airflow.providers.amazon.aws.transfers.mysql_to_s3

Module Contents

class airflow.providers.amazon.aws.transfers.mysql_to_s3.MySQLToS3Operator(*, query: str, s3_bucket: str, s3_key: str, mysql_conn_id: str = 'mysql_default', aws_conn_id: str = 'aws_default', verify: Optional[Union[bool, str]] = None, pd_csv_kwargs: Optional[dict] = None, index: bool = False, header: bool = False, **kwargs)[source]

Bases: airflow.models.BaseOperator

Saves data from an specific MySQL query into a file in S3.

Parameters
  • query (str) – the sql query to be executed. If you want to execute a file, place the absolute path of it, ending with .sql extension. (templated)

  • s3_bucket (str) – bucket where the data will be stored. (templated)

  • s3_key (str) – desired key for the file. It includes the name of the file. (templated)

  • mysql_conn_id (str) – reference to a specific mysql database

  • aws_conn_id (str) – reference to a specific S3 connection

  • verify (bool or str) –

    Whether or not to verify SSL certificates for S3 connection. By default SSL certificates are verified. You can provide the following values:

    • False: do not validate SSL certificates. SSL will still be used

      (unless use_ssl is False), but SSL certificates will not be verified.

    • path/to/cert/bundle.pem: A filename of the CA cert bundle to uses.

      You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.

  • pd_csv_kwargs (dict) – arguments to include in pd.to_csv (header, index, columns…)

  • index (str) – whether to have the index or not in the dataframe

  • header (bool) – whether to include header or not into the S3 file

template_fields = ['s3_bucket', 's3_key', 'query'][source]
template_ext = ['.sql'][source]
template_fields_renderers[source]
_fix_int_dtypes(self, df: pd.DataFrame)[source]

Mutate DataFrame to set dtypes for int columns containing NaN values.

execute(self, context)[source]

Was this entry helpful?