Module Contents



Saves data from an specific MySQL query into a file in S3.




class*, query: str, s3_bucket: str, s3_key: str, replace: bool = False, mysql_conn_id: str = 'mysql_default', aws_conn_id: str = 'aws_default', verify: Optional[Union[bool, str]] = None, pd_csv_kwargs: Optional[dict] = None, index: bool = False, header: bool = False, file_format: typing_extensions.Literal[csv, parquet] = 'csv', pd_kwargs: Optional[dict] = None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Saves data from an specific MySQL query into a file in S3.

  • query (str) -- the sql query to be executed. If you want to execute a file, place the absolute path of it, ending with .sql extension. (templated)

  • s3_bucket (str) -- bucket where the data will be stored. (templated)

  • s3_key (str) -- desired key for the file. It includes the name of the file. (templated)

  • replace (bool) -- whether or not to replace the file in S3 if it previously existed

  • mysql_conn_id (str) -- Reference to mysql connection id.

  • aws_conn_id (str) -- reference to a specific S3 connection

  • verify (bool or str) --

    Whether or not to verify SSL certificates for S3 connection. By default SSL certificates are verified. You can provide the following values:

    • False: do not validate SSL certificates. SSL will still be used

      (unless use_ssl is False), but SSL certificates will not be verified.

    • path/to/cert/bundle.pem: A filename of the CA cert bundle to uses.

      You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.

  • pd_csv_kwargs (dict) -- arguments to include in pd.to_csv (header, index, columns...)

  • index (str) -- whether to have the index or not in the dataframe

  • header (bool) -- whether to include header or not into the S3 file

  • file_format (str) -- the destination file format, only string 'csv' or 'parquet' is accepted.

  • pd_kwargs (dict) -- arguments to include in DataFrame.to_parquet() or DataFrame.to_csv(). This is preferred than pd_csv_kwargs.

template_fields :Sequence[str] = ['s3_bucket', 's3_key', 'query'][source]
template_ext :Sequence[str] = ['.sql'][source]
execute(self, context: airflow.utils.context.Context) None[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?