`airflow.contrib.operators.mysql_to_gcs`¶

Module Contents¶

airflow.contrib.operators.mysql_to_gcs.PY3[source]¶

class airflow.contrib.operators.mysql_to_gcs.MySqlToGoogleCloudStorageOperator(sql, bucket, filename, schema_filename=None, approx_max_file_size_bytes=1900000000, mysql_conn_id='mysql_default', google_cloud_storage_conn_id='google_cloud_default', schema=None, delegate_to=None, export_format='json', field_delimiter=', ', *args, **kwargs)[source]¶

Bases:airflow.models.BaseOperator

Copy data from MySQL to Google cloud storage in JSON or CSV format.

The JSON data files generated are newline-delimited to enable them to be loaded into BigQuery. Reference: https://cloud.google.com/bigquery/docs/ loading-data-cloud-storage-json#limitations

Parameters

sql (str) – The SQL to execute on the MySQL table.
bucket (str) – The bucket to upload to.
filename (str) – The filename to use as the object name when uploading to Google cloud storage. A {} should be specified in the filename to allow the operator to inject file numbers in cases where the file is split due to size.
schema_filename (str) – If set, the filename to use as the object name when uploading a .json file containing the BigQuery schema fields for the table that was dumped from MySQL.
approx_max_file_size_bytes (long) – This operator supports the ability to split large table dumps into multiple files (see notes in the filenamed param docs above). Google cloud storage allows for files to be a maximum of 4GB. This param allows developers to specify the file size of the splits.
mysql_conn_id (str) – Reference to a specific MySQL hook.
google_cloud_storage_conn_id (str) – Reference to a specific Google cloud storage hook.
schema (str or list) – The schema to use, if any. Should be a list of dict or a str. Pass a string if using Jinja template, otherwise, pass a list of dict. Examples could be seen: https://cloud.google.com/bigquery/docs /schemas#specifying_a_json_schema_file
delegate_to (str) – The account to impersonate, if any. For this to work, the service account making the request must have domain-wide delegation enabled.
export_format (str) – Desired format of files to be exported.
field_delimiter (str) – The delimiter to be used for CSV files.

template_fields = ['sql', 'bucket', 'filename', 'schema_filename', 'schema'][source]¶

template_ext = ['.sql'][source]¶

ui_color = #a0e08c[source]¶

execute(self, context)[source]¶

_query_mysql(self)[source]¶: Queries mysql and returns a cursor to the results.

_write_local_data_files(self, cursor)[source]¶

Takes a cursor, and writes results to a local file.

Returns: A dictionary where keys are filenames to be used as object names in GCS, and values are file handles to local files that contain the data for the GCS objects.

_configure_csv_file(self, file_handle, schema)[source]¶: Configure a csv writer with the file_handle and write schema as headers for the new file.

_write_local_schema_file(self, cursor)[source]¶

Takes a cursor, and writes the BigQuery schema in .json format for the results to a local file system.

Returns: A dictionary where key is a filename to be used as an object name in GCS, and values are file handles to local files that contains the BigQuery schema fields in .json format.

_upload_to_gcs(self, files_to_upload)[source]¶: Upload all of the file splits (and optionally the schema .json file) to Google cloud storage.

static _convert_types(schema, col_type_dict, row)[source]¶: Takes a value from MySQLdb, and converts it to a value that’s safe for JSON/Google cloud storage/BigQuery. Dates are converted to UTC seconds. Decimals are converted to floats. Binary type fields are encoded with base64, as imported BYTES data must be base64-encoded according to Bigquery SQL date type documentation: https://cloud.google.com/bigquery/data-types

_get_col_type_dict(self)[source]¶: Return a dict of column name and column type based on self.schema if not None.

classmethod type_map(cls, mysql_type)[source]¶: Helper function that maps from MySQL fields to BigQuery fields. Used when a schema_filename is set.

airflow.contrib.operators.mysql_to_gcs¶

Module Contents¶

`airflow.contrib.operators.mysql_to_gcs`¶