airflow.providers.amazon.aws.transfers.dynamodb_to_s3
¶
This module contains operators to replicate records from DynamoDB table to S3.
Module Contents¶
Classes¶
Custom json encoder implementation |
|
Replicates records from a DynamoDB table to S3. |
- class airflow.providers.amazon.aws.transfers.dynamodb_to_s3.JSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶
Bases:
json.JSONEncoder
Custom json encoder implementation
- class airflow.providers.amazon.aws.transfers.dynamodb_to_s3.DynamoDBToS3Operator(*, dynamodb_table_name, s3_bucket_name, file_size, dynamodb_scan_kwargs=None, s3_key_prefix='', process_func=_convert_item_to_json_bytes, aws_conn_id='aws_default', **kwargs)[source]¶
Bases:
airflow.models.BaseOperator
Replicates records from a DynamoDB table to S3. It scans a DynamoDB table and writes the received records to a file on the local filesystem. It flushes the file to S3 once the file size exceeds the file size limit specified by the user.
Users can also specify a filtering criteria using dynamodb_scan_kwargs to only replicate records that satisfy the criteria.
See also
For more information on how to use this operator, take a look at the guide: Amazon DynamoDB To Amazon S3 transfer operator
- Parameters
dynamodb_table_name (str) – Dynamodb table to replicate data from
s3_bucket_name (str) – S3 bucket to replicate data to
file_size (int) – Flush file to s3 if file size >= file_size
dynamodb_scan_kwargs (dict[str, Any] | None) – kwargs pass to <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Table.scan>
s3_key_prefix (str) – Prefix of s3 object key
process_func (Callable[[dict[str, Any]], bytes]) – How we transforms a dynamodb item to bytes. By default we dump the json
aws_conn_id (str) – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).