airflow.providers.amazon.aws.transfers.dynamodb_to_s3

This module contains operators to replicate records from DynamoDB table to S3.

Module Contents

Classes

DynamoDBToS3Operator

Replicates records from a DynamoDB table to S3.

class airflow.providers.amazon.aws.transfers.dynamodb_to_s3.DynamoDBToS3Operator(*, dynamodb_table_name, s3_bucket_name, file_size, dynamodb_scan_kwargs=None, s3_key_prefix='', process_func=_convert_item_to_json_bytes, aws_conn_id='aws_default', **kwargs)[source]

Bases: airflow.models.BaseOperator

Replicates records from a DynamoDB table to S3. It scans a DynamoDB table and writes the received records to a file on the local filesystem. It flushes the file to S3 once the file size exceeds the file size limit specified by the user.

Users can also specify a filtering criteria using dynamodb_scan_kwargs to only replicate records that satisfy the criteria.

See also

For more information on how to use this operator, take a look at the guide: DynamoDB To S3 Operator

Parameters
  • dynamodb_table_name (str) -- Dynamodb table to replicate data from

  • s3_bucket_name (str) -- S3 bucket to replicate data to

  • file_size (int) -- Flush file to s3 if file size >= file_size

  • dynamodb_scan_kwargs (Optional[Dict[str, Any]]) -- kwargs pass to <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Table.scan> # noqa: E501

  • s3_key_prefix (str) -- Prefix of s3 object key

  • process_func (Callable[[Dict[str, Any]], bytes]) -- How we transforms a dynamodb item to bytes. By default we dump the json

  • aws_conn_id (str) -- The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).

template_fields :Sequence[str] = ['s3_bucket_name', 'dynamodb_table_name'][source]
template_fields_renderers[source]
execute(self, context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?