airflow.providers.amazon.aws.transfers.dynamodb_to_s3

Module Contents

Classes

JSONEncoder

Custom json encoder implementation.

DynamoDBToS3Operator

Replicates records from a DynamoDB table to S3.

class airflow.providers.amazon.aws.transfers.dynamodb_to_s3.JSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: json.JSONEncoder

Custom json encoder implementation.

default(obj)[source]

Convert decimal objects in a json serializable format.

class airflow.providers.amazon.aws.transfers.dynamodb_to_s3.DynamoDBToS3Operator(*, dynamodb_table_name, s3_bucket_name, file_size=1000, dynamodb_scan_kwargs=None, s3_key_prefix='', process_func=_convert_item_to_json_bytes, point_in_time_export=False, export_time=None, export_format='DYNAMODB_JSON', export_table_to_point_in_time_kwargs=None, check_interval=30, max_attempts=60, **kwargs)[source]

Bases: airflow.providers.amazon.aws.transfers.base.AwsToAwsBaseOperator

Replicates records from a DynamoDB table to S3.

It scans a DynamoDB table and writes the received records to a file on the local filesystem. It flushes the file to S3 once the file size exceeds the file size limit specified by the user.

Users can also specify a filtering criteria using dynamodb_scan_kwargs to only replicate records that satisfy the criteria.

See also

For more information on how to use this operator, take a look at the guide: Amazon DynamoDB To Amazon S3 transfer operator

Parameters
  • dynamodb_table_name (str) – Dynamodb table to replicate data from

  • s3_bucket_name (str) – S3 bucket to replicate data to

  • file_size (int) – Flush file to s3 if file size >= file_size

  • dynamodb_scan_kwargs (dict[str, Any] | None) – kwargs pass to <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Table.scan>

  • s3_key_prefix (str) – Prefix of s3 object key

  • process_func (Callable[[dict[str, Any]], bytes]) – How we transform a dynamodb item to bytes. By default, we dump the json

  • point_in_time_export (bool) – Boolean value indicating the operator to use ‘scan’ or ‘point in time export’

  • export_time (datetime.datetime | None) – Time in the past from which to export table data, counted in seconds from the start of the Unix epoch. The table export will be a snapshot of the table’s state at this point in time.

  • export_format (str) – The format for the exported data. Valid values for ExportFormat are DYNAMODB_JSON or ION.

  • export_table_to_point_in_time_kwargs (dict | None) – extra parameters for the boto3 export_table_to_point_in_time function all. e.g. ExportType, IncrementalExportSpecification

  • check_interval (int) – The amount of time in seconds to wait between attempts. Only if export_time is provided.

  • max_attempts (int) – The maximum number of attempts to be made. Only if export_time is provided.

template_fields: Sequence[str] = ()[source]
template_fields_renderers[source]
hook()[source]

Create DynamoDBHook.

execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?