airflow.providers.amazon.aws.hooks.datasync

Interact with AWS DataSync, using the AWS boto3 library.

Module Contents

class airflow.providers.amazon.aws.hooks.datasync.AWSDataSyncHook(wait_interval_seconds: int = 30, *args, **kwargs)[source]

Bases: airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook

Interact with AWS DataSync.

Additional arguments (such as aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.

Parameters

wait_interval_seconds (Optional[int]) – Time to wait between two consecutive calls to check TaskExecution status. Defaults to 30 seconds.

Raises

ValueError – If wait_interval_seconds is not between 0 and 15*60 seconds.

TASK_EXECUTION_INTERMEDIATE_STATES = ['INITIALIZING', 'QUEUED', 'LAUNCHING', 'PREPARING', 'TRANSFERRING', 'VERIFYING'][source]
TASK_EXECUTION_FAILURE_STATES = ['ERROR'][source]
TASK_EXECUTION_SUCCESS_STATES = ['SUCCESS'][source]
create_location(self, location_uri: str, **create_location_kwargs)[source]

Creates a new location.

Parameters
  • location_uri (str) – Location URI used to determine the location type (S3, SMB, NFS, EFS).

  • create_location_kwargs – Passed to boto.create_location_xyz(). See AWS boto3 datasync documentation.

Return str

LocationArn of the created Location.

Raises

AirflowException – If location type (prefix from location_uri) is invalid.

get_location_arns(self, location_uri: str, case_sensitive: bool = False, ignore_trailing_slash: bool = True)[source]

Return all LocationArns which match a LocationUri.

Parameters
  • location_uri (str) – Location URI to search for, eg s3://mybucket/mypath

  • case_sensitive (bool) – Do a case sensitive search for location URI.

  • ignore_trailing_slash (bool) – Ignore / at the end of URI when matching.

Returns

List of LocationArns.

Return type

list(str)

Raises

AirflowBadRequest – if location_uri is empty

create_task(self, source_location_arn: str, destination_location_arn: str, **create_task_kwargs)[source]

Create a Task between the specified source and destination LocationArns.

Parameters
  • source_location_arn (str) – Source LocationArn. Must exist already.

  • destination_location_arn (str) – Destination LocationArn. Must exist already.

  • create_task_kwargs – Passed to boto.create_task(). See AWS boto3 datasync documentation.

Returns

TaskArn of the created Task

Return type

str

update_task(self, task_arn: str, **update_task_kwargs)[source]

Update a Task.

Parameters
  • task_arn (str) – The TaskArn to update.

  • update_task_kwargs – Passed to boto.update_task(), See AWS boto3 datasync documentation.

delete_task(self, task_arn: str)[source]

Delete a Task.

Parameters

task_arn (str) – The TaskArn to delete.

get_task_arns_for_location_arns(self, source_location_arns: list, destination_location_arns: list)[source]

Return list of TaskArns for which use any one of the specified source LocationArns and any one of the specified destination LocationArns.

Parameters
  • source_location_arns (list) – List of source LocationArns.

  • destination_location_arns (list) – List of destination LocationArns.

Returns

list

Return type

list(TaskArns)

Raises

AirflowBadRequest – if source_location_arns or destination_location_arns are empty.

start_task_execution(self, task_arn: str, **kwargs)[source]

Start a TaskExecution for the specified task_arn. Each task can have at most one TaskExecution.

Parameters
  • task_arn (str) – TaskArn

  • kwargs – kwargs sent to boto3.start_task_execution()

Returns

TaskExecutionArn

Return type

str

Raises
  • ClientError – If a TaskExecution is already busy running for this task_arn.

  • AirflowBadRequest – If task_arn is empty.

cancel_task_execution(self, task_execution_arn: str)[source]

Cancel a TaskExecution for the specified task_execution_arn.

Parameters

task_execution_arn (str) – TaskExecutionArn.

Raises

AirflowBadRequest – If task_execution_arn is empty.

get_task_description(self, task_arn: str)[source]

Get description for the specified task_arn.

Parameters

task_arn (str) – TaskArn

Returns

AWS metadata about a task.

Return type

dict

Raises

AirflowBadRequest – If task_arn is empty.

describe_task_execution(self, task_execution_arn: str)[source]

Get description for the specified task_execution_arn.

Parameters

task_execution_arn (str) – TaskExecutionArn

Returns

AWS metadata about a task execution.

Return type

dict

Raises

AirflowBadRequest – If task_execution_arn is empty.

get_current_task_execution_arn(self, task_arn: str)[source]

Get current TaskExecutionArn (if one exists) for the specified task_arn.

Parameters

task_arn (str) – TaskArn

Returns

CurrentTaskExecutionArn for this task_arn or None.

Return type

str

Raises

AirflowBadRequest – if task_arn is empty.

wait_for_task_execution(self, task_execution_arn: str, max_iterations: int = 60)[source]

Wait for Task Execution status to be complete (SUCCESS/ERROR). The task_execution_arn must exist, or a boto3 ClientError will be raised.

Parameters
  • task_execution_arn (str) – TaskExecutionArn

  • max_iterations (int) – Maximum number of iterations before timing out.

Returns

Result of task execution.

Return type

bool

Raises
  • AirflowTaskTimeout – If maximum iterations is exceeded.

  • AirflowBadRequest – If task_execution_arn is empty.

Was this entry helpful?