airflow.providers.amazon.aws.hooks.datasync

Interact with AWS DataSync, using the AWS boto3 library.

Module Contents

class airflow.providers.amazon.aws.hooks.datasync.AWSDataSyncHook(wait_interval_seconds: int = 5, *args, **kwargs)[source]

Bases: airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook

Interact with AWS DataSync.

Additional arguments (such as aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.

Parameters

wait_for_task_execution (int) -- Time to wait between two consecutive calls to check TaskExecution status.

Raises

ValueError -- If wait_interval_seconds is not between 0 and 15*60 seconds.

TASK_EXECUTION_INTERMEDIATE_STATES = ['INITIALIZING', 'QUEUED', 'LAUNCHING', 'PREPARING', 'TRANSFERRING', 'VERIFYING'][source]
TASK_EXECUTION_FAILURE_STATES = ['ERROR'][source]
TASK_EXECUTION_SUCCESS_STATES = ['SUCCESS'][source]
create_location(self, location_uri: str, **create_location_kwargs)[source]

Creates a new location.

Parameters
  • location_uri (str) -- Location URI used to determine the location type (S3, SMB, NFS, EFS).

  • create_location_kwargs -- Passed to boto.create_location_xyz(). See AWS boto3 datasync documentation.

Return str

LocationArn of the created Location.

Raises

AirflowException -- If location type (prefix from location_uri) is invalid.

get_location_arns(self, location_uri: str, case_sensitive: bool = False, ignore_trailing_slash: bool = True)[source]

Return all LocationArns which match a LocationUri.

Parameters
  • location_uri (str) -- Location URI to search for, eg s3://mybucket/mypath

  • case_sensitive (bool) -- Do a case sensitive search for location URI.

  • ignore_trailing_slash (bool) -- Ignore / at the end of URI when matching.

Returns

List of LocationArns.

Return type

list(str)

Raises

AirflowBadRequest -- if location_uri is empty

_refresh_locations(self)[source]

Refresh the local list of Locations.

create_task(self, source_location_arn: str, destination_location_arn: str, **create_task_kwargs)[source]

Create a Task between the specified source and destination LocationArns.

Parameters
  • source_location_arn (str) -- Source LocationArn. Must exist already.

  • destination_location_arn (str) -- Destination LocationArn. Must exist already.

  • create_task_kwargs -- Passed to boto.create_task(). See AWS boto3 datasync documentation.

Returns

TaskArn of the created Task

Return type

str

update_task(self, task_arn: str, **update_task_kwargs)[source]

Update a Task.

Parameters
  • task_arn (str) -- The TaskArn to update.

  • update_task_kwargs -- Passed to boto.update_task(), See AWS boto3 datasync documentation.

delete_task(self, task_arn: str)[source]

Delete a Task.

Parameters

task_arn (str) -- The TaskArn to delete.

_refresh_tasks(self)[source]

Refreshes the local list of Tasks

get_task_arns_for_location_arns(self, source_location_arns: list, destination_location_arns: list)[source]

Return list of TaskArns for which use any one of the specified source LocationArns and any one of the specified destination LocationArns.

Parameters
  • source_location_arns (list) -- List of source LocationArns.

  • destination_location_arns (list) -- List of destination LocationArns.

Returns

list

Return type

list(TaskArns)

Raises

AirflowBadRequest -- if source_location_arns or destination_location_arns are empty.

start_task_execution(self, task_arn: str, **kwargs)[source]

Start a TaskExecution for the specified task_arn. Each task can have at most one TaskExecution.

Parameters
  • task_arn (str) -- TaskArn

  • kwargs -- kwargs sent to boto3.start_task_execution()

Returns

TaskExecutionArn

Return type

str

Raises
  • ClientError -- If a TaskExecution is already busy running for this task_arn.

  • AirflowBadRequest -- If task_arn is empty.

cancel_task_execution(self, task_execution_arn: str)[source]

Cancel a TaskExecution for the specified task_execution_arn.

Parameters

task_execution_arn (str) -- TaskExecutionArn.

Raises

AirflowBadRequest -- If task_execution_arn is empty.

get_task_description(self, task_arn: str)[source]

Get description for the specified task_arn.

Parameters

task_arn (str) -- TaskArn

Returns

AWS metadata about a task.

Return type

dict

Raises

AirflowBadRequest -- If task_arn is empty.

describe_task_execution(self, task_execution_arn: str)[source]

Get description for the specified task_execution_arn.

Parameters

task_execution_arn (str) -- TaskExecutionArn

Returns

AWS metadata about a task execution.

Return type

dict

Raises

AirflowBadRequest -- If task_execution_arn is empty.

get_current_task_execution_arn(self, task_arn: str)[source]

Get current TaskExecutionArn (if one exists) for the specified task_arn.

Parameters

task_arn (str) -- TaskArn

Returns

CurrentTaskExecutionArn for this task_arn or None.

Return type

str

Raises

AirflowBadRequest -- if task_arn is empty.

wait_for_task_execution(self, task_execution_arn: str, max_iterations: int = 2 * 180)[source]

Wait for Task Execution status to be complete (SUCCESS/ERROR). The task_execution_arn must exist, or a boto3 ClientError will be raised.

Parameters
  • task_execution_arn (str) -- TaskExecutionArn

  • max_iterations (int) -- Maximum number of iterations before timing out.

Returns

Result of task execution.

Return type

bool

Raises
  • AirflowTaskTimeout -- If maximum iterations is exceeded.

  • AirflowBadRequest -- If task_execution_arn is empty.

Was this entry helpful?