Amazon AppFlow¶
Amazon AppFlow is a fully managed integration service that enables you to securely transfer data between Software-as-a-Service (SaaS) applications like Salesforce, SAP, Zendesk, Slack, and ServiceNow, and AWS services like Amazon S3 and Amazon Redshift, in just a few clicks. With AppFlow, you can run data flows at enterprise scale at the frequency you choose - on a schedule, in response to a business event, or on demand. You can configure data transformation capabilities like filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps. AppFlow automatically encrypts data in motion, and allows users to restrict data from flowing over the public Internet for SaaS applications that are integrated with AWS PrivateLink, reducing exposure to security threats.
Prerequisite Tasks¶
To use these operators, you must do a few things:
Create necessary resources using AWS Console or AWS CLI.
Install API libraries via pip.
pip install 'apache-airflow[amazon]'Detailed information is available Installation of Apache Airflow®
Generic Parameters¶
- aws_conn_id
Reference to Amazon Web Services Connection ID. If this parameter is set to
None
then the default boto3 behaviour is used without a connection lookup. Otherwise use the credentials stored in the Connection. Default:aws_default
- region_name
AWS Region Name. If this parameter is set to
None
or omitted then region_name from AWS Connection Extra Parameter will be used. Otherwise use the specified value instead of the connection value. Default:None
- verify
Whether or not to verify SSL certificates.
False
- Do not validate SSL certificates.path/to/cert/bundle.pem - A filename of the CA cert bundle to use. You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.
If this parameter is set to
None
or is omitted then verify from AWS Connection Extra Parameter will be used. Otherwise use the specified value instead of the connection value. Default:None
- botocore_config
The provided dictionary is used to construct a botocore.config.Config. This configuration can be used to configure Avoid Throttling exceptions, timeouts, etc.
{ "signature_version": "unsigned", "s3": { "us_east_1_regional_endpoint": True, }, "retries": { "mode": "standard", "max_attempts": 10, }, "connect_timeout": 300, "read_timeout": 300, "tcp_keepalive": True, }
If this parameter is set to
None
or omitted then config_kwargs from AWS Connection Extra Parameter will be used. Otherwise use the specified value instead of the connection value. Default:None
Note
Specifying an empty dictionary,
{}
, will overwrite the connection configuration for botocore.config.Config
Operators¶
Run Flow¶
To run an AppFlow flow keeping as is, use:
AppflowRunOperator
.
run_flow = AppflowRunOperator(
task_id="run_flow",
flow_name=flow_name,
)
Note
Supported sources: Salesforce, Zendesk
Run Flow Full¶
To run an AppFlow flow removing all filters, use:
AppflowRunFullOperator
.
campaign_dump_full = AppflowRunFullOperator(
task_id="campaign_dump_full",
source=source_name,
flow_name=flow_name,
)
Note
Supported sources: Salesforce, Zendesk
Run Flow Daily¶
To run an AppFlow flow filtering daily records, use:
AppflowRunDailyOperator
.
campaign_dump_daily = AppflowRunDailyOperator(
task_id="campaign_dump_daily",
source=source_name,
flow_name=flow_name,
source_field="LastModifiedDate",
filter_date="{{ ds }}",
)
Note
Supported sources: Salesforce
Run Flow Before¶
To run an AppFlow flow filtering future records and selecting the past ones, use:
AppflowRunBeforeOperator
.
campaign_dump_before = AppflowRunBeforeOperator(
task_id="campaign_dump_before",
source=source_name,
flow_name=flow_name,
source_field="LastModifiedDate",
filter_date="{{ ds }}",
)
Note
Supported sources: Salesforce
Run Flow After¶
To run an AppFlow flow filtering past records and selecting the future ones, use:
AppflowRunAfterOperator
.
campaign_dump_after = AppflowRunAfterOperator(
task_id="campaign_dump_after",
source=source_name,
flow_name=flow_name,
source_field="LastModifiedDate",
filter_date="3000-01-01", # Future date, so no records to dump
)
Note
Supported sources: Salesforce, Zendesk
Skipping Tasks For Empty Runs¶
To skip tasks when some AppFlow run return zero records, use:
AppflowRecordsShortCircuitOperator
.
campaign_dump_short_circuit = AppflowRecordsShortCircuitOperator(
task_id="campaign_dump_short_circuit",
flow_name=flow_name,
appflow_run_task_id="campaign_dump_after", # Should shortcircuit, no records expected
)
Note
Supported sources: Salesforce, Zendesk