Amazon AppFlow¶
Amazon AppFlow is a fully managed integration service that enables you to securely transfer data between Software-as-a-Service (SaaS) applications like Salesforce, SAP, Zendesk, Slack, and ServiceNow, and AWS services like Amazon S3 and Amazon Redshift, in just a few clicks. With AppFlow, you can run data flows at enterprise scale at the frequency you choose - on a schedule, in response to a business event, or on demand. You can configure data transformation capabilities like filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps. AppFlow automatically encrypts data in motion, and allows users to restrict data from flowing over the public Internet for SaaS applications that are integrated with AWS PrivateLink, reducing exposure to security threats.
Prerequisite Tasks¶
To use these operators, you must do a few things:
Create necessary resources using AWS Console or AWS CLI.
Install API libraries via pip.
pip install 'apache-airflow[amazon]'
Detailed information is available Installation
Operators¶
Run Flow¶
To run an AppFlow flow keeping all filters as is, use:
AppflowRunOperator
.
campaign_dump = AppflowRunOperator(
task_id="campaign_dump",
source=SOURCE_NAME,
flow_name=FLOW_NAME,
)
Note
Supported sources: Salesforce, Zendesk
Run Flow Full¶
To run an AppFlow flow removing all filters, use:
AppflowRunFullOperator
.
campaign_dump_full = AppflowRunFullOperator(
task_id="campaign_dump_full",
source=SOURCE_NAME,
flow_name=FLOW_NAME,
)
Note
Supported sources: Salesforce, Zendesk
Run Flow Daily¶
To run an AppFlow flow filtering daily records, use:
AppflowRunDailyOperator
.
campaign_dump_daily = AppflowRunDailyOperator(
task_id="campaign_dump_daily",
source=SOURCE_NAME,
flow_name=FLOW_NAME,
source_field="LastModifiedDate",
filter_date="{{ ds }}",
)
Note
Supported sources: Salesforce
Run Flow Before¶
To run an AppFlow flow filtering future records and selecting the past ones, use:
AppflowRunBeforeOperator
.
campaign_dump_before = AppflowRunBeforeOperator(
task_id="campaign_dump_before",
source=SOURCE_NAME,
flow_name=FLOW_NAME,
source_field="LastModifiedDate",
filter_date="{{ ds }}",
)
Note
Supported sources: Salesforce
Run Flow After¶
To run an AppFlow flow filtering past records and selecting the future ones, use:
AppflowRunAfterOperator
.
campaign_dump_after = AppflowRunAfterOperator(
task_id="campaign_dump_after",
source=SOURCE_NAME,
flow_name=FLOW_NAME,
source_field="LastModifiedDate",
filter_date="3000-01-01", # Future date, so no records to dump
)
Note
Supported sources: Salesforce, Zendesk
Skipping Tasks For Empty Runs¶
To skip tasks when some AppFlow run return zero records, use:
AppflowRecordsShortCircuitOperator
.
campaign_dump_short_circuit = AppflowRecordsShortCircuitOperator(
task_id="campaign_dump_short_circuit",
flow_name=FLOW_NAME,
appflow_run_task_id="campaign_dump_after", # Should shortcircuit, no records expected
)
Note
Supported sources: Salesforce, Zendesk