AWS Glue DataBrew¶
AWS Glue DataBrew is a visual data preparation tool that makes it easier for data analysts and data scientists to clean and normalize data to prepare it for analytics and machine learning (ML). You can choose from over 250 prebuilt transformations to automate data preparation tasks, all without the need to write any code. You can automate filtering anomalies, converting data to standard formats and correcting invalid values, and other tasks. After your data is ready, you can immediately use it for analytics and ML projects.
Prerequisite Tasks¶
To use these operators, you must do a few things:
Create necessary resources using AWS Console or AWS CLI.
Install API libraries via pip.
pip install 'apache-airflow[amazon]'Detailed information is available Installation of Airflow®
Operators¶
Start an AWS Glue DataBrew job¶
To submit a new AWS Glue DataBrew job you can use GlueDataBrewStartJobOperator
.
start_job = GlueDataBrewStartJobOperator(task_id="startjob", job_name=job_name, delay=15)