airflow.providers.amazon.aws.operators.comprehend

Module Contents

Classes

ComprehendBaseOperator

This is the base operator for Comprehend Service operators (not supposed to be used directly in DAGs).

ComprehendStartPiiEntitiesDetectionJobOperator

Create a comprehend pii entities detection job for a collection of documents.

class airflow.providers.amazon.aws.operators.comprehend.ComprehendBaseOperator(input_data_config, output_data_config, data_access_role_arn, language_code, **kwargs)[source]

Bases: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.comprehend.ComprehendHook]

This is the base operator for Comprehend Service operators (not supposed to be used directly in DAGs).

Parameters
  • input_data_config (dict) – The input properties for a PII entities detection job. (templated)

  • output_data_config (dict) – Provides configuration parameters for the output of PII entity detection jobs. (templated)

  • data_access_role_arn (str) – The Amazon Resource Name (ARN) of the IAM role that grants Amazon Comprehend read access to your input data. (templated)

  • language_code (str) – The language of the input documents. (templated)

aws_hook_class[source]
template_fields: Sequence[str][source]
template_fields_renderers: dict[source]
client()[source]

Create and return the Comprehend client.

abstract execute(context)[source]

Must overwrite in child classes.

class airflow.providers.amazon.aws.operators.comprehend.ComprehendStartPiiEntitiesDetectionJobOperator(input_data_config, output_data_config, mode, data_access_role_arn, language_code, start_pii_entities_kwargs=None, wait_for_completion=True, waiter_delay=60, waiter_max_attempts=20, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]

Bases: ComprehendBaseOperator

Create a comprehend pii entities detection job for a collection of documents.

See also

For more information on how to use this operator, take a look at the guide: Create an Amazon Comprehend Start PII Entities Detection Job

Parameters
  • input_data_config (dict) – The input properties for a PII entities detection job. (templated)

  • output_data_config (dict) – Provides configuration parameters for the output of PII entity detection jobs. (templated)

  • mode (str) – Specifies whether the output provides the locations (offsets) of PII entities or a file in which PII entities are redacted. If you set the mode parameter to ONLY_REDACTION. In that case you must provide a RedactionConfig in start_pii_entities_kwargs.

  • data_access_role_arn (str) – The Amazon Resource Name (ARN) of the IAM role that grants Amazon Comprehend read access to your input data. (templated)

  • language_code (str) – The language of the input documents. (templated)

  • start_pii_entities_kwargs (dict[str, Any] | None) – Any optional parameters to pass to the job. If JobName is not provided in start_pii_entities_kwargs, operator will create.

  • wait_for_completion (bool) – Whether to wait for job to stop. (default: True)

  • waiter_delay (int) – Time in seconds to wait between status checks. (default: 60)

  • waiter_max_attempts (int) – Maximum number of attempts to check for job completion. (default: 20)

  • deferrable (bool) – If True, the operator will wait asynchronously for the job to stop. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False)

  • aws_conn_id – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).

  • region_name – AWS region_name. If not specified then the default boto3 behaviour is used.

  • verify – Whether to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html

  • botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

execute(context)[source]

Must overwrite in child classes.

execute_complete(context, event=None)[source]

Was this entry helpful?