airflow.providers.amazon.aws.operators.comprehend
¶
Module Contents¶
Classes¶
This is the base operator for Comprehend Service operators (not supposed to be used directly in DAGs). |
|
Create a comprehend pii entities detection job for a collection of documents. |
- class airflow.providers.amazon.aws.operators.comprehend.ComprehendBaseOperator(input_data_config, output_data_config, data_access_role_arn, language_code, **kwargs)[source]¶
Bases:
airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator
[airflow.providers.amazon.aws.hooks.comprehend.ComprehendHook
]This is the base operator for Comprehend Service operators (not supposed to be used directly in DAGs).
- Parameters
input_data_config (dict) – The input properties for a PII entities detection job. (templated)
output_data_config (dict) – Provides configuration parameters for the output of PII entity detection jobs. (templated)
data_access_role_arn (str) – The Amazon Resource Name (ARN) of the IAM role that grants Amazon Comprehend read access to your input data. (templated)
language_code (str) – The language of the input documents. (templated)
- class airflow.providers.amazon.aws.operators.comprehend.ComprehendStartPiiEntitiesDetectionJobOperator(input_data_config, output_data_config, mode, data_access_role_arn, language_code, start_pii_entities_kwargs=None, wait_for_completion=True, waiter_delay=60, waiter_max_attempts=20, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
Bases:
ComprehendBaseOperator
Create a comprehend pii entities detection job for a collection of documents.
See also
For more information on how to use this operator, take a look at the guide: Create an Amazon Comprehend Start PII Entities Detection Job
- Parameters
input_data_config (dict) – The input properties for a PII entities detection job. (templated)
output_data_config (dict) – Provides configuration parameters for the output of PII entity detection jobs. (templated)
mode (str) – Specifies whether the output provides the locations (offsets) of PII entities or a file in which PII entities are redacted. If you set the mode parameter to ONLY_REDACTION. In that case you must provide a RedactionConfig in start_pii_entities_kwargs.
data_access_role_arn (str) – The Amazon Resource Name (ARN) of the IAM role that grants Amazon Comprehend read access to your input data. (templated)
language_code (str) – The language of the input documents. (templated)
start_pii_entities_kwargs (dict[str, Any] | None) – Any optional parameters to pass to the job. If JobName is not provided in start_pii_entities_kwargs, operator will create.
wait_for_completion (bool) – Whether to wait for job to stop. (default: True)
waiter_delay (int) – Time in seconds to wait between status checks. (default: 60)
waiter_max_attempts (int) – Maximum number of attempts to check for job completion. (default: 20)
deferrable (bool) – If True, the operator will wait asynchronously for the job to stop. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False)
aws_conn_id – The Airflow connection used for AWS credentials. If this is
None
or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).region_name – AWS region_name. If not specified then the default boto3 behaviour is used.
verify – Whether to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html