airflow.providers.amazon.aws.operators.comprehend¶
Classes¶
| This is the base operator for Comprehend Service operators (not supposed to be used directly in DAGs). | |
| Create a comprehend pii entities detection job for a collection of documents. | |
| Create a comprehend document classifier that can categorize documents. | 
Module Contents¶
- class airflow.providers.amazon.aws.operators.comprehend.ComprehendBaseOperator(input_data_config, output_data_config, data_access_role_arn, language_code, **kwargs)[source]¶
- Bases: - airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[- airflow.providers.amazon.aws.hooks.comprehend.ComprehendHook]- This is the base operator for Comprehend Service operators (not supposed to be used directly in DAGs). - Parameters:
- input_data_config (dict) – The input properties for a PII entities detection job. (templated) 
- output_data_config (dict) – Provides configuration parameters for the output of PII entity detection jobs. (templated) 
- data_access_role_arn (str) – The Amazon Resource Name (ARN) of the IAM role that grants Amazon Comprehend read access to your input data. (templated) 
- language_code (str) – The language of the input documents. (templated) 
 
 - template_fields: collections.abc.Sequence[str][source]¶
 
- class airflow.providers.amazon.aws.operators.comprehend.ComprehendStartPiiEntitiesDetectionJobOperator(input_data_config, output_data_config, mode, data_access_role_arn, language_code, start_pii_entities_kwargs=None, wait_for_completion=True, waiter_delay=60, waiter_max_attempts=20, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
- Bases: - ComprehendBaseOperator- Create a comprehend pii entities detection job for a collection of documents. - See also - For more information on how to use this operator, take a look at the guide: Create an Amazon Comprehend Start PII Entities Detection Job - Parameters:
- input_data_config (dict) – The input properties for a PII entities detection job. (templated) 
- output_data_config (dict) – Provides configuration parameters for the output of PII entity detection jobs. (templated) 
- mode (str) – Specifies whether the output provides the locations (offsets) of PII entities or a file in which PII entities are redacted. If you set the mode parameter to ONLY_REDACTION. In that case you must provide a RedactionConfig in start_pii_entities_kwargs. 
- data_access_role_arn (str) – The Amazon Resource Name (ARN) of the IAM role that grants Amazon Comprehend read access to your input data. (templated) 
- language_code (str) – The language of the input documents. (templated) 
- start_pii_entities_kwargs (dict[str, Any] | None) – Any optional parameters to pass to the job. If JobName is not provided in start_pii_entities_kwargs, operator will create. 
- wait_for_completion (bool) – Whether to wait for job to stop. (default: True) 
- waiter_delay (int) – Time in seconds to wait between status checks. (default: 60) 
- waiter_max_attempts (int) – Maximum number of attempts to check for job completion. (default: 20) 
- deferrable (bool) – If True, the operator will wait asynchronously for the job to stop. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False) 
- aws_conn_id – The Airflow connection used for AWS credentials. If this is - Noneor empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).
- region_name – AWS region_name. If not specified then the default boto3 behaviour is used. 
- verify – Whether to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html 
- botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html 
 
 
- class airflow.providers.amazon.aws.operators.comprehend.ComprehendCreateDocumentClassifierOperator(document_classifier_name, input_data_config, mode, data_access_role_arn, language_code, fail_on_warnings=False, output_data_config=None, document_classifier_kwargs=None, wait_for_completion=True, waiter_delay=60, waiter_max_attempts=20, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), aws_conn_id='aws_default', **kwargs)[source]¶
- Bases: - airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[- airflow.providers.amazon.aws.hooks.comprehend.ComprehendHook]- Create a comprehend document classifier that can categorize documents. - Provide a set of training documents that are labeled with the categories. - See also - For more information on how to use this operator, take a look at the guide: Create an Amazon Comprehend Document Classifier - Parameters:
- document_classifier_name (str) – The name of the document classifier. (templated) 
- input_data_config (dict[str, Any]) – Specifies the format and location of the input data for the job. (templated) 
- mode (str) – Indicates the mode in which the classifier will be trained. (templated) 
- data_access_role_arn (str) – The Amazon Resource Name (ARN) of the IAM role that grants Amazon Comprehend read access to your input data. (templated) 
- language_code (str) – The language of the input documents. You can specify any of the languages supported by Amazon Comprehend. All documents must be in the same language. (templated) 
- fail_on_warnings (bool) – If set to True, the document classifier training job will throw an error when the status is TRAINED_WITH_WARNING. (default False) 
- output_data_config (dict[str, Any] | None) – Specifies the location for the output files from a custom classifier job. This parameter is required for a request that creates a native document model. (templated) 
- document_classifier_kwargs (dict[str, Any] | None) – Any optional parameters to pass to the document classifier. (templated) 
- wait_for_completion (bool) – Whether to wait for job to stop. (default: True) 
- waiter_delay (int) – Time in seconds to wait between status checks. (default: 60) 
- waiter_max_attempts (int) – Maximum number of attempts to check for job completion. (default: 20) 
- deferrable (bool) – If True, the operator will wait asynchronously for the job to stop. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False) 
- aws_conn_id (str | None) – The Airflow connection used for AWS credentials. If this is - Noneor empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).
- region_name – AWS region_name. If not specified then the default boto3 behaviour is used. 
- verify – Whether to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html 
- botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html 
 
 - template_fields: collections.abc.Sequence[str][source]¶