airflow.providers.amazon.aws.operators.redshift_cluster¶

Classes¶

`RedshiftCreateClusterOperator`	Creates a new cluster with the specified parameters.
`RedshiftCreateClusterSnapshotOperator`	Creates a manual snapshot of the specified cluster. The cluster must be in the available state.
`RedshiftDeleteClusterSnapshotOperator`	Deletes the specified manual snapshot.
`RedshiftResumeClusterOperator`	Resume a paused AWS Redshift Cluster.
`RedshiftPauseClusterOperator`	Pause an AWS Redshift Cluster if it has status available.
`RedshiftDeleteClusterOperator`	Delete an AWS Redshift cluster.

Module Contents¶

class airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftCreateClusterOperator(*, cluster_identifier, node_type, master_username, master_user_password, cluster_type='multi-node', db_name='dev', number_of_nodes=1, cluster_security_groups=None, vpc_security_group_ids=None, cluster_subnet_group_name=None, availability_zone=None, preferred_maintenance_window=None, cluster_parameter_group_name=None, automated_snapshot_retention_period=1, manual_snapshot_retention_period=None, port=5439, cluster_version='1.0', allow_version_upgrade=True, publicly_accessible=True, encrypted=False, hsm_client_certificate_identifier=None, hsm_configuration_identifier=None, elastic_ip=None, tags=None, kms_key_id=None, enhanced_vpc_routing=False, additional_info=None, iam_roles=None, maintenance_track_name=None, snapshot_schedule_identifier=None, availability_zone_relocation=None, aqua_configuration_status=None, default_iam_role_arn=None, aws_conn_id='aws_default', wait_for_completion=False, max_attempt=5, poll_interval=60, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶

Bases: airflow.models.BaseOperator

Creates a new cluster with the specified parameters.

See also

For more information on how to use this operator, take a look at the guide: Create an Amazon Redshift cluster

Parameters:

cluster_identifier (str) – A unique identifier for the cluster.
node_type (str) – The node type to be provisioned for the cluster. Valid Values: ds2.xlarge, ds2.8xlarge, dc1.large, dc1.8xlarge, dc2.large, dc2.8xlarge, ra3.xlplus, ra3.4xlarge, and ra3.16xlarge.
master_username (str) – The username associated with the admin user account for the cluster that is being created.
master_user_password (str) – The password associated with the admin user account for the cluster that is being created.
cluster_type (str) – The type of the cluster single-node or multi-node. The default value is multi-node.
db_name (str) – The name of the first database to be created when the cluster is created.
number_of_nodes (int) – The number of compute nodes in the cluster. This param require when cluster_type is multi-node.
cluster_security_groups (list[str] | None) – A list of security groups to be associated with this cluster.
vpc_security_group_ids (list[str] | None) – A list of VPC security groups to be associated with the cluster.
cluster_subnet_group_name (str | None) – The name of a cluster subnet group to be associated with this cluster.
availability_zone (str | None) – The EC2 Availability Zone (AZ).
preferred_maintenance_window (str | None) – The time range (in UTC) during which automated cluster maintenance can occur.
cluster_parameter_group_name (str | None) – The name of the parameter group to be associated with this cluster.
automated_snapshot_retention_period (int) – The number of days that automated snapshots are retained. The default value is 1.
manual_snapshot_retention_period (int | None) – The default number of days to retain a manual snapshot.
port (int) – The port number on which the cluster accepts incoming connections. The Default value is 5439.
cluster_version (str) – The version of a Redshift engine software that you want to deploy on the cluster.
allow_version_upgrade (bool) – Whether major version upgrades can be applied during the maintenance window. The Default value is True.
publicly_accessible (bool) – Whether cluster can be accessed from a public network.
encrypted (bool) – Whether data in the cluster is encrypted at rest. The default value is False.
hsm_client_certificate_identifier (str | None) – Name of the HSM client certificate the Amazon Redshift cluster uses to retrieve the data.
hsm_configuration_identifier (str | None) – Name of the HSM configuration
elastic_ip (str | None) – The Elastic IP (EIP) address for the cluster.
tags (list[Any] | None) – A list of tag instances
kms_key_id (str | None) – KMS key id of encryption key.
enhanced_vpc_routing (bool) – Whether to create the cluster with enhanced VPC routing enabled Default value is False.
additional_info (str | None) – Reserved
iam_roles (list[str] | None) – A list of IAM roles that can be used by the cluster to access other AWS services.
maintenance_track_name (str | None) – Name of the maintenance track for the cluster.
snapshot_schedule_identifier (str | None) – A unique identifier for the snapshot schedule.
availability_zone_relocation (bool | None) – Enable relocation for a Redshift cluster between Availability Zones after the cluster is created.
aqua_configuration_status (str | None) – The cluster is configured to use AQUA .
default_iam_role_arn (str | None) – ARN for the IAM role.
aws_conn_id (str | None) – str | None = The Airflow connection used for AWS credentials. The default connection id is aws_default.
wait_for_completion (bool) – Whether wait for the cluster to be in available state
max_attempt (int) – The maximum number of attempts to be made. Default: 5
poll_interval (int) – The amount of time in seconds to wait between attempts. Default: 60
deferrable (bool) – If True, the operator will run in deferrable mode

template_fields: collections.abc.Sequence[str] = ('cluster_identifier', 'cluster_type', 'node_type', 'master_username', 'master_user_password',...[source]¶

ui_color = '#eeaa11'[source]¶

ui_fgcolor = '#ffffff'[source]¶

cluster_identifier[source]¶

node_type[source]¶

master_username[source]¶

master_user_password[source]¶

cluster_type = 'multi-node'[source]¶

db_name = 'dev'[source]¶

number_of_nodes = 1[source]¶

cluster_security_groups = None[source]¶

vpc_security_group_ids = None[source]¶

cluster_subnet_group_name = None[source]¶

availability_zone = None[source]¶

preferred_maintenance_window = None[source]¶

cluster_parameter_group_name = None[source]¶

automated_snapshot_retention_period = 1[source]¶

manual_snapshot_retention_period = None[source]¶

port = 5439[source]¶

cluster_version = '1.0'[source]¶

allow_version_upgrade = True[source]¶

publicly_accessible = True[source]¶

encrypted = False[source]¶

hsm_client_certificate_identifier = None[source]¶

hsm_configuration_identifier = None[source]¶

elastic_ip = None[source]¶

tags = None[source]¶

kms_key_id = None[source]¶

enhanced_vpc_routing = False[source]¶

additional_info = None[source]¶

iam_roles = None[source]¶

maintenance_track_name = None[source]¶

snapshot_schedule_identifier = None[source]¶

availability_zone_relocation = None[source]¶

aqua_configuration_status = None[source]¶

default_iam_role_arn = None[source]¶

aws_conn_id = 'aws_default'[source]¶

wait_for_completion = False[source]¶

max_attempt = 5[source]¶

poll_interval = 60[source]¶

deferrable = True[source]¶

kwargs[source]¶

execute(context)[source]¶

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, event=None)[source]¶

class airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftCreateClusterSnapshotOperator(*, snapshot_identifier, cluster_identifier, retention_period=-1, tags=None, wait_for_completion=False, poll_interval=15, max_attempt=20, aws_conn_id='aws_default', deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶

Bases: airflow.models.BaseOperator

Creates a manual snapshot of the specified cluster. The cluster must be in the available state.

See also

For more information on how to use this operator, take a look at the guide: Create an Amazon Redshift cluster snapshot

Parameters:

snapshot_identifier (str) – A unique identifier for the snapshot that you are requesting
cluster_identifier (str) – The cluster identifier for which you want a snapshot
retention_period (int) – The number of days that a manual snapshot is retained. If the value is -1, the manual snapshot is retained indefinitely.
tags (list[Any] | None) – A list of tag instances
wait_for_completion (bool) – Whether wait for the cluster snapshot to be in available state
poll_interval (int) – Time (in seconds) to wait between two consecutive calls to check state
max_attempt (int) – The maximum number of attempts to be made to check the state
aws_conn_id (str | None) – The Airflow connection used for AWS credentials. The default connection id is aws_default
deferrable (bool) – If True, the operator will run as a deferrable operator.

template_fields: collections.abc.Sequence[str] = ('cluster_identifier', 'snapshot_identifier')[source]¶

snapshot_identifier[source]¶

cluster_identifier[source]¶

retention_period = -1[source]¶

tags = None[source]¶

wait_for_completion = False[source]¶

poll_interval = 15[source]¶

max_attempt = 20[source]¶

deferrable = True[source]¶

aws_conn_id = 'aws_default'[source]¶

redshift_hook[source]¶

execute(context)[source]¶

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, event=None)[source]¶

class airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftDeleteClusterSnapshotOperator(*, snapshot_identifier, cluster_identifier, wait_for_completion=True, aws_conn_id='aws_default', poll_interval=10, **kwargs)[source]¶

Bases: airflow.models.BaseOperator

Deletes the specified manual snapshot.

See also

For more information on how to use this operator, take a look at the guide: Delete an Amazon Redshift cluster snapshot

Parameters:

snapshot_identifier (str) – A unique identifier for the snapshot that you are requesting
cluster_identifier (str) – The unique identifier of the cluster the snapshot was created from
wait_for_completion (bool) – Whether wait for cluster deletion or not The default value is True
aws_conn_id (str | None) – The Airflow connection used for AWS credentials. The default connection id is aws_default
poll_interval (int) – Time (in seconds) to wait between two consecutive calls to check snapshot state

template_fields: collections.abc.Sequence[str] = ('cluster_identifier', 'snapshot_identifier')[source]¶

snapshot_identifier[source]¶

cluster_identifier[source]¶

wait_for_completion = True[source]¶

poll_interval = 10[source]¶

redshift_hook[source]¶

execute(context)[source]¶

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

get_status()[source]¶

class airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftResumeClusterOperator(*, cluster_identifier, aws_conn_id='aws_default', wait_for_completion=False, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), poll_interval=30, max_attempts=30, **kwargs)[source]¶

Bases: airflow.models.BaseOperator

Resume a paused AWS Redshift Cluster.

See also

For more information on how to use this operator, take a look at the guide: Resume an Amazon Redshift cluster

Parameters:

cluster_identifier (str) – Unique identifier of the AWS Redshift cluster
aws_conn_id (str | None) – The Airflow connection used for AWS credentials. The default connection id is aws_default
poll_interval (int) – Time (in seconds) to wait between two consecutive calls to check cluster state
max_attempts (int) – The maximum number of attempts to check the state of the cluster.
wait_for_completion (bool) – If True, the operator will wait for the cluster to be in the resumed state. Default is False.
deferrable (bool) – If True, the operator will run as a deferrable operator.

template_fields: collections.abc.Sequence[str] = ('cluster_identifier',)[source]¶

ui_color = '#eeaa11'[source]¶

ui_fgcolor = '#ffffff'[source]¶

cluster_identifier[source]¶

aws_conn_id = 'aws_default'[source]¶

wait_for_completion = False[source]¶

deferrable = True[source]¶

max_attempts = 30[source]¶

poll_interval = 30[source]¶

execute(context)[source]¶

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, event=None)[source]¶

class airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftPauseClusterOperator(*, cluster_identifier, aws_conn_id='aws_default', wait_for_completion=False, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), poll_interval=30, max_attempts=30, **kwargs)[source]¶

Bases: airflow.models.BaseOperator

Pause an AWS Redshift Cluster if it has status available.

See also

For more information on how to use this operator, take a look at the guide: Pause an Amazon Redshift cluster

Parameters:

cluster_identifier (str) – id of the AWS Redshift Cluster
aws_conn_id (str | None) – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).
wait_for_completion (bool) – If True, waits for the cluster to be paused. (default: False)
deferrable (bool) – Run operator in the deferrable mode
poll_interval (int) – Time (in seconds) to wait between two consecutive calls to check cluster state
max_attempts (int) – Maximum number of attempts to poll the cluster

template_fields: collections.abc.Sequence[str] = ('cluster_identifier',)[source]¶

ui_color = '#eeaa11'[source]¶

ui_fgcolor = '#ffffff'[source]¶

cluster_identifier[source]¶

aws_conn_id = 'aws_default'[source]¶

wait_for_completion = False[source]¶

deferrable = True[source]¶

max_attempts = 30[source]¶

poll_interval = 30[source]¶

execute(context)[source]¶

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, event=None)[source]¶

class airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftDeleteClusterOperator(*, cluster_identifier, skip_final_cluster_snapshot=True, final_cluster_snapshot_identifier=None, wait_for_completion=True, aws_conn_id='aws_default', poll_interval=30, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), max_attempts=30, **kwargs)[source]¶

Bases: airflow.models.BaseOperator

Delete an AWS Redshift cluster.

See also

For more information on how to use this operator, take a look at the guide: Delete an Amazon Redshift cluster

Parameters:

cluster_identifier (str) – unique identifier of a cluster
skip_final_cluster_snapshot (bool) – determines cluster snapshot creation
final_cluster_snapshot_identifier (str | None) – name of final cluster snapshot
wait_for_completion (bool) – Whether wait for cluster deletion or not The default value is True
aws_conn_id (str | None) – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).
poll_interval (int) – Time (in seconds) to wait between two consecutive calls to check cluster state
deferrable (bool) – Run operator in the deferrable mode.
max_attempts (int) – (Deferrable mode only) The maximum number of attempts to be made

template_fields: collections.abc.Sequence[str] = ('cluster_identifier',)[source]¶

ui_color = '#eeaa11'[source]¶

ui_fgcolor = '#ffffff'[source]¶

cluster_identifier[source]¶

skip_final_cluster_snapshot = True[source]¶

final_cluster_snapshot_identifier = None[source]¶

wait_for_completion = True[source]¶

poll_interval = 30[source]¶

redshift_hook[source]¶

aws_conn_id = 'aws_default'[source]¶

deferrable = True[source]¶

max_attempts = 30[source]¶

execute(context)[source]¶

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, event=None)[source]¶