Apache Cassandra Operators

Apache Cassandra is an open source distributed NoSQL database that can be used when you need scalability and high availability without compromising performance. It offers linear scalability and fault-tolerance on commodity hardware or cloud infrastructure which makes it the perfect platform for mission-critical data. It supports multi-datacenter replication with lower latencies.

Prerequisite

To use operators, you must configure a Cassandra Connection.

Waiting for a Table to be created

The CassandraTableSensor operator is used to check for the existence of a table in a Cassandra cluster.

Use the table parameter (set in default_args in the example below) to poke until the provided table is found. Use dot notation to target a specific keyspace.

Waiting for a Record to be created

The CassandraRecordSensor operator is used to check for the existence of a record of a table in the Cassandra cluster.

Use the table parameter (set in default_args in the example below) to mention the keyspace and table for the record. Use dot notation to target a specific keyspace.

Use the keys parameter to poke until the provided record is found. The existence of record is identified using key value pairs. In the given example, we're are looking for value v1 in column p1 and v2 in column p2.

Example use of these sensors

airflow/providers/apache/cassandra/example_dags/example_cassandra_dag.py[source]

with DAG(
    dag_id='example_cassandra_operator',
    schedule_interval=None,
    start_date=datetime(2021, 1, 1),
    default_args={'table': 'keyspace_name.table_name'},
    catchup=False,
    tags=['example'],
) as dag:
    table_sensor = CassandraTableSensor(task_id="cassandra_table_sensor")

    record_sensor = CassandraRecordSensor(task_id="cassandra_record_sensor", keys={"p1": "v1", "p2": "v2"})

Reference

For further information, look at Cassandra Query Language (CQL) SELECT statement.

Was this entry helpful?