airflow.providers.apache.hive.transfers.s3_to_hive¶
This module contains an operator to move data from an S3 bucket to Hive.
Module Contents¶
Classes¶
| Moves data from S3 to Hive. The operator downloads a file from S3, | 
- class airflow.providers.apache.hive.transfers.s3_to_hive.S3ToHiveOperator(*, s3_key: str, field_dict: Dict, hive_table: str, delimiter: str = ',', create: bool = True, recreate: bool = False, partition: Optional[Dict] = None, headers: bool = False, check_headers: bool = False, wildcard_match: bool = False, aws_conn_id: str = 'aws_default', verify: Optional[Union[bool, str]] = None, hive_cli_conn_id: str = 'hive_cli_default', input_compressed: bool = False, tblproperties: Optional[Dict] = None, select_expression: Optional[str] = None, **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- Moves data from S3 to Hive. The operator downloads a file from S3, stores the file locally before loading it into a Hive table. If the - createor- recreatearguments are set to- True, a- CREATE TABLEand- DROP TABLEstatements are generated. Hive data types are inferred from the cursor's metadata from.- Note that the table generated in Hive uses - STORED AS textfilewhich isn't the most efficient serialization format. If a large amount of data is loaded and/or if the tables gets queried considerably, you may want to use this operator only to stage the data into a temporary table before loading it into its final destination using a- HiveOperator.- Parameters
- s3_key (str) -- The key to be retrieved from S3. (templated) 
- field_dict (dict) -- A dictionary of the fields name in the file as keys and their Hive types as values 
- hive_table (str) -- target Hive table, use dot notation to target a specific database. (templated) 
- delimiter (str) -- field delimiter in the file 
- create (bool) -- whether to create the table if it doesn't exist 
- recreate (bool) -- whether to drop and recreate the table at every execution 
- partition (dict) -- target partition as a dict of partition columns and values. (templated) 
- headers (bool) -- whether the file contains column names on the first line 
- check_headers (bool) -- whether the column names on the first line should be checked against the keys of field_dict 
- wildcard_match (bool) -- whether the s3_key should be interpreted as a Unix wildcard pattern 
- aws_conn_id (str) -- source s3 connection 
- Whether or not to verify SSL certificates for S3 connection. By default SSL certificates are verified. You can provide the following values: - False: do not validate SSL certificates. SSL will still be used
- (unless use_ssl is False), but SSL certificates will not be verified. 
 
- path/to/cert/bundle.pem: A filename of the CA cert bundle to uses.
- You can specify this argument if you want to use a different CA cert bundle than the one used by botocore. 
 
 
- hive_cli_conn_id (str) -- Reference to the Hive CLI connection id. 
- input_compressed (bool) -- Boolean to determine if file decompression is required to process headers 
- tblproperties (dict) -- TBLPROPERTIES of the hive table being created 
- select_expression (str) -- S3 Select expression