This module contains a Salesforce Hook which allows you to connect to your Salesforce instance, retrieve data from it, and write that data to a file for other uses.

NOTE: this hook also relies on the simple_salesforce package:

Module Contents

class airflow.contrib.hooks.salesforce_hook.SalesforceHook(conn_id, *args, **kwargs)[source]

Bases: airflow.hooks.base_hook.BaseHook


Sign into Salesforce.

If we have already signed it, this will just return the original object

make_query(self, query)[source]

Make a query to Salesforce. Returns result in dictionary


query – The query to make to Salesforce

describe_object(self, obj)[source]

Get the description of an object from Salesforce.

This description is the object’s schema and some extra metadata that Salesforce stores for each object


obj – Name of the Salesforce object that we are getting a description of.

get_available_fields(self, obj)[source]

Get a list of all available fields for an object.

This only returns the names of the fields.

static _build_field_list(fields)[source]
get_object_from_salesforce(self, obj, fields)[source]

Get all instances of the object from Salesforce. For each model, only get the fields specified in fields.

All we really do underneath the hood is run:

SELECT <fields> FROM <obj>;

classmethod _to_timestamp(cls, col)[source]

Convert a column of a dataframe to UNIX timestamps if applicable


col – A Series object representing a column of a dataframe.

write_object_to_file(self, query_results, filename, fmt='csv', coerce_to_timestamp=False, record_time_added=False)[source]

Write query results to file.

Acceptable formats are:
  • csv:

    comma-separated-values file. This is the default format.

  • json:

    JSON array. Each element in the array is a different row.

  • ndjson:

    JSON array but each element is new-line delimited instead of comma delimited like in json

This requires a significant amount of cleanup. Pandas doesn’t handle output to CSV and json in a uniform way. This is especially painful for datetime types. Pandas wants to write them as strings in CSV, but as millisecond Unix timestamps.

By default, this function will try and leave all values as they are represented in Salesforce. You use the coerce_to_timestamp flag to force all datetimes to become Unix timestamps (UTC). This is can be greatly beneficial as it will make all of your datetime fields look the same, and makes it easier to work with in other database environments

  • query_results – the results from a SQL query

  • filename – the name of the file where the data should be dumped to

  • fmt – the format you want the output in. Default: csv.

  • coerce_to_timestamp – True if you want all datetime fields to be converted into Unix timestamps. False if you want them to be left in the same format as they were in Salesforce. Leaving the value as False will result in datetimes being strings. Defaults to False

  • record_time_added(optional) True if you want to add a Unix timestamp field to the resulting data that marks when the data was fetched from Salesforce. Default: False.

Was this entry helpful?