airflow.providers.salesforce.hooks.salesforce

Connect to your Salesforce instance, retrieve data from it, and write that data to a file for other uses.

Note

this hook also relies on the simple_salesforce package: https://github.com/simple-salesforce/simple-salesforce

Module Contents

Classes

SalesforceHook

Creates new connection to Salesforce and allows you to pull data out of SFDC and save it to a file.

Attributes

log

airflow.providers.salesforce.hooks.salesforce.log[source]
class airflow.providers.salesforce.hooks.salesforce.SalesforceHook(salesforce_conn_id=default_conn_name, session_id=None, session=None)[source]

Bases: airflow.hooks.base.BaseHook

Creates new connection to Salesforce and allows you to pull data out of SFDC and save it to a file.

You can then use that file with other Airflow operators to move the data into another data source.

Parameters
  • conn_id – The name of the connection that has the parameters needed to connect to Salesforce. The connection should be of type Salesforce.

  • session_id (str | None) – The access token for a given HTTP request session.

  • session (requests.Session | None) – A custom HTTP request session. This enables the use of requests Session features not otherwise exposed by simple_salesforce.

Note

A connection to Salesforce can be created via several authentication options:

  • Password: Provide Username, Password, and Security Token

  • Direct Session: Provide a session_id and either Instance or Instance URL

  • OAuth 2.0 JWT: Provide a Consumer Key and either a Private Key or Private Key File Path

  • IP Filtering: Provide Username, Password, and an Organization ID

If in sandbox, enter a Domain value of ‘test’.

conn_name_attr = 'salesforce_conn_id'[source]
default_conn_name = 'salesforce_default'[source]
conn_type = 'salesforce'[source]
hook_name = 'Salesforce'[source]
classmethod get_connection_form_widgets()[source]

Return connection widgets to add to connection form.

classmethod get_ui_field_behaviour()[source]

Return custom field behaviour.

conn()[source]

Returns a Salesforce instance. (cached).

get_conn()[source]

Return a Salesforce instance. (cached).

make_query(query, include_deleted=False, query_params=None)[source]

Make a query to Salesforce.

Parameters
  • query (str) – The query to make to Salesforce.

  • include_deleted (bool) – True if the query should include deleted records.

  • query_params (dict | None) – Additional optional arguments

Returns

The query result.

Return type

dict

describe_object(obj)[source]

Get the description of an object from Salesforce.

This description is the object’s schema and some extra metadata that Salesforce stores for each object.

Parameters

obj (str) – The name of the Salesforce object that we are getting a description of.

Returns

the description of the Salesforce object.

Return type

dict

get_available_fields(obj)[source]

Get a list of all available fields for an object.

Parameters

obj (str) – The name of the Salesforce object that we are getting a description of.

Returns

the names of the fields.

Return type

list[str]

get_object_from_salesforce(obj, fields)[source]

Get all instances of the object from Salesforce.

For each model, only get the fields specified in fields.

All we really do underneath the hood is run:

SELECT <fields> FROM <obj>;

Parameters
  • obj (str) – The object name to get from Salesforce.

  • fields (Iterable[str]) – The fields to get from the object.

Returns

all instances of the object from Salesforce.

Return type

dict

write_object_to_file(query_results, filename, fmt='csv', coerce_to_timestamp=False, record_time_added=False)[source]

Write query results to file.

Acceptable formats are:
  • csv:

    comma-separated-values file. This is the default format.

  • json:

    JSON array. Each element in the array is a different row.

  • ndjson:

    JSON array but each element is new-line delimited instead of comma delimited like in json

This requires a significant amount of cleanup. Pandas doesn’t handle output to CSV and json in a uniform way. This is especially painful for datetime types. Pandas wants to write them as strings in CSV, but as millisecond Unix timestamps.

By default, this function will try and leave all values as they are represented in Salesforce. You use the coerce_to_timestamp flag to force all datetimes to become Unix timestamps (UTC). This is can be greatly beneficial as it will make all of your datetime fields look the same, and makes it easier to work with in other database environments

Parameters
  • query_results (list[dict]) – the results from a SQL query

  • filename (str) – the name of the file where the data should be dumped to

  • fmt (str) – the format you want the output in. Default: ‘csv’

  • coerce_to_timestamp (bool) – True if you want all datetime fields to be converted into Unix timestamps. False if you want them to be left in the same format as they were in Salesforce. Leaving the value as False will result in datetimes being strings. Default: False

  • record_time_added (bool) – True if you want to add a Unix timestamp field to the resulting data that marks when the data was fetched from Salesforce. Default: False

Returns

the dataframe that gets written to the file.

Return type

pandas.DataFrame

object_to_df(query_results, coerce_to_timestamp=False, record_time_added=False)[source]

Export query results to dataframe.

By default, this function will try and leave all values as they are represented in Salesforce. You use the coerce_to_timestamp flag to force all datetimes to become Unix timestamps (UTC). This is can be greatly beneficial as it will make all of your datetime fields look the same, and makes it easier to work with in other database environments

Parameters
  • query_results (list[dict]) – the results from a SQL query

  • coerce_to_timestamp (bool) – True if you want all datetime fields to be converted into Unix timestamps. False if you want them to be left in the same format as they were in Salesforce. Leaving the value as False will result in datetimes being strings. Default: False

  • record_time_added (bool) – True if you want to add a Unix timestamp field to the resulting data that marks when the data was fetched from Salesforce. Default: False

Returns

the dataframe.

Return type

pandas.DataFrame

test_connection()[source]

Test the Salesforce connectivity.

Was this entry helpful?