Data Factory: Use a SQL Query to create a Data Source

When we include a data source inside a data flow it always requests a dataset. The dataset, in turn, requires you to point to a table. A dataset can’t be defined over a query, it needs an object on the linked service, which may be a table or view as you may notice on the image below.

 

 

How can we work around this and use a query as a source for a data flow?

The secret is on the source object we use on the data flow. We need to select a dataset, as always. However, on the 2nd tab, Source Options, we can choose the input type as Query and define a SQL query. The source will ignore the table configuration in the dataset and get the data from the query.

This is how the 1st tab will look like when we select the dataset:

This is how the 2nd tab looks like with the query defined:

 

After defining the query, we can click the button Import Projection. Data Factory will need to initialize the Integration Runtime, so it can execute the import of the schema. Once the Integration Runtime is initialized, the Import Projection can proceed. Usually you will need to click the button again.

On the Projection tab we will not see anything related to the table at all, only the query results will be there.

 

The schema drift properties on this scenario are still optional. They will act completely over the query. We import the projection schema from the query. The dataflow will accept additional fields beyond the ones defined during development if the schema drift property is enabled, as it usually does.

The Dataflow Code

We can say in some ways the data factory data flows have two different languages: The Data Flow Script (DFS) and the json syntax. The two buttons on the top right of the Data Factory screen allow us to see the code.

The DFS from this script makes no reference to the dataset at all:

Data Factory converts the DFS to a single script line in the JSON file. The JSON file requires a source dataset specified, but many dataset definitions, such as the table, will be ignored. The resulting JSON will be like the image below.

The dataset, on this example, establishes a link between the data flow and the linked service, but only that.