27 March 2024

678 views

1 0

Creating a Single JSON File for Configuration using Notebooks

On the blog Fabric Notebook and Deployment Pipelines I explained a technique to keep notebooks configuration values in JSON files on lakehouses, a good solution from many different points of views.

What if we need to provide maintenance to the JSON configuration file using notebooks?

The first problem is the fact the typical statement to save a dataframe as JSON will always create a folder containing the JSON files. The folder behaves like a table and all the files contain the records.

This statement will create a folder containing JSON files:

data_frame.write.mode('overwrite').json('location')

‘location’ will always be a folder, a single file name will never be accepted.

The solution is a workaround: We can use mssparkutils to execute file system statements, copying the JSON file as a single file and dropping the folder and its content.

These are the steps we need to execute:

Save the dataframe as a JSON in a temporary location
The last file in the folder created is the one we want. We need to get it and copy to it’s final location and name
We drop the entire folder and its content.

The final result of these steps is the single JSON file on the location we want it. We can develop these steps as a function, as the example below:

def saveResult (data_frame, temp_location, file_path):
    data_frame.write.mode('overwrite').json(temp_location)
    file = mssparkutils.fs.ls(temp_location)[-1].path
    mssparkutils.fs.cp(file, file_path)
    mssparkutils.fs.rm(temp_location, recurse=True)

The parameters of the function are:

data_frame: As the name explains
temp_location: Temporary location where we will save the JSON folder
file_path: Final location of the single JSON file we want.

The statements used on the mssparkutils are the following:

ls: List the files in the location
cp: Copy a file to a new location
rm: Remove a complete folder, with the option to be recursive, removing the folder content

Once the function is created, we just need to use it:

#read the json file
df = spark.read.json('Files/configuration/config.json', multiLine=True)

#change the content as needed
#save the json file using the function
saveResult(dt,'Files/configuration/accounts','Files/configuration/config.json')

Dennes Torres 03 April 2024

Dynamic Partitioning and a Simple Incremental Load

Blogs

Let’s consider a simple statement for partitioning and save a table in a lakehouse: df.write.mode("overwrite").format("delta").partitionBy("Year","Month","Day").save("Tables/" + table_name) Let’s consider we load the data daily, with all the transactions from the day. The table will save the transactions for each day in different partitions. We can expect the table to keep the partitions from previous day, … Read more

Blogs

Dennes Torres 17 April 2024

Exploration Experience: Saving and Creating Datasets

Blogs

I have been talking about Data Exploration in Power BI on many of my sessions, specially the sessions about Data Marts. The new data exploration feature is one more feature on this expanding scenario for data exploration. This one brings some interesting details. We start using this feature from a query. The feature will allow … Read more

Blogs

Dennes Torres 13 December 2023

Fabric Notebooks and Deployment Pipelines

Blogs

On my article about Fabric source control extended features, I explained how Microsoft included the notebooks on the source control. In this way we can include notebooks on a Software Development Lifecycle (SDLC) for Power BI objects. In this way, the notebooks need to flow from the development environment to test and production environments. However, … Read more

Blogs

Creating a Single JSON File for Configuration using Notebooks

Subscribe for more articles

Rate this article

Dennes Torres

Related articles

Dynamic Partitioning and a Simple Incremental Load

Exploration Experience: Saving and Creating Datasets

Fabric Notebooks and Deployment Pipelines

Tags