Creating a Single JSON File for Configuration using Notebooks

On the blog Fabric Notebook and Deployment Pipelines I explained a technique to keep notebooks configuration values in JSON files on lakehouses, a good solution from many different points of views.

What if we need to provide maintenance to the JSON configuration file using notebooks?

The first problem is the fact the typical statement to save a dataframe as JSON will always create a folder containing the JSON files. The folder behaves like a table and all the files contain the records.

This statement will create a folder containing JSON files:

data_frame.write.mode('overwrite').json('location')

‘location’ will always be a folder, a single file name will never be accepted.

The solution is a workaround: We can use mssparkutils to execute file system statements, copying the JSON file as a single file and dropping the folder and its content.

These are the steps we need to execute:

  1. Save the dataframe as a JSON in a temporary location
  2. The last file in the folder created is the one we want. We need to get it and copy to it’s final location and name
  3. We drop the entire folder and its content.

The final result of these steps is the single JSON file on the location we want it. We can develop these steps as a function, as the example below:

def saveResult (data_frame, temp_location, file_path):
    data_frame.write.mode('overwrite').json(temp_location)
    file = mssparkutils.fs.ls(temp_location)[-1].path
    mssparkutils.fs.cp(file, file_path)
    mssparkutils.fs.rm(temp_location, recurse=True)

The parameters of the function are:

  • data_frame: As the name explains
  • temp_location: Temporary location where we will save the JSON folder
  • file_path: Final location of the single JSON file we want.

The statements used on the mssparkutils are the following:

  • ls: List the files in the location
  • cp: Copy a file to a new location
  • rm: Remove a complete folder, with the option to be recursive, removing the folder content

Once the function is created, we just need to use it:

#read the json file
df = spark.read.json('Files/configuration/config.json', multiLine=True)

#change the content as needed
#save the json file using the function
saveResult(dt,'Files/configuration/accounts','Files/configuration/config.json')