Azure SQL Automation: Improving the Indexing Job

The Azure SQL Databases don’t give us access to the SQL Server Agent. Usually, we use SQL Server Agent to schedule jobs on premise, but in the cloud, we need a different solution.

In the technical sessions I deliver I usually explain about Azure Automation and how we can create runbooks to execute scheduled tasks on SQL Server.

We can start from many ready-to-use runbooks of different kinds provided for us on the Automation Gallery. We choose one, import it and start from this point.

Publishing to the Automation Gallery

These runbooks are not provided only by Microsoft. Anyone can create a repository in github and the repository will appear inside the runbook gallery.

Microsoft provided instructions about how to contribute with the Automation Gallery. We need to follow some rules, but basically it involves including one specific tag on the github repo. Azure will identify the repos with this tag and import the content of the repo to the Automation Gallery, within something like 12 or 24 hours of delay.

There is one small catch: the Automation Account supports 4 kinds of runbook, but although the instructions say it is possible, I only managed to contribute with one of the types. Considering that no one outside Microsoft ever contributed with different types as well, maybe there is something missing on the instructions.

The 4 kinds of runbook supported by the Automation Account are:

  • Powershell script
  • Graphical
  • Powershell Workflow
  • Python Script

This is the filter of the gallery inside the Automation Account:

Graphical user interface, application

Description automatically generated

Python Runbook has two “sub-types”, Python 2 and Python 3. Graphical runbook can be a regular graphical one or a Graphical Powershell Workflow.

Graphical user interface, text, application, email

Description automatically generated

The only kind of runbook repository we are able to create is the PowerShell script runbook. All the others are not recognized by the Gallery, at least until now no one managed to publish a different type.

Graphical user interface

Description automatically generated with low confidence

The Existing Azure SQL indexing script

There is an Azure SQL indexing script created by the Azure Automation team.

This script is very interesting. These are some details:

  • It’s flexible. It has parameters to make the script very flexible, good for many different situations.
  • It checks the fragmentation to decide if it will reindex or not.
  • The fragmentation value used to decide for the reindex is parameterized.
  • We can use the script to reindex all the tables or a single one.

Unfortunately, the script has one bug: It doesn’t use the table schema in the code. This is not a small bug. Any table with a schema different than dbo will cause an error on the script.

Of course, it’s not only about dbo schema. It’s about the default schema configured for the login used to execute the script. Tables on the default schema will work, tables on different schemas will fail. But we don’t need to go so in deep on Azure SQL technical details.

The Solution – a new reindexing Runbook for Azure SQL

I made a fork and a pull request to the original github repository which contains the runbook and the code, contributing with a fix to the code.

However, the review of the pull request will take a while. In the meantime, I received permission from the developer to publish my changes in a new repository if I would like, creating a new runbook template in the gallery.

Unlike the original one, a PowerShell Workflow runbook, the one I created is recognized only as a PowerShell Runbook, but this is already enough to achieve the result.

The code of the new Runbook is this:

            
            
    param(            
            
        [parameter(Mandatory=$True)]            
            
        [string] $SqlServer,            
            
               
        [parameter(Mandatory=$True)]            
            
        [string] $Database,            
            
                
        [parameter(Mandatory=$True)]            
            
        [string] $SQLCredentialName,            
                    
            
        [parameter(Mandatory=$False)]            
            
        [int] $FragPercentage = 20,            
            
        [parameter(Mandatory=$False)]            
            
        [int] $SqlServerPort = 1433,            
            
                    
        [parameter(Mandatory=$False)]            
            
        [boolean] $RebuildOffline = $False,            
            
        [parameter(Mandatory=$False)]            
            
        [string] $Table            
            
                              
            
    )            
            
    # Get the stored username and password from the Automation credential            
            
    $SqlCredential = Get-AutomationPSCredential -Name $SQLCredentialName            
            
    if ($SqlCredential -eq $null)            
            
    {            
            
        throw "Could not retrieve '$SQLCredentialName' credential asset. Check that you created this first in the Automation service."            
            
    }            
            
                
            
    $SqlUsername = $SqlCredential.UserName             
            
    $SqlPass = $SqlCredential.GetNetworkCredential().Password            
            
                
            
    $TableNames = Inlinescript {            
            
                  
            
        # Define the connection to the SQL Database            
            
        $Conn = New-Object System.Data.SqlClient.SqlConnection("Server=tcp:$using:SqlServer,$using:SqlServerPort;Database=$using:Database;User ID=$using:SqlUsername;Password=$using:SqlPass;Trusted_Connection=False;Encrypt=True;Connection Timeout=30;")            
            
                     
            
        # Open the SQL connection            
            
        $Conn.Open()            
            
                    
            
        # SQL command to find tables and their average fragmentation            
            
        $SQLCommandString = @"

        SELECT a.object_id,so.name as TableName, sc.name as schemaName,avg_fragmentation_in_percent

        FROM sys.dm_db_index_physical_stats (

               DB_ID(N'$Database')

             , OBJECT_ID(0)

             , NULL

             , NULL

             , NULL) AS a

        JOIN sys.indexes AS b 

        ON a.object_id = b.object_id AND a.index_id = b.index_id

        join sys.objects so

        ON a.object_id=so.object_id

        join sys.schemas sc

        on so.schema_id=sc.schema_id

        WHERE so.type_desc='USER_TABLE'

"@            
            
        # Return the tables with their corresponding average fragmentation            
            
        $Cmd=new-object system.Data.SqlClient.SqlCommand($SQLCommandString, $Conn)            
            
        $Cmd.CommandTimeout=120            
            
                    
            
        # Execute the SQL command            
            
        $FragmentedTable=New-Object system.Data.DataSet            
            
        $Da=New-Object system.Data.SqlClient.SqlDataAdapter($Cmd)            
            
        [void]$Da.fill($FragmentedTable)            
            
        # Return the table names that have high fragmentation            
            
        ForEach ($FragTable in $FragmentedTable.Tables[0])            
            
        {            
            
            Write-Verbose ("Table Object ID:" + $FragTable.Item("object_id"))            
            
            Write-Verbose ("Fragmentation:" + $FragTable.Item("avg_fragmentation_in_percent"))            
            
                        
            
            If ($FragTable.avg_fragmentation_in_percent -ge $Using:FragPercentage)            
            
            {            
            
                # Table is fragmented. Return this table for indexing by finding its name            
            
                $result=$FragTable.Item("schemaName")  + "." + $FragTable.Item("TableName")            
            
                $result                            
            
            }            
            
        }            
            
        $Conn.Close()            
            
    }            
            
    # If a specific table was specified, then find this table if it needs to indexed, otherwise            
            
    # set the TableNames to $null since we shouldn't process any other tables.            
            
    If ($Table)            
            
    {            
            
        Write-Verbose ("Single Table specified: $Table")            
            
        If ($TableNames -contains $Table)            
            
        {            
            
            $TableNames = $Table            
            
        }            
            
        Else            
            
        {            
            
            # Remove other tables since only a specific table was specified.            
            
            Write-Verbose ("Table not found: $Table")            
            
            $TableNames = $Null            
            
        }            
            
    }            
            
    # Interate through tables with high fragmentation and rebuild indexes            
            
    ForEach ($TableName in $TableNames)            
            
    {            
            
      Write-Verbose "Creating checkpoint"            
            
      Checkpoint-Workflow            
            
      Write-Verbose "Indexing Table $TableName..."            
            
                  
            
      InlineScript {            
            
                      
            
        $SQLCommandString = @"

        EXEC('ALTER INDEX ALL ON $Using:TableName REBUILD with (ONLINE=ON)')

"@            
            
        # Define the connection to the SQL Database            
            
        $Conn = New-Object System.Data.SqlClient.SqlConnection("Server=tcp:$using:SqlServer,$using:SqlServerPort;Database=$using:Database;User ID=$using:SqlUsername;Password=$using:SqlPass;Trusted_Connection=False;Encrypt=True;Connection Timeout=30;")            
            
                    
            
        # Open the SQL connection            
            
        $Conn.Open()            
            
        # Define the SQL command to run. In this case we are getting the number of rows in the table            
            
        $Cmd=new-object system.Data.SqlClient.SqlCommand($SQLCommandString, $Conn)            
            
        # Set the Timeout to be less than 30 minutes since the job will get queued if > 30            
            
        # Setting to 25 minutes to be safe.            
            
        $Cmd.CommandTimeout=1500            
            
        # Execute the SQL command            
            
        Try             
            
        {            
            
            $Ds=New-Object system.Data.DataSet            
            
            $Da=New-Object system.Data.SqlClient.SqlDataAdapter($Cmd)            
            
            [void]$Da.fill($Ds)            
            
        }            
            
        Catch            
            
        {            
            
            if (($_.Exception -match "offline") -and ($Using:RebuildOffline) )            
            
            {            
            
                Write-Verbose ("Building table $Using:TableName offline")            
            
                $SQLCommandString = @"

                EXEC('ALTER INDEX ALL ON $Using:TableName REBUILD')

"@                          
            
                # Define the SQL command to run.             
            
                $Cmd=new-object system.Data.SqlClient.SqlCommand($SQLCommandString, $Conn)            
            
                # Set the Timeout to be less than 30 minutes since the job will get queued if > 30            
            
                # Setting to 25 minutes to be safe.            
            
                $Cmd.CommandTimeout=1500            
            
                # Execute the SQL command            
            
                $Ds=New-Object system.Data.DataSet            
            
                $Da=New-Object system.Data.SqlClient.SqlDataAdapter($Cmd)            
            
                [void]$Da.fill($Ds)            
            
            }            
            
            Else            
            
            {            
            
                # Will catch the exception here so other tables can be processed.            
            
                Write-Error "Table $Using:TableName could not be indexed. Investigate indexing each index instead of the complete table $_"            
            
             }            
            
        }            
            
        # Close the SQL connection            
            
        $Conn.Close()            
            
      }              
            
    }            
            
    Write-Verbose "Finished Indexing"

These are the main differences between this new script and the original one:

  • The original one is a powershell workflow, while the new one is only a powershell script, because the gallery doesn’t recognize workflows runbooks.
  • The new script changed the SQL query to include the sys.schema DMV, making the correct joins to retrieve the table schema
  • The retrieved schema is concatenated with the table name to build the name which will be used during the reindexing.

Conclusion

The Azure Automation Account is a great feature to schedule tasks on Azure SQL and all other Azure services. The fact we can contribute to this feature building runbooks templates on github and offering them to the entire community makes this feature even better.