January 31, 2017

D3D12 Shader Live-Reloading

Introduction

I previously wrote about ShaderSet, which was my attempt at making a clean, efficient, and simple shader live-reloading interface for OpenGL 4.

Since ShaderSet was so fun to use, I wanted to have the same thing in my D3D12 coding. As a result, I came up with PipelineSet. This class makes it easy to live-reload shaders, while encapsulating the complexity of compiling pipeline state in a multi-threaded fashion, and also allowing advanced usage to fit your rendering engine’s needs.

Show Me The Code

In summary, the interface looks something like what follows. I tried to show how it fits into the design of a component-based renderer.

// Example component of the renderer
class MyRenderComponent
{
  ID3D12RootSignature** mppRS;
  ID3D12PipelineState** mppPSO;

public:
  void Init(IPipelineState* pPipeSet)
  {
    // set up your PSO desc
    D3D12_GRAPHICS_PIPELINE_STATE_DESC desc = { ... };

    // associate the compiled shader file names to shader stages
    GraphicsPipelineFiles files;
    // note: scene.vs.cso also contains root signature
    files.RSFile = L"scene.vs.cso";
    files.VSFile = L"scene.vs.cso";
    files.PSFile = L"scene.ps.cso";

    std::tie(mppRS, mppPSO) = pPipeSet->AddPipeline(desc, files);
  }

  void WriteCmds(ID3D12GraphicsCommandList* pCmdList)
  {
    if (!*mppRS || !*mppPSO)
    {
      // not compiled yet, or failed to compile
      return;
    }

    pCmdList->SetGraphicsRootSignature(*mppRS);
    pCmdList->SetPipelineState(*mppPSO);
    // TODO: Set root parameters and etc
    pCmdList->DrawInstanced(...);
  }
};

std::shared_ptr<IPipelineState> pPipeSet;

void RendererInit()
{
  pPipeSet = IPipelineSet::Create(pDevice, kMaximumFrameLatency);

  // let each component add its pipelines
  foreach (component in renderer)
  {
      component->Init(pPipeSet.get());
  }

  // Kick-off building the pipelines.
  // Can no longer add pipelines after this point.
  HANDLE hBuild = pPipeSet->BuildAllAsync();

  // wait for pipelines to finish building
  if (WaitForSingleObject(hBuild, INFINITE) != WAIT_OBJECT_0) {
    fprintf(stderr, "BuildAllAsync fatal error\n");
    exit(1);
  }
}

void RendererUpdate()
{
  // updates pipelines that have reloaded since last update
  // also garbage-collects unused pipelines after kMaximumFrameLatency updates
  pPipeSet->UpdatePipelines();

  foreach (component in renderer)
  {
    component->WriteCmds(pCmdList);
  }

  SubmitCmds();
}

The big idea is to add pipeline descs to the PipelineSet, and those descs don’t need to specify bytecode for their shader stages. Instead, the names of the compiled shader objects for each shader stage are passed through the “GraphicsPipelineFiles” or “ComputePipelineFiles” struct.

Each added shader returns a double-pointer to the root signature and pipeline state. This indirection allows the root signature and pipeline state to be reloaded, and also allows code to deal with the PipelineSet in an abstract manner. (It’s “just a double pointer”, not a PipelineSet-specific class.)

From there, BuildAllAsync() will build all the pipelines in the PipelineSet in a multi-threaded fashion, using the Windows Threadpool. When the returned handle is signaled, that means the compilation has finished.

Finally, you must call UpdatePipelines() at each frame. This does two things: First, it’ll update any pipelines and root signature that have been reloaded since the last update. Second, it garbage-collects any root signature and pipelines that are no longer used (ie. because they have been replaced by their new reloaded versions.) This garbage collection is done by deleting the resources only after kMaximumFrameLatency updates have passed. This works because it’s guaranteed that no more frames are in flight on the GPU with this pipeline state, since it exceeds the depth of your CPU to GPU pipeline.

The Workflow

IPipelineSet is designed to work along with Visual Studio’s built-in HLSL compiler. The big idea is to rebuild your shaders from Visual Studio while your program is running. This works quite conveniently, since Visual Studio’s default behavior for .hlsl files is to compile them to .cso (“compiled shader object”) files that can be loaded directly as bytecode by D3D12.

Normally, Visual Studio will force you to stop debugging if you want to rebuild your solution. However, if you “Start Without Debugging” (or hit Ctrl+F5 instead of just F5), then you can still build while your program is running. From there, you can make changes to your HLSL shaders while your program is running, and hit Ctrl+Shift+B to rebuild them live. The IPipelineSet will then detect a change in your cso files, and live-reload any affected root signatures and pipeline state objects.

To maintain bindings between shaders and C++, I used a so-called “preamble file” in ShaderSet. This preamble is not necessary with HLSL, since we can use its native #include functionality. Using this feature, I create a hlsli file (the HLSL equivalent of a C header) for the shaders I use. For example, if I have two shaders “scene.vs.hlsl” and “scene.ps.hlsl”, I create a third file “scene.rs.hlsli”, which contains two things:

The Root signature, as #define SCENE_RS “RootFlags(0), etc”
The root parameter locations, like #define SCENE_CAMERA_CBV_PARAM 0

I include this rs.hlsli file from my vertex/pixel shaders, then put [RootSignature(SCENE_RS)] before their main. From there, I pick registers for buffers/textures/etc using the conventions specified in the root signature.

I also include this rs.hlsli file from my C++ code, which lets me directly refer to the root parameter slots in my code that sets root signature parameters.

As an example, let’s suppose I want to render a 3D model in a typical 3D scene. The vertex shader transforms each vertex by the MVP matrix, and the pixel shader reads from a texture to color the model. I might have a scene.rs.hlsli as follows:

#ifndef SCENE_RS_HLSLI
#define SCENE_RS_HLSLI

#define SCENE_RS \
"RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT)," \
"CBV(b0, visibility=SHADER_VISIBILITY_VERTEX)," \
"DescriptorTable(SRV(t0), visibility=SHADER_VISIBILITY_PIXEL)," \
"StaticSampler(s0, visibility=SHADER_VISIBILITY_PIXEL)"

#define SCENE_RS_MVP_CBV_PARAM 0
#define SCENE_RS_TEX0_DESCRIPTOR_TABLE_PARAM 1

#endif // SCENE_RS_HLSLI

This code defines the root signature for use in HLSL. (See: Specifying Root Signatures in HLSL) The defines at the bottom correspond to root parameter slots, and they match the order of root parameters specified in the root signature string.

The vertex shader scene.vs.hlsl would then be something like:

#include "scene.rs.hlsli"

cbuffer MVPCBV : register(b0) {
    float4x4 MVP;
};

struct VS_INPUT {
    float3 Position : POSITION;
    float2 TexCoord : TEXCOORD;
};

struct VS_OUTPUT {
    float4 Position : SV_Position;
    float2 TexCoord : TEXCOORD;
};

[RootSignature(SCENE_RS)]
VS_OUTPUT VSmain(VS_INPUT input)
{
    VS_OUTPUT output;
    output.Position = mul(float4(input.Position,1.0), MVP);
    output.TexCoord = input.TexCoord;
    return output;
}

Notice that the register b0 is chosen so it matches what was specified in the root signature in scene.rs.hlsli. Also notice the [RootSignature(SCENE_RS)] attribute above the main.

From there, the pixel shader scene.ps.hlsl might look like this:

#include "scene.rs.hlsli"

Texture2D Tex0 : register(t0);
SamplerState Smp0 : register(s0);

struct PS_INPUT {
    float4 Position : SV_Position;
    float2 TexCoord : TEXCOORD;
};

struct PS_OUTPUT {
    float4 Color : SV_Target;
};

[RootSignature(SCENE_RS)]
PS_OUTPUT PSmain(PS_INPUT input)
{
    PS_OUTPUT output;
    output.Color = Tex0.Sample(Smp0, input.TexCoord);
    return output;
}

Again notice that the registers for the texture and sampler match those specified in the root signature, and notice the RootSignature attribute above the main.

Finally, I call this shader from my C++ code. I include the header from the source file of the corresponding renderer component, I set the root signature parameters, and make the call. It might be something similar to this:

#include "scene.rs.hlsli"

class SceneRenderer
{
    ID3D12RootSignature** mppRS;
    ID3D12PipelineState** mppPSO;

public:
    void Init(IPipelineSet* pPipeSet)
    {
        D3D12_GRAPHICS_PIPELINE_STATE_DESC desc = { ... };

        GraphicsPipelineFiles files;
        files.RSFile = L"scene.vs.cso";
        files.VSFile = L"scene.vs.cso";
        files.PSFile = L"scene.ps.cso";

        std::tie(mppRS, mppPSO) = pPipeSet->AddPipeline(desc, files);
    }

    void WriteCmds(
        BufferAllocator* pPerFrameAlloc,
        ID3D12GraphicsCommandList* pCmdList)
    {
        if (!*mppRS || !*mppPSO)
        {
            // not compiled yet, or failed to compile
            return;
        }

        float4x4* pCPUMVP;
        D3D12_GPU_VIRTUAL_ADDRESS pGPUMVP;
        std::tie(pCPUMVP, pGPUMVP) = pPerFrameAlloc->allocate(
            sizeof(float4x4), D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT);

        *pCPUMVP = MVP; 

        pCmdList->SetGraphicsRootSignature(*mppRS);
        pCmdList->SetPipelineState(*mppPSO);

        pCmdList->SetGraphicsRootConstantBufferView(
            SCENE_RS_MVP_CBV_PARAM, pGPUMVP);

        pCmdList->SetGraphicsRootDescriptorTable(
            SCENE_RS_TEX0_DESCRIPTOR_TABLE_PARAM, Tex0SRV_GPU);

        /* TODO: Set other GPU state */
        pCmdList->DrawIndexedInstanced(...);
    }
};

There’s things going on here that aren’t strictly the topic of this article, but I’ll explain them anyways because I think it’s very useful for writing D3D12 code.

I use a big upload buffer each frame to write all my CBV allocations to, that’s the purpose of pPerFrameAlloc. Its allocate() function returns both a CPU (mapped) pointer and the corresponding GPU virtual address for the allocation, which allows me to write to the allocation from CPU, then pass the GPU VA while writing commands.

In this case, the per-frame allocation is an upload buffer, so I don’t need to explicitly copy from CPU to GPU (the shader will just read from host memory.) An alternate implementation could use an additional allocator for a default heap, and explicitly make a copy from the upload heap to the default heap.

The per-frame allocator is a simple lock-free linear allocator, so I can use it to make allocations from multiple threads, if I’m recording commands from multiple threads.

I could do something similar to the per-frame allocator for descriptors for the Tex0SRV_GPU, or I could create the descriptor once up-front in the Init(). It’s up to your choice, really.

When the time comes to finally specify the root parameters, I do it using the defines from the included scene.rs.hlsli, such as SCENE_RS_MVP_CBV_PARAM. This makes sure my C++ code stays synchronized to the HLSL code.

In Summary

IPipelineSet implements D3D12 shader live-reloading. It encapsulates the concurrent code used to reload shaders, and encapsulates the parallel code that accelerates PSO compilation through multi-threading. It integrates with code without that code needing to be aware of PipelineSet (it’s “just a double-pointer”), and garbage collection is handled efficiently and automatically. Finally, PipelineSet is designed for a workflow using Visual Studio that makes it easy to rebuild shaders while your program is running, and allows you to easily share resource bindings between HLSL and C++.

There are a bunch more advanced features. For example, it’s possible to supply an externally created root signature or shader bytecode, and it’s possible to “steal” signature/pipeline objects from the live-reloader by manipulating the reference count. See the comments in pipelineset.h for details.

You can download PipelineSet from GitHub: https://github.com/nlguillemot/PipelineSet

You can integrate it into your codebase by just adding pipelineset.h and pipelineset.cpp into your project. Should “just work”, assuming you have D3D12 and DXGI linked up already.

Comments, critique, pull requests, all welcome.

Introduction

Show Me The Code

The Workflow

In Summary

Share this:

Leave a comment Cancel reply