DVC on Heroku

DVC is a version control system for machine learning datasets and models. It allows you to store large files outside of Git while keeping them versioned. In a few steps, you can have it working on Heroku.

This tutorial assumes you’re already using DVC and are ready to deploy your app to Heroku.

Getting Started

Use the Apt buildpack from Heroku to install DVC.

heroku buildpacks:add --index 1 heroku-community/apt

Create an Aptfile with the latest release:

https://github.com/iterative/dvc/releases/download/1.9.1/dvc_1.9.1_amd64.deb

Next, add your storage credentials. We recommend an Amazon S3 bucket in the same region as your Heroku dynos (us-east-1 by default) to avoid paying for data transfer, which is free in-region:

heroku config:set AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... AWS_REGION=...

Finally, configure your application to run the following commands:

dvc config core.no_scm true
dvc pull
rm -r .dvc .apt/usr/lib/dvc # reduces slug size

You can run them at build time or runtime. We recommend build time unless it causes your app to exceed the maximum slug size (500 MB compressed).

With Django, add to settings.py:

if "DYNO" in os.environ and os.path.isdir(".dvc"):
    print("Running DVC")
    os.system("dvc config core.no_scm true")
    if os.system("dvc pull") != 0:
        exit("Pull failed")
    os.system("rm -r .dvc .apt/usr/lib/dvc")

With Rails, create an initializer with:

if Rails.env.production? && Dir.exist?(".dvc")
  puts "Running DVC"
  system "dvc config core.no_scm true"
  system "dvc pull" or abort "Pull failed"
  system "rm -r .dvc .apt/usr/lib/dvc"
end

Your DVC files are now available on Heroku tada

Published November 7, 2020


You might also enjoy

The Origin of SQL Queries

Blind Index 1.0

Just Table It


All code examples are public domain.
Use them however you’d like (licensed under CC0).