Building a digital music collection in 2023

Justin Wernick <>
Curly Tail, Curly Braces
2023-08-20

Abstract

I've stopped paying for music streaming services in favour of going back to buying albums and building my own digital music collection. This article is about some of the modern tools I'm applying to this endeavour, including updating my choice in audio codecs for storing the collection.

I like to listen to music.

When I was small, this meant cassette tapes, or later CDs. I even had some adventurous dives into my parents' vinyl collection. As someone who grew up interested in computers, digital music was of course my medium of choice, and I started up a collection of MP3 files.

At some point in the last decade, internet access became omnipresent. I stopped collecting new music, because it was convenient enough to open up a music streaming site and search for the music I wanted to listen to.

But something was lost with music streaming. My music library didn't grow to suit my tastes anymore. I'd sometimes find that particular songs or artists were delisted, or just not available in my country anymore. Streaming sites seemed to become increasingly interested in trying to get me to listen the songs they were promoting, rather than the songs I was interested in. The "recommended for you" sections filled up with popular artists who I didn't want anything to do with. The amount of adverts went up, and with them the value of music streaming sites dropped.

I've decided to go back to curating a big directory of music files. Not to brag, but I know a lot more about managing directories of files now than I did when I was in school. This article is going to look at two aspects of this music management problem:

  1. How I sync my music between devices, and keep backups so I don't lose it.

  2. How to keep the file sizes down to make the syncing problem easier.

Push it to a Git repo

The underlying technical challenge to with keeping your own music collection is managing a big directory of files. I want it to be easy to sync that directory between various computers and players. I care about it having a backup so I don't lose it when things inevitably go wrong. Long time readers of this blog will not be surprised at all to discover that my solution for this is to put it all in a Git repo. Then I can easily commit, push, pull, and all the other normal Git actions to keep my work computer and personal computer in sync. Also, my Git server runs a daily backup, so anything I push to it is backed up automatically!

There are a few pitfalls here, and they all come down to media archives generally having larger file sizes than source code.

  1. Your music repo might be too big for your Git hosting provider. I didn't run into this because I run my own Git server (although I did need to ask my cloud provider for a bigger hard drive). You can probably also get around this by using an extension to Git called Git LFS (Large File Storage), which many Git servers support. If you're not as invested in Git as I am, you can also use other file syncing products (I don't know which are the "good" ones these days).

  2. Over time, Git doesn't automatically store the changes to your music efficiently. I found that regularly running git gc on the server helped to keep the repo's overall size closer to just the size of the music in it. I also found that Pijul (a different version control system) keeps the file sizes under control better than Git. Unfortunately I also found that making new commits in Pijul got slow as the repo got big. I see a lot of promise in Pijul and I hope that this improves in future releases.

Let's talk about getting file sizes down

Smaller music files are faster to sync and cheaper to store in the cloud. Big files make everything harder.

Now you can get some improvements by taking your music file and putting it into a .zip file, but to really get that size down you need to use a codec.

What are audio codecs?

The word "codec" comes from shoving the words "coder" and "decoder" together. Basically a codec is made up of two programs:

There are lots of different codecs that are good for different things. Some are good for video. Some are good for audio. For the purposes of this blog post, I'm going to focus specifically on audio codecs.

When you're looking at the air pressure measurements that make up an audio signal, they aren't just random numbers. Audio follows a whole bunch of patterns just because it was created by and moved through the real corporeal world. Audio also has some common use cases like talking or music which have their own patterns. Audio codecs can take advantage of these patterns when compressing audio.

Broadly speaking, there are two types of codecs. "Lossless" codecs will give back exactly the same signal after decoding. "Lossy" codecs decode to a different signal that has the same effect. A lossless audio codec would give back exactly the same air pressure measurements. A lossy audio codec would give different air pressure measurements that sound the same to a human listener. A lossy codec might cut out sound that's outside of the range the human ear can hear, or sounds that you can't hear because they're covered by other sounds. Lossy codecs are great for our purposes because they can make the file much smaller than lossless codecs, and are just as good for listening to.

Many codecs have different settings, where you can decrease the audio quality to get smaller files. The term "bitrate" refers to how many kilobits of file are needed for each second of audio. Ideally, we'd like to have a really low bitrate while the music still sounds the same.

How do you choose a bitrate?

It's really difficult to say what a good quality level to choose is, since the "quality" of sound it fairly subjective. Researchers have asked people to listen to music encoded using a variety of codecs and a variety of bitrates and rate them.

I skipped reading that research and went straight to FFMPEG's high quality audio encoding guide. They have minimum and recommended bitrates for a variety of codecs, and I just pick the "recommended" bitrate.

The audio codec everyone has heard of: MP3

MP3 files were so ubiquitous when I was growing up that I hardly feel like I need to explain what they are. But with the rise of music streaming services, and people not really managing their own music files anymore, maybe it isn't as well known anymore.

MP3 is a lossy codec which was developed in the 90s. When music started being downloaded on the internet (both legally and questionably legally), MP3 was the codec people generally used. It was also the codec generally supported by early portable players.

This made MP3 the gold standard when I started building my music collection.

FFMPEG's high quality audio encoding guide recommends a bitrate of 192Kbps or higher for high quality audio in MP3 format.

The new awesome audio codec: Opus

In 2012, the Xiph.org Foundation released the first version of their Opus codec.

Opus is also a lossy codec. For my use case, there's one important distinction:

In listening tests, Opus files are significantly smaller than MP3 files of the same perceived quality. FFMPEG's high quality audio encoding guide recommends a bitrate of 64Kbps or higher for high quality audio in Opus format.

Opus is also a free and open codec. The source code for the Xiph.org implementation is released under an open source license. Anyone is free to write and distribute their own Opus implementation. This makes it a good choice in terms of sustainability. If someone wants to add Opus support to their project, they don't need consider the costs of patents and licensing, they can just use it.

How do I convert my library to use Opus?

Opus format is great, so I wanted a nice way to bulk convert my existing collection. I also wanted the script to be easy to convert new albums I download if they weren't already using Opus.

This script is called to-opus.sh, and I use find to bulk run it on all the files that match a pattern in a directory. Be careful when trying this out. It deletes the input files after doing the conversion, so maybe try this on a backup of your collection at first!

#!/bin/sh

# Use together with find, for example:
# find ./ -name "*.mp3" -exec ./to-opus.sh "{}" \;

IN="$1"
OUT=${1%.*}.ogg
BITRATE=64k

echo "$IN->$OUT"

ffmpeg -y -loglevel "error" -i "$IN" -vn -acodec libopus -b:a $BITRATE "$OUT"
rm "$IN"

What do I look for when getting more music?

Unfortunately, most online music stores and other people putting their music online I've encountered don't offer Opus yet. Here are some guidelines that work for me:

Actively curate the stuff you have

It's easier than ever to download a lot of music and end up with a massive collection. Some of it bought, some of it is just freely put on the internet for download by its artists. It's very easy for a collection to get out of hand over time.

Here are some things I do to curate my collection:

I've been enjoying how once again the music that I play when I open up my computer is evolving to match my particular taste. I like that if I don't like a song or genre I can just not have it in my collection and it will never be auto-played next.

I like that this is a collection that I control. That I can keep backups and sync it between my computers as I see fit, even if that way is unusual like using a Git repo. I really appreciate having found Opus as an alternative to MP3, which can basically get my file sizes down with no noticeable loss of quality!

This is working well for me. I can recommend you try it too. Keep the bits that work for you, and leave the rest.


If you'd like to share this article on social media, please use this link: https://www.worthe-it.co.za/blog/2023-08-20-building-a-digital-music-collection-in-2023.html

Copy link to clipboard

Tags: blog, git, music, shell-script


Support

If you get value from these blog articles, consider supporting me on Patreon. Support via Patreon helps to cover hosting, buying computer stuff, and will allow me to spend more time writing articles and open source software.


Related Articles

How to Train Your Git Server

Have you considered hosting your own Git server? It's easier than you might think. In this article, I go step by step through setting up a simple self-hosted Git server which only supports private repositories for a single person.

The Localhost Podcast

I wanted to manage the process of syncing audiobooks from my computer to my phone better. The solution that worked well for me is to use a podcasting app and an RSS feed. This article explains why this works well for me, and how you can try it out.

Subscribe to my RSS feed.