Home | Benchmarks | Categories | Atom Feed

Posted on Wed 04 May 2022 under Databases

Install ClickHouse Faster

Last September I advocated for first-class Homebrew support for ClickHouse on macOS. ClickHouse is a fast and feature-rich database that runs well on Linux and macOS among many other platforms. There were two solutions given at the time for simplifying its installation. But two weeks ago, I was pointed to an even easier way:

$ curl --user-agent "curl/7.54.1" https://clickhouse.com/ | sh

The above executes 57 lines of shell code with your regular user permissions. It detects which build is compatible with your machine and downloads a pre-compiled binary. I've set the user agent manually as many people run curl with alternative user agents which won't work.

When I executed the above on my Intel-based MacBook Pro I'm greeted with the following.

Will download https://builds.clickhouse.com/master/macos/clickhouse

Standalone Binary

Though there are formal installation methods for production systems, ClickHouse can also run without elevated permissions from any folder you have permission to execute binaries in.

ClickHouse is often used in a client-server setting but can also run without a server, much like SQLite. Below I can use the downloaded binary to convert and enrich a compressed CSV file into a Parquet file. Details on the Cell Tower CSV can be found in my Faster Geospatial Enrichment post.

$ wget "https://opencellid.org/ocid/downloads?token=...&type=full&file=cell_towers.csv.gz"
$ gunzip -c cell_towers.csv.gz \
    | ./clickhouse local \
          --input-format CSVWithNames \
          -q "SELECT *,
                     geoToH3(lat, lon, 7) AS h3_7,
                     geoToH3(lat, lon, 8) AS h3_8,
                     geoToH3(lat, lon, 9) AS h3_9
              FROM table
              FORMAT Parquet" \
    > cell_towers.pq

The above will launch version 22.5.1.1008 the local client for ClickHouse which was about a month newer than the latest stable version at the time I wrote this. The above converted a 45M-record, GZIP-compressed CSV into a Snappy-compressed, Parquet file in 1 minute and 40 seconds. This was run on my MacBook Pro which has a 4-core, 8-thread Intel i5-8257U CPU which can burst up to 3.9 GHz under heavy workloads.

Below is an extract of the Parquet file's structure.

$ brew install parquet-tools
$ parquet-tools meta cell_towers.pq
...
creator:       parquet-cpp version 1.5.1-SNAPSHOT

file schema:   schema
--------------------------------------------------------------------------------
radio:         OPTIONAL BINARY R:0 D:1
mcc:           OPTIONAL DOUBLE R:0 D:1
net:           OPTIONAL DOUBLE R:0 D:1
area:          OPTIONAL DOUBLE R:0 D:1
cell:          OPTIONAL DOUBLE R:0 D:1
unit:          OPTIONAL DOUBLE R:0 D:1
lon:           OPTIONAL DOUBLE R:0 D:1
lat:           OPTIONAL DOUBLE R:0 D:1
range:         OPTIONAL DOUBLE R:0 D:1
samples:       OPTIONAL DOUBLE R:0 D:1
changeable:    OPTIONAL DOUBLE R:0 D:1
created:       OPTIONAL DOUBLE R:0 D:1
updated:       OPTIONAL DOUBLE R:0 D:1
averageSignal: OPTIONAL DOUBLE R:0 D:1
h3_7:          OPTIONAL INT64 L:INTEGER(64,false) R:0 D:1
h3_8:          OPTIONAL INT64 L:INTEGER(64,false) R:0 D:1
h3_9:          OPTIONAL INT64 L:INTEGER(64,false) R:0 D:1
...

The above schema can also be extracted with ClickHouse alone.

$ ./clickhouse local \
    --input-format Parquet \
    -q "DESCRIBE TABLE table" \
    < cell_towers.pq
radio   Nullable(String)
mcc Nullable(Float64)
net Nullable(Float64)
area    Nullable(Float64)
cell    Nullable(Float64)
unit    Nullable(Float64)
lon Nullable(Float64)
lat Nullable(Float64)
range   Nullable(Float64)
samples Nullable(Float64)
changeable  Nullable(Float64)
created Nullable(Float64)
updated Nullable(Float64)
averageSignal   Nullable(Float64)
h3_7    Nullable(UInt64)
h3_8    Nullable(UInt64)
h3_9    Nullable(UInt64)

The ClickHouse binary I'm using provides 19 execution verbs. Everything you need to explore ClickHouse lives in a single, self-contained binary.

clickhouse local [args]
clickhouse client [args]
clickhouse benchmark [args]
clickhouse server [args]
clickhouse extract-from-config [args]
clickhouse compressor [args]
clickhouse format [args]
clickhouse copier [args]
clickhouse obfuscator [args]
clickhouse git-import [args]
clickhouse keeper [args]
clickhouse keeper-converter [args]
clickhouse install [args]
clickhouse start [args]
clickhouse stop [args]
clickhouse status [args]
clickhouse restart [args]
clickhouse static-files-disk-uploader [args]
clickhouse hash-binary [args]
Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.

Copyright © 2014 - 2024 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.