As you may know, I have been working on polyglot for some time now. It is now the second-most popular ATS project on github0 and it has reached a state of relative maturity.

With that in mind, I would like to make a pitch for polyglot by proving the following:

Ambiguous Extensions

Unfortunately, despite the many other source-counting tools tested, nearly every one fell victim to the same bug: they have no mechanism to distinguish languages that use the same file extension. These collisions do actually happen in practice; .v is used by Coq and Verilog.

polyglot, linguist, enry, and cloc were the only tools to correctly handle this issue.

Performance

Our first benchmark is against the rust source repo.

Tool Language Time
polyglot ATS 143.2 ms
loc Rust 171.8 ms
tokei Rust 304.6 ms
scc Go 471.1 ms
gocloc Go 839.8 ms
cloc Perl 5.052 s
enry Go 5.440 s
linguist Ruby 17.46 s

Second, we look at the go source repo:

Tool Language Time
polyglot ATS 152.5 ms
loc Rust 177.3 ms
tokei Rust 299.1 ms
scc Go 502.7 ms
gocloc Go 1.201 s
enry Go 1.758 s
linguist Ruby 13.42 s
cloc Perl 17.16 s

Third, the Linux source tree:

Tool Language Time
polyglot ATS 1.113 s
loc Rust 2.034 s
tokei Rust 3.088 s
scc Go 5.841 s
gocloc Go 13.68 s
enry Go 2m 12.9s
cloc Perl 2m 3.9s
linguist Ruby 3m 11.3s

Finally, the OpenBLAS source tree:

Tool Language Time
polyglot ATS 164.7 ms
loc Rust 273.7 ms
tokei Rust 373.6 ms
scc Go 633.3 ms
gocloc Go 1.501 s
enry Go 5.633 s
cloc Perl 24.17 s
linguist Ruby 29.72 s

As you can see, polyglot is the fastest tool. Moreover, it beats cloc, enry, and linguist by an order of magnitude and thus it is by far the fastest tool with any claim of correctness.

0: by number of stars