Word Counting My Whole Site

Try doing it with this

My site is static HTML, built with Jekyll (more details in my colophon). This means I have a folder that contains the whole site in HTML files.

I wanted to find the total word count. I found this combination of commands works great:

find . -iname "*.html" | parallel pandoc -t plain  | wc -w

It uses:

It took about 2 seconds on my computer to tell me my site currently has about 75,000 words. More than I expected, though this counts words in footers etc. many times over.

Thanks to pandoc’s universality, you can also use this to count words in many file formats: markdown, reStructuredText, MS Word, etc.

If your site is more dynamic, but still small enough to download, you might consider using GNU wget. Its --recursive flag will let you download every page as HTML locally, following links to find everything on the website.

Fin

May you continue to increase your word count,

—Adam


Read my book Boost Your Git DX to Git better.


Subscribe via RSS, Twitter, Mastodon, or email:

One summary email a week, no spam, I pinky promise.

Related posts:

Tags: ,