Christian Heilmann

Kirby apps and regressive enhancements

Wednesday, March 6th, 2024 at 4:40 pm

Kirby vomiting lots of app icons

ChatGPT announced that it now has a killer new feature: it can read out results in several synthesised voices. This means you can speak your prompt and the computer answers. We’ve reached Star Trek TNG levels of human-computer interaction. All praise be AI and the large corporations that make it happen!

Except, when I read the announcement, my immediate reaction was “so what?”. Generated voices have been a thing since the 80s and I remember SAM / Reciter being fun for, well, hours at most. Check it in all its glory in this JavaScript port or on YouTube.

MacOS even has generated voices in the terminal.

Speech is slow…

Just a few years ago speech was announced as the de-facto standard for human-mobile interaction. Siri, Alexa, Bixby, Cortana, they all came to make our lives easier and more “human”. And we all used them for a while, but soon discarded them as speaking and listening seems great, but has one big drawback: it is synchronous and fraught with communication errors.

It is frustrating to not be understood by people but even more hurtful to have a computer tell you it doesn’t know what you are on about.

Online reading – as in skimming – through a wrong response feels much less annoying than a synthesised voice spouting nonsense and you having to wait for it to finish. Maybe it is me having lived in England for too long, but I feel even weird interrupting a virtual person.

But OK, maybe this is a generational issue and people do want to talk and listen to their computers. Fair enough, but the ChatGPT announcement irked me in another way: why would I need to have a synthesised voice as part of the app when these services already exist in the operating system?

Speech synthesis is an accessibility feature

Every OS comes with voice recognition and voice synthesis as part of its accessibility stack. People who can’t use a mouse, a keyboard or can’t see the screen are dependent on these to interact with the computer. And these solutions have been tested, improved over decades and are available for all users – for free.

MacOS Voice Settings

If you, for example see this post in Microsoft Edge, you can choose `Tools` -> `Read aloud` and get the page read out to you. You can choose from Dozens of different voices catered to different languages and you can change the speed of the voice to your needs.

What did I have to do to give you this feature? Nothing. Well, I needed to do one thing: publish this text as HTML and that was easy enough.

Kirby apps – suck in features already available and deliver a worse experience

And this is what my real problem is with this “killer feature” announcement: instead of integrating the app into an already customised environment, it “innovates” by offering the feature in-app. This feels like a massive step backwards as you duplicate functionality.

And it validates something I’ve been complaining about ever since the concept of “App Stores” came up: this isn’t about user convenience, but about controlling the whole experience and keeping people in your app. It’s “time spent in app” KPIs over and over again.

Speech recognition and speech synthesis is something we already have on the platform level. An app running on the platform should integrate with these instead of competing. As a user, I have spent a lot of time setting up my environment to fit my needs. And I spent time and money to install and buy solutions I like to use for various tasks. Apps should recognise my efforts to cater the experience to my wants and needs and not offer me a lesser experience and sell it as innovation.

Of course, there is another thing at play here: the “not invented here” or “I can do that” attitude of developers. This is the same thing that makes people create and add dark/light theme switches to their web sites right now that aren’t accessible instead of using a simple media query to load the correct CSS according to the setting of the operating system. It reminds me of the dark days when Internet Explorer didn’t resize pixel based fonts and everybody added “font resizing widgets” to their sites, adding to the already full-up landfill of dead code on the web.

I love the idea of releasing content and functionality that users can cater to their needs and likes. I also like to give a basic experience to everyone and include more functionality when and if it is possible. This is called progressive enhancement and it has been my guiding star ever since I started coding for the web. It’s fun to let go and let the platform help me build things that are damn hard to do right. So why not allow it to do that?

Share on Mastodon (needs instance)

Share on Twitter

My other work: