Dear readers, today I am going to talk about a journal script I kind of wrote. 😉
The problem I’m trying to solve is that I want to save my thoughts.
I have no problem reading what I wrote, but I don’t enjoy waiting. I can dictate, but I don’t want to save or listen to my voice.
Whenever I encounter such a situation, I get into engineering mode, and if it’s something I can tackle within a few hours of work, I go for it.
First, I researched a voice-to-text library that’s easy to use, and I found Vosk. It has a huge library of models. I opted for two small ones because I want to use the app while I code. They give somewhat decent results.
Then, with the magic of multiple AI models, I came to a solution in Python. It streams my microphone and system sound to the Vosk model, which provides a transcription written with timestamps in a file with the current date.
It serves the purpose, but it’s not convenient for daily use. One of my mottoes is: if it’s not easy and instant, I won’t use it. So, I packed the script into a Python module and wrote a *.desktop file to register it as a regular Linux application in my case, Pop!_OS. Just a quick extra touch was adding a keyboard shortcut and behold the miracle it works! Notifications using notify-send are there to let you know the app's current state.
One thing that irritates me is when an application runs in the console because it clutters my workspace. To avoid this, I needed a simple way to start and stop the app without relying on the terminal. My solution was to implement a lock file system.
When the app starts, it creates a lock file containing its process ID (PID). If the lock already exists, the script uses it to send a KeyboardInterrupt signal to stop the running instance and exits. This way, the first call starts the app and begins transcribing, while the second call stops it.
I hope this article sparked someone’s wish to solve their own problem in a unique, inventive, and somewhat polished way.
Feel free to check my other similar article:
Automating Text Extraction from Screenshots
Automating Text Extraction from Screenshots Dev.to
Also feel free to check out code: on Github
Have a great day 🚀
Thanks for reading! ^_^