Show HN: Quick and dirty speech transcription on an Intel NPU

4 points by ellenhp 7 months ago | 1 comment
tldr: I made a global hotkey on my laptop that will record my voice while I hold it down, transcribe the result and type out what it thinks I said.

Background:

My laptop was stolen recently and the new one I got to replace it has an Intel NPU [0] in it. The promise of the NPU is running small machine learning models efficiently on mobile hardware. I thought a good application of this would be using whisper to transcribe speech into text. There’s not really much [1] out there on Linux that can do this right now which is kind of a bummer because it’s a big accessibility thing to be able to type with your voice. I use my Sway configuration [2] to map the right control key to run a wrapper program [3] and then the release of the right control key to send a SIGINT to that program. The wrapper catches the SIGINT, ends transcription, and types the transcribed text into the focused application with the `enigo` crate.

Repo link: https://github.com/ellenhp/whisper-npu-server

This is not one of my high polish projects, but I did want to throw it out there into the world, especially because the OpenVINO project doesn't have any containerized NPU examples, even for LLMs.

[0] https://intel.github.io/intel-npu-acceleration-library/npu.h...

[1] I found this, and based some of my code on it: https://github.com/oddlama/whisper-overlay

[2] See end of post for example.

[3] https://github.com/ellenhp/whisper-transcription-wayland/

Sample Sway config:

bindsym --no-repeat Control_R exec "whisper-transcription"

bindsym --release Control_R exec killall -2 whisper-transcription

  • basiskarten 6 months ago
    This is really cool, I wish I could use it on my Windows 11 X1 Carbon which also comes with a NPU.

    I was quite disappointed that the dictation tool Lenovo praised on their website for this "CoPilot+ AI PC" turned out to be a shortcut to the Windows 11 transcription tool. My hope was that they were indeed putting the NPU to use for that. Other than that this is a great machine. In any case, Lenovo should fire whoever is responsible for that feature and hire you instead.

    It would also be great to see a demo of how this works on your machine. Thank you for sharing it at all, though!