Essential insights from Hacker News discussions

Show HN: Whispering – Open-source, local-first dictation you can trust

Here's a summary of the key themes discussed in the Hacker News thread:

Local-First and Open-Source Ethos

A strong undercurrent in the discussion is the appreciation for local-first, open-source software and the desire for data ownership. Users express enthusiasm for projects that align with these principles.

  • "I think there should be an open-source, local-first version of every app, and I would like them all to work together. The idea of Epicenter is to store your data in a folder of plaintext and SQLite, and build a suite of interoperable, local-first tools on top of this shared memory. Everything is totally transparent, so you can trust it." - chrisweekly
  • "I’m basically obsessed with local-first open-source software." - marcodiego
  • "We all should be." - marcodiego

Support for Parakeet and Alternative Models

There's significant interest in using models beyond OpenAI's Whisper, particularly Parakeet, due to its speed and accuracy. Users inquire about its integration and express a desire for more diverse model options.

  • "Does this support using the Parakeet model locally? I'm a MacWhisper user and I find that Parakeet is way better and faster than Whisper for on-device transcription." - wkcheng
  • "Parakeet is amazing - 3000x real-time on an A100 and _5x real-time even on a laptop CPU_, while being more accurate than whisper-large-v3" - daemonologist
  • "Not yet, but I want it too! Parakeet looks incredible (saw that leaderboard result). My current roadmap is: finish stabilizing whisper.cpp integration, then add Parakeet support." - braden-w
  • "Are there any non-Whisper-based voice models/tech/APIs?" - random3
  • "All these all just Whisper wrappers? I don't get it, the underlying model still isn't as good as paid custom models from companies, is there an actual open source / weights alternative to Whisper for speech to text? I know only of Parakeet." - satvikpendem

Text-to-Speech (TTS) Integration

The discussion naturally extends to the need for Text-to-Speech (TTS) capabilities to enable truly hands-free interaction with computers, with users suggesting various open-source TTS solutions.

  • "Now we just need text to speech so we can truly interact with our computers hands free." - codybontecou
  • On Mac, users can use "say" via the command line. - PyWoody
  • Linux users can use "espeak-ng" or "festival". - Aachen, 0xbadcafebee
  • "I also just found something that sounds genuinely realistic: Piper" - 0xbadcafebee
  • The broader "Year of Voice" project by Home Assistant is also mentioned as a precedent for open-source voice assistants. - 0xbadcafebee

Security Concerns and False Positives

A significant portion of the thread addresses security concerns, specifically reports of Windows Defender flagging the application as infected. The developer suggests potential causes related to the build process or specific dependencies.

  • "Windows Defender says it is infected." - satisfice
  • "This needs to be higher, the installer on the README has a trojan." - sa-code
  • "More details please? Which installer?" - fencepost (followed by detailed VirusTotal links and analysis of EXE vs. MSI installers)
  • "Need to run a diff against 7.2.2 tag against 7.3.0; I suspect the issue might be something related to an edit I made on tauri.conf.json or one of my Rust dependencies." - braden-w
  • "I'm no expert, but since it acts as a keyboard wedge it's likely to be unpopular with security software." - barryfandango
  • "Ahh that's unfortunate. This most likely is related to the rust enigo create, which we use to write text to the cursor." - braden-w
  • "If it's still an issue, feel free to build it locally on your machine to ensure your supply chain is clean!" - braden-w

Performance and Usability of Transcription

Users discuss the practicalities of using the software, including transcription speed, accuracy, UI/UX, and specific features like semantic correction and push-to-transcribe.

  • "I can’t go back to regular typing at this point, just feels super inefficient." - dumbmrblah
  • "Does Whispering support semantic correction?" - newman314
  • "we support prompts at both 1. the model level (the Whisper supports a "prompt" parameter that sometimes works) and 2. transformations level (inject the transcribed text into a prompt and get the output from an LLM model of your choice)." - braden-w
  • "I can’t go back to regular typing at this point, just feels super inefficient." - dumbmrblah
  • "For the Whispering dev: would it be possible to set "right shift" as a toggle? also do it like VoiceInk which is: - either short right shift press -> then it starts, and short right shift press again to stop - or "long right shift press" (eg when at pressed at least for 0.5s) -> then it starts and just waits for you to release right shift to stop it's quite convenient" - oulipo
  • "I can’t go back to regular typing at this point, just feels super inefficient." - dumbmrblah
  • "The best option right now seems to pipe the output into another LLM to do some cleanup, which we try to help you do in Whispering. Recent transcription models don't have very good built-in inference/cleanup, with Whisper having the very weak "prompt" parameter." - braden-w
  • "I'm a huge fan of using Whisper hosted on Groq since the transcription is near instantaneous. ElevenLabs' Scribe model is also particularly great with accuracy, and I use it for high-quality transcriptions or manually upload files to their API to get diarization and timestamps" - braden-w
  • "I keep these in a dictate.sh script and bind to press/release on a single key. A programmable keyboard helps here. I use https://git.sr.ht/%7Egeb/dotool to turn the transcription into keystrokes." - pstroqaty
  • "Transcription quality with large-v3-turbo-q8_0 is excellent IMO and a Vulkan build is very fast on my 6600XT. It takes about 1s for an average sentence to appear after I release the hotkey." - pstroqaty

Speaker Diarization

A recurring request and area of interest is speaker diarization (identifying different speakers) for use cases like meeting notes or dictating in a multi-person environment.

  • "Is there speaker detection?" - ideashower
  • "Speaker diarization is the term you are looking for, and this is more difficult than simple transcription." - hephaes7us
  • "Diarization is on the roadmap! Some providers support it, but some don't and the adapter for that could be tricky. Currently, for diarization I use the Elevenlabs Scribe API" - braden-w
  • "Can it tell voices apart?" - dllthomas
  • "I'd love to find a tool which could recognise a few different speakers so that I could automatically dictate 1:1 sessions." - mrgaro
  • "Speaker "diarization" is what you're looking for, and currently the most popular solution is pyannote.audio." - torstenvl

Clarity on Local vs. Cloud Usage

Some users expressed initial confusion about whether the application inherently relied on cloud services or fully supported local-only operation, leading to clarification from the developer and moderators.

  • "The text here says all data remains on device and emphasises how much you can trust that, that you're obsessed with local-first software, etc. Clicking on the demo video, step one is... configuring access tokens for external services? Are the services shown at 0:21 (Groq, OpenAI, Antrophic, Google, ElevenLabs) doing the actual transcription, listening to everything I say, and is only the resulting text that they give us subject to "it all stays on your device"?" - Aachen
  • "All your data is stored locally on your device, and your audio goes directly from your machine to your chosen cloud provider (Groq, OpenAI, ElevenLabs, etc.) or local provider (Speaches, owhisper, etc.) Their point is they aren’t a middleman with this, and you can use your preferred supplier or run something locally." - IanCal
  • "The issue is > All your data is stored locally on your device, is fundamentally incapable with half of the following sentence." - bangaladore
  • "Great correction, wish I could edit the post! Updated the README to reflect this." - braden-w
  • "The app supports both external APIs (Groq, OpenAI, etc.), and more recently local transcription (via whisper.cpp, OWhisper, Speaches, etc.), which never leaves your device." - braden-w

Other Notable Mentions and Comparisons

Users also brought up alternative projects and provided feedback on specific aspects of the software.

  • Willow Voice was mentioned as a current alternative that the poster might migrate from. - Johnny_Bonk
  • MacWhisper was praised for its features and ongoing development. - polo
  • VoiceInk was highlighted as a similar open-source macOS tool, with discussion on its specific shortcut behaviors. - oulipo, michael-sumner
  • Hyprnote was recommended for meeting notes and diarization. - braden-w, Brajeshwar
  • Vibe and WhisperX were mentioned as local alternatives using Whisper. - hereme888, icelancer
  • Whisper.cpp itself was praised for its quality and speed. - pstroqaty
  • FUTO Keyboard and Coqui STT/TTS were mentioned in the context of local dictation and STT technologies. - Tmpod, 0xbadcafebee
  • The Epicenter project's architecture of storing data in plaintext and SQLite was seen as a strong foundation. - chrisweekly
  • Questions arose about the model's performance with children's speech. - glial
  • Users experienced issues with downloading models on Linux. - emacsen