Show HN: Semantic grep for Claude Code (local embeddings)

Here's a summary of the themes from the Hacker News discussion:

Comparison to Existing Tools

Users immediately sought to compare the discussed tool, referred to as "ck" or a "semantic grep tool," with other existing solutions.

"this is so cool, is there any other tool which is more mature?" asked dprophecyguy.
redhale mentioned "SemTools [0]" as a tool they had seen but not yet tried.
fakebizprez expressed strong confidence in LlamaIndex, stating "LlamaIndex is batting a thousand since their inception. Can't go wrong with this tool, either."

The Rise of Advanced CLI Tools and TUIs

A significant theme is the increasing sophistication and popularity of command-line interface (CLI) tools, with some users noting a trend towards powerful text-based user interfaces (TUIs) built with standard npm packages.

fakebizprez observed, "We really are living in the golden age of the terminal. I thought this would take a chunk out of Typescript/node marketshare of young coders, but i'm starting to see more and more of these animals building TUIs using nothing but npm packages. Have they no shame?"
floydnoel shared their experience: "Last week I built my own CLI coding agent tool using just nodejs and zero dependencies! It is a lot of fun to build, really, I think everyone should try it out."
cheesyFishes commented, "Seems like CLI tools are all the rage these days."
ozten posited, "This generalizes to a whole new category of tools: UX which requires more thought and skill, but is way more powerful. Human devs are mostly too lazy to use, but LLMs will put in the work to use them."

Integration and Interaction with LLMs (Especially Claude Code)

A central discussion point is how this semantic grep tool can be integrated with and enhance Large Language Models (LLMs), particularly Claude Code. The tool's ability to provide semantic search capabilities to LLMs that might otherwise rely on simpler methods like grep was highlighted.

Runonthespot explained the motivation behind the tool: "Mainly I wrote it because I noticed Claude's "by design" use of grep meant it couldn't search the code base for things it didn't already know the name of, or find "the auth section". But equally, it's well documented that e.g. Cursor's old RAG technique wasn't that great."
Runonthespot further elaborated on the hybrid approach: "Note that it’s grep AND semantic - so Claude can start with a grep strategy and if it finds nothing can switch to semantic, and since it’s local and fast, it keeps in sync easily enough."
brookst asked for clarification on integration: "How do you tell CC to use it? Just as an entry in Claude.md?"
Runonthespot provided an example: '"We have a new grep semantic hybrid tool installed called ck - check it out using ck --help and take it for a spin"'
dmd questioned the relevance: "What does this have to do with Claude Code?"
furyofantares expressed curiosity about the hybrid mode and speed: "If you're getting useful results from hybrid mode that's very interesting to me since well-constructed grep that claude executes don't really look like they'd work great for semantic search to me! ... I am very curious your thoughts on speed. I'd rather any tools claude invokes be as fast as possible so it can get feedback immediately and execute again."
mikebiglan sought more information on best practices: "Went to the github repo and was expecting a section about Claude Code and best practices on how to set this up with Claude Code. Very curious to hear how that might work, especially with what you've found compared to Claude Code's love of grep."
ayhanfuat inquired about Claude Code's design philosophy: "Isn't Claude Code's selling point that it doesn't use embeddings?"
joshuanapoli clarified that the discussed tool is distinct: "I don’t think that “Claude Code” is relevant to this semantic grep tool."

Technical Implementation: Embeddings, Indexing, and Performance

Discussions delved into the technical underpinnings of the tool, including the use of embeddings, indexing strategies, and performance considerations.

commandar detailed the tool's architecture: "Roo has codebase indexing that it'll instruct the agent to use if enabled. It uses whatever arbitrary embedding model you want to point it at and backs it with a qdrant vector db... I've found the nomic text embed model is fast enough for the task even running on CPU."
commandar also emphasized the value of indexing: "FWIW, I've found that the indexing is worth the effort to set up. The models are generally better about finding what they need without completely blowing up their context windows when it's available."
skybrian asked about index updates: "Looks like you have to build an index. When should it be rebuilt? Any support for automatic rebuilds?"
Runonthespot confirmed automatic updates: "Yes- files are hashed and checked whenever you search so index should always remain up to date. Only changed files are reindexed."
abyesilyurt inquired about the embedding model: "What model are you using to create the embeddings?"
Runonthespot responded: "BAAI/bge-small-en-v1.5 but considering switching this to google's latest gemmaembedding - it's fairly switchable."
alvis suggested a more descriptive title: "A proper title could be "Semantic grep with completely local embeddings". Put the title aside, the tool, if it works as described, is pretty insane."
rane reported performance issues: "I tried in my relatively small project. ... All I got was spinning M2 Mac fan after a minute, and gave up."
Runonthespot requested more diagnostic information: "interesting - can I ask you to try a ck --index . ?"
postalcoder also noted performance and a desire for gitignore integration: "It'd be nice if respected gitignore. It's turning my M4 MBP into a space heater too."
Runonthespot acknowledged the gitignore feature request: "coming up next."

Language Support and Extensibility

The discussion touched upon the tool's language support and the process of adding new languages, highlighting the use of tree-sitter for semantic chunking.

Alifatisk was impressed by the core features and inquired about extensibility: "I did look into the core features and I gotta say, that looked quite cool. It's like Google search, but for the codebase. What does it take to support other languages?"
Runonthespot explained the requirements: "It supports most languages but needs a bit of tree-sitter setup to do semantic chunking. Let me know what languages you’d like added."
Several users requested specific language support:
- benzible: "I'd love to see elixir support."
- Bigsy: "Clojure would be awesome."
- Alifatisk: "Thanks for your quick response, most large codebases I've been fiddling on is Ruby!"
- t0mas88: "Java would be useful as well for larger backend codebases."
mellosouls noted a discrepancy regarding Rust support: "Apart from anything else it appears to be very misleading as Rust (ironically) according to the documentation is not one of the languages supported."
Runonthespot clarified the titling and promised updates: "I'll add rust, ruby, elixir, Clojure next. It says rust as it's written in rust, sorry about that!"

The Role of Rust in CLI Tooling

The fact that the tool is written in Rust generated commentary, with some users finding it noteworthy and others questioning its relevance as a primary selling point.

Alifatisk noted the emphasis on Rust: "At this point, we aren't even saying it's written in Rust anymore, we just mention it in the title whenever possible."
AmazingTurtle questioned the title's focus: "Why does it need to say RUST in the headline as if this was a feature, lol."
Runonthespot defended the emphasis, alluding to a common perception: "we all know rust CLI tools are better right?"
anthonyronning expressed disappointment with the title's implication: "I clicked on this because it said rust in the title. Very disappointed."

Debate on Semantic vs. Conventional Search (grep)

A recurring debate was the utility and positioning of semantic search compared to traditional keyword-based search (grep). Some users questioned why semantic search was not the default.

0x696C6961 questioned the re-implementation of grep and its default behavior: "This is cool, but I don't understand why it tries to re-implement (a subset of) grep. Not only that, but the grep-like behaviour is the default and I need to opt-in to the semantic search using the --sem flag. If I want grep I can use grep/ripgrep."
Runonthespot explained the hybrid approach: "Fair comment- the initial thinking was to have both and in fact a hybrid mode too which fuses results so you can get chunks that match both semantically and on keyword search in one resultset."
alvis argued for prioritizing semantic search: "How much is the penalty we are talking about for semantic vs conventional grep? My thinking is that for large codebase, sorting embedding matches maybe more efficient than reading all files and hence there is no point to put semantic search behind a --semantic flag."
dsiegel2275 questioned the premise of "not using embeddings" as a selling point: "Why would "not using embeddings" be a selling point? Some of the most effective IR systems use embeddings (bi-encoders, cross-encoders)."

Future Potential and Alternative Approaches

Users discussed the broader implications of such tools and explored alternative implementations, including leveraging Abstract Syntax Trees (ASTs) for deeper code understanding.

MarkMarine highlighted a comment regarding ASTs from a previous discussion: "Mark Marine: I saw this comment a little bit back and I don’t think the OP expanded on it, but this looks like a fantastic idea to me: sam0x17 20 days ago: Didn't want to bury the lead, but I've done a bunch of work with this myself. It goes fine as long as you give it both the textual representation and the ability to walk along the AST."
dorian-graph mentioned another similar, recent tool: "There's also https://github.com/bartolli/codanna, that's similarly new. I'll have to try that again, and this one."