Tools: Code Is All You Need

Here's a summary of the themes from the Hacker News discussion, with direct quotes where appropriate:

LLM Capabilities and Practical Use Cases

A central theme is the ongoing debate about the true capabilities and practical applications of LLMs, particularly in coding and DevOps tasks. While many users acknowledge the potential and find LLMs useful for assistance and learning, there's a strong undercurrent of skepticism regarding their current reliability for critical tasks and the often-proclaimed "hype" surrounding them.

"When properly used, they're an enhancement to learning." - loudmax
"LLMs are great at processing vague descriptions of problems and offering a solution that's reasonably close to the mark." - loudmax
"its really kind of uncomfortable to realize that a bunch of people you had tons of respect for are either ignorant or dishonest/bought enough to say otherwise." - benreesman
"The LLM output is often coerced back into something more deterministic such as types, or DB primary keys." - pclowes
"It’s not like tool calls have access to some secret deterministic mode of the LLM; it’s still just text." - wrs
"LLMs are really good at processing vague descriptions of problems and offering a solution that's reasonably close to the mark." - loudmax
"The job then becomes identifying those problems and figuring out how to configure a sandbox for them, what tools to provide and how to define the success criteria for the model." - simonw
"LLMs are not a substitute for learning. When used properly, they're an enhancement to learning." - loudmax
"The LLM output is often coerced back into something more deterministic such as types, or DB primary keys." - pclowes
"I don’t want to sound like a hard-core LLM believer. I get your point and it’s fair." - whiplash451
"LLMs have already found large-scale usage (deep research, translation) which makes them more ubiquitous today than 3D printers ever will or could have been." - whiplash451
"I don't think these kinds of bets often pay off. The opposite actually, I think every truly revolutionary technological advance in the contemporary timeframe has arisen out of its very obvious killer app(s), they were in a sense inevitable. Speculative tech--the blockchain being one of the more salient and frequently tapped examples--tends to work in pretty clear bubbles, in my estimation." - nativeit
"The fact that LLMs are only currently useful for tasks that don't require precise answers is a massive red flag for me." - jrm4
"It's not living up to the hype, but most of that hype was obvious nonsense." - rapind
"There's a hole in the ground where something between 100 billion and a trillion dollars in the ground that so far has about 20B in revenue (not profit) going into it annually." - benreesman
"The economics of it sucked." - threetonesun (referring to Google's hesitance on LLMs)
"If it’s not happening at the scale it was pitched, then it’s not happening." - deadbabe
"3D printers are the backbone of modern physical prototyping. They're far more important to today's global economy than LLMs are, even if you don't have the vantage point to see it from your sector." - kibwen
"I've never seen anyone rely on users to use GitBash to run shell scripts." - forrestthewoods

The Utility of Predefined Snippets and Knowledge Bases

A significant portion of the discussion revolves around the effectiveness of providing LLMs with structured examples, documentation, or "skill files" to guide their behavior. Users share their experiences creating personal knowledge bases of shell commands or integration details, allowing LLMs to recall and adapt these for new tasks. This is seen as a way to enhance LLM efficiency and reduce the need for the user to remember specific syntax.

"I have a 'devops' folder with a CLAUDE.md with bash commands for common tasks (e.g. find prod / staging logs with this integration ID)." - mritchie712
"When I complete a novel task (e.g. count all the rows that were synced from stripe to duckdb) I tell Claude to update CLAUDE.md with the example. The next time I ask a similar question, Claude one-shots it." - mritchie712
"The commands aren't the special sauce, it's the analytical capabilities of the LLM to view the outputs of all those commands and correlate data or whatever." - lreeves
"Because now the capabilities of the model grow over time. And I can ask questions that involve a handful of those snippets. When we get to something new that requires some doing, it becomes another snippet." - light_hue_1
"I can offload everything I used to know about an API and never have to think about it again." - light_hue_1
"the snippets are examples. You can ask hundreds of variations of similar, but different, complex questions and the LLM can adjust the example for that need." - mritchie712
"I don't have a snippet for, 'find all 500's for the meltano service for duckdb syntax errors', but it'd easily nail that given the existing examples." - mritchie712
"I use a similar file, but just for myself (I've never used an LLM 'agent'). I live in Emacs, but this is the only thing I use org-mode for: it lets me fold/unfold the sections, and I can press C-c C-c over any of the code snippets to execute it." - chriswarbo
"Giving LLMs the right context -- eg in the form of predefined 'cognitive tools', as explored with a ton of rigor here^1 -- seems like the way forward, at least to this casual observer." - chrisweekly

MCP (Model Context Protocol) vs. Direct Code Generation

The discussion features a significant debate on the merits of using MCP (or similar "tool-calling" mechanisms) versus having LLMs directly generate code.

Proponents of MCP argue that it provides a more structured, constrained, and potentially more reliable way for LLMs to interact with systems, by defining tools with clear schemas. This approach can offload the LLM from needing to understand complex API intricacies, authentication, or edge cases.

Critics, however, suggest that MCP adds overhead (larger inputs, token inefficiency) and that direct code generation, especially with the LLM's ability to leverage extensive training data on common languages and CLIs, can be more efficient and equally or more effective. Some feel that MCP is an unnecessary abstraction layer when direct code execution or API calls are sufficient.

Conversely, others point out that MCP is beneficial for internal or poorly documented tools where dumping the entire documentation into the context might be unwieldy.

"You are giving textual instructions to Claude in the hopes that it correctly generates a shell command for you vs giving it a tool definition with a clearly defined schema for parameters and your MCP Server is, presumably, enforcing adherence to those parameters BEFORE it hits your shell." - lsaferite
"To an LLM there’s not much difference between the list of sample commands above and the list of tool commands it would get from an MCP server. JSON and GNU-style args are very similar in structure." - wrs
"Sure, you could dump all the documentation into context for code generation, but that often requires more context than interacting with an MCP tool. More importantly, generated code for unfamiliar APIs is prone to errors so you'd need robust testing and retry mechanisms built in to the process." - victorbjorklund
"With MCP, if the tools are properly designed and receive correct inputs, they work reliably. The LLM doesn't need to figure out API intricacies, authentication flows, or handle edge cases - that's already handled by the MCP server." - victorbjorklund
"MCP works exactly that way: you dump documentation into the context. That's how the LLM knows how to call your tool." - the_mitsuhiko
"I found the limit of tools available to be <15 with sonnet4. That's a super low amount. Basically the official playwright MCP alone is enough to fully exhaust your available tool space." - the_mitsuhiko
"It demands too much context." - the_mitsuhiko (criticizing MCP's context usage)
"The problem I see with MCP is very simple. It's using JSON as the format and that's nowhere as expressive as a programming language." - never_inline
"I hate Bash. Hate it. And hate the ecosystem of Unix CLIs that are from the 80s and have the most obtuse, inscrutable APIs ever designed." - forrestthewoods
"MCP is literally the same as giving an LLM a set of man page summaries and a very limited shell over HTTP. It’s just in a different syntax (JSON instead of man macros and CLI args)." - wrs
"It would be better for MCP to deliver function definitions and let the LLM write little scripts in a simple language." - wrs
"I ended up writing an OSS MCP server that securely executes LLM generated JavaScript using a C# JS interpreter (Jint) and handing it a fetch analogue as well as jsonpath-plus. Also gave it a built-in secrets manager." - CharlieDigital
"I had the same experience this week attempting to build an agent first by using the Playwright MCP server, realizing it was slow, token-inefficient, and flaky, and rewriting with direct Playwright calls." - pamelafox
"MCP servers might be fun to get an idea for what's possible, and good for one-off mashups, but API calls are generally more efficient and stable, when you know what you want." - pamelafox
"Wouldn't the sweet spot for MCP be where the LLM is able to do most of the heavy lifting on its own (outputting some kind of structured or unstructured output), but needs a bit of external/dynamic data that it can't do without?" - pramodbiligiri
"Give it an objective and the LLM writes its own code and uses the tool iteratively to accomplish the task (as long as you can interact with it via a REST API)." - CharlieDigital

The Role of Sandboxing and Security

A recurring concern is the safety of granting LLMs access to execute commands or interact with systems. The idea of sandboxing the LLM execution environment is frequently brought up as a crucial security measure.

"You're letting the LLM execute privileged API calls against your production/test/staging environment, just hoping it won't corrupt something, like truncate logs, files, databases etc?" - e12e
"I have a small little POC agentic tool on my side which is fully sandboxed, an it's inherently 'non prompt injectable' by the data that it processes since it only ever passes that data through generated code." - the_mitsuhiko
"I've been using a VM for a sandbox, just to make sure it won't delete my files if it goes insane." - dist-epoch
"The job then becomes identifying those problems and figuring out how to configure a sandbox for them, what tools to provide and how to define the success criteria for the model." - simonw
"Give it an objective and the LLM writes its own code and uses the tool iteratively to accomplish the task (as long as you can interact with it via a REST API)." - CharlieDigital
"I ended up writing an OSS MCP server that securely executes LLM generated JavaScript using a C# JS interpreter (Jint) and handing it a fetch analogue as well as jsonpath-plus. Also gave it a built-in secrets manager." - CharlieDigital
"The problem with this is that you have to give your LLM basically unbounded access to everything you have access to, which is a recipe for pain." - empath75
"The half-baked or missing security aspects are more fundamental." - chrisweekly

The Future of LLMs and Human Collaboration

The discussion touches on the evolving relationship between human developers and LLMs, with some envisioning a future of "agentic coding" and others emphasizing the need for human oversight and collaboration. The concept of LLMs as powerful "colleagues" or accelerators rather than replacements for human ingenuity is a recurring theme.

"The current state of affairs, but of course many will try very hard to replace programmers that is currently not possible. What it is possible is to accelerate the work of a programmer several times (but they must be good both at programming and LLM usage), or take a smart person that has a relatively low skill in some technology, and thanks to LLM make this person productive in this field without the long training otherwise needed." - antirez
"It is not MCP: it is autonomous agents that don't get feedbacks from smart humans." - antirez
"Maybe we have to drop the self-assurance and opinionated view points and tackle this like a scientific problem." - khalic
"I always dreamed of a tool which would know the intent, semantic and constraints of all inputs and outputs of any piece of code and thus could combine these code pieces automatically." - luckystarr
"CLojure-mcp [...] gives the LLM a bash tool, a persistent Clojure REPL, and structural editing tools. The effect is that it's far more efficient at editing Clojure code than any purely string-diff-based approach..." - galdre
"I treat an LLM the same way I'd treat myself as it relates to context and goals when working with code." - rasengan
"I don't think these kinds of bets often pay off. The opposite actually, I think every truly revolutionary technological advance in the contemporary timeframe has arisen out of its very obvious killer app(s), they were in a sense inevitable." - nativeit
"It’s not a fad, it is a paradigm change." - basch (referring to ChatGPT's adoption)
"You can be right but too early." - golergka

Cost, Performance, and Vendor Lock-in

Concerns about the economic viability and scalability of LLM usage, particularly regarding ever-increasing costs and potential vendor lock-in, are also prevalent. Users question whether current pricing models are sustainable and whether cheaper, more open alternatives will eventually catch up.

"I'm using claude-code daily and find it very useful, but I'm expecting the price to continue getting jacked up." - rapind
"We'll see. I expect at some point the more open models and tools will catch up when the expensive models like ChatGPT plateau (assuming they do plateau). Then we'll find out if these valuations measure up to reality." - rapind
"No, they don't [get cheaper], the cost is just still hidden from you but the freebies will end just like MoviePass and cheap Ubers" - dingnuts
"I guess I don't see the technical limitation. Seem like a protocol update issue." - JyB (responding to context issues in MCP)
"I'm back on ChatGPT for O3 use, but expect that Grok4 will be next." - briandw
"If you can successfully abstract away the LLM cost with minimal engineer time, it's a net positive." - pclowes
"My understanding is that the model trained on just the first will never beat the model trained on both. Bloomberg model is my favorite example of this. If you can squirell away special data then that special data plus everything else will beat the any other models. But that's basically what openai, google, and anthropic are all currently doing." - johnsmith1840
"The economic arguments are too compelling not to try it given the recent drops in pricing." - pclowes