Here's a summary of the themes from the Hacker News discussion:
Simplicity and Efficacy of Minimalist Agents
A primary theme is the surprising effectiveness of a very simple, minimal code agent (mini-swe-agent
) for tasks like SWE-bench. This approach is lauded for its conciseness and ability to function with various LLMs, highlighting that complexity isn't always necessary for a working solution.
- "OK that really is pretty simple, thanks for sharing." - simonw
- "Lack of tools in mini-swe-agent is a feature. You can run it with any LLM no matter how big or small." - diminish
- "Mini swe agent, as an academic tool, can be easily tested aimed to show the power of a simple idea against any LLM. You can go and test it with different LLMs." - diminish
The Role and Limitations of Tools
The discussion extensively covers the debate around the necessity and design of tools for AI agents. While some argue that a minimal set of tools, or even just a bash tool, is sufficient, others emphasize the benefits of specific, well-designed tools for efficiency, safety, and better LLM performance.
- "Nice but sad to see lack of tools. Most your code is about the agent framework instead of specific to SWE." - BenderV
- "Imho, right tools allow small models to perform better than undirected tool like bash to do everything." - BenderV
- "Why are any of the tools beyond the bash tool required? Surely listing files, searching a repo, editing a file can all be achieved with bash?" - normie3000
- "The Bash tool, for instance, at times gets confused by bashisms, not escaping arguments correctly, not handling whitespace correctly etc." - the_mitsuhiko
- "Separate tools is simpler than having everything go through bash. If everything goes through bash then you need some way to separate always safe commands that don't need approval (such as listing files), from all other potentially unsafe commands that require user approval. If you have listing files as a separate tool then you can also enforce that the agent doesn't list any files outside of the project directory." - zarzavat
- "This is a very strong argument for more specific tools, thanks!" - normie3000
- "The model is aware of how these tools work, it is more token-efficient and it is generally much more successful at performing those actions." - the_mitsuhiko
- "This saves the LLM from having to do multiple low level clicking and typing and keeps it on track. Help the poor model out, will ya!?" - kissgyorgy (quoted from an unknown source)
- "You just keep throwing tokens at the loop, and then you've got yourself an agent. Money. Replace "tokens" with "money". You just keep throwing money at the loop, and then you've got yourself an agent." - codingdave
Challenges of Working with Existing Codebases
A key concern raised is the difficulty of applying LLM-based coding agents to existing, complex codebases compared to self-contained, simpler problems. The challenges involve understanding code organization, avoiding unintended side effects, and managing token limits.
- "when a problem is entirely self contained in a file, it's very easy to edit it with LLM. that's not the case with a codebase, where things are littered around in tune with specific model of organisation the developer had in mind." - faangguyindia
- "anyone can build a coding agent which works on a) fresh code base b) when you've unlimited token budget. now build it for old codebase, let's see how precisely it edits or removes features without breaking the whole codebase lets see how many tokens it consumes per bug fix or feature addition." - faangguyindia
- "This comment belongs in a discussion about using LLMs to help write code for large existing systems - it's a bit out of place in a discussion about a tutorial on building coding agents to help people understand how the basic tools-in-a-loop pattern works." - simonw
- "Gemini CLI often makes incorrect edits and gets confused, at least on my codebase. Just like ChatGPT would do in a longer chat where the context gets lost: random, unnecessary and often harmful edits are made confidently." - cryptoz
The Future of HCI for Coding Agents: Beyond the Terminal
Several users expressed dissatisfaction with the current terminal-based, chat-like interfaces for coding agents, particularly for complex tasks. They advocate for more sophisticated Human-Computer Interaction (HCI) paradigms, such as dashboards, HUDs, and interactive previews, to improve usability, error handling, and agent collaboration.
- "I really think the current trend of CLI coding agents isn't going to be the future. They're cool but they are too simple." - cryptoz
- "You'll get a much better result with a dashboard/HUD. The future of agents is that multiple of them will be working at once on the codebase and they'll be good enough that you'll want more of a status-update-confirm loop than an agentic code editing tool update." - cryptoz
- "The single-file lineup of agentic actions with user input, in a terminal chat UI, just isn't gonna cut it for more complicated problems. You need faster error reporting from multiple sources, you need to be able to correct the LLM and break it out of error loops. You won't want to be at the terminal even though it feels comfortable because it's just the wrong HCI tool for more complicated tasks." - cryptoz
- "Why do humans need a IDE when we could do anything in a shell? Interface give you the informations you need at a given moment and the actions you can take." - BenderV
The Importance of Learning and Building Custom Agents
A significant portion of the conversation emphasizes the value of understanding the fundamentals by building one's own coding agents. This is seen not just as an educational exercise but as a future-proofing skill for software engineering roles, akin to current whiteboard coding interviews.
- "Nice but sad to see lack of tools." - BenderV
- "For me, the post is missing an explanation of the reason why I would want to build my own coding agent instead of just using one of the publicly available ones." - prodimmune
- "You wouldn't. This project and this post are for the curious and for the learners." - dotancohen
- "Knowing how to build your own agent and what that loop is going to be the new whiteboard coding question in a couple of years. Absolute. It's going to be the same as 'Reverse this string', 'I've got a linked list, can you reverse it?', or 'Here's my graph, can you traverse it?'" - ghuntley
- "Exactly, dude. This is the most important thing, the fundamentals to understand how this stuff works under the hood. I don't get how people aren't curious. Why aren't people being engineers? This is one of the most transformative things to happen in our profession in the last 20 years." - ghuntley
Feedback on Presentation and Content
There was some critique regarding the presentation of the linked article, specifically its use of numerous images and lack of accessibility features like alt-text, suggesting it was a direct conversion of conference slides. Additionally, some users questioned the focus on building agents versus using existing ones.
- "I hate to do meta-commentary (the content is a decent beginner level introduction to the topic!), but this is some of the worst AI-slop-infused presentation I've seen with a blog post in a while." - hobofan
- "Why the unnecessary generated AI pictures in between? Why put everything that could have been a bullet point into it's own individual picture (even if it's not AI generated)? It's very visually distracting, breaks the flow of reading, and it's less accessible as all the picture lack alt-text." - hobofan
- "Agreed. It's unreadable." - bambax
- "Wow. Yeah. That's unreadable - my frustration and annoyance levels got high fast, had to close the page before I went for the power button on my machine :)" - gregrata
- "If a picture is usually worth 1000 words, the pictures in this are on a 99.6% discount. What the actual...?" - akk0
Program Synthesis and Model Capabilities
The discussion touches on program synthesis capabilities within agents, where models generate and execute code to perform tasks rather than just making direct edits. This is highlighted as an advanced behavior that contributes to an agent's effectiveness.
- "Where is the program synthesis? My way of thinking is given primitives as tools, i want the model to construct and return the program to execute." - revskill
- "Sonnet does this via the edit tool and bash tool. Itβs inbuilt to the model." - ghuntley
- "Keep an eye out for Sonnet generating Python files. What typically happens is: let's say you had a refactor that needs to happen, and let's say 100 symbols need renaming. Instead of invoking the edit tool 100 times, Sonnet has this behaviour where it will synthesise a Python program and then execute it to do it all in one shot." - ghuntley
Specific Models and Tools Mentioned
Several specific LLMs and coding agent tools were mentioned as benchmarks or alternatives, with opinions varying on their performance.
- "claude code (with max subscription), cursor-agent (with usage based pricing)" - mrugge
- "Claude code is the strongest atm, but roocode or cline (vscode extensions) can also work well. Roo with gpt5-mini (so cheap, pretty fast) does diff based edits w/ good coordination over a task, and finishes most tasks that I tried. It even calls them "surgical diffs" :D" - NitpickLawyer
- "Gemini CLI often makes incorrect edits and gets confused" - cryptoz
- "Gemini CLI still uses archaic whole file format for edits, it's not a good representative of current state of coding agents." - faangguyindia
- "Many agents can send diffs. Whole file reading and writing burns tokens and pollutes context." - faangguyindia
- "Opencode is pretty good and likely meets your needs. One thing I'll call out is Gemini is terrible as an agent currently because Gemini is not a very good tool calling an LLM. It's an oracle." - ghuntley