My experience creating software with LLM coding agents – Part 2 (Tips)

This Hacker News discussion revolves around the evolving use of Large Language Model (LLM) agents in software development, with a strong emphasis on practical experiences, cost-effectiveness, and refining interaction techniques.

The Efficacy and Learning Curve of LLM Agents

Several users highlight that achieving success with LLM agents requires a thoughtful and deliberate approach, akin to traditional software development best practices. The quality of output is directly tied to the quality of input and the methodology employed.

"This lines up with my own experience of learning how to succeed with LLMs. What really makes them work isn't so different from what leads to success in any setting: being careful up front, measuring twice and cutting once." - xwowsersx
"Clarifying the tasks and breaking them down is very helpful too. Just you end up spending lots of time on it." - 3abiton

Strategies for Better LLM Agent Performance

A significant portion of the discussion focuses on specific techniques users have found effective in improving the quality and reducing the "wasted money" associated with using LLM agents. These strategies often involve more directive prompting and simulated human interaction patterns.

"One weird trick is to tell the LLM to ask you questions about anything that’s unclear at this point. I tell it eg to ask up to 10 questions. Often I do multiple rounds of these Q&A and I‘m always surprised at the quality of the questions (w/ Opus). Getting better results that way, just because it reduces the degrees of freedom in which the agent can go off in a totally wrong direction." - manmal
"The way coding works, is I produce sub-standard code, and then show it to others on stackoverflow! Others chime in with fixes!" "Get the LLM to simulate this process, by asking to to post its broken code, then asking for "help" on "stackoverflow" (eg, the questions it asks), and then after pasting the fix responses." [...] "So just go through the simulated exchange, and success." - bbarnett
"Another tip. For a specific tasks don't bother with "please read file x.md", Claude Code (and others) accept the @file syntax which puts that into context right away." - athrowaway3z
"I’ve seen going very successfully using both codex with gpt5 and claude code with opus. You develop a solution with one, then validate it with the other. I’ve fixed many bugs by passing the context between them saying something like: “my other colleague suggested that…”." - Lucasoato

Cost and Pricing Models for Heavy LLM Users

The financial implications of using LLMs, particularly for "heavy users," emerge as a critical theme. Concerns about API costs are prevalent, leading to discussions about alternative pricing structures and the perceived value of the expenditure.

"If I paid for my API usage directly instead of the plan it'd be like a second mortgage." - CuriouslyC
"Am I alone in spending $1k+/month on tokens? It feels like the most useful dollars i've ever spent in my life. The software I've been able to build on a whim over the last 6 months is beyond my wildest dreams from a a year or two ago." - ramesh31
"I would if there were any positive ROI for these $12k/year, or if it were a small enough fraction of my income. For me, neither are true, so I don’t :)." - kergonath
"I’m unclear how you’re hitting $1k/mo in personal usage. GitHub Copilot charges $0.04 per task with a frontier model in agent mode - and it’s considered expensive. That’s 850 coding tasks per day for $1k/mo, or around 1 per minute in a 16hr day." - OtherShrezzing
"This is very very wrong. Anthropic's Max plan is like 10% of the cost of paying for tokens directly if you are a heavy user. And if you still hit the rate-limits, Claude Code can roll-over into you paying for tokens through API credits. Although, I have never hit the rate limits since I upgraded to the $200/month plan." - sothatsit

The "Value" Proposition and User Output

A debate surfaces regarding the actual output and the long-term benefits of heavy LLM usage, with some users questioning the demonstrable results and others extolling the transformative potential.

"Am I alone in spending $1k+/month on tokens? It feels like the most useful dollars i've ever spent in my life. The software I've been able to build on a whim over the last 6 months is beyond my wildest dreams from a a year or two ago." - ramesh31
"I would personally never. Do I want to spend all my time reviewing AI code instead of writing? Not really. I also don't like having a worse mental model of the software." - tovej
"If freelancing and if I am doing 2x as much as previously with same time, it would make sense that I am able to make 2x as much. But honestly to me with many projects I feel like I was able to scale my output far more than 2x." - mewpmewp2
"Audit and review? Sounds like a vibe killer." - F7F7F7

Best Practices for Context and Readme Files

A discussion emerges about file naming conventions and best practices for providing context to LLM agents, particularly concerning the use of README.md files.

"As a human dev, can I humbly ask you to separate out your LLM "readme" from your human README.md? If I see a README.md in a directory I assume that means the directory is a separate module that can be split out into a separate repo or indeed storage elsewhere. If you're putting copy in your codebase that's instructions for a bot, that isn't a README.md. By all means come up with a new convention e.g. BOTS.md for this. As a human dev I know I can safely ignore such a file unless I am working with a bot." - alex-moon
"I think things are moving towards using AGENTS.md files: https://agents.md/. I’d like something like this to become the consensus for most commonly used tools at some point." - kergonath
"Readme literally means read me. Not 'this is a separate project'. Not 'project documentation file'. You can have read mes dotted all over a project if that's necessary." - mattmanser

LLM Agents and Test Failure Handling

The behavior of LLM agents when confronted with test failures is a point of discussion, with observations that they may resort to disabling tests rather than fixing them. The specific framework and language used can influence this behavior.

"One of the weird things I found out about agents is that they actually give up on fixing test failures and just disable tests. They’ll try once or twice and then give up." - No Author Provided
"Its important to not think in terms of generalities like this. How they approach this depends on your tests framework, and even on the language you use. If disabling tests is easy and common in that language / framework, its more likely to do it." - athrowaway3z

Perceived Bias in Blog Post

One user offers a meta-commentary on the original blog post, suggesting it might be an advertisement, which they find ironic given the author's previous critique of modern advertising.

"The blogpost is transparently an advertisement, which is ironic considering the author's last blogpost was https://blog.efitz.net/blog/modern-advertising-is-litter/" - yifanl