Building your own CLI coding agent with Pydantic-AI

This discussion on Hacker News primarily revolves around the user experience and effectiveness of AI agent development libraries, with a strong focus on Pydantic AI and its alternatives like LiteLLM. Several key themes emerge:

Pydantic AI: Mixed Experiences and Ongoing Development

There's a clear division in user experiences with Pydantic AI. Some find it "lovely" and easy to use for constructing agents, while others encounter significant difficulties that lead them to seek simpler solutions. Maintainers are actively involved, acknowledging some pain points and working on improvements.

Positive Experiences:
- "Pydantic-AI is lovely - I've been working on a forever, fun project to build a coding agent CLI for a year plus now. IMO it does make constructing any given agent very easy." - binalpatel
- "I have been using it [Pydantic AI] a lot lately and anything beyond basic usage is an absolute chore." - iLoveOncall (This user later clarifies their statement: "Pydantic still sees multiple commits per week, which is less than it was at one point, but I'd say that's a sign of its maturity and stability more than a lack of attention.")
- "After maintaining my own agents library for a while, I’ve switched over to pydantic ai recently. I have some minor nits, but overall it's been working great for me." - bluecoconut
- "we[0] have a pretty complex agent[1] running on Pydantic AI. The team is very responsive to bugs / feature requests. If I had to do it over again, I'd pick Pydantic AI again." - mritchie712
Negative Experiences and Criticisms:
- "I had the opposite experience. I liked the niceties of Pydantic AI, but had trouble with it that I found difficult to deal with. For example, some of the models wouldn't stream, but the OpenAI models did. It took months to resolve..." - ziftface
- "I feel like the Claude code cli always does a little bit better, subjectively for me, but I haven’t seen a LLMarena or clear A vs B, comparison or measure." - bluecoconut
- "I wanted to love pydantic AI as much as I love pydantic but the killer feature is pydantic-model-completion and weirdly.. it has always seemed to work better for me when I naively build it from scratch without pydantic AI." - photonthug
- "All I know is that with the same LLM models, openai.client.chat.completions + a custom prompt to pass in the pydantic JSON schema + post-processing to instantiate SomePydanticModel(*json) creates objects successfully whereas vanilla pydantic-ai rarely does, regardless of the number of retries." - photonthug
Maintainer Response and Clarifications:
- DouweM, a Pydantic AI maintainer, actively engages with the feedback, addressing specific issues like streaming support and encouraging users to report bugs with reproducible examples.
- Regarding streaming: "I'm not sure how long ago you tried streaming with Pydantic AI, but as of right now we ... support streaming against the OpenAI, Claude, Bedrock, Gemini, Groq, HuggingFace, and Mistral APIs, as well as all OpenAI Chat Completions-compatible APIs..." - DouweM
- Regarding bug reporting for Azure OpenAI: "Pydantic AI maintainer here! Did you happen to file an issue for the problem you were seeing with Azure OpenAI?... The vast majority of bugs we encounter are not in Pydantic AI itself but rather in having to deal with supposedly OpenAI Chat Completions-compatible APIs that aren't really..." - DouweM
- On nested models: "Thanks, a reproducible example would be very useful. Note that earlier this month I made Pydantic AI try a lot harder to use strict JSON mode... so if you haven't tried it in a little while, the problem you were seeing may very well have been fixed already!" - DouweM

Simplicity vs. Abstraction: The Value of LiteLLM and Direct Implementations

A recurring theme is the trade-off between the convenience of high-level abstractions like Pydantic AI and the simplicity and control offered by more direct interfaces or simpler wrappers like LiteLLM. Some users find that abstractions can introduce unnecessary complexity and a higher chance of bugs, especially when dealing with niche features or less common model providers.

Preference for Simplicity:
- "LiteLLM's docs were simple and everything worked as expected. The agentic code is simple enough that I'm not sure what the value-add for some of these libraries is besides adding complexity and the opportunity for more bugs." - ziftface
- "I'm sure for more complex use cases they can be useful, but for most of the applications I've seen, a simple translation layer like LiteLLM or maybe OpenRouter is more than enough." - ziftface
Challenges with Abstractions:
- "These abstractions are nice to not get locked in with one llm provider - but like with langchain - once you use some more niche feature the bugs do shine through. I tried it out with structured output for azure openai but had to give up since somewhere somewhat was broken and it's difficult to figure out if it's the abstraction or the library of the llm provider which the abstraction uses." - siva7
- "I wish Pydantic invested in... Pydantic, instead of some AI API wrapper garbage." - iLoveOncall
"Getting Locked In":
- "These abstractions are nice to not get locked in with one llm provider..." - siva7
- "In this example, you get locked into pydantic_ai, another proprietary provider." - dcreater
- "...you get quickly locked in a extremely fast paced market where today's king can change weekly." - siva7

Pydantic Core and Python's Evolution

Beyond the AI-specific wrapper, there's a sentiment that Pydantic, while a powerful tool, should ideally be more integrated into Python's standard library. This reflects a broader discussion about the evolution of Python and its core data structures.

Desire for Core Integration:
- "I wish Python would improve to bridge the gaps between pydantic and dataclasses, so that we don't have to rely on pydantic. It's too foundational a piece to not be part of the core python anymore" - dcreater
Alternatives and Comparisons:
- "attrs + cattrs is pretty close. I know it’s not in the stdlib, but dataclasses were modelled on attrs in the first place and using attrs + cattrs feels quite a bit more idiomatic than Pydantic." - JimDabell
Pydantic's Ongoing Health:
- "Pydantic still sees multiple commits per week, which is less than it was at one point, but I'd say that's a sign of its maturity and stability more than a lack of attention." - DouweM

Evaluating Agent Performance and Benchmarking

One user raises a critical question about how to effectively measure the performance and efficiency of different coding agent implementations. The lack of standardized benchmarks in this rapidly evolving field is highlighted as a significant gap.

The Need for Benchmarking:
- "Towards coding agents, I wonder if there are any good / efficient ways to measure how much different implementations work on coding? SWE-bench seems good, but expensive to run." - bluecoconut
- "Effectively I’m curious for things like: given tool definition X vs Y ..., prompt for tool X vs Y ..., model choice ..., sub-agents, todo lists, etc. how much across each ablation, does it matter? And measure not just success, but cost to success too (efficiency)." - bluecoconut
- "Overall, it seems like in the phase space of options, everything “kinda works” but I’m very curious if there are any major lifts, big gotchas, etc." - bluecoconut

The Power of Runtime Inspection and Dynamic Context

A key insight is shared regarding the advantage of leveraging runtime inspection, model documentation, and even source code to provide dynamic and structured context to LLMs. This approach is seen as crucial for building more reliable and less "magical" AI systems.

Leveraging Context and Inspection:
- "getting dynamic prompt context by leveraging JSON schemas, model-and-field docs from pydantic, plus maybe other results from runtime-inspection (like the actual source-code) is obviously a very good idea." - photonthug
- "Documentation is context, and even very fuzzy context is becoming a force multiplier." - photonthug
- "Similarly languages/frameworks with good support for runtime-inspection/reflection and have an ecosystem with strong tools for things like ASTs really should be the best things to pair with AI and agents." - photonthug

Cultural Parallel to Agile and Software Development Methodologies

One comment draws an interesting parallel between the current state of "AI" and "vibe coding" and historical shifts in software development, referencing figures like Martin Fowler and their advocacy for agile methodologies.

Historical Parallels in Development:
- "Fowler has pushed Agile, UML, Design Patterns, NoSQL, Extreme Programming, Pair Programming. "AI" and vibe coding fits very well into that list." - bgwalter