Launch HN: Reducto Studio (YC W24) – Build accurate document pipelines, fast

Website Design Praise

Several users complimented the website's design. "I'm not a product fit, but I would like to take a moment to praise the detailed beauty of the design work on the site," said weego, highlighting specific elements like "the typography and layout to the line-work down to how the gradients in the, in fashion, large logotype at the bottom of the footer are tied in by using texture." iyn agreed, saying, "Agreed — came here to say exactly that. I like that this is not yet another tailwind template (nothing wrong with them, I use them all the time) but something with its own identity. I especially love the illustrations/icons. Well done!"

Launch Bug & Mobile Issues

The initial launch had some technical issues. "FYI - https://links.reducto.ai/studio doesn't seem to be working... ERR_TOO_MANY_REDIRECTS" noted omaerkhan. While adit_a claimed it was quickly "Fixed!" TimMeade reported, "Still not working here." Later, serjester reported, "Congrats on the launch guys, mobile website seems to be broken though." These early issues were quickly addressed by the Reducto team.

Competitive Landscape & Differentiation

A key theme revolved around how Reducto stacks up against competitors, particularly in the crowded document extraction space. skadamat asked, "Congrats on the launch! How do you guys compare with Datalab with regards to accuracy? https://www.datalab.to/" gbertb echoed this, stating, "I want to know this, too. Lots of these companies are doing the same thing, but leave out benchmarks that include marker." adit_a responded, "We have a lot of respect for the work VikP and his team did on Surya but we haven't benchmarked his newer pipeline so I don't want to make a 1:1 claim. If you want to do a side by side with your use case we'd be happy to set you up with free trial access." jackienotchan also inquired about differentiation, given the number of document extraction companies funded by YC, asking "How do you differentiate from these? And how do you see the space evolving as LLMs commoditize PDF extraction?"

adit_a addressed this by stating that feature sets will converge, so the focus should be on "accuracy, reliability, and scalability." They elaborated, "generally speaking we don't see differentiation in the sense of just feature set since that'll converge over time, and instead primarily focus on accuracy, reliability, and scalability, all 3 of which have a very substantive impact from last mile improvements." They also emphasized the company's focus on using LLMs and VLMs as tools, not viewing their improvements as "antagonistic."

kbyatnal, Founder of Extend, claimed Reducto had "cloned us down to the small details... a recursive spreadsheet-like experience." adit_a responded, "Hey, we've never used or even attempted to use your platform. Respectfully I think you know that, and that you also know that your team has tried to get access to ours using personal gmail accounts dating back to 2024. A schema builder with nested array fields has been part of our playground (and nearly every structured extraction solution) for a very long time and is just not something that we even view as a defining part of the platform."

Product Value Proposition

The discussion touched upon the value that Reducto provides to users. willwjack commented, "This would have saved me so much pain back when I was working on RAG workflows. Great to see." Further, b0a04gl highlighted the potential for Reducto to evolve into a system that remembers and adapts to changes in document formats over time, suggesting that "if reducto leans in fully as the layer that remembers every correction, every edge case, every shift in layout or wording across document versions it starts becoming more than a pipeline. it becomes institutional memory for unstructured data...continuity's the moat imo among the competitors." raunakchowdhuri confirmed this as Reducto's vision: "this is exactly where we're going with this! glad you see the vision :)" adit_a added, "Yeah, we're extremely excited about the potential of building a flywheel for each individual customer's pipeline."

Series A and Launch Timing

echelon questioned the timing of the launch relative to the Series A funding: "How do you raise Series A before launch / PMF? I assume y'all launched before this to select partners? Or perhaps this is a new product on top of the core product?" adit_a clarified, "To clarify, our API was already fully launched and in prod with customers when we raised our series A. This launch is specifically for the platform we're building around the API :)"

Legal Compliance and DPAs

Fraaaank raised a question about data processing agreements, asking "Why do you only get a data processing agreement when on the enterprise plan? It's a legal requirement for any European company." adit_a responded, "We have a default DPA we're willing to sign on all tiers -- the note in the pricing page is meant to refer to custom/redlined DPAs that become complex to manage over time. We'll edit that to make it more clear."

Extend's Perspective

kbyatnal provided a detailed perspective from Extend, emphasizing that while the document processing space is large and competition is beneficial, there's still a significant gap in going from raw OCR outputs to production-ready document pipelines: "Having worked with a ton of startups & F500s, we've seen that there's still a large gap for businesses in going from raw OCR outputs —> document pipelines deployed in prod for mission-critical use cases. LLMs and VLMs aren't magic, and anyone who goes in expecting 100% automation is in for a surprise." He further explained the complexities involved in building and maintaining these pipelines, including dataset labeling, pipeline orchestration, and human-in-the-loop correction, highlighting Extend's focus on providing AI teams with the necessary tooling to "hit accuracy quickly."