Here's a summary of the themes from the Hacker News discussion, presented with markdown headers and direct quotes:
DIY Search Engines and the Barrier to Entry
There's a strong sentiment that while the concept of building a search engine might seem feasible to an independent person or small group, the reality of competing with established players like Google is significantly more complex. The discussion highlights both the allure of DIY search and the immense practical challenges involved.
- "Google was invented many years ago by two guys in a dorm room and since then there's been so many white papers and advancements in the public sphere and the actual underlying problem has not changed that much, that it seems like it could be done by a small group or independent person." - the_real_cher
- "The actual underlying problem has changed altogether. Pagerank is easily gamed by SEO. Search candidates and rankings now require assessment by LLM. Moreover, as a default, users want the results intelligently synthesized into a text response with references rather than as raw results. Crawling too requires innovative approaches to bypass server filters. I doubt any independent person can afford to run a vector database or LLMs at immense scale." - OutOfHere
- "Google basically invented the modern cloud in order to efficiently use the hardware necessary to actually build those search engine indices. It's not really a question of implementing a good algorithm and away we go." - ambicapter
- "The hardest part about building a search engine is not the actual searching though, it is (like others here have pointed out), building your index and crawling the (extremely adversarial) internet, especially when you're running the thing from a single server in your own home without fancy rotating IPs." - luizfelberti
- "Most likely it is, the issue then becomes being able to store and afford the storage for all the files." - giancarlostoro
- "I think there are two factors that helped Google. First, the search engine landscape back then was absolutely abysmal. ... Second, the internet was different: when all nerds declared that Google is good, that was CNN-grade newsworthy... Today, that's not the case." - non_aligned
- "I know that Google engineers have a cushy life but I actually find it unlikely that a guy, who isn't attempting some radical new type of search (like pagerank back in the day) can hope to compete with the orgs in Google who support search." - HardCodedBias
- "I absolutely devoured Wilson Lins articles recently .. they are very high quality and informative for any amateur interested in search engines and LLMs!" - udkl
The Changing Nature of Search and User Expectations
A significant portion of the discussion revolves around how user expectations for search results have evolved, driven by the rise of AI and LLMs. The preference for synthesized answers over raw links is a key point of contention.
- "users want the results intelligently synthesized into a text response with references rather than as raw results." - OutOfHere
- "The reason I pay for Kagi is that I specifically don't want this to occur." - kcbanner
- "Workaccount2: How much do you technologically relate to the average person on the street though? Every person I have seen (outside the tiny tech bubble) google something has just read the AI overview without skipping a beat." - Workaccount2
- "At this point the web is also so centralized you only need 3 bookmarks these days (your news, youtube and Amazon) A search is just learning what you don't know and AI does a better job than search has ever done for me - and I'm in tech." - throwmeaway222
- "We’re being force fed them. I’m an AI hater and I catch myself reading those sometimes. Yes, people want the answer directly. Google wants you to stay on their site to read some mishmash. I think the ideal would be to immediately go to the source’s site." - jkestner
- "Incidentally, are there any sites that do actual web search any more, better than Yandex? I'd rather avoid a Russian site if I can, but there are whole topics where it's impossible to find anything useful on heavily "massaged" allegedly-Web-search-but-not-really sites like Google and DDG (Bing), but I can find what I want on page 1 or 2 of a Yandex search. Is Kagi as good as that, or is their index simply ignoring a whole bunch of the Web like so many others?" - yepitwas
- "I suspect that chat apps dominate (80+%?) the under-20 demographic, and have a sizable chunk of the under-30 demographic. Within the next five years it will probably represent 50+% of total search traffic. Maybe it already does. It makes sense that any search site that wants to be in the game tomorrow would keep racing down the AI chat path." - freeopinion
- "I've been thinking that google could use its own AI to evaluate URLs instead of relying on pagerank and backlinks which are almost completely valueless as a signal in 2025. in my niche there's more slop than ever being produced daily and it's all hitting rank 1. it's tragic what google is doing to the internet." - p3rls
- "When I started using it (~ 2 years) , it was necessary. Google was simply not solving my actual issues (software related). Now, It seems that google might have improved a bit. I check from time to time and the gap isn't as huge, as when Kagi started" - vlucas
Technical Hurdles in Crawling and Indexing
The difficulty of effectively crawling and indexing the modern web is a recurring theme. Many users point to increased complexity due to CAPTCHAs, JavaScript requirements, paywalls, and adversarial website operators.
- "Crawling is much more difficult than it used to be. Significantly more content is behind a login, Javascript is required for way more than it should be, and almost the entire web is behind cloudflare or another type of captcha." - dec0dedab0de
- "These things are actually fairly small problems. The parts that absolutely require JS can't be reliably linked to and nobody indexes that stuff. Most apparent SPA:s serve a HTML alternative if you don't claim to be a web browser in the UA. Cloudflare and the like are also fairly easy to deal with as long as your crawler is well behaved. You can register the fingerprint and mostly get access to cf:ed websites." - marginalia_nu
- "It feels a lot harder to crawl the internet these days, as others have said around here as well. What are some good practices these days to ensure a good crawl/scrape? Invest in proxies, preferably residential?" - risico
- "No one wants to pay for search. People on HN are probably 90% of their total possible market." - throwaway290
- "The curse of Hacker News!" - ytrt54e (referring to a site going down)
- "The last time I saw a fastcompany link must have been a decade ago! I was nostalgically looking forward to read another article of theirs. Alas..." - lucb1e
- "The IP thing is interesting, I was trying to make this CSGO bot one time to scrape steam's prices and there are proxy services out there you rent, tried at least one and it was blocked by steam. So I wonder if people buy real IPs." - ge96
- "Yeah people buy residential IPs on the black market. They are essentially infected home PCs and botnets." - kccqzy
- "why can't crawling be crowd sourced? It would solve ip rotation and spread the load" - wordpad
- "Is the common crawl usable for something like this? https://commoncrawl.org" - moduspol
- "The crawl seems hard but the difference between having something and not having it is is very obvious. Ordering the results is not. What should go on page 200 and do those results still count as having them?" - 6510
The "Kagi Discussion" and User Advocacy
A recurring and often debated theme is the frequent mention of Kagi, a paid search engine. Some users see this as genuine user advocacy for a good product, while others express skepticism, questioning whether it's an organic phenomenon or some form of subtle marketing.
- "I greatly prefer Kagi https://help.kagi.com/kagi/company/ but it's very nice to see more competition in this space in general." - ourguile
- "Kagi is a polished product. This is drying someones laundry." - tmdetect
- "Do Kagi users get paid for shilling the company? Nearly all threads relating to the subject of search has a few mentionings of the glory of Kagi, often including links to the site. I suspect this is not as effective as the Kagi crew thinks since there is likely to be a large overlap between their potential customers and those who are really turned off by such shilling." - the_third_wave
- "Have you considered it's a good product that causes its users to become advocates?" - hamdingers
- "Nope, it's just a nice thing I like. It is nearly the platonic ideal of a search engine for me. It causes me no problems and doesn't try to sell me garbage. It's like discovering that there a better pair of shoes that're more comfortable. Everybody can use a slightly improved more comfortable pair of shoes, so it comes up frequently." - lelandbatey
- "I just don’t understand people who get so upset that someone might like something enough to talk about liking it. So upset that they won’t ever try the thing. Like … ok I guess? You do you. It’s just a strange way to make decisions. At least this is just a consumer product. Worse is when people here say they make technical decisions using the same process. They’d black list certain tech because they’ve heard people talking about how it solved their problems. Also ok, but now I know I should avoid them professionally." - testdelacc1
- "I get the impression it's the volume of the folks who sing its praises. There was a web3 crowd for a while, Bitwarden champions would show up to any mention of a password manager, and (ahem) some AI champions can be over the top" - mdaniel
- "Flip side how much does Google pay you to defend their monopoly? Kagi is a solid product with a team that clearly cares about what they’re building. They’re transparent and post change logs when things update. I simply trust them infinitely more than Google." - dawnerd
- "I understand skepticism in the age of LLM-generated content and CAPTCHA-solving bots. What I don't understand is why people choose such weird hills to die on and think that posting about it will accomplish anything." - alexjplant
- "Kagi customer here. Not getting paid to shill. I think it's worth occasionally mentioning alternatives that are good enough to pay for so that other people know there are other people using other options." - datadrivenangel
- "Whenever I fall back to Google and see how terrible it has become I feel sorry for everyone still using it as their main search engine so I tend to link people to kagi because it's just so much better. Especially the customization aspects. I also like the idea of mainstreaming to pay for critical services like search. No paid shilling whatsoever." - jasonvorhe
The Evolving Internet Landscape and Centralization
Several users commented on the internet's shift towards centralization, with large platforms dominating content and making it harder for independent crawlers and niche search engines to thrive.
- "People used to submit their sites to search engines and now they might actively block search engines. So a search engine author might have to spend a lot of effort in adversarial games." - freeopinion
- "The internet now isn't at all the same as the internet in 1999. Discovery isn't really that useful. If you find someone's self-hosted blog about dinosaurs, it probably hasn't been updated since 2004, all the links and images are broken, and it's just thoroughly upstaged by Wikipedia and the Smithsonian." - phendrenad2
- "We've basically come full circle to the AOL model, where there are "hubs" of content that cater to specific categories. YouTube has ALL the long-form essays. Tiktok has ALL the humorous videos. Medium has ALL the opinion pieces. Reddit has ALL the flame wars. Mayo Clinic has ALL the drug side-effects. Amazon has ALL the shopping. Ebay has ALL the collectables." - phendrenad2
- "None of these big companies want nasty little web crawlers poking and prodding their site. But they accept Google crawlers, because Google brings them users. Are they going to be that friendly to your crawler?" - phendrenad2
- "Google maps is probably a big moat that's very hard to replicate. You can't as easily just crawl all of that data." - snek_case
- "If you wrote that 100 people could outwork one person, I'd nod my head. If you wrote that 10k people could outwork 1k people, I'd shrug. If you tell me that 100 people can combine to tie my shoe faster than I can, I'd question that." - freeopinion
- "With the decentralization/recentralization of the Web, it may become easier for certain people to roll their own search engines for their respective communities and crawl/index pages only according to their shared tastes." - tolerance
Hardware and Cost Considerations for Search Engines
The discussion touches upon the hardware requirements and associated costs of running a search engine, with some users sharing experiences with acquiring powerful, albeit used, server components.
- "The beefy CPU running this setup, a 32-core AMD EPYC 7532, underlines just how fast technology moves. At the time of its release in 2020, the processor alone would have cost more than $3,000. It can now be had on eBay for less than $200" - ofrzeta
- "You need to spend a lot of time looking through badly labeled offers, and be willing to buy from sellers with no reputation." - progval
- "I got a 7551p plus motherboard and ram for about 600 bucks from China this January. I may have overpaid but it works great, and gets the job done." - throwawayffffas
- "_fat_santa: Not for a CPU but earlier this year I bought a Thinkpad workstation off eBay for $500. It's a machine from 2020 and when it was new cost $5,700. I see this for pretty much all hardware out on eBay, just go back 5 years and watch the price fall 10x." - _fat_santa
- "Has eBay fixed their "and then they ship you a box of rocks" problem? [...] IME eBay favors buyers." - saalweachter / accrual
- "You don't get that with used old stuff, you get it with unrealistic low prices for new stuff. A 7532 CPU is now ewaste for all the datacenters out there 1/10 of original price is reasonable, but the latest Nvidia GPU for 200 bucks is obviously a scam." - throwawayffffas
- "TheServerStore.com often has good deals. I actually bought a brand new 64-core EPYC 7702 server with 256 GB RAM and 8TB NVMe storage for about $3K fully assembled earlier this year." - Gormo
"Nostalgia" and the "Found" Experience
A few users express a longing for the days of discovering unique, independently hosted personal websites and the charm of older internet exploration, contrasting it with the current era of hyper-centralization.
- "The great thing about this is that with the decentralization/recentralization of the Web, it may become easier for certain people to roll their own search engines for their respective communities and crawl/index pages only according to their shared tastes." - tolerance
- "jrm4: More to the point, it's a shame that we can't collectively grok (dammit, they took that from us too) concepts like 'personal' and/or 'curated' directories, e.g. individual and group wikis and so forth on perhaps more directed topics with lists of good links." - jrm4
- "Right, I suppose I mean 'getting more people to think about why a few of these bookmarked for your favorite topics, especially tied to a trustworthy person, is a million times better than just hitting up Google.'" - jrm4