Essential insights from Hacker News discussions

Ban me at the IP level if you don't like me

This discussion revolves around the challenges of managing unwanted internet traffic, primarily bots and scrapers, and the various strategies being employed or considered to combat them. Key themes include IP-based blocking, whitelisting, the role of cloud providers and residential IPs, the impact on legitimate users, and the increasing complexity of identifying and mitigating malicious actors.

The Growing Problem of Bots and Scrapers

A significant portion of the conversation highlights the sheer volume and persistence of unwanted traffic, often from automated sources, that overwhelms servers and degrades service.

  • "The sheer amount of IP addresses one has to block to keep malware and bots at bay is becoming unmanageable." - ygritte
  • "I'm pretty sure I still owe t-mobile money. When I moved to the EU, we kept our old phone plans for awhile. Then, for whatever reason, the USD didn't make it to the USD account in time and we missed a payment. Then t-mobile cut off the service and you need to receive a text message to login to the account. Obviously, that wasn't possible. So, we lost the ability to even pay, even while using a VPN. We just decided to let it die, but I'm sure in t-mobile's eyes, I still owe them." - withinboredom (This expresses frustration with service access issues which, while not directly about bots, touches on the user experience friction that can arise from restrictive network policies, a common side effect of bot mitigation.)
  • "My personal server was making constant hdd grinding noises before I banned the entire nation of China. I only use this server for jellyfin and datahoarding so this was all just logs constantly rolling over from failed ssh auth attempts" - snickerbockers
  • "My friend has a small public gitea instance, only use by him a a few friends. He's getting thousounds of requests an hour from bots. I'm sorry but even if it does not impact his service, at the very least it feels like harassment" - phito

IP Address Blocking and Whitelisting Strategies

Participants discuss the effectiveness and feasibility of blocking or whitelisting IP addresses and ranges as a primary defense mechanism.

  • Blocking Aggressive Scrapers: "I've seen blocks like that for e.g. alibaba cloud. It's sad indeed, but it can be really difficult to handle aggressive scrapers." - def
  • Whitelisting IP Ranges: "One starts to wonder, at what point might it be actually feasible to do it the other way around, by whitelisting IP ranges." - Etheryte
  • Positive Results from Whitelisting: "I admin a few local business sites.. I whitelist all the countries isps and the strangeness in the logs and attack counts have gone down." - worthless-trash
  • Challenges in Whitelisting: "So, its relatively easy because there is limited ISP's in my country. I imagine its a much harder option for the US." - worthless-trash
  • Publicly Curated Lists: "Is there a public curated list of "good ips" to whitelist ?" - coffee_am
  • Community Efforts: partyguy links to https://github.com/AnTheMaker/GoodBots as a potential resource. aorth expresses interest but also futility: "Noble effort. I might make some pull requests, though I kinda feel it's futile."

The Problem with Residential and Cloud IPs

A major point of friction is the use of residential and cloud provider IP addresses by malicious actors, which complicates simple blocking strategies.

  • Residential Proxies: "Unfortunately, well-behaved bots often have more stable IPs, while bad actors are happy to use residential proxies. If you ban a residential proxy IP you're likely to impact real users while the bad actor simply switches. Personally I don't think IP level network information will ever be effective without combining with other factors." - bobbiechen
  • CGNAT and Shared IPs: "In these days of CGNAT, a residential IP is shared by multiple customers." - richardwhiuk
  • ISP-Level Blocking: A discussion ensues about the consequences of blocking entire ISPs or cities due to residential proxy usage. "If enough websites block the entire ISP / city in this way, and enough users get annoyed by being blocked and switch ISPs, then the ISPs will be motivated to stay in business and police their customers' traffic harder." - Arnavion
  • Cloud Provider Blocking: "Why not just ban all IP blocks assigned to cloud providers? Won't halt botnets but the IP range owned by AWS, GCP, etc is well known" - shortrounddev2. However, this is countered: "Because crawlers would then just use a different IP which isn’t owned by cloud vendors." - hnlmorg. And: "Tricky to get a list of all cloud providers, all their networks, and then there are cases like CATO Networks Ltd and ZScaler, which are apparently enterprise security products that route clients traffic through their clouds 'for security'." - aorth.
  • Difficulty in Identification: "Many US companies do it already. It should be illegal, at least for companies that still charge me while I’m abroad and don’t offer me any other way of canceling service or getting support." - lxgr, highlighting the broad impact of geo-blocking.

Geographical Blocking and its Consequences

Blocking entire countries or regions is a common, albeit controversial, tactic.

  • Pragmatic Blocking: "We solved a lot of our problems by blocking all Chinese ASNs. Admittedly, not the friendliest solution, but there were so many issues originating from Chinese clients that it was easier to just ban the entire country." - wandal. "It's not like we can capitalize on commerce in China anyway, so I think it's a fairly pragmatic approach." - lwansbrough
  • Escalation of Blocking: "Why stop there? Just block all non-US IPs!" - lxgr, sarcastically pointing out the trend.
  • Impact on Travelers and VPN users: "This causes me about 90% of my major annoyances. Seriously. It’s almost always these stupid country restrictions. ... I was in UK. I wanted to buy a movie ticket there. Fuck me, because I have an Austrian ip address, because modern mobile backends pass your traffic through your home mobile operator." - ruszki.
  • Economic Rationale: "It is not silly pseudo-security, it is economics. Ban Chinese, lower your costs while not losing any revenue. It is capitalism working as intended." - ordu. "Not sure I'd call dumping externalities on a minority of your customer base without recourse 'capitalism working as intended'." - lxgr counters.
  • Specific Country Targets: Seychelles and Cyprus are noted as sources of inordinate bad traffic, with speculation linking them to financial or tax structures used by other nations (India, Russia, China). "The Seychelles has a sweetheart tax deal with India such that a lot of corporations who have an India part and a non-India part will set up a Seychelles corp to funnel cash between the two entities." - seanhunter. "So the seychelles traffic is likely really disguised chinese traffic." - grandinj.

Technical Approaches and Sophistication

The discussion delves into more nuanced technical and strategic approaches to bot mitigation.

  • IP Reputation and ASN Blocking: The idea of blocking entire Autonomous System Numbers (ASNs) that are prolific sources of bad traffic is raised. Tools like bgp.tools and hackertarget.com are mentioned for ASN lookups.
  • User Agent String Filtering: Blocking based on User Agent strings is suggested as a simpler method, though acknowledged as easily spoofed. "Blocking the IP can be done at the firewall" - lexicality, highlighting the advantage of IP-based blocking.
  • Rate Limiting and Tarpitting: Throttling bandwidth or employing tarpitting (slowing down responses) is discussed as a way to make scraping more resource-intensive for bots. "What’s worked better for me is tarpitting or just confusing the hell out of scrapers so they waste their own resources." - niczem.
  • Dynamic Blocking: Solutions that automatically detect and block malicious IPs or networks in real-time are explored. "I’m currently working on a sniffer that tracks all inbound TCP connections and UDP/ICMP traffic and can trigger firewall rule addition/removal based on traffic attributes..." - sneak.
  • The "Thinkbot" Problem: A specific bot that identifies itself via its User Agent and even suggests IP blocking is discussed, leading to a debate on whether to block by User Agent or IP.
  • "Adversarial" Security: Some users adopt strategies that actively confuse or inconvenience bots, such as intentionally serving malformed data or using obscure server banners. "I add re-actively. I figure there are "legitimate" IP's that companies use and I only look at IP addresses that are 'vandalizing' my servers with inappropriate scans and block them." - roguebloodrage.
  • Allowlisting: The concept of "allowlists" (whitelisting) is presented as a more secure, albeit more challenging, approach. "I've been experimenting with a system where allowed users can create short-lived tokens via some out-of-band mechanism..." - imiric.
  • Identity-Aware Proxies: Tools like GCP's Identity-Aware Proxy are mentioned as a means to outsource security verification.

The Future and Broader Implications

The conversation touches on the long-term implications of this traffic for the internet.

  • IPv6 Complexity: The potential for IPv6 to exacerbate blocking challenges is noted, with suggestions of blocking larger prefix ranges. "For ipv6 you just start nuking /64s and /48s if they're really rowdy." - snerbles.
  • The "Open Web" in Decline: Concerns are raised about the erosion of an open internet due to the increasing need for authentication and restrictions. "The open, anonymous web is on the verge of extinction." - rglullis.
  • Trustless Systems and Blockchain: A tangential discussion emerges on the role of trustless systems and blockchain in managing access and identity in a more hostile internet environment.
  • Corporate Responsibility and Capitalism: The economic incentives behind these security measures are debated, with participants questioning whether cost-saving by offloading externalities onto users is truly "capitalism working as intended."
  • Lack of Collaboration: The lack of shared data or cooperative efforts to combat bots is lamented. "What I’d really love to see - but probably never will—is companies joining forces to share data or support open projects like Common Crawl. That would raise the floor for everyone. But, you know… capitalism, so instead we all reinvent the wheel in our own silos." - niczem.

In essence, the discussion reflects a growing sense of a "cyberpunk dystopia" where the open internet is becoming increasingly difficult to navigate due to automated threats, leading to a fragmentation and securitization of access that impacts legitimate users and the very principles of an open global network.