To Block or Not Block: Good Bots vs Bad Bots: Full Guide

Internet bots, or just bots, are essentially tools for us human users, so they are inherently not good nor bad. While in recent years these bots have gained notoriety since they are often used by hackers and cybercriminals in various attacks, it’s important to remember that there are actually good bots, operated by reputable companies that are not only beneficial but can be integral to a website’s operation.

For any website that is relying on search engine traffic, for example, Googlebot and Bingbot, among other search engine bots, are very important. Similarly, there are now various websites that rely on chatbots in providing their online customer services. 

So, when managing bot traffic on our website and server, there is always one key dilemma: to block or not block, due to two core challenges of bot management.

Key Challenges of Bot Management

Can’t we simply just identify all non-human traffic and block them all? While block management might sound relatively simple on the surface, there are actually two main challenges creating the dilemma of whether to block or not to block: 

 

  • Good Bots VS Bad Bots

 

Since, as discussed, some good bots are not only beneficial but can be essential for our website’s daily operation, we wouldn’t want to accidentally block these good bots. 

Good bots are typically owned by reputable companies (think Google, Amazon, Facebook, etc.) and typically they will follow your site’s rules and policies, which you can determine via robots.txt or .httaccess, among other means. 

The thing is, a lot of malicious bots are disguising themselves as good bots, pretending to be operated by these reputable companies, using their unique fingerprints. So, identifying between good bots and bad bots can be very complicated.

 

  • False positives: blocking legitimate users

 

Another core challenge is the fact that malicious bots often masquerade themselves as human users. In fact, the latest generation of bots is actively using AI technologies in order to perform human-like behaviors and interactions, like performing non-linear mouse movements, random interaction patterns, and so on.

Also, these bots can now rotate between hundreds if not thousands of IP addresses and user agents while making requests on your site, so the fingerprinting-based detection approaches are rendered ineffective.

On the other hand, accidentally blocking legitimate users can hurt your site and business performance, and can ruin your reputation in the long run. 

Choose the right strategy for effective bot management

There are many different ways to block & manage bots. However, some strategies are more effective than others, depending on your business. 

Below are some effective ways to manage bots and minimize the risks associated with their activities:

 

  • Setting up rules and policies for good bots

 

All proper bot management practices should begin by setting up policies of which page can and cannot be crawled by bots, and which resources are not permitted to be accessed. As briefly discussed above, we can achieve this by setting up robots.txt or .httaccess, among other methods. 

We have to remember that not all good bots are going to be beneficial for our site, and yet they can eat up our resources and slow down our website performance when not managed. For example, if you are not publishing any content in Chinese, you might want to ‘block’ the Baidu bot.

We can also keep a whitelist (allowlist) of bots that are going to benefit our sites, and also a blacklist/blocklist for known malicious bots. Update these lists regularly. 

 

  • Bot detection and management software

A sufficient bot management solution is required in tackling the two challenges discussed above, and the bot management software should be able to: 

  1. Identify between bad bots and good bots effectively
  2. Differentiate between bot traffic and legitimate human traffic
  3. Apply the right action (block, throttling, etc.) according to the bot’s behavior
  4. Use behavioral-based analysis to analyze the bot’s behavior compared to a baseline

An AI-powered account takeover prevention solution like DataDome can use behavioral analysis to detect and manage malicious bots in real-time and autopilot, effectively differentiating good bots from bad bots and managing malicious bot traffic accordingly. 

 

  • Serving fake/modified content

 

A good approach to ‘fool’ the attacker is to serve fake content while keeping them operating on your site. For example, we can redirect the bot to a special page with thin or false content so the attacker will get false information instead. Hopefully, this will slow the attacker enough so they’ll give up and switch to another target. 

 

  • Rate-limiting/throttling

 

Another option rather than black-holing (fully blocking) a bot is to rate-limit the specific client to slow down its operation. Bots are running on a resource, which can be ‘expensive’, and the competition among bot operators are actually very tight. So, when you significantly slow down their operations, they might just give up. 

Conclusion

With both good and bad bots accounting for nearly half of all internet traffic, it’s very important for all businesses with an online presence to start considering a proper practice to manage the bot activities accessing their sites. 

As we can see, blocking the bot traffic isn’t always the best option, and in fact, in many cases, it can be a bad choice due to several reasons. Instead, we have to take a case-by-case approach and manage each bot according to their behaviors and objectives.