prevent seo bots

How To Prevent SEO Bots From Crawling Your Site

Sometimes you may need to prevent SEO bots from crawling your website, if you don’t want its content to be indexed online. Here’s how to prevent SEO bots from crawling your site using robots.txt file. You can also use these steps to stop all  spam bots and malicious bots from crawling your website.

 

How To Prevent SEO Bots From Crawling Your Site

Here are the steps to prevent SEO bots from crawling your site using robots.txt file.

 

What is robots.txt?

Robots.txt is a text file that contains crawling instructions for incoming bots. Search bots, spam bots and other bots look for this file before they crawl your website. They proceed depending on the instructions present in this file. Robots.txt must be served at www.yourdomain.com/robots.txt URL. So if your website is www.helloworld.com, then robots.txt should be served at www.helloworld.com/robots.txt

You can use robots.txt to tell search bots to not crawl your entire website, or specific folders and pages in it.

There are quite a few rules available to instruct crawl bots. The most common ones are:

  • User-agent: Search bots user User-agent attribute to identify yourself. You can allow/disallow crawl bots by mentioning their user agent names.
  • Disallow: Specifies the files or folders that are not allowed to be crawled.
  • Crawl-delay: Specifies the number of seconds a bot should wait before crawling each page
  • Wildcard (*): Used to mean all bots

Bonus Read : NGINX SSL Configuration (Step by Step)

 

How to Prevent Search Bots from crawling your website

We will look at a few examples to disallow robots from crawling your site. Here are the user-agent names of common bots for your reference – Googlebot, Yahoo!, Slurp bingbot, AhrefsBot, Baiduspider, Ezooms, MJ12bot, YandexBot

 

Disallow all search engines to crawl website

Here’s what you need to add to your robots.txt file if you want to disallow all bots from crawling your website

User-agent: *
Disallow: /

In the above configuration, we use a wildcard * for user-agent rule to disallow all in robots.txt. We use home url (/) in Disallow rule to specify entire website.

In this case, we disallow all bots from crawling our entire website.

Bonus Read : Linux List All Processes by Name, User, PID

 

Allow all search engines to crawl website

Here’s what you need to add to your robots.txt file if you want to allow all bots from crawling your website

User-agent: *
Disallow:

In the above configuration, we use a wildcard * for user-agent to specify all crawl bots. We leave the Disallow rule as blank.

In this case, we allow all bots from crawling our entire website.

Bonus Read : How to Prevent Image Hotlinking in NGINX

 

Disallow One Specific Search Engine from crawling website

If you want to disallow only one specific crawl bot from crawling your website, mention its user name in User-name rule

User-agent: BaiduSpider
Disallow: /

Bonus Read : How to List all virtual hosts in Apache

 

Disallow All Search Engines from Crawling specific folders

If you want to disallow all search engines from crawling specific folders (e.g /product, /uploads), mention them separately in Disallow rule

User-agent: *
Disallow: /uploads
Disallow: /product

 

Disallow All Search Engines from Crawling specific files

If you want to disallow all search engines from crawling specific files (e.g /signup.html, /payment.php), mention them separately in Disallow rule

User-agent: *
Disallow: /signup.html
Disallow: /payment.php

 

You can always use a combination of the above configurations in your robots.txt file.

Hopefully, now you can easily disallow SEO bots from crawling your website.

Ubiq makes it easy to visualize data in minutes, and monitor in real-time dashboards. Try it today!

mm

About Ubiq

Ubiq is a powerful dashboard & reporting platform for small & medium businesses. Build dashboards, charts & reports for your business in minutes. Get insights from data quickly. Try it for free today!