1

I have the following entry in my Awstats file:

Unknown robot (identified by 'bot*')

How can I block this bot?
I tried the following separately but none of them seems to be catching it:

RewriteCond %{HTTP_USER_AGENT} ^bot* 

RewriteCond %{HTTP_USER_AGENT} bot\* 

RewriteCond %{HTTP_USER_AGENT} bot[*]

Here is the full .htaccess code I am using:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^bot*
RewriteRule .? - [F,L]

Tested three regex values (^bot*, bot\*, bot[*]) in the second line, none of them stopped the bot.

2 Answers 2

3

The asterisk (*) is not literal. AWStats is simply stating that it used that particular rule to check if the request was being made by a bot. In your case, bot* means that the user agent string started with bot, and it found a match.

As the asterisk is not literal, you can use the following instead:

RewriteCond %{HTTP_USER_AGENT} ^bot [OR]  # matches bot* (the same as ^bot.*$)
RewriteCond %{HTTP_USER_AGENT} bot$       # matches *bot (the same as ^.*bot$)

Note: I should say here that it is better to check your access logs to see exactly what these user agents are and block them specifically. You don't want to find yourself in a position whereby you are blocking bots that you might want.


Recommendation: Change your rule from RewriteRule .? - [F,L] to RewriteRule ^ - [F,L]

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Mike. I am using your first line now and I will see if it blocks the bot or not in the following one or two days and update here accordingly. Can you please elaborate your "recommendation" as to why to change the rule like that? I was told that using .? for the regex will match anything (even a blank) and redirect to the fail condition specified.
I put it in small writing because it isn't important. Using ^ just means that the test-string begins with something. Essentially, they're the same, but I think there is a performance boost with my suggestion.
Mike, it seems the bot is still visiting my site. I see in Awstats file that it visited and created hits today. Here is my .htaccess code again: RewriteEngine On RewriteCond %{HTTP_USER_AGENT} ^spider [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^bot [NC] RewriteRule .? - [F,L]
I think that the bots are not able to view the content they're requesting, but AWStats will continue to log the requests. I recommend that you use a user agent tester in your browser to test what is being served to bots.
2

We can block a bots using the bot exact name inside the .htaccess file. Below example definitely will help you, currently i am using the same setup, its saving my server resource.

SetEnvIfNoCase User-Agent "Yandex" bad_bot    
SetEnvIfNoCase User-Agent "AhrefsBot" bad_bot    
SetEnvIfNoCase User-Agent "MJ12bot" bad_bot

<IfModule mod_authz_core.c>
 <Limit GET POST>
  <RequireAll>
   Require all granted
   Require not env bad_bot
  </RequireAll>
 </Limit>
</IfModule>

Let me know if you have any queries.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.