Webmaster Forum

Go Back   Webmaster Forum > Web Development > Web Hosting Forum > Dedicated Servers

Dedicated Servers Dedicated server help.


Reply
 
Thread Tools Display Modes
Share |
  #1  
Old 02-06-2012, 09:04 PM
vectro's Avatar
vectro vectro is offline
Contributing Member
 
Join Date: 12-29-08
Location: U.S.A.
Posts: 437
iTrader: 0 / 0%
Need mod_security rules to prevent Googlebot from crawling one file

I need mod_security rules to prevent Googlebot from indexing any file named browse.php anywhere on the server, while still allowing Googlebot to access anything else. I figured mod_security will do the trick because it can recognize user-agents and set rules accordingly.

Any ideas?
 
Reply With Quote

Advertisement

Advertisement

  #2  
Old 02-13-2012, 05:07 PM
vectro's Avatar
vectro vectro is offline
Contributing Member
 
Join Date: 12-29-08
Location: U.S.A.
Posts: 437
iTrader: 0 / 0%
I did some research on creating mod_security rules and figured this out myself. Here is a server-wide mod_security rule for the main Apache configuration which will keep Googlebot off of 1 particular file. This only applies if the file shows up in the root directory of a domain, but it applies to all domains on the server.

Code:
<LocationMatch "/file.php"> SecRule REQUEST_HEADERS:User-Agent "@pm Googlebot" "deny,status:403" </LocationMatch>
Change file.php to the name of the file you want to protect. The part that says "Googlebot" can also be changed to any user-agent. It's a pattern match and not an explicit match. This means the full user-agent simply needs to include the word for the rule to apply.
 
Reply With Quote
  #3  
Old 03-06-2012, 04:28 PM
vectro's Avatar
vectro vectro is offline
Contributing Member
 
Join Date: 12-29-08
Location: U.S.A.
Posts: 437
iTrader: 0 / 0%
Quote:
Originally Posted by mountainman View Post
SecRule REQUEST_HEADERS:User-Agent "compatible; Googlebot" "nolog,allow"
If you add this rule it allows to crawl or if you remove this that blocks Google
I'm not exactly referring to allowing/denying all Googlebot traffic. I was only thinking about blocking it from one file.
 
Reply With Quote
  #4  
Old 03-30-2012, 12:00 AM
seriesn seriesn is offline
Member
 
Join Date: 06-06-11
Posts: 44
iTrader: 0 / 0%
Quote:
Originally Posted by vectro View Post
I'm not exactly referring to allowing/denying all Googlebot traffic. I was only thinking about blocking it from one file.
Assuming you are talking about a Website, where the noted "File" is available, Why not setup a robots.txt file?
Something like this?
Quote:
User-agent: *
Disallow:
Disallow: /file.extension.
 
Reply With Quote
  #5  
Old 03-30-2012, 11:49 PM
vectro's Avatar
vectro vectro is offline
Contributing Member
 
Join Date: 12-29-08
Location: U.S.A.
Posts: 437
iTrader: 0 / 0%
robots.txt only applies to one site. I use the mod_security rule so that it applies to the entire server. There are a lot of sites with this one particular file that tends to confuse Googlebot.
 
Reply With Quote
  #6  
Old 07-30-2016, 02:53 AM
emilyrose9 emilyrose9 is offline
Banned
 
Join Date: 05-12-15
Location: Augusta
Posts: 120
iTrader: 0 / 0%
Make all the crawler IPs collectively in /etc/csf/csf.allow to whitelist them. You can alos try check your audit log and find out which rules blocks Google and then adjust it.
 
Reply With Quote
  #7  
Old 07-30-2016, 05:40 AM
vectro's Avatar
vectro vectro is offline
Contributing Member
 
Join Date: 12-29-08
Location: U.S.A.
Posts: 437
iTrader: 0 / 0%
Quote:
Originally Posted by emilyrose9 View Post
Make all the crawler IPs collectively in /etc/csf/csf.allow to whitelist them.
CSF is firewall, not mod_security.

Quote:
Originally Posted by emilyrose9 View Post
You can alos try check your audit log and find out which rules blocks Google and then adjust it.
That could work, but audit log can be enormous, so make sure to use a filter like grep in Linux.
 
Reply With Quote
Go Back   Webmaster Forum > Web Development > Web Hosting Forum > Dedicated Servers

Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Googlebot blocking in httpd.conf file with some exception... xitdude Google Forum 1 11-13-2012 09:14 AM
Mod_Security FnF Dedicated Servers 3 06-05-2009 03:15 AM
Googlebot Not Crawling GoldDust Google Forum 6 03-22-2004 10:10 AM


V7N Network
Get exposure! V7N I Love Photography V7N SEO Blog V7N Directory


All times are GMT -7. The time now is 04:49 AM.
Powered by vBulletin
Copyright 2000-2014 Jelsoft Enterprises Limited.
Copyright © 2003 - 2018 VIX-WomensForum LLC