Help - we're blocking SEOmoz cawlers
-
We have a fairly stringent blacklist and by the looks of our crawl reports we've begin unintentionally blocking the SEOmoz crawler.
can you guys let me know the useragent string and anything else I need to enable mak sure you're crawlers are whitelisted?
Cheers!
-
Hi Keri,
Still testing, though i see no reason why this shouldn't work so will close the QA ticket.
cheers!
-
Hi! Did this work for you, or would you like our help team to lend a hand?
-
We maintain a crawler (and others) blacklist to control server loads, so I'm just looking for the useragent string I can add to the white list. this one should do the trick;
Mozilla/5.0 (compatible; rogerBot/1.0; UrlCrawler; http://www.seomoz.org/dp/rogerbot)
-
Still way to early for me ;-). I block specific robots rather than excluding all but a few.
I have not tried the following (but think/hope it will work) - this should block all robots, but allow SeoMoz and Google:
User-agent: *
Disallow: /User-agent: rogerbot
Disallow:User-agent: Google
Disallow:You would already have something like this in your robots.txt (unless your block occurs on a network/firewall level).
-
Thanks Gerd, though looks like your robots.txt is a disallow rule, when I'm looking to let it through.
I'll give this one a try: Mozilla/5.0 (compatible; rogerBot/1.0; UrlCrawler; http://www.seomoz.org/dp/rogerbot)
-
I have it as "rogerbot"
<code>User-agent: rogerbot Disallow: /</code>
Access-log: Mozilla/5.0 (compatible; rogerBot/1.0; UrlCrawler; http://www.seomoz.org/dp/rogerbot)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I'm struggling to understand (and fix) why I'm getting a 404 error. The URL includes this "%5Bnull%20id=43484%5D" but I cannot find that anywhere in the referring URL. Does anyone know why please? Thanks
Can you help with how to fix this 404 error please? It appears that I have a redirect from one page to the other, although the referring page URL works, but it appears to be linking to another URL with this code at the end of the the URL - %5Bnull%20id=43484%5D that I'm struggling to find and fix. Thanks
Technical SEO | | Nichole.wynter20200 -
Google has deindexed a page it thinks is set to 'noindex', but is in fact still set to 'index'
A page on our WordPress powered website has had an error message thrown up in GSC to say it is included in the sitemap but set to 'noindex'. The page has also been removed from Google's search results. Page is https://www.onlinemortgageadvisor.co.uk/bad-credit-mortgages/how-to-get-a-mortgage-with-bad-credit/ Looking at the page code, plus using Screaming Frog and Ahrefs crawlers, the page is very clearly still set to 'index'. The SEO plugin we use has not been changed to 'noindex' the page. I have asked for it to be reindexed via GSC but I'm concerned why Google thinks this page was asked to be noindexed. Can anyone help with this one? Has anyone seen this before, been hit with this recently, got any advice...?
Technical SEO | | d.bird0 -
The word 'shop' in a page title
I'm reworking most of the page titles on our site and I'm considering the use of the word 'Shop' before a product category. ex. Shop 'keyword' | Brand Name As opposed to just using the keyword sans 'Shop.' Some of the keywords are very generic, especially for a top level category page. Question: Is the word 'Shop' damaging my SEO efforts in any way?
Technical SEO | | rhoadesjohn0 -
On-Page SEO of the SEOmoz Blog Section
Hey Everyone My brain isn't working (only had 1 cup of coffee so far - #2 on it's way) this morning and I could use some help. We're creating a blog on a site for a client of ours and I've been looking at the SEOmoz blog for best practices when it comes to the implementation of pagination, canonical tags and noindex. My questions: There is no use of the canonical tag on the main blog page or any of the paginated pages but it is being used on blog post pages. Why not use it on the main blog pages as well? I'm assuming because the blog pages are always changing with different content there is not much point? Paginated pages in the category sections i.e. http://www.seomoz.org/blog/category/1?page=2 are noindexed but paginated pages in the main blog section i.e. http://www.seomoz.org/blog?page=2 are not. Is this because of a duplicate content concern since the posts in the category sections are in the main blog section as well? If that's the case, why wouldn't the main category page i.e.http://www.seomoz.org/blog/category/1 be noindexed as well? What's the reason for noindexing the "Show # Posts" pages i.e.http://www.seomoz.org/blog?show=5 ? I'm assuming another concern of duplicate content? Any insights into these questions would be greatly appreciated and would help with the implementation of our clients blog. Thanks, Ken
Technical SEO | | noBulMedia0 -
Warnings for blocked by blocked by meta-robots/meta robots Nofollow...how to resolve?
Hello, I see hundreds of notices for blocked by meta-robots/meta robots nofollow and it appears it is linked to the comments on my site which I assume I would not want to be crawled. Is this the case and these notices are actually a positive thing? Please advise how to clear them up if these notices can be potentially harmful for my SEO. Thanks, Talia
Technical SEO | | M80Marketing0 -
Blocking AJAX Content from being crawled
Our website has some pages with content shared from a third party provider and we use AJAX as our implementation. We dont want Google to crawl the third party's content but we do want them to crawl and index the rest of the web page. However, In light of Google's recent announcement about more effectively indexing google, I have some concern that we are at risk for that content to be indexed. I have thought about x-robots but have concern about implementing it on the pages because of a potential risk in Google not indexing the whole page. These pages get significant traffic for the website, and I cant risk. Thanks, Phil
Technical SEO | | AU-SEO0 -
Ignore url parameters without the 'parameter=' ?
We are working on an ecommerce site that sorts out the products by color and size but doesn't use the sortby= but uses sortby/. Can we tell Google to ignore the sortby/ parameter in Webmaster Tools even though it is not followed by an = sign? For example: www.mysite.com/shirts/tshirts/shopby/size-m www.mysite.com/shirts/tshirts/shopby/color-black Can we tell WMT to ignore the 'shopby/' parameter so that only the tshirts page will be indexed? Or does the shopby have to be set up as 'shopby=' ? Thanks!
Technical SEO | | Hakkasan0 -
Purchasing a site for a 301 Re-direct
Hi Mozzers, I have a question regarding a tactic I'm considering for a client. My client has a web hosting company and is ranking well for his keywords and in position 3 for his main term. There is a site available on flippa that is a keyword rich domain and has a decent link portfolio and domain authority and the price is attractive. I'm considering buying it to 301 it to his domain but I've never done this tactic before. Is this grey/black hat? Has anyone done this before and to what extent did it work? Thanks Bush
Technical SEO | | Bush_JSM0