SEOMOZ crawler is still crawling a subdomain despite disallow
-
This is for our client with a subdomain. We only want to analyze their main website as this is the one we want to SEO. The subdomain is not optimized so we know it's bound to have lots of errors.
We added the disallow code when we started and it was working fine. We only saw the errors for the main domain and we were able to fix them. However, just a month ago, the errors and warnings spiked up and the errors we saw were for the subdomain.
As far as our web guys are concerned. the disallow code is still there and was not touched. User-agent: rogerbot Disallow: /
We would like to know if there's anything we might have unintentionally changed or something we need to do so that the SEOMOZ crawler will stop going through the subdomain.
Any help is greatly appreciated!
-
Thanks Peter for your assistance.
Hope to hear from the SEOMOZ team soon with regards to this issue.
-
John,
Thanks for writing in! I would like to take a look at which project you guys were working with that this is happening. I will go ahead and start a ticket so we can better answer your questions You should hear from me soon!
Best,
Peter
Moz Help Team. -
I have heard of this before recently, I think possibly the moz crawler all or sometimes now just ignores the disallow because it is not a usual S.E crawler.
Hopefully one of the staff can provide some insight in this for you.
All the best.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Disallowed "Search" results with robots.txt and Sessions dropped
Hi
Intermediate & Advanced SEO | | Frankie-BTDublin
I've started working on our website and I've found millions of "Search" URL's which I don't think should be getting crawled & indexed (e.g. .../search/?q=brown&prefn1=brand&prefv1=C.P. COMPANY|AERIN|NIKE|Vintage Playing Cards|BIALETTI|EMMA PAKE|QUILTS OF DENMARK|JOHN ATKINSON|STANCE|ISABEL MARANT ÉTOILE|AMIRI|CLOON KEEN|SAMSONITE|MCQ|DANSE LENTE|GAYNOR|EZCARAY|ARGOSY|BIANCA|CRAFTHOUSE|ETON). I tried to disallow them on the Robots.txt file, but our Sessions dropped about 10% and our Average Position on Search Console dropped 4-5 positions over 1 week. Looks like over 50 Million URL's have been blocked, and all of them look like all of them are like the example above and aren't getting any traffic to the site. I've allowed them again, and we're starting to recover. We've been fixing problems with getting the site crawled properly (Sitemaps weren't added correctly, products blocked from spiders on Categories pages, canonical pages being blocked from Crawlers in robots.txt) and I'm thinking Google were doing us a favour and using these pages to crawl the product pages as it was the best/only way of accessing them. Should I be blocking these "Search" URL's, or is there a better way about going about it??? I can't see any value from these pages except Google using them to crawl the site.0 -
Why is a canonicalized URL still in index?
Hi Mozers, We recently canonicalized a few thousand URLs but when I search for these pages using the site: operator I can see that they are all still in Google's index. Why is that? Is it reasonable to expect that they would be taken out of the index? Or should we only expect that they won't rank as high as the canonical URLs? Thanks!
Intermediate & Advanced SEO | | yaelslater0 -
Blog - subdomain vs. subfolderq
Hi everyone I work on an ecommerce site and I'm trying to get more content together for the site & blog. The development team want to put the blog we have on a subdomain of our site, my question is - what is better for SEO Subfolder vs. subdomain I've read a couple of articles to say subfolder is better and a subdomain needs a lot of management to build up authority itself? Thanks!
Intermediate & Advanced SEO | | BeckyKey0 -
Should I set a max crawl rate in Webmaster Tools?
We have a website with around 5,000 pages and for the past few months we've had our crawl rate set to maximum (we'd just started paying for a top of the range dedicated server at the time, so performance wasn't an issue). Google Webmaster Tools has alerted me this morning that the crawl rate has expired so I'd have to manually set the rate again. In terms of SEO, is having a max rate a good thing? I found this post on Moz, but it's dated from 2008. Any thoughts on this?
Intermediate & Advanced SEO | | LiamMcArthur0 -
Duplicate Page Content Errors on Moz Crawl Report
Hi All, I seem to be losing a 'firefighting' battle with regards to various errors being reported on the Moz crawl report relating to; Duplicate Page Content Missing Page Title Missing Meta Duplicate Page Title While I acknowledge that some of the errors are valid (and we are working through them), I find some of them difficult to understand... Here is an example of a 'duplicate page content' error being reported; http://www.bolsovercruiseclub.com (which is obviously our homepage) Is reported to have 'duplicate page content' compared with the following pages; http://www.bolsovercruiseclub.com/guides/gratuities http://www.bolsovercruiseclub.com/cruise-deals/cruise-line-deals/holland-america-2014-offers/?order_by=brochure_lead_difference http://www.bolsovercruiseclub.com/about-us/meet-the-team/craig All 3 of those pages are completely different hence my confusion... This is just a solitary example, there are many more! I would be most interested to hear what people's opinions are... Many thanks Andy
Intermediate & Advanced SEO | | TomKing0 -
URL Parameter Being Improperly Crawled & Indexed by Google
Hi All, We just discovered that Google is indexing a subset of our URL’s embedded with our analytics tracking parameter. For the search “dresses” we are appearing in position 11 (page 2, rank 1) with the following URL: www.anthropologie.com/anthro/category/dresses/clothes-dresses.jsp?cm_mmc=Email--Anthro_12--070612_Dress_Anthro-_-shop You’ll note that “cm_mmc=Email” is appended. This is causing our analytics (CoreMetrics) to mis-attribute this traffic and revenue to Email vs. SEO. A few questions: 1) Why is this happening? This is an email from June 2012 and we don’t have an email specific landing page embedded with this parameter. Somehow Google found and indexed this page with these tracking parameters. Has anyone else seen something similar happening?
Intermediate & Advanced SEO | | kevin_reyes
2) What is the recommended method of “politely” telling Google to index the version without the tracking parameters? Some thoughts on this:
a. Implement a self-referencing canonical on the page.
- This is done, but we have some technical issues with the canonical due to our ecommerce platform (ATG). Even though page source code looks correct, Googlebot is seeing the canonical with a JSession ID.
b. Resubmit both URL’s in WMT Fetch feature hoping that Google recognizes the canonical.
- We did this, but given the canonical issue it won’t be effective until we can fix it.
c. URL handling change in WMT
- We made this change, but it didn’t seem to fix the problem
d. 301 or No Index the version with the email tracking parameters
- This seems drastic and I’m concerned that we’d lose ranking on this very strategic keyword Thoughts? Thanks in advance, Kevin0 -
Why is my site not getting crawled by google?
Hi Moz Community, I have an escort directory website that is built out of ajax. We basically followed all the recommendations like implementing the escaped fragment code so Google would be able to see the content. Problem is whenever I submit my sitemap on Google webmastertool it always 700 had been submitted and only 12 static pages had been indexed. I did the site query and only a number of pages where indexed. Does it have anything to do with my site being on HTTPS and not on HTTP? My site is under HTTPS and all my content is ajax based. Thanks
Intermediate & Advanced SEO | | en-gageinc0 -
No reply from Google despite reconsideration and hard work!
Hi everyone, I hope someone can help. The site www.danbro.co.uk suffered a manually penalty in April. It had a number of poor quality sitewide blog links and the link profile looked (and still does despite a massive improvement) unnatural. The work to over turn the penalty Since then 80% of the spammy blog links have been taken down. the other 20% of sites simply do no respond to requests (these sites were documented). The webmaster also has started building up more natural links as 95% of the anchor text could be classed as 'money keywords'.A low % of links included the brand name at all. A number of requests have been sent ( i told the site manager this should have been avoided but hay-ho). The last request of which i assisted them with was sent last week. The reconsideration was extremely detailed and documented all link removals and links that were still live yet poor quality. some questions Its been a few months since the penalty - should a brand new site be launched? If option '1' was to be implemented, is a 302 re-direct a feasible option as the site's content is vast? will this pass on the penalty? in your opinion Is the websites link profile the root of this penalty? If point '3' is true would you a) start a fresh site or b) continue working on balancing the link profile? Any help or case studies would be tremendous thanks in advance
Intermediate & Advanced SEO | | Townpages0