SEOMOZ crawler is still crawling a subdomain despite disallow
-
This is for our client with a subdomain. We only want to analyze their main website as this is the one we want to SEO. The subdomain is not optimized so we know it's bound to have lots of errors.
We added the disallow code when we started and it was working fine. We only saw the errors for the main domain and we were able to fix them. However, just a month ago, the errors and warnings spiked up and the errors we saw were for the subdomain.
As far as our web guys are concerned. the disallow code is still there and was not touched. User-agent: rogerbot Disallow: /
We would like to know if there's anything we might have unintentionally changed or something we need to do so that the SEOMOZ crawler will stop going through the subdomain.
Any help is greatly appreciated!
-
Thanks Peter for your assistance.
Hope to hear from the SEOMOZ team soon with regards to this issue.
-
John,
Thanks for writing in! I would like to take a look at which project you guys were working with that this is happening. I will go ahead and start a ticket so we can better answer your questions You should hear from me soon!
Best,
Peter
Moz Help Team. -
I have heard of this before recently, I think possibly the moz crawler all or sometimes now just ignores the disallow because it is not a usual S.E crawler.
Hopefully one of the staff can provide some insight in this for you.
All the best.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Old product URLs still indexed and maybe causing problems?
Hi all, Need some expertise here: We recently (3 months ago) launched a newly updated site with the same domain. We also added an SSL and dropped the www (with proper redirects). We went from http://www.mysite.com to https://mysite.com. I joined the company about a week after launch of the new site. All pages I want indexed are indexed, on the sitemap and submitted (submitted in July but processes regularly). When I check site:mysite.com everything is there, but so are pages from the old site that are not on the sitemap. These do have 301 redirects. I am finding our non-product pages are ranking with no problem (including category pages) but our product pages are not, unless I type in the title almost exactly. We 301 redirected all old urls to new comparable product, or if the product is not available anymore to the home page. For better or worse, as it turns out and prior to my arrival, in building the new site the team copied much of the content (descriptions, reviews, etc) from the old site to create the new product pages. After some frustration and research I am finding the old pages are still indexed and possibly causing a duplicate content issue. Now, I gather there is supposedly no "penalty", per se, for duplicate content but a page or site will simply not show in the SERPs. Understandable and this seems to be the case. We also sell a lot of product wholesale and it turns out many dealers are using the same descriptions we have (and have had) on our site. Some are much larger than us so I'd expect to be pushed down a bit but we don't even show in the top 10 pages...for our own product. How long will it take for Google to drop the old and rank the new as unique? I have re-written some pages but much is technical specifications and tough to paraphrase or re-write. I know I could do this in Search Console but I don't have access to the old site any longer. Should I remove the 301s a few at a time and see if the old get dropped faster? Maybe just re-write ALL the content? Wait? As a site note, I'm also on a Drupal CMS with a Shopify ecommerce module so maybe the shop.mysite.com vs mysite.com is throwing it off with the products(?) - (again the Drupal non-product AND category pages rank fine). Thoughts on this would be much appreciated. Thx so much!
Intermediate & Advanced SEO | | mcampanaro0 -
Unrelated subdomain hurts domain rankins?
Hi All, One of our subdomains has lot of content created by different users and mostly they are outgoing links from landing pages. Moreover the top ranking content is about "cigarettes" which is nowhere related to our niche. Will this hurt our domain rankings?
Intermediate & Advanced SEO | | vtmoz0 -
Subdomain vs totally new domain
Problem: Our organization publish maps for public viewing using google maps. We are currently getting limited value from these links. We need to separate our public and private maps for infrastructure purposes, and are weighing up the strengths and weaknesses of separating by domain or sub domain with regards SEO and infrastructure. Current situation: maps.mycompany.com currently has a page authority of 30 and mycompany.com has a domain authority of 39. We are currently only getting links from 8 maps which are shared via social media whereas most people embed our maps on their website using an iframe which I believe doesn't do us any favour with SEO. We currently have approx 3K public maps. Question: What SEO impact can you see if we move our public maps from the subdomain maps.mycompany.com to mycompanypublicmaps.com? Thanks in advance for your help and happy to give more info if you need it!
Intermediate & Advanced SEO | | eSpatial0 -
What referrer is shown in http request when google crawler visit a page?
Is it legit to show different content to http request having different referrer? case a: user view one page of the site with plenty of information about one brand, and click on a link on that page to see a product detail page of that brand, here I don't want to repeat information about the brand itself case b: a user view directly the product detail page clicking on a SERP result, in this case I would like to show him few paragraph about the brand Is it bad? Anyone have experience in doing it? My main concern is google crawler. Should not be considered cloaking because I am not differentiating on user-agent bot-no-bot. But when google is crawling the site which referrer will use? I have no idea, does anyone know? When going from one link to another on the website, is google crawler leaving the referrer empty?
Intermediate & Advanced SEO | | max.favilli0 -
Why is my site not getting crawled by google?
Hi Moz Community, I have an escort directory website that is built out of ajax. We basically followed all the recommendations like implementing the escaped fragment code so Google would be able to see the content. Problem is whenever I submit my sitemap on Google webmastertool it always 700 had been submitted and only 12 static pages had been indexed. I did the site query and only a number of pages where indexed. Does it have anything to do with my site being on HTTPS and not on HTTP? My site is under HTTPS and all my content is ajax based. Thanks
Intermediate & Advanced SEO | | en-gageinc0 -
How to Disallow Tag Pages With Robot.txt
Hi i have a site which i'm dealing with that has tag pages for instant - http://www.domain.com/news/?tag=choice How can i exclude these tag pages (about 20+ being crawled and indexed by the search engines with robot.txt Also sometimes they're created dynamically so i want something which automatically excludes tage pages from being crawled and indexed. Any suggestions? Cheers, Mark
Intermediate & Advanced SEO | | monster990 -
Rebuilding a site with pre-existing high authority subdomains
I'm rebuilding a real estate website with 4 subdomains that have Page Authorities between 45 and 50. Since it's a real estate website it has 20,000+ pages of unique (listing) content PER sub-domain. The subdomains are structured like: washington.xyzrealty.com and california.xyzrealty.com. The root domain has a ~50 Page Authority. The site is about 7 years old. My preference is to focus all of my efforts on the primary domain going forward, but I don't want to waste the power of the subdomains. I'm considering: 1. Putting blogs or community/city pages on the subdomains 2. 301 redirecting all of the existing pages to matching pages on the new root domain. 3. Any other ideas??
Intermediate & Advanced SEO | | jonathanwashburn0 -
SEOMOZ duplicate page result: True or false?
SEOMOZ say's: I have six (6) duplicate pages. Duplicate content tool checker say's (0) On the physical computer that hosts the website the page exists as one file. The casing of the file is irrelevant to the host machine, it wouldn't allow 2 files of the same name in the same directory. To reenforce this point, you can access said file by camel-casing the URI in any fashion (eg; http://www.agi-automation.com/Pneumatic-grippers.htm). This does not bring up a different file each time, the server merely processes the URI as case-less and pulls the file by it's name. What is happening in the example given is that some sort of indexer is being used to create a "dummy" reference of all the site files. Since the indexer doesn't have file access to the server, it does this by link crawling instead of reading files. It is the crawler that is making an assumption that the different casings of the pages are in fact different files. Perhaps there is a setting in the indexer to ignore casing. So the indexer is thinking that these are 2 different pages when they really aren't. This makes all of the other points moot, though they would certainly be relevant in the case of an actual duplicated page." ****Page Authority Linking Root Domains http://www.agi-automation.com/ 43 82 http://www.agi-automation.com/index.html 25 2 http://www.agi-automation.com/Linear-escapements.htm 21 1 www.agi-automation.com/linear-escapements.htm 16 1 http://www.agi-automation.com/Pneumatic-grippers.htm 30 3 http://www.agi-automation.com/pneumatic-grippers.htm 16 1**** Duplicate content tool estimates the following: www and non-www header response; Google cache check; Similarity check; Default page check; 404 header response; PageRank dispersion check (i.e. if www and non-www versions have different PR).
Intermediate & Advanced SEO | | AGIAutomation0