A question about Mozbot and a recent crawl on our website.
-
Hi All,
Rogerbot has been reporting errors on our website's for over a year now, and we correct the issues as soon as they are reported.
However I have 2 questions regarding the recent crawl report we got on the 8th.
1.) Pages with a "no-index" tag are being crawled by roger and are being reported as duplicate page content errors. I can ignore these as google doesnt see these pages, but surely roger should ignore pages with "no-index" instructions as well? Also, these errors wont go away in our campaign until Roger ignores the URL's.
2.) What bugs me most is that resource pages that have been around for about 6 months have only just been reported as being duplicate content. Our weekly crawls have never picked up these resources pages as being a problem, why now all of a sudden? (Makes me wonder how extensive each crawl is?)
Anyone else had a similar problem?
Regards
GREG
-
Its pretty big
Over 1000 Pages in the index, and many more internal URLs to crawl that have a no-index tag. (booking forms etc)
Ill see if we can archive our other campaigns and let roger crawl our main site properly.
-
How big is your website Greg ?
-
Thanks Nakul,
I do a weekly scan with Xenu which doesn't have a URL limit like SF.
I was under the impression a full scan of the site was done each week, but as you say, its being scanned in chunks, divided across our 3 other websites.
If this is the case, it would be great to let Mozbot know were to crawl to avoid unnecessary resources being used up when it could be scanning our most important pages.
Greg
-
Greg The crawl is limited to 10,000 (Total) for all your 5 campaigns. As far as whether or not Roger-Bot should ignore Noindex - Here's what I think - I think the intent of that tool here is to find issue. In this scenario, Roger bot is making sure you are aware of the fact that some of those pages have a noindex. Roger does not know whether it's intentional or not. You can also do a deeper crawl and do a deep dive into your website by using Screaming Frog SEO Spider http://www.screamingfrog.co.uk/seo-spider/ It does a great job of doing a deep crawl when you want it since it's a desktop software and you can set all sorts of options and identify issues.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Tools necessary for a Technical Audit of website with penalties and need remediation?
Tools necessary for a Technical Audit of website with penalties and needs remediation? I am being tested for a job interview to prove and/or disprove a website has issues. I am familiar with Moz tools but I'm not sure of the procedure for this request? I am not finding anything online. The client will be giving a website and I will be doing this audit. What tools would you use? What exactly should I be looking for? What are some obvious fixes? WHERE CAN I LEARN MORE?
Moz Pro | | Joseph.Lusso0 -
Website Issues - Duplicate Content
Hello, I'm fairly new to using Moz and I logged on this morning to find Issues have been found in one of the websites - 22 High Priority and 44 Medium. I know it's due to duplicate content in the blog, but i can't figure out what is duplicated? I've only recently come on board this website so I don't know if the content has been plagiarised or what? The link to the site is here: delacyspa.co.uk Any help would be appreciated. Thanks zFxQmmd
Moz Pro | | Cowbang0 -
Question about Crawl Diagnostics - 4xx (Client Error) report
Hi here, I was wondering if there is a way to find out the originating page where a broken link is found from the 4xx (Client Error) report. I can't find a way to know that, and without that information is very difficult for me to fix any possible 404 related issues on my website. Any thoughts are very welcome! Thank you in advance.
Moz Pro | | fablau0 -
Why is my domain not being crawled anymore?
I just noticed that right around 12/1/2012, SEOMoz stopped crawling all but two pages out of the 400 or so on my website at www.TrustworthyCare.com . I speculate that this is probably due to some dumb mistake I made at that time, but I can't for the life of me figure out what that mistake was. Before that, the weekly crawls included all 400 or so pages. I wonder whether it's something that changed in our .htaccess file. Here's how that file looks now; can anyone see what is wrong there, or perhaps offer other suggestions if it doesn't look like anything is wrong in it? Thanks! Tim PS - I'm a small business owner, not an SEO or software engineer. PPS - I found and read this page, but I've pretty much tried the things described there (I think): https://seomoz.zendesk.com/entries/409821-why-isn-t-my-site-being-crawled-you-re-not-crawling-all-my-pages ================================= RewriteCond %{HTTP_HOST} ^aservantsheartcare.com$ [OR]RewriteCond %{HTTP_HOST} ^www.aservantsheartcare.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^aservantsheartcaremanagement.com$ [OR]RewriteCond %{HTTP_HOST} ^www.aservantsheartcaremanagement.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^aservantsheartgeriatriccare.com$ [OR]RewriteCond %{HTTP_HOST} ^www.aservantsheartgeriatriccare.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^aservantsheartgeriatriccaremanagement.com$ [OR]RewriteCond %{HTTP_HOST} ^www.aservantsheartgeriatriccaremanagement.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^aservantshearthomecare.com$ [OR]RewriteCond %{HTTP_HOST} ^www.aservantshearthomecare.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^aservantsheartseniorcare.com$ [OR]RewriteCond %{HTTP_HOST} ^www.aservantsheartseniorcare.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^aservantsheartservices.com$ [OR]RewriteCond %{HTTP_HOST} ^www.aservantsheartservices.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^careforparents.com$ [OR]RewriteCond %{HTTP_HOST} ^www.careforparents.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^eldercareradio.com$ [OR]RewriteCond %{HTTP_HOST} ^www.eldercareradio.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^helpforyourparents.com$ [OR]RewriteCond %{HTTP_HOST} ^www.helpforyourparents.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^privatedutyseniorcare.com$ [OR]RewriteCond %{HTTP_HOST} ^www.privatedutyseniorcare.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^sandiegocaremanagement.com$ [OR]RewriteCond %{HTTP_HOST} ^www.sandiegocaremanagement.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^sandiegocaremanager.com$ [OR]RewriteCond %{HTTP_HOST} ^www.sandiegocaremanager.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^sandiegogeriatriccaremanagement.com$ [OR]RewriteCond %{HTTP_HOST} ^www.sandiegogeriatriccaremanagement.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^sandiegogeriatriccaremanager.com$ [OR]RewriteCond %{HTTP_HOST} ^www.sandiegogeriatriccaremanager.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^servantsheartcare.com$ [OR]RewriteCond %{HTTP_HOST} ^www.servantsheartcare.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^servantshearthomecare.com$ [OR]RewriteCond %{HTTP_HOST} ^www.servantshearthomecare.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^servantsheartseniorcare.com$ [OR]RewriteCond %{HTTP_HOST} ^www.servantsheartseniorcare.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^tlccare.com$ [OR]RewriteCond %{HTTP_HOST} ^www.tlccare.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^tlcseniorcenter.com$ [OR]RewriteCond %{HTTP_HOST} ^www.tlcseniorcenter.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^tlcseniorhomecare.com$ [OR]RewriteCond %{HTTP_HOST} ^www.tlcseniorhomecare.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] RewriteCond %{HTTP_HOST} ^tlcseniorservices.com$ [OR]RewriteCond %{HTTP_HOST} ^www.tlcseniorservices.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] #php_value upload_max_filesize 8MRewriteCond %{HTTP_HOST} ^trustworthycare.com$RewriteRule ^(.)$ "http://www.trustworthycare.com/$1" [R=301,L] RewriteCond %{HTTP_REFERER} !^$RewriteCond %{HTTP_REFERER} !^http://blog.trustworthycare.com/.$ [NC]RewriteCond %{HTTP_REFERER} !^http://blog.trustworthycare.com$ [NC]RewriteCond %{HTTP_REFERER} !^http://test.trustworthycare.com/.$ [NC]RewriteCond %{HTTP_REFERER} !^http://test.trustworthycare.com$ [NC]RewriteCond %{HTTP_REFERER} !^http://trustworthycare.com/.$ [NC]RewriteCond %{HTTP_REFERER} !^http://trustworthycare.com$ [NC]RewriteCond %{HTTP_REFERER} !^http://www.blog.trustworthycare.com/.$ [NC]RewriteCond %{HTTP_REFERER} !^http://www.blog.trustworthycare.com$ [NC]RewriteCond %{HTTP_REFERER} !^http://www.test.trustworthycare.com/.$ [NC]RewriteCond %{HTTP_REFERER} !^http://www.test.trustworthycare.com$ [NC]RewriteCond %{HTTP_REFERER} !^http://www.trustworthycare.com/.$ [NC]RewriteCond %{HTTP_REFERER} !^http://www.trustworthycare.com$ [NC]RewriteCond %{HTTP_REFERER} !^http://www.trustworthycare.com/images/files_for_service_inquiries/.$ [NC]RewriteCond %{HTTP_REFERER} !^http://www.trustworthycare.com/images/files_for_service_inquiries$ [NC]RewriteCond %{HTTP_REFERER} !^http://sandbox.trustworthycare.com/.$ [NC]RewriteCond %{HTTP_REFERER} !^http://sandbox.trustworthycare.com$ [NC]RewriteRule ..(jpg|jpeg|gif|png|bmp)$ - [F,NC] RewriteCond %{HTTP_HOST} ^ashsc.com$ [OR]RewriteCond %{HTTP_HOST} ^www.ashsc.com$RewriteRule ^/?$ "http://trustworthycare.com/" [R=301,L] # BEGIN W3TC Browser Cache BrowserMatch ^Mozilla/4 gzip-only-text/html BrowserMatch ^Mozilla/4.0[678] no-gzip BrowserMatch \bMSIE !no-gzip !gzip-only-text/html BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html Header append Vary User-Agent env=!dont-vary AddOutputFilterByType DEFLATE text/css application/x-javascript text/x-component text/html text/richtext image/svg+xml text/plain text/xsd text/xsl text/xml image/x-icon <filesmatch ".(css|js|htc|css|js|htc)$"=""></filesmatch> FileETag None Header set X-Powered-By "W3 Total Cache/0.9.2.5" <filesmatch ".(html|htm|rtf|rtx|svg|svgz|txt|xsd|xsl|xml|html|htm|rtf|rtx|svg|svgz|txt|xsd|xsl|xml)$"=""></filesmatch> FileETag None Header set X-Powered-By "W3 Total Cache/0.9.2.5" <filesmatch ".(asf|asx|wax|wmv|wmx|avi|bmp|class|divx|doc|docx|eot|exe|gif|gz|gzip|ico|jpg|jpeg|jpe|mdb|mid|midi|mov|qt|mp3|m4a|mp4|m4v|mpeg|mpg|mpe|mpp|otf|odb|odc|odf|odg|odp|ods|odt|ogg|pdf|png|pot|pps|ppt|pptx|ra|ram|svg|svgz|swf|tar|tif|tiff|ttf|ttc|wav|wma|wri|xla|xls|xlsx|xlt|xlw|zip|asf|asx|wax|wmv|wmx|avi|bmp|class|divx|doc|docx|eot|exe|gif|gz|gzip|ico|jpg|jpeg|jpe|mdb|mid|midi|mov|qt|mp3|m4a|mp4|m4v|mpeg|mpg|mpe|mpp|otf|odb|odc|odf|odg|odp|ods|odt|ogg|pdf|png|pot|pps|ppt|pptx|ra|ram|svg|svgz|swf|tar|tif|tiff|ttf|ttc|wav|wma|wri|xla|xls|xlsx|xlt|xlw|zip)$"=""></filesmatch> FileETag None Header set X-Powered-By "W3 Total Cache/0.9.2.5" # END W3TC Browser Cache# BEGIN W3TC Page Cache core RewriteEngine On RewriteBase / RewriteRule ^(./)?w3tc_rewrite_test$ $1?w3tc_rewrite_test=1 [L] RewriteCond %{HTTP:Accept-Encoding} gzip RewriteRule . - [E=W3TC_ENC:gzip] RewriteCond %{REQUEST_METHOD} !=POST RewriteCond %{QUERY_STRING} ="" RewriteCond %{HTTP_HOST} =www.trustworthycare.com RewriteCond %{REQUEST_URI} /$ [OR] RewriteCond %{REQUEST_URI} (sitemap(index)?.xml(.gz)?|[a-z0-9-]+-sitemap([0-9]+)?.xml(.gz)?) [NC] RewriteCond %{REQUEST_URI} !(/wp-admin/|/xmlrpc.php|/wp-(app|cron|login|register|mail).php|/feed/|wp-.*.php|index.php) [NC,OR] RewriteCond %{REQUEST_URI} (wp-comments-popup.php|wp-links-opml.php|wp-locations.php) [NC] RewriteCond %{HTTP_COOKIE} !(comment_author|wp-postpass|wordpress[a-f0-9]+|wordpress_logged_in) [NC] RewriteCond %{HTTP_USER_AGENT} !(W3\ Total\ Cache/0.9.2.5) [NC] RewriteCond "%{DOCUMENT_ROOT}/sitectrl/wp-content/w3tc/pgcache/%{REQUEST_URI}/_index%{ENV:W3TC_UA}%{ENV:W3TC_REF}%{ENV:W3TC_SSL}.html%{ENV:W3TC_ENC}" -f RewriteRule .* "/sitectrl/wp-content/w3tc/pgcache/%{REQUEST_URI}/_index%{ENV:W3TC_UA}%{ENV:W3TC_REF}%{ENV:W3TC_SSL}.html%{ENV:W3TC_ENC}" [L]# END W3TC Page Cache core# BEGIN WordPressRewriteEngine OnRewriteBase /RewriteRule ^index.php$ - [L]RewriteCond %{REQUEST_FILENAME} !-fRewriteCond %{REQUEST_FILENAME} !-dRewriteRule . /index.php [L] # END WordPressRewriteCond %{HTTP_HOST} ^privatedutycare.com$ [OR]RewriteCond %{HTTP_HOST} ^www.privatedutycare.com$RewriteRule ^/?$ "http://www.ageassistance.com" [R=301,L] =================================
Moz Pro | | tcolling0 -
Errors on my Crawl Diagnostics
I have 51 errors on my Crawl Diagnostics tool.46 are 4xx Client Error.Those 4xx errors are links to products (or categories) that we are not selling them any more so there are inactive on the website but Google still have the links. How can I tell Google not to index them?. Can those errors (and warnings) could be harming my rankings (they went down from position 1 to 4 for the most important keywords) thanks,
Moz Pro | | cardif0 -
Trying to understand how a website is getting higher than me with less links
Hi i am new to seo and trying to teach myself the best way to improve a site and the best way to use the tools on seomoz. The problem i have is. i am working on a page at the moment on a site. the page is called weight loss hypnotherapy http://www.clairehegarty.co.uk/weight-loss-hypnotherapy and i have around 130 links going to the page where as a site which is much higher than me in google has only around 5 links. I cannot understand with being new to seo how this can happen, can anyone please explain what i need to do to improve my ranking please. here is the site i am talking about that is higher than me www.weightlosshypnotherapy.co.uk/ any help would be great can anyone also give me a good example of a page before it has been optimised and a page after it has had this done.
Moz Pro | | ClaireH-1848860 -
Ruling out subfolders in pro tool crawl
Is there a way to "rule out" a subfolder in the pro dashboard site crawl? We're working on a site that has 500,000+ pages in the forums, but its the CMS pages we're optimizing and don't want to spend the 10k limit on forum pages.
Moz Pro | | DeepRipples0 -
Site is showing forwarded /301 to another website
My site http://riyas.in is showing a 301 redirect or a forward to http://flicker.com/muhammedriyas . I had done a 301 redirect long before from my site to this domain, but i removed that after 2-3 days. Please help me to solve this problem. I attached a screen shot seomoz1.jpg
Moz Pro | | riyas_0