How to remove URLS from from crawl diagnostics blocked by robots.txt
-
I suddenly have a huge jump in the number of errors in crawl diagnostics and it all seems to be down to a load of URLs that should be blocked by robots.txt. These have never appeared before, how do I remove them or stop them appearing again?
-
Hi Simon,
Noindex Follow meta tag sounds like the way to go.
Best to read this first... http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world
Hope this helps.
Justin
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Moz Pro crawl signaling missing canonical which are not?
Hi,
Moz Pro | | rolandvintners
I'm trying MozPro considering using it.
One of the tool which is appealing is the crawl and insights.
After quick use, I really question many of the alerts, for instance, I got a "missing canonical tag" on this url: https://vintners.co/wine/grawu_gto#2020 but when I check my markup, there's clearly a canonical tag: <link rel="canonical" href="https://vintners.co/wine/grawu_gto"> Anybody can explain?
I asked Moz Pro staff when being onboarded but didn't get an answer...
Honestly, I'm questioning the value of these crawls, or may be I miss something?0 -
Special Characters in URL & Google Search Engine (Index & Crawl)
G'd everyone, I need help with understanding how special characters impact SEO. Eg. é , ë ô in words Does anyone have good insights or reference material regarding the treatment of Special Characters by Google Search Engine? how Page Title / Meta Desc with Special Chars are being index & Crawl Best Practices when it comes to URLs - uses of Unicode, HTML entity references - when are where? any disadvantage using special characters Does special characters in URL have any impact on SEO performance & User search, experience. Thanks heaps, Amy
Moz Pro | | LabeliumUSA0 -
Summarize your question.Is it possible to request another unscheduled crawl?
We have just sorted a couple of issues on the website which threw the crawl into spasm and gave us hundreds of hugely long URLs. We are pretty sure that we have corrected this and do not want to wait another week to check what SEOMOZ comes up with. Is there anyway that we can request a special crawl of the website so that we can hopefully just be left any legitimate remaining issues?
Moz Pro | | dmckenzie4560 -
Mozcape API Batching URLs LIMIT
Guys, there's an example to batching URLs using PHP: http://apiwiki.seomoz.org/php Which is the maximum number of URLs I can add to that batch?
Moz Pro | | Srvwiz0 -
Where does the crawler find the urls?
The SEO Moz crawler has found a number of 500 error pages, and 404s etc which is very useful 🙂 however some of the urls are weird/broken formats we don't recognise and nobody remembers ever using - not weird enough to imply hacking, but something broken in the CMS Is there anyway to find out where the crawler found these urls? I can patch up and redirect the end result as best I can but I would prefer to fix plug the leak thanks 🙂
Moz Pro | | Fammy1 -
I have corrected the Problems in Crawl Diagnostics. When would it refresh/ re-crawl my site ?
I have corrected most of the problems shown in crawl diagnostics and changed the meta desc. , titles etc. When will SEOMOZ recrawl those pages and show that Its correct now ?
Moz Pro | | VarunBansal0 -
Robots review
Anything in this that would have caused Rogerbot to stop indexing my site? It only saw 34 of 5000+ pages on the last pass. It had no problems seeing the whole site before. User-agent: Rogerbot Disallow: /default.aspx?*
Moz Pro | | sprynewmedia
//Keep from crawling the CMS urls default.aspx?Tabid=234. Real home page is home.aspx Disallow: /ctl/
// Keep from indexing the admin controls Disallow: ArticleAdmin
// Keep from indexing article admin page Disallow: articleadmin
// same in lower case Disallow: /images/
// Keep from indexing CMS images Disallow: captcha
// keep from indexing the captcha image which appears to be a page to crawls. general rules lacking wildcards User-agent: * Disallow: /default.aspx Disallow: /images/ Disallow: /DesktopModules/DnnForge - NewsArticles/Controls/ImageChallenge.captcha.aspx0 -
Crawl Errors Confusing Me
The SEOMoz crawl tool is telling me that I have a slew of crawl errors on the blog of one domain. All are related to the MSNbot. And related to trackbacks (which we do want to block, right?) and attachments (makes sense to block those, too) ... any idea why these are crawl issues with MSNbot and not Google? My robots.txt is here: http://www.wevegotthekeys.com/robots.txt. Thanks, MJ
Moz Pro | | mjtaylor0