How to remove URLS from from crawl diagnostics blocked by robots.txt
-
I suddenly have a huge jump in the number of errors in crawl diagnostics and it all seems to be down to a load of URLs that should be blocked by robots.txt. These have never appeared before, how do I remove them or stop them appearing again?
-
Hi Simon,
Noindex Follow meta tag sounds like the way to go.
Best to read this first... http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world
Hope this helps.
Justin
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is It Necessary to Remove The Inbound links With Spammy Score?
Hi Friends, I am new to this community. I just checked my inbound links using Moz tool and I came to know that there are some inbound links with spammy score. So, should I remove those links using disavow tool? Awaiting for the reply.
Moz Pro | | Flyin.com0 -
What to do with a site of >50,000 pages vs. crawl limit?
What happens if you have a site in your Moz Pro campaign that has more than 50,000 pages? Would it be better to choose a sub-folder of the site to get a thorough look at that sub-folder? I have a few different large government websites that I'm tracking to see how they are fairing in rankings and SEO. They are not my own websites. I want to see how these agencies are doing compared to what the public searches for on technical topics and social issues that the agencies manage. I'm an academic looking at science communication. I am in the process of re-setting up my campaigns to get better data than I have been getting -- I am a newbie to SEO and the campaigns I slapped together a few months ago need to be set up better, such as all on the same day, making sure I've set it to include www or not for what ranks, refining my keywords, etc. I am stumped on what to do about the agency websites being really huge, and what all the options are to get good data in light of the 50,000 page crawl limit. Here is an example of what I mean: To see how EPA is doing in searches related to air quality, ideally I'd track all of EPA's web presence. www.epa.gov has 560,000 pages -- if I put in www.epa.gov for a campaign, what happens with the site having so many more pages than the 50,000 crawl limit? What do I miss out on? Can I "trust" what I get? www.epa.gov/air has only 1450 pages, so if I choose this for what I track in a campaign, the crawl will cover that subfolder completely, and I am getting a complete picture of this air-focused sub-folder ... but (1) I'll miss out on air-related pages in other sub-folders of www.epa.gov, and (2) it seems like I have so much of the 50,000-page crawl limit that I'm not using and could be using. (However, maybe that's not quite true - I'd also be tracking other sites as competitors - e.g. non-profits that advocate in air quality, industry air quality sites - and maybe those competitors count towards the 50,000-page crawl limit and would get me up to the limit? How do the competitors you choose figure into the crawl limit?) Any opinions on which I should do in general on this kind of situation? The small sub-folder vs. the full humongous site vs. is there some other way to go here that I'm not thinking of?
Moz Pro | | scienceisrad0 -
How to remove 404 pages wordpress
I used the crawl tool and it return a 404 error for several pages that I no longer have published in Wordpress. They must still be on the server somewhere? Do you know how to remove them? I think they are not a file on the server like an html file since Wordpress uses databases? I figure that getting rid of the 404 errors will improve SEO is this correct? Thanks, David
Moz Pro | | DJDavid0 -
SEO Crawl Report Images?
Does SEOMOZ crawl images in the report? Raven tools is showing me about 200 missing alt tags and title tags. I can not seem to find any of this information on the SEOMOZ report. Am I missing something?
Moz Pro | | jasonsixtwo0 -
URL Encoding
HI SEOmoz has finished crawling the site and surprised me with nearly 4k of 301's all the 301 are on my deal pages Example of the 301 http://www.economy-car-leasing.co.uk/van-leasing-deals/ford/transit-lease/transit-lwb-el-minibus-diesel-rwd-high-roof-17-seater-tdci-135ps%3D586165 as you can see from the above URL it returns a 404 but the URL is actually sent as below http://www.economy-car-leasing.co.uk/van-leasing-deals/ford/transit-lease/transit-lwb-el-minibus-diesel-rwd-high-roof-17-seater-tdci-135ps=586165 For some reason SEOmoz crawler is converting the = to %3d and reporting its a 301 even though it returns 404 Is this an error on SEOMOZ part ? or is there an error on my site Googlebot when i do a fetch as Google bot returns all on with the = sign and every other tool i have tried is ok too so not sure why SEOMOZ is seeing it different and then adding the URL as a 301 I am hoping this is just a glitch on the report tool part as im struggling since a recent site 301
Moz Pro | | kellymandingo0 -
Crawl Diagnostics Summary
Sorry if I am not asking in the right place. On Crawl Diagnostics Summary it says this right..?? : "To get you started quickly Roger is crawling up to 250 pages on your site. You should see these results within two hours. The full crawl will complete within 7 days.". so it's passed a day and it still doesn't show nothing. It says "Processing Crawl Data for 358 pages" How much should i wait??
Moz Pro | | Dussk0 -
Crawl Diagnostic Errors
Hi there, Seeing a large number of errors in the SEOMOZ Pro crawl results. The 404 errors are for pages that look like this: http://www.example.com/2010/07/blogpost/http:%2F%2Fwww.example.com%2F2010%2F07%2Fblogpost%2F I know that t%2F represents the two slashes, but I'm not sure why these addresses are being crawled. The site is a wordpress site. Anyone seen anything like this?
Moz Pro | | rosstaylor0