Strange Webmaster Tools Crawl Report
-
Up until recently I had robots.txt blocking the indexing of my pdf files which are all manuals for products we sell. I changed this last week to allow indexing of those files and now my webmaster tools crawl report is listing all my pdfs as not founds.
What is really strange is that Webmaster Tools is listing an incorrect link structure: "domain.com/file.pdf" instead of "domain.com/manuals/file.pdf"
Why is google indexing these particular pages incorrectly? My robots.txt has nothing else in it besides a disallow for an entirely different folder on my server and my htaccess is not redirecting anything in regards to my manuals folder either. Even in the case of outside links present in the crawl report supposedly linking to this 404 file when I visit these 3rd party pages they have the correct link structure.
Hope someone can help because right now my not founds are up in the 500s and that can't be good
Thanks is advance!
-
Hello,
Did you check the "linked From" tab? click on each error and see which are the sites that are linked from
-
Thanks for the help Wissam!
What I have done is changed all relative paths to direct- then I ran screaming frog and it did not pick up any 404s at all - this was last Thursday. Unfortunately webmaster tools is still reporting the same style 404s having been discovered since then. Is there a reason why screaming frog and webmaster tools would be seeing different crawl results?
-
all link reported in the GWT is based on a crawl.( so there is either an external or internal link pointing to these.com/file.pdf)
So what i would do is fire up Screaming Frog or Xenu and do a full site crawl and check the reports. You might find some pages linking or using relative urls in the a href elements.
If you land into a situation where you have external links pointing to wrong URLS I would recommend either by contacting them or just 301 /file.pdf to /manuals/file.pdf
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl Results
How fresh is SEOMOZ crawl results ?. On my report for today I can see that my website ranking for several keywords run manually and individually on Google, Yahoo and bing to be better than the actual SEOMOZ report. Also have been noticing that Back link count on SEOMOZ report to be significantly less than counted with other sites and software.Can someone advise me on this ?
Technical SEO | | sherohass0 -
Sitemap Generator Tool
We have developed a very large domain with well over 500 pages that need to be indexed. The tool we usually use to create a sitemap has a limit of 500 pages. Does anyone know of good tool we can use to create a sitemap text and xml that doesn't have a limit of pages? Thanks!
Technical SEO | | TracSoft0 -
CDN Being Crawled and Indexed by Google
I'm doing a SEO site audit, and I've discovered that the site uses a Content Delivery Network (CDN) that's being crawled and indexed by Google. There are two sub-domains from the CDN that are being crawled and indexed. A small number of organic search visitors have come through these two sub domains. So the CDN based content is out-ranking the root domain, in a small number of cases. It's a huge duplicate content issue (tens of thousands of URLs being crawled) - what's the best way to prevent the crawling and indexing of a CDN like this? Exclude via robots.txt? Additionally, the use of relative canonical tags (instead of absolute) appear to be contributing to this problem as well. As I understand it, these canonical tags are telling the SEs that each sub domain is the "home" of the content/URL. Thanks! Scott
Technical SEO | | Scott-Thomas0 -
Google Webmasters News Errors ressolution
Hello to the community, i had a sudden increase from just a couple to 50 someting Google Webmaster News Errors. The two areas affected are Content of article and date of article.I found a very good article in SEOMoz about Google Webmasters, but it was published before the changes early last year were done in Google Webmasters. http://www.seomoz.org/blog/how-to-fix-crawl-errors-in-google-webmaster-tools The people that have been asking the same question in the internet have not yet received replies from Google and the Google support replies dont make it really clear. http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93994 Any views experiences with this. My site is in Google News, but we do not have a Google News Sitemap. Thanks, Polar
Technical SEO | | Polarstar0 -
DNS error on webmaster tool
Google webmaster tool is showing DNS error and that is leading to many server error (502,500) almost 50+ in every crawl. Recently Google crawled one of our sub domains that we did not want google to crawl. We blocked it via Robots.txt and also removed all the URL's and since then we are having this issue. Any suggestions how to fix this DNS error? Thanks in advance.
Technical SEO | | tpt.com0 -
Wordpress & use of 'www' vs not for webmaster tools - explanation needed
I am having a hard time understanding the issue of canonization of site pages, specifically in regards to the 'www' or 'non-www' versions of a site. And specifically in regards to wordpress. I can see that it doesn't matter whether you type in 'www' or not in the url for a wordpress site, what is going on in the back end that allows this? When I link up to google webmaster tools, should i use www or not? thanks for any help d
Technical SEO | | dnaynay0 -
301 mistake in Google Webmaster Tools?
Google webmaster tools has a warning for our site map saying that this url (and a couple of others) have a 301 redirect in them. http://www.aquinasandmore.com/catholic-gifts/Immaculate-Heart-of-Mary-Bookmark/sku/59682 I've checked the link and don't see that it actually is redirecting. Any thoughts on why this is popping up?
Technical SEO | | IanTheScot0 -
Blocking Google from Crawling Parameters
Hi guys: What is the best way to keep Google from crawling certain urls with parameters? I used the setting in Webmaster Tools, but that doesn't seem to be helping at all. Can I use robots.txt or some other method? Thanks! Some examples are: <colgroup><col width="797"></colgroup> www.mayer-johnson.com/category/assistive-technology?manufacturer=179 www.mayer-johnson.com/category/assistive-technology?manufacturer=226 www.mayer-johnson.com/category/assistive-technology?manufacturer=227 <colgroup><col width="797"></colgroup> www.mayer-johnson.com/category/english-language-learners?condition=212 www.mayer-johnson.com/category/english-language-learners?condition=213 www.mayer-johnson.com/category/english-language-learners?condition=214 <colgroup><col width="797"></colgroup>
Technical SEO | | DanaDV
| www.mayer-johnson.com/category/english-language-learners?roles=164 |
| www.mayer-johnson.com/category/english-language-learners?roles=165 |
| www.mayer-johnson.com/category/english-language-learners?roles=197 | | |0