Strange Webmaster Tools Crawl Report
-
Up until recently I had robots.txt blocking the indexing of my pdf files which are all manuals for products we sell. I changed this last week to allow indexing of those files and now my webmaster tools crawl report is listing all my pdfs as not founds.
What is really strange is that Webmaster Tools is listing an incorrect link structure: "domain.com/file.pdf" instead of "domain.com/manuals/file.pdf"
Why is google indexing these particular pages incorrectly? My robots.txt has nothing else in it besides a disallow for an entirely different folder on my server and my htaccess is not redirecting anything in regards to my manuals folder either. Even in the case of outside links present in the crawl report supposedly linking to this 404 file when I visit these 3rd party pages they have the correct link structure.
Hope someone can help because right now my not founds are up in the 500s and that can't be good
Thanks is advance!
-
Hello,
Did you check the "linked From" tab? click on each error and see which are the sites that are linked from
-
Thanks for the help Wissam!
What I have done is changed all relative paths to direct- then I ran screaming frog and it did not pick up any 404s at all - this was last Thursday. Unfortunately webmaster tools is still reporting the same style 404s having been discovered since then. Is there a reason why screaming frog and webmaster tools would be seeing different crawl results?
-
all link reported in the GWT is based on a crawl.( so there is either an external or internal link pointing to these.com/file.pdf)
So what i would do is fire up Screaming Frog or Xenu and do a full site crawl and check the reports. You might find some pages linking or using relative urls in the a href elements.
If you land into a situation where you have external links pointing to wrong URLS I would recommend either by contacting them or just 301 /file.pdf to /manuals/file.pdf
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No index and Crawl Budget
Hello, If we noindex pages, will it improve crawl budget ? For example pages like these - https://x-z.com/2012/10/
Technical SEO | | Johnroger
https://x-y.com/2012/06/
https://x-y.com/2013/03/
https://x-y.com/2019/10/
https://x-y.com/2019/08/ Should we delete/redirect such pages ? Thanks0 -
Ignore these external links reported in GWT?
Taking a long, Ace-Ventura-like breath here. This question is loaded. Here we go: No manual actions reported against my client's site in GWT HOWEVER, a link: operator search for external links to my client's website shows NO links in results. That seems like a very bad omen to me. OSE shows 13 linking domains, but not the one that's listed in the next bullet point. Issue: In Google Webmaster Tools, I noticed 1,000+ external links to my client's website all coming from riddim-donmagazine.com (there are a small handful of other domains listed, but this one stuck out for the large quantity of links coming from this domain) Those external links all point to two URLs on my client's website. I have no knowledge of any campaigns run by client that would use this other domain (or any schemes for that matter) It appears that this website riddim-donmagazine.com has been suspended by hostgator All of the links were first discovered last year (dates vary but basically August through December 2013) There have not been any newly discovered links from this website reported by GWT since those 2013 dates All of the external links are /? based. Example: http://riddim-donmagazine.com/?author=1&paged=31 If I run that link in preceding bullet point through http://www.webconfs.com/http-header-check.php, or any others from riddim-donmagazine.com those external links return 302 status. My best guess is at one time the client was running an advertising program and this website may have been on that network. One of the external links points to an ad page on the client's website.
Technical SEO | | EEE3
(web.archive.org confirms this is a WordPress site and that it's coverage of Bronx news could trigger an ad for my client or make it related to my client's website when it comes to demographics.) Believe me, this externally linked domain is only a small problem in comparison with the rest of my client's issues (mainly they've changed domains, then they changed website vendors, etc., etc.), but I did want to ask about that one externally linked domain. Whew.Thanks in advance for insights/thoughts/advice!0 -
Google Disavow Tool
Some background: My rankings have been wildly fluctuating for the past few months for no apparent reason. When I inquired about this, many people said that even though I haven't received any penalty notice, I was probably affected by penguin. (http://moz.com/community/q/ranking-fluctuations) I recently did a link detox by LinkRemovalTools and it gave me a list of all my links, 2% were toxic and 51% were suspiscious. Should I simply disavow the 2%? There are many sites where is no contact info.
Technical SEO | | EcomLkwd0 -
Google Webmasters News Errors ressolution
Hello to the community, i had a sudden increase from just a couple to 50 someting Google Webmaster News Errors. The two areas affected are Content of article and date of article.I found a very good article in SEOMoz about Google Webmasters, but it was published before the changes early last year were done in Google Webmasters. http://www.seomoz.org/blog/how-to-fix-crawl-errors-in-google-webmaster-tools The people that have been asking the same question in the internet have not yet received replies from Google and the Google support replies dont make it really clear. http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93994 Any views experiences with this. My site is in Google News, but we do not have a Google News Sitemap. Thanks, Polar
Technical SEO | | Polarstar0 -
Which are the best website reporting tools for speed and errors
Hi, i have just made some changes on my site by adding some redirects and i want to know what affect this has had on my site www.in2town.co.uk I have been using tools such as Pingdom tools, http://gtmetrix.com but these always give me different time reports so i do not know the true speed of my site and give me different advice. So i am wanting to know how to check the true speed for my site in the UK and how to check for the errors to make it better any advice would be great on which tools to use
Technical SEO | | ClaireH-1848860 -
Are there any tools that will detect when new pages are added?
need help with keeping track of when new pages pop up that might compete internally with already optimized pages.
Technical SEO | | SEOmoxy0 -
How do i Organize an XML Sitemap for Google Webmaster Tools?
OK, so i used am xlm sitemap generator tool, xml-sitemaps.com, for Google Webmaster Tools submission. The problem is that the priorities are all out of wack. How on earth do i organize it with 1000's of pages?? Should i be spending hours organizing it?
Technical SEO | | schmeetz0 -
What tool do you use to check for URLs not indexed?
What is your favorite tool for getting a report of URLs that are not cached/indexed in Google & Bing for an entire site? Basically I want a list of URLs not cached in Google and a seperate list for Bing. Thanks, Mark
Technical SEO | | elephantseo3