Strange Webmaster Tools Crawl Report
-
Up until recently I had robots.txt blocking the indexing of my pdf files which are all manuals for products we sell. I changed this last week to allow indexing of those files and now my webmaster tools crawl report is listing all my pdfs as not founds.
What is really strange is that Webmaster Tools is listing an incorrect link structure: "domain.com/file.pdf" instead of "domain.com/manuals/file.pdf"
Why is google indexing these particular pages incorrectly? My robots.txt has nothing else in it besides a disallow for an entirely different folder on my server and my htaccess is not redirecting anything in regards to my manuals folder either. Even in the case of outside links present in the crawl report supposedly linking to this 404 file when I visit these 3rd party pages they have the correct link structure.
Hope someone can help because right now my not founds are up in the 500s and that can't be good
Thanks is advance!
-
Hello,
Did you check the "linked From" tab? click on each error and see which are the sites that are linked from
-
Thanks for the help Wissam!
What I have done is changed all relative paths to direct- then I ran screaming frog and it did not pick up any 404s at all - this was last Thursday. Unfortunately webmaster tools is still reporting the same style 404s having been discovered since then. Is there a reason why screaming frog and webmaster tools would be seeing different crawl results?
-
all link reported in the GWT is based on a crawl.( so there is either an external or internal link pointing to these.com/file.pdf)
So what i would do is fire up Screaming Frog or Xenu and do a full site crawl and check the reports. You might find some pages linking or using relative urls in the a href elements.
If you land into a situation where you have external links pointing to wrong URLS I would recommend either by contacting them or just 301 /file.pdf to /manuals/file.pdf
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can we validate a CDN like Max in Webmasters?
Hi, Can we validate a CDN like Max in Webmasters? We have images hosted in CDN and they dont get indexed in Google images. Its been a year now and no luck. Maxcdn says they have no issues at there end and images have ALT and they are original images with no copyright issues
Technical SEO | | ArchieChilds0 -
Duplicate Meta Titles and Descriptions Issue in Google Webmaster Tool
Hello All, We have one site named http://www.bargains-online.com.au/ & have some categories along with filter option on left side like filter by price & by brand, ect. We have already set rel canonical tags on all filtered pages, but still those all pages showing duplicate page titles and description warning in HTML Improvements section in Google Webmaster Tool. For Example: http://www.bargains-online.com.au/pressure-cleaners.html We've set rel canonical tag on below pages. http://www.bargains-online.com.au/pressure-cleaners/l/manufacturer:black-eagle.html http://www.bargains-online.com.au/pressure-cleaners/l/price:2,100.html http://www.bargains-online.com.au/pressure-cleaners/l/price:3,100.html Kindly request if anybody has any solutions for the same, please share with us. Thanks, Akshay
Technical SEO | | akshaydesai0 -
Links in Webmaster Tools that aren't really linking to us
I've noticed that there is a domain in WMT that Google says is linking to our domain from 173 different pages, but it actually isn't linking to us at all on ANY of those pages. The site is a business directory that seems to be automatically scraping business listings and adding them to hundreds of different categories. Low quality crap that I've disavowed just in case. I have hand checked a bunch of the pages that WMT is reporting with links to us by viewing source, but there's no links to us. I've also used crawlers to check for links, but they turn up nothing. The pages do, however, mention our brand name. I find this very odd that Google would report links to our site when there isn't actually links to our site. Has anyone else ever noticed something like this?
Technical SEO | | Philip-DiPatrizio0 -
Google Disavow Tool
Some background: My rankings have been wildly fluctuating for the past few months for no apparent reason. When I inquired about this, many people said that even though I haven't received any penalty notice, I was probably affected by penguin. (http://moz.com/community/q/ranking-fluctuations) I recently did a link detox by LinkRemovalTools and it gave me a list of all my links, 2% were toxic and 51% were suspiscious. Should I simply disavow the 2%? There are many sites where is no contact info.
Technical SEO | | EcomLkwd0 -
Sitemap Generator Tool
We have developed a very large domain with well over 500 pages that need to be indexed. The tool we usually use to create a sitemap has a limit of 500 pages. Does anyone know of good tool we can use to create a sitemap text and xml that doesn't have a limit of pages? Thanks!
Technical SEO | | TracSoft0 -
Application/x-msdownload in crawl report?
In a crawl report for my company's site, the "content_type_header" column usually contains "text/html". But there are some random pages with "application/x-msdownload"... what does "application/x-msdownload" mean? <colgroup><col width="121"></colgroup>
Technical SEO | | JimLynch
| |0 -
SEO basics for Q&A tool
Hi everyone, our company wants to launch a Q&A forum on our website. The goal is to keep the useres interacting with our website, generate leads (of course) and... last but not least... to generate UGC for our website (and Google of course)... [We organise career events with big companys for students, professionals, give career advice etc..] From a SEO perspective, I find the following points difficult to overcome: the possible problem of "thin" content, many URL's with a question and only 1 or 2 answers will not look good for Google, especially when there are a lot of it (Panda-Update). One solution could be to noindex pages with thin content, but imagine that you have an active community, this could take ages and we got other things to do... the problem of finding ALL content: what would be the best solution to make sure that G finds all UGC, even the older content? Would it be enough to link to older questions on the page of the actual question? Let's say, this page contains links to the 5 questions before and so on... Or should there be categories of questions, where you list all of the questions ever asked??? would you/can one optimise the content? Users do not ask questions with the beloved keywords and if there would be a standard solution that the URL and the Title-Tag contains the question, there could be a lot of strange/not useful pages on our domain... I hope I could make clear what my problems are and I hope someone can give me some good advice... Thanx!!
Technical SEO | | accessKellyOCG0 -
Error Reporting
http://pro.seomoz.org/campaigns/33868/issues/18 Rel Canonical Found about 16 hours ago <dl> <dt>Tag value</dt> <dd>http://www.geeks.com/</dd> <dt>Description</dt> <dd>Using rel=canonical suggests to search engines which URL should be seen as canonical.</dd> <dd>We do have rel canonical on some of the pages this report is recommending that we "fix" this issue.</dd> <dd> Rel Canonical Found about 16 hours ago <dl> <dt>Tag value</dt> <dd>http://www.geeks.com/products.asp?cat=MBB</dd> <dt>Description</dt> <dd>Using rel=canonical suggests to search engines which URL should be seen as canonical.</dd> </dl> <a class="more expanded">Minimize</a> </dd> </dl>
Technical SEO | | JustinGeeks0