Strange Webmaster Tools Crawl Report
-
Up until recently I had robots.txt blocking the indexing of my pdf files which are all manuals for products we sell. I changed this last week to allow indexing of those files and now my webmaster tools crawl report is listing all my pdfs as not founds.
What is really strange is that Webmaster Tools is listing an incorrect link structure: "domain.com/file.pdf" instead of "domain.com/manuals/file.pdf"
Why is google indexing these particular pages incorrectly? My robots.txt has nothing else in it besides a disallow for an entirely different folder on my server and my htaccess is not redirecting anything in regards to my manuals folder either. Even in the case of outside links present in the crawl report supposedly linking to this 404 file when I visit these 3rd party pages they have the correct link structure.
Hope someone can help because right now my not founds are up in the 500s and that can't be good
Thanks is advance!
-
Hello,
Did you check the "linked From" tab? click on each error and see which are the sites that are linked from
-
Thanks for the help Wissam!
What I have done is changed all relative paths to direct- then I ran screaming frog and it did not pick up any 404s at all - this was last Thursday. Unfortunately webmaster tools is still reporting the same style 404s having been discovered since then. Is there a reason why screaming frog and webmaster tools would be seeing different crawl results?
-
all link reported in the GWT is based on a crawl.( so there is either an external or internal link pointing to these.com/file.pdf)
So what i would do is fire up Screaming Frog or Xenu and do a full site crawl and check the reports. You might find some pages linking or using relative urls in the a href elements.
If you land into a situation where you have external links pointing to wrong URLS I would recommend either by contacting them or just 301 /file.pdf to /manuals/file.pdf
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Many errors in Search Console (strange parameters)
Hello, I have many strange parameters in my search console that make many 404 pages, for example: mywebsite.com/article-name/&ct=ga&cd=CAIyGjk4YjY4ZDExNTYxOTgzZTk6Y29tOmVuOlVT&usg=AFQjCNFvpYpYpYf9DoyRBBu8jbiQB8JcIQ mywebsite.com/article-name/&sa=U&ved=0ahUKEwj1zMLR0JbLAhUGM5oKHejjBJAQqQIILSgAMAk&usg=AFQjCNEBNFx3dG5B0-16X6eXTS7k-Srm6Q Can someone tell me how to solve it?
Technical SEO | | JohnPalmer0 -
Bing Webmaster Tools Incompatibility Issues with new Microsoft Edge Browser
Our client received an email from Bing WMTs saying "We have identified 4 known issues with your website in Microsoft Edge – the new default browser for Windows 10 and Bing – Of the four problems mentioned, only two seem to be relevant (maybe) We’ve found that this webpage may include HTML markup that treats Microsoft Edge differently from other modern browsers. The new EdgeHTML rendering engine for Microsoft Edge is document-mode agnostic and designed for fast, modern rendering. We recommend that you implement one code base for all modern browsers and include Microsoft Edge as part of your modern browser test matrix. **We've found that this webpage may have missing vendor-specific prefixes **or may have implemented vendor-specific prefixes when they are not required in common CSS properties. This may cause compatibility problems with how this webpage renders across different browsers. Last month the client received 20K visitors from all IE browsers and this is significant enough to be concerned about. **Are other folks making changes to their code to adapt to MS Edge? **
Technical SEO | | RosemaryB0 -
Tool to Generate All the URLs on a Domain
Hi all, I've been using xml-sitemaps.com for a while to generate a list of all the URLs that exist on a domain. However, this tool only works for websites with under 500 URLs on a domain. The paid tool doesn't offer what we are looking for either. I'm hoping someone can help with a recommendation. We're looking for a tool that can: Crawl, and list, all the indexed URLs on a domain, including .pdf and .doc files (ideally in a .xls or .txt file) Crawl multiple domains with unlimited URLs (we have 5 websites with 500+ URLs on them) Seems pretty simple, but we haven't been able to find something that isn't tailored toward management of a single domain or that can crawl a huge volume of content.
Technical SEO | | timfrick0 -
Google stopped crawling my site. Everybody is stumped.
This has stumped the Wordpress staff and people in the Google Webmasters forum. We are in Google News (have been for years), and so new posts are crawled immediately. On Feb 17-18 Crawl Stats dropped 85%, and new posts were no longer indexed (not appearing on News or search). Data highlighter attempts return "This URL could not be found in Google's index." No manual actions by Google. No changes to the website; no custom CSS. No Site Errors or new URL errors. No sitemap problems (resubmitting didn't help). We're on wordpress.com, so no odd code. We can see the robot.txt file. Other search engines can see us, as can social media websites. Older posts still index, but loss of News is a big hit. Also, I think overall Google referrals are dropping. We can Fetch the URL for a new post, and many hours later it appears on Google and News, and we can then use Data Highlighter. It's now 6 days and no recovery. Everybody is stumped. Any ideas? I just joined, so this might be the wrong venue. If so, apologies.
Technical SEO | | Editor-FabiusMaximus_Website0 -
Duplicate content on report
Hi, I just had my Moz Campaign scan 10K pages out of which 2K were duplicate content and URL's are http://www.Somesite.com/modal/register?destination=question%2F37201 http://www.Somesite.com/modal/register?destination=question%2F37490 And the title for all 2K is "Register" How can i deal with this as all my pages have the register link and login and when done it comes back to the same page where we left and that it actually not duplicate but we need to deal with it propely thanks
Technical SEO | | mtthompsons0 -
SEO basics for Q&A tool
Hi everyone, our company wants to launch a Q&A forum on our website. The goal is to keep the useres interacting with our website, generate leads (of course) and... last but not least... to generate UGC for our website (and Google of course)... [We organise career events with big companys for students, professionals, give career advice etc..] From a SEO perspective, I find the following points difficult to overcome: the possible problem of "thin" content, many URL's with a question and only 1 or 2 answers will not look good for Google, especially when there are a lot of it (Panda-Update). One solution could be to noindex pages with thin content, but imagine that you have an active community, this could take ages and we got other things to do... the problem of finding ALL content: what would be the best solution to make sure that G finds all UGC, even the older content? Would it be enough to link to older questions on the page of the actual question? Let's say, this page contains links to the 5 questions before and so on... Or should there be categories of questions, where you list all of the questions ever asked??? would you/can one optimise the content? Users do not ask questions with the beloved keywords and if there would be a standard solution that the URL and the Title-Tag contains the question, there could be a lot of strange/not useful pages on our domain... I hope I could make clear what my problems are and I hope someone can give me some good advice... Thanx!!
Technical SEO | | accessKellyOCG0 -
Should we block URL param in Webmaster tools after URL migration?
Hi, We have just released a new version of our website that now has a human readable nice URL's. Our old ugly URL's are still accessible and cannot be blocked/redirected. These old URL's use a URL param that has an xpath like expression language to define the location in our catalog. We have about 2 million pages indexed with this old URL param in it while we have approximately 70k nice URL's after the migration. This high number of old URL's is due to facetting that was done using this URL param. I wonder if we should now completely block this URL param from Google Webmaster tools so that these ugly URL's will be removed from the Google index. Or will this harm our position in Google? Thanks, Chris
Technical SEO | | eCommerceSEO0 -
Website Grader Report - Permanent Redirect Not Found
Have you ever checked HubSpot's website grader at www.websitegrader.com? I usually notice that the tool gives an error namely "Permanent Redirect Not Found" with below explanation: "Search engines may think www.example.com and example.com are two different sites.You should set up a permanent redirect (technically called a "301 redirect") between these sites. Once you do that, you will get full search engine credit for your work on these sites. :(Website Grader) Can we trust this tool?
Technical SEO | | merkal20050