Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Why is Google Webmaster Tools showing 404 Page Not Found Errors for web pages that don't have anything to do with my site?
-
I am currently working on a small site with approx 50 web pages. In the crawl error section in WMT Google has highlighted over 10,000 page not found errors for pages that have nothing to do with my site. Anyone come across this before?
-
These extensions look like they are attachments. Go into GWT, click on the 404 link, then a box with pop up, click on the "Linked From" tab.. Go to the page and Ctrl U to see the source code. Do a CTRL F and search for the broken link. When you find it in your source code you should be able to figure out what's triggering that response. If you can't find the URLs in your source code, mark them as fixed and it should take care of the problem. Especially if they are older. It looks like it could be a shipment status, a product out of stock message, and a PDF of train schedules.
I would check the linked from pages and make sure that there isn't some erroneous code that is creating a page when it doesn't need to.
-
I think you could have is the following problem:
You got a Domain wich earned Links bevore and these Links are still out there. Google can see them and you will again and again see them in WMT.
You cant just say solved and than they are gone, caused by the backlinks. They come again.
Check that in WMT - I dont know how it is called in english versions (I just see the german), you can click on the 404 and than take a look at "what is linking to that page"
An easy solution in that case may be to disallow "/fs" for bots. (if you dont use fs in your url structure) -
The 404s start with the correct url for the site but are then suffixed after the forward slash with urls such as;
fs/201410/a_Royal_Mail_Item_Is_Currently_Being_Processed_For_Delivery_.html
/fs/201410/a_When_will_John_lewis_restock_the_Dr_Dre_beats_solo_hd_.html
fs/201410/a_Food_stalls_at_train_stations_in_London_.html
There are 10,000+ of these in my WMT account. I have never seen this before - any ideas?
-
Need more data to help, eh!
-
What do you mean "pages that have nothing to do with my site" are these not on your domain or are they on your domain but you are not familiar with them?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why can't google mobile friendly test access my website?
getting the following error when trying to use google mobile friendly tool: "page cannot be reached. This could be because the page is unavailable or blocked by robots.txt" I don't have anything blocked by robots.txt or robots tag. i also manage to render my pages on google search console's fetch and render....so what can be the reason that the tool can't access my website? Also...the mobile usability report on the search console works but reports very little, and the google speed test also doesnt work... Any ideas to what is the reason and how to fix this? LEARN MOREDetailsUser agentGooglebot smartphone
Technical SEO | | Nadav_W0 -
Robots.txt & meta noindex--site still shows up on Google Search
I have set up my robots.txt like this: User-agent: *
Technical SEO | | RoxBrock
Disallow: / and I have this meta tag in my on a Wordpress site, set up with SEO Yoast name="robots" content="noindex,follow"/> I did "Fetch as Google" on my Google Search Console My website is still showing up in the search results and it says this: "A description for this result is not available because of this site's robots.txt" This site has not shown up for years and now it is ranking above my site that I want to rank for this keyword. How do I get Google to ignore this site? This seems really weird and I'm confused how a site with little content, that has not been updated for years can rank higher than a site that is constantly updated and improved.1 -
404 Error Pages being picked up as duplicate content
Hi, I recently noticed an increase in duplicate content, but all of the pages are 404 error pages. For instance, Moz site crawl says this page: https://www.allconnect.com/sc-internet/internet.html has 43 duplicates and all the duplicates are also 404 pages (https://www.allconnect.com/Coxstatic.html for instance is a duplicate of this page). Looking for insight on how to fix this issue, do I add an rel=canonical tag to these 60 error pages that points to the original error page? Thanks!
Technical SEO | | kfallconnect0 -
How Does Google's "index" find the location of pages in the "page directory" to return?
This is my understanding of how Google's search works, and I am unsure about one thing in specific: Google continuously crawls websites and stores each page it finds (let's call it "page directory") Google's "page directory" is a cache so it isn't the "live" version of the page Google has separate storage called "the index" which contains all the keywords searched. These keywords in "the index" point to the pages in the "page directory" that contain the same keywords. When someone searches a keyword, that keyword is accessed in the "index" and returns all relevant pages in the "page directory" These returned pages are given ranks based on the algorithm The one part I'm unsure of is how Google's "index" knows the location of relevant pages in the "page directory". The keyword entries in the "index" point to the "page directory" somehow. I'm thinking each page has a url in the "page directory", and the entries in the "index" contain these urls. Since Google's "page directory" is a cache, would the urls be the same as the live website (and would the keywords in the "index" point to these urls)? For example if webpage is found at wwww.website.com/page1, would the "page directory" store this page under that url in Google's cache? The reason I want to discuss this is to know the effects of changing a pages url by understanding how the search process works better.
Technical SEO | | reidsteven750 -
We have set up 301 redirects for pages from an old domain, but they aren't working and we are having duplicate content problems - Can you help?
We have several old domains. One is http://www.ccisound.com - Our "real" site is http://www.ccisolutions.com The 301 redirect from the old domain to the new domain works. However, the 301-redirects for interior pages, like: http://www.ccisolund.com/StoreFront/category/cd-duplicators do not work. This URL should redirect to http://www.ccisolutions.com/StoreFront/category/cd-duplicators but as you can see it does not. Our IT director supplied me with this code from the HT Access file in hopes that someone can help point us in the right direction and suggest how we might fix the problem: RewriteCond%{HTTP_HOST} ccisound.com$ [NC] RewriteRule^(.*)$ http://www.ccisolutions.com/$1 [R=301,L] Any ideas on why the 301 redirect isn't happening? Thanks all!
Technical SEO | | danatanseo0 -
Google's "cache:" operator is returning a 404 error.
I'm doing the "cache:" operator on one of my sites and Google is returning a 404 error. I've swapped out the domain with another and it works fine. Has anyone seen this before? I'm wondering if G is crawling the site now? Thx!
Technical SEO | | AZWebWorks0 -
Blank pages in Google's webcache
Hello all, Is anybody experiencing blanck page's in Google's 'Cached' view? I'm seeing just the page background and none of the content for a couple of my pages but when I click 'View Text Only' all of teh content is there. Strange! I'd love to hear if anyone else is experiencing the same. Perhaps this is something to do with the roll out of Google's updates last week?! Thanks,
Technical SEO | | A_Q
Elias0 -
Does Google pass link juice a page receives if the URL parameter specifies content and has the Crawl setting in Webmaster Tools set to NO?
The page in question receives a lot of quality traffic but is only relevant to a small percent of my users. I want to keep the link juice received from this page but I do not want it to appear in the SERPs.
Technical SEO | | surveygizmo0