Why do I have so many extra indexed pages?
-
Stats-
Webmaster Tools Indexed Pages- 96,995
Site: Search- 97,800 Pages
Sitemap Submitted- 18,832
Sitemap Indexed- 9,746
I went through the search results through page 28 and every item it showed was correct. How do I figure out where these extra 80,000 items are coming from? I tried crawling the site with screaming frog awhile back but it locked because of so many urls. The site is a Magento site so there are a million urls, but I checked and all of the canonicals are setup properly. Where should I start looking?
-
It ended up being my search results. I was able to use the site operator to break it down.
-
To ensure Screaming Frog can handle the crawl you could chunk up the site and crawl it in parts, e.g. by each subdirectory. This can be done within the 'configuration' menu under 'include'. There's loads of tutorials online.
You can also use exclude to ensure it doesn't crawl unnecessary pages, images or scripts for example on wordpress I often block wp-content
Definitely sounds like a problem with query parameters being indexed though and its often good to ensure these are addressed in the search console.
-
1. Your first one is interesting. I actually haven't been in there before. There are 96 rows and everyone of them is set to let Googlebot Decide. Do you think I should change that up?
2. Not sure on how many images we have but it is a lot. Not we do not have an image sitemap.
I tried Screaming Frog and it couldn't handle it. After about 1.5 million urls it kept locking up. I just setup a free trial for Deep Crawl. It can only do 10,000 but I will see if it has anything worthwhile.
-
- Have you checked out the parameters settings in Google Search Console to find out how many pages Google has found for your site with the same parameters? That might give some insights on that side.
- How many images do you have across the site? Do you have image sitemaps for these kind of pages.
What I would advise + what you've already been trying is to get a full crawl by either using ScreamingFrog or Deepcrawl. This will provide you with better insights into how many pages a search engine can really find.
-
I wouldn't say it is doing fine. Before I started they launched a new site and messed up the 301 redirects. Traffic hasn't recovered yet.
For Robots I am using the Inchoo robots.txt-http://inchoo.net/ecommerce/ultimate-magento-robots-txt-file-examples/ maybe it is a parameters issue, but I can't figure out how to see all my indexed pages.
I tried doing a search for both inurl:= site:www.site.com and inurl:? site:www.site.com and nothing showed up unless I am missing something.
I can't figure out how to check if some of the canonicalized urls are indexed. The pages are all identical though.
We have less then 100 out of stock items.
-
As long as your organic traffic is doing fine I shouldn't be too concerned. That being said:
- Is your robots.txt or search console disallowing crawler access to parameters like '?count=' or '?color='?
- Is your robots.txt disallowing crawler access to urls that have a 'noindex' but were indexed before they got noindex?
- You can also take a couple of parameters from your site and test if any url's have been indexed, by using the 'inurl:parameter site:www.site.com' query.
- Are some of the canonicalized urls indexed anyway? This may indicate that page content is different enough for Google to index both versions.
- If there's a ton of articles that go in and out of stock and use dynamic ID's, Google may keep these in their index. Do out of stock articles return a 404 or are they kept alive?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I apply Canonical Links from my Landing Pages to Core Website Pages?
I am working on an SEO project for the website: https://wave.com.au/ There are some core website pages, which we want to target for organic traffic, like this one: https://wave.com.au/doctors/medical-specialties/anaesthetist-jobs/ Then we have basically have another version that is set up as a landing page and used for CPC campaigns. https://wave.com.au/anaesthetists/ Essentially, my question is should I apply canonical links from the landing page versions to the core website pages (especially if I know they are only utilising them for CPC campaigns) so as to push link equity/juice across? Here is the GA data from January 1 - April 30, 2019 (Behavior > Site Content > All Pages😞
Intermediate & Advanced SEO | | Wavelength_International0 -
Google Is Indexing The Wrong Page For My Keyword
For a long time (almost 3 mounth) google indexing the wrong page for my main keyword.
Intermediate & Advanced SEO | | Tiedemann_Anselm
The problem is that each time google indexed another page each time for a period of 4-7 days, Sometimes i see the home page, sometimes a category page and sometimes a product page.
It seems though Google has not yet decided what his favorite / better page for this keyword. This is the pages google index: (In most cases you can find the site on the second or third page) Main Page: http://bit.ly/19fOqDh Category Page: http://bit.ly/1ebpiRn Another Category: http://bit.ly/K3MZl4 Product Page: http://bit.ly/1c73B1s All links I get to the website are natural links, therefore in most cases the anchor we got is the website name. In addition I have many links I get from bloggers that asked to do a review on one of my products, I'm very careful about that and so I'm always checking the blogger and their website only if it is something good, I allowed it. also i never ask for a link back (must of the time i receive without asking), and as I said, most of their links are anchor with my website name. Here some example of links that i received from bloggers: http://bit.ly/1hF0pQb http://bit.ly/1a8ogT1 http://bit.ly/1bqqRr8 http://bit.ly/1c5QeC7 http://bit.ly/1gXgzXJ Please Can I get a recommendation what should you do?
Should I try to change the anchor of the link?
Do I need to not allow bloggers to make a review on my products? I'd love to hear what you recommend,
Thanks for the help0 -
Should We Add the W3.org Language Tag To Every Page Or Just The Home Page?
Greetings, We have five international sites around the world, two of which are in difference languages. Currently we have the following line of html code on the home page of each of the sites: Clearly, we need to change the "en" portion for the sites that aren't in English, but, should we include that meta tag in each of the site's pages, or will the home page suffice. Thanks!
Intermediate & Advanced SEO | | CSawatzky0 -
Why would one of our section pages NOT be indexed by Google?
One of our higher traffic section pages is not being indexed by Google. The products that reside on this section page ARE indexed by Google and are on page 1. So why wouldn't the section page be even listed and indexed? The meta title is accurate, meta description is good. I haven't received any notices in Webmaster Tools. Is there a way to check to see if OTHER pages might also not be indexed? What should a small ecom site do to see about getting it listed? SOS in Modesto. Ron
Intermediate & Advanced SEO | | yatesandcojewelers0 -
How to associate content on one page to another page
Hi all, I would like associate content on "Page A" with "Page B". The content is not the same, but we want to tell Google it should be associated. Is there an easy way to do this?
Intermediate & Advanced SEO | | Viewpoints1 -
Huge google index with un-relevant pages
Hi, i run a site about sport matches, every match has a page and the pages are generated automatically from the DB. pages are not duplicated, but over time some look a little bit similar. after a match finishes it has no internal links or sitemap entry, but it's reachable by direct URL and continues to be on google index. so over time we have more than 100,000 indexed pages. since past matches have no significance and they're not linked and a match can repeat and it may look like duplicate content....what you suggest us to do: when a match is finished - not linked, but appears on the index and SERP 301 redirect the match Page to the match Category which is a higher hierarchy and is always relevant? use rel=canonical to the match Category do nothing.... *301 redirect will shrink my index status, some say a high index status is good... *is it safe to 301 redirect 100,000 pages at once - wouldn't it look strange to google? *would canonical remove the past matches pages from the index? what do you think? Thanks, Assaf.
Intermediate & Advanced SEO | | stassaf0 -
Consolidating MANY separate domains into a much better, single URL: Should I point a landing page or redirect to the new site?
I am consolidating a site for a client who previously, and very foolishly, broke up their domains like so: companyparis.com companyflorence.com companyrome.com etc... I am now done with the new site, which will be at: company.eu with pages as appropriate: company.eu/paris company.eu/florence company.eu/rome This domain, although not entirely new, does not have much authority or rank. In terms of SEO and link-building, is it better to redirect the old domain to the specific page on the new domain: companyparis.com --> company.eu/paris or... is it better to put a landing page at the old domain LINKING to the page on the new domain: companyparis.com --> landing page linking to --> company.eu/paris
Intermediate & Advanced SEO | | thongly0 -
We are changing ?page= dynamic url's to /page/ static urls. Will this hurt the progress we have made with the pages using dynamic addresses?
Question about changing url from dynamic to static to improve SEO but concern about hurting progress made so far.
Intermediate & Advanced SEO | | h3counsel0