Why do I have so many extra indexed pages?
-
Stats-
Webmaster Tools Indexed Pages- 96,995
Site: Search- 97,800 Pages
Sitemap Submitted- 18,832
Sitemap Indexed- 9,746
I went through the search results through page 28 and every item it showed was correct. How do I figure out where these extra 80,000 items are coming from? I tried crawling the site with screaming frog awhile back but it locked because of so many urls. The site is a Magento site so there are a million urls, but I checked and all of the canonicals are setup properly. Where should I start looking?
-
It ended up being my search results. I was able to use the site operator to break it down.
-
To ensure Screaming Frog can handle the crawl you could chunk up the site and crawl it in parts, e.g. by each subdirectory. This can be done within the 'configuration' menu under 'include'. There's loads of tutorials online.
You can also use exclude to ensure it doesn't crawl unnecessary pages, images or scripts for example on wordpress I often block wp-content
Definitely sounds like a problem with query parameters being indexed though and its often good to ensure these are addressed in the search console.
-
1. Your first one is interesting. I actually haven't been in there before. There are 96 rows and everyone of them is set to let Googlebot Decide. Do you think I should change that up?
2. Not sure on how many images we have but it is a lot. Not we do not have an image sitemap.
I tried Screaming Frog and it couldn't handle it. After about 1.5 million urls it kept locking up. I just setup a free trial for Deep Crawl. It can only do 10,000 but I will see if it has anything worthwhile.
-
- Have you checked out the parameters settings in Google Search Console to find out how many pages Google has found for your site with the same parameters? That might give some insights on that side.
- How many images do you have across the site? Do you have image sitemaps for these kind of pages.
What I would advise + what you've already been trying is to get a full crawl by either using ScreamingFrog or Deepcrawl. This will provide you with better insights into how many pages a search engine can really find.
-
I wouldn't say it is doing fine. Before I started they launched a new site and messed up the 301 redirects. Traffic hasn't recovered yet.
For Robots I am using the Inchoo robots.txt-http://inchoo.net/ecommerce/ultimate-magento-robots-txt-file-examples/ maybe it is a parameters issue, but I can't figure out how to see all my indexed pages.
I tried doing a search for both inurl:= site:www.site.com and inurl:? site:www.site.com and nothing showed up unless I am missing something.
I can't figure out how to check if some of the canonicalized urls are indexed. The pages are all identical though.
We have less then 100 out of stock items.
-
As long as your organic traffic is doing fine I shouldn't be too concerned. That being said:
- Is your robots.txt or search console disallowing crawler access to parameters like '?count=' or '?color='?
- Is your robots.txt disallowing crawler access to urls that have a 'noindex' but were indexed before they got noindex?
- You can also take a couple of parameters from your site and test if any url's have been indexed, by using the 'inurl:parameter site:www.site.com' query.
- Are some of the canonicalized urls indexed anyway? This may indicate that page content is different enough for Google to index both versions.
- If there's a ton of articles that go in and out of stock and use dynamic ID's, Google may keep these in their index. Do out of stock articles return a 404 or are they kept alive?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
May integrating my main category page in the index page improve my ranking of main category keyword?
90% of our sales are made with products in one of our product categories.
Intermediate & Advanced SEO | | lcourse
A search for main category keyword returns our root domain index page in google, not the category page.
I was wondering whether integrating the complete main category directly in the index page of the root domain and this way including much more relevant content for this main category keyword may have a positive impact on our google ranking for the main category keyword. Any thoughts?1 -
Moving html site to wordpress and 301 redirect from index.htm to index.php or just www.example.com
I found page duplicate content when using Moz crawl tool, see below. http://www.example.com
Intermediate & Advanced SEO | | gozmoz
Page Authority 40
Linking Root Domains 31
External Link Count 138
Internal Link Count 18
Status Code 200
1 duplicate http://www.example.com/index.htm
Page Authority 19
Linking Root Domains 1
External Link Count 0
Internal Link Count 15
Status Code 200
1 duplicate I have recently transfered my old html site to wordpress.
To keep the urls the same I am using a plugin which appends .htm at the end of each page. My old site home page was index.htm. I have created index.htm in wordpress as well but now there is a conflict of duplicate content. I am using latest post as my home page which is index.php Question 1.
Should I also use redirect 301 im htaccess file to transfer index.htm page authority (19) to www.example.com If yes, do I use
Redirect 301 /index.htm http://www.example.com/index.php
or
Redirect 301 /index.htm http://www.example.com Question 2
Should I change my "Home" menu link to http://www.example.com instead of http://www.example.com/index.htm that would fix the duplicate content, as indx.htm does not exist anymore. Is there a better option? Thanks0 -
Indexing Dynamic Pages
http://www.oreillyauto.com/site/c/search/Wiper+Blade/03300/C0047.oap?make=Honda&model=Accord&year=2005&vi=1430764 How is O'Reilly getting this page indexed? It shows up in organic results for [2005 honda accord windshield wiper size].
Intermediate & Advanced SEO | | Kingof50 -
Any downsides of (permanent)redirecting 404 pages to more generic pages(category page)
Hi, We have a site which is somewhat like e-bay, they have several categories and advertisements posted by customers/ client. These advertisements disappear over time and turn into 404 pages. We have the option to redirect the user to the corresponding category page, but we're afraid of any negative impact of this change. Are there any downsides, and is this really the best option we have? Thanks in advance!
Intermediate & Advanced SEO | | vhendriks0 -
Duplicate Page Title/Content Issues on Product Review Submission Pages
Hi Everyone, I'm very green to SEO. I have a Volusion-based storefront and recently decided to dedicate more time and effort into improving my online presence. Admittedly, I'm mostly a lurker in the Q&A forum but I couldn't find any pre-existing info regarding my situation. It could be out there. But again, I'm a noob... So, in my recent SEOmoz report I noticed that over 1,000 Duplicate Content Errors and Duplicate Page Title Errors have been found since my last crawl. I can see that every error is tied to a product in my inventory - specifically each product page has an option to write a review. It looks like the subsequent page where a visitor can fill out their review is the stem of the problem. All of my products are shown to have the same issue: Duplicate Page Title - Review:New Duplicate Page Content - the form is already partially filled out with the corresponding product My first question - It makes sense that a page containing a submission form would have the same title and content. But why is it being indexed, or crawled (or both for that matter) under every parameter in which it could be accessed (product A, B, C, etc)? My second question (an obvious one) - What can I do to begin to resolve this? As far as I know, I haven't touched this option included in Volusion other than to simply implement it. If I'm missing any key information, please point me in the right direction and I'll respond with any additional relevant information on my end. Many thanks in advance!
Intermediate & Advanced SEO | | DakotahW0 -
To land page or not to land page
Hey all, I wish to increase my sites rankings on a variety of keywords within sub categories but I'm unsure where to be spending the time in SEO. Here's an example of the website page structure: General Home Page > Sub Category 1 Home Page
Intermediate & Advanced SEO | | DPSSeomonkey
> Searching / Results pages
- Sub Category 1
- Sub Category 2
- Sub Category 3
- Sub Category 4 > Sub Category 2 Home Page
> Searching / Results pages
- Sub Category 1
- Sub Category 2
- Sub Category 3
- Sub Category 4 We've newly introduced the Sub Category Home Pages and I was wondering if SEO is best performed on these pages or should landing pages be built, one for each of the 4 sub categories in each section. Those landing pages would have links to the "Searching / Results pages" for that sub category. Thanks!0 -
Too many on page links - product pages
Some of the pages on my client's website have too many on page links because they have lists of all their products. Is there anything I should/could do about this?
Intermediate & Advanced SEO | | AlightAnalytics0 -
Pop Up Pages Being Indexed, Seen As Duplicate Content
I offer users the opportunity to email and embed images from my website. (See this page http://www.andertoons.com/cartoon/6246/ and look under the large image for "Email to a Friend" and "Get Embed HTML" links.) But I'm seeing the ensuing pop-up pages (Ex: http://www.andertoons.com/embed/5231/?KeepThis=true&TB_iframe=true&height=370&width=700&modal=true and http://www.andertoons.com/email/6246/?KeepThis=true&TB_iframe=true&height=432&width=700&modal=true) showing up in Google. Even worse, I think they're seen as duplicate content. How should I deal with this?
Intermediate & Advanced SEO | | andertoons0