Index bloating issue
-
Hello,
In the last month, I noticed a huge spike in the number of pages indexed on my site, which I think is impacting my SEO quality score.
While I've only have about 90 pages on my site map, the number of pages indexed jumped to 446, with about 536 pages being blocked by robots. At first we thought this might be due to duplicate product pages showing up in different categories on my site, but we added something to our robot.txt file to not index those pages. But the number has not gone down. I've tried to consult with our hosting vendor, but no one seems to be concerned or have any idea why there was such a big jump in the last month.
Any insights or pointers would be so greatly appreciated, so that I can fix/improve my SEO as quickly as possible!
Thanks!
-
in order to determine if your website is hacked this is one of the best tools I know of both to find out and to remove the malware.
In order to determine rather not you have on-site SEO problems on a very technical and granular scale I would use
https://www.deepcrawl.com/ $80 a month you cannot go wrong
another amazing tool and it's free for the first 500 pages and if you want the added features which you do or more pages only about $150 a year is
-
Thank you. These are helpful suggestions.
-
A couple of things to note:
- As Robert mentioned, I would definitely make sure there is no longer an issue on your wordpress site relating to your previous hack.
- Robots.txt disallow does not stop pages from being indexed. It merely tells search engines to stop crawling that page from here out. The meta noindex tag is more applicable for noindexing pages that are already out there.
- I would check your search console crawl errors to see if there's a hefty spike in 404 errors as well, as it may be old spam pages you removed from the site.
- If these pages that are bloating your index are all still old spam filled pages from when you were hacked, you could start by using the search console's "remove url's" tool, which will remove all these url's from the index temporarily. For a more long term approach, instead of them giving off a 404 if they have been removed, making the server give off a "410" response would tell google they are gone forever, and thus they will be removed from the index as time goes on.
-
When I do the search for my main url - the results are clean. Just the pages to my site show up. And the index results for this site still bloated. However, for my wordpress site, which is a subdomain and on a different platform to my main site, there are some issues (it was hacked as Rob noted below). But we have since cleaned up the pages etc, reuploaded the site maps, etc. So I'm a little stumped on my main site (which wasn't hacked - that I'm aware of).
-
What do you see if you do a search for site:yoursite.com ?
-
Hello Julie,
This sounds like you might have a hacking issue on your website. You probably need someone to conduct a full code audit of your site to determine whether any files you have uploaded (plugins, for example) were contaminated. If a site is hacked, new pages can be added that are hidden from view and difficult to detect unless handled by a security specialist.
We recently brought on a new client who had this issue and discovered that his site had 1000's of pages dedicated to testosterone pills, etc. We had to go through GWT and the site logs to determine what new pages were created and it was a complete hack job.
In terms of fixing your SEO, the first step is to determine where/if the hack exists. Once that is decided, you have to clean up the site and restore the site's security.
I would be happy to help you with the next steps if you would like. I am always available!
Thanks and best of luck,
Rob
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My Website stopped being in the Google Index
Hi there, So My website is two weeks old, and I published it and it was ranking at about page 10 or 11 for a week maybe a bit longer. The last few days it dropped off the rankings, which I assumed was the google algorithm doing its thing but when I checked Google Search Console it says my domain is not in the index. 'This page is not in the index, but not because of an error. See the details below to learn why it wasn't indexed.' I click request indexing, then after a bit, it goes green saying it was successfully indexed. Then when I refresh the website it gives me the same message 'This page is not in the index, but not because of an error. See the details below to learn why it wasn't indexed.' Not sure why it says this, any ideas or help is appreciated cheers.
Technical SEO | | sydneygardening0 -
Issues with Duplicates and AJAX-Loader
Hi, On one website, the "real" content is loaded via AJAX when the visitor clicks on a tile (I'll call a page with some such tiles a tile-page here). A parameter is added to the URL at the that point and the content of that tile is displayed. That content is available via an URL of its own ... which is actually never called. What I want to achieve is a canonicalised tile-page that gets all of the tiles' content and is indexed by google - if possible with also recognising that the single-URLs of a tile are only fallback-solutions and the "tile-page" should be displayed instead. The current tile-page leads to duplicate meta-tags, titles etc and minimal differences between what google considers a page of its own (i.e. the same page with different tiles' contents). Does anybody have an idea on what one can do here?
Technical SEO | | netzkern_AG0 -
How much time for re-indexing ?
I was just checking Google Webmaster tools and I found 102 duplicate title pages. Just fixed them all now.
Technical SEO | | monali123
Shall I re-submit the site map again or how do we tell Google about the changes and then how much time does it take for them to clear SERPS cache and re-index re-count ?0 -
What's the issue?
Hi, We have a client who dropped in the rankings (initially from bottom of the first page to page to page 3, and now page 5) for a single keyword (their most important one - targeted on their homepage) back in the middle of March. So far, we've found that the issue isn't the following: Keyword stuffing on the page External anchor text pointing to the page Internal anchor text pointing to the page In addition to the above, the drop didn't coincide with panda or penguin. Any other ideas as to what could cause such a drop for a single keyword (other related rankings haven't moved). We're starting to think that this may just have been another small change in the algorithm but it seems like too big of a drop in a short space of time for that to be the case. Any thoughts would be much appreciated! Thanks.
Technical SEO | | jasarrow0 -
How to Find all the Pages Index by Google?
I'm planning on moving my online store, http://www.filtrationmontreal.com/ to a new platform, http://www.corecommerce.com/ To reduce the SEO impact, I want to redirect 301 all the pages index by Google to the new page I will create in the new platform. I will keep the same domaine name, but all the URL will be customize on the new platform for better SEO. Also, is there a way or tool to create CSV file from those page index. Can Webmaster tool help? You can read my question about this subject here, http://www.seomoz.org/q/impacts-on-moving-online-store-to-new-platform Thank you, BigBlaze
Technical SEO | | BigBlaze2050 -
Duplicate Content Issue
Very strange issue I noticed today. In my SEOMoz Campaigns I noticed thousands of Warnings and Errors! I noticed that any page on my website ending in .php can be duplicated by adding anything you want to the end of the url, which seems to be causing these issues. Ex: Normal URL - www.example.com/testing.php Duplicate URL - www.example.com/testing.php/helloworld The duplicate URL displays the page without the images, but all the text and information is present, duplicating the Normal page. I Also found that many of my PDFs seemed to be getting duplicated burried in directories after directories, which I never ever put in place. Ex: www.example.com/catalog/pdfs/testing.pdf/pdfs/another.pdf/pdfs/more.pdfs/pdfs/ ... when the pdfs are only located in a pdfs directory! I am very confused on how to fix this problem. Maybe with some sort of redirect?
Technical SEO | | hfranz0 -
See any issues with this tabbed content page?
When I view source, and view as Googlebot it's showing as 1 long page of content = good. However, the developer uses some redirects and dynamic page generation to pull this off. I didn't see any issues from a Search perspective but would appreciate a second opinion: Click here Thanks!
Technical SEO | | 540SEO0 -
Google News not indexing .index.html pages
Hi all, we've been asked by a blog to help them better indexing and ranking on Google News (with the site being already included in Google News with poor results) The blog had a chronicle URL duplication problem with each post existing with 3 different URLs: #1) www.domain.com/post.html (currently in noindex for editorial choices as showing all the comments) #2) www.domain.com/post/index.html (currently indexed showing only top comments) #3) www.domain.com/post/ (very same as #2) We've chosen URL #2 (/index.html) as canonical URL, and included a rel=canonical tag on URL #3 (/) linking to URL #2.
Technical SEO | | H-FARM
Also we've submitted yesterday a Google News sitemap including consistently the list of URLs #2 from the last 48h . The sitemap has been properly "digested" by Google and shows that all URLs have been sent and indexed. However if we use the site:domain.com command on Google News we see something completely different: Google News has indexed actually only some news and more specifically only the URLs #3 type (ending with the trailing slash instead of /index.html). Why ? What's wrong ? a) Does Google News bot have problems indexing URLs ending with .index.html ? While figuring out what's wrong we've found out that http://news.google.it/news/search?aq=f&pz=1&cf=all&ned=us&hl=en&q=inurl%3Aindex.html gives no results...it seems that Google News index overall does not include any URLs ending with /index.html b) Does Google News bot recognise rel=canonical tag ? c) Is it just a matter of time and then Google News will pick up the right URLs (/index.html) and/or shall we communicate Google News team any changes ? d) Any suggestions ? OR Shall we do the other way around. meaning make URL #3 the canonical one ? While Google News is showing these problems, Google Web search has actually well received the changes, so we don't know what to do. Thanks for your help, Matteo0