Index bloating issue
-
Hello,
In the last month, I noticed a huge spike in the number of pages indexed on my site, which I think is impacting my SEO quality score.
While I've only have about 90 pages on my site map, the number of pages indexed jumped to 446, with about 536 pages being blocked by robots. At first we thought this might be due to duplicate product pages showing up in different categories on my site, but we added something to our robot.txt file to not index those pages. But the number has not gone down. I've tried to consult with our hosting vendor, but no one seems to be concerned or have any idea why there was such a big jump in the last month.
Any insights or pointers would be so greatly appreciated, so that I can fix/improve my SEO as quickly as possible!
Thanks!
-
in order to determine if your website is hacked this is one of the best tools I know of both to find out and to remove the malware.
In order to determine rather not you have on-site SEO problems on a very technical and granular scale I would use
https://www.deepcrawl.com/ $80 a month you cannot go wrong
another amazing tool and it's free for the first 500 pages and if you want the added features which you do or more pages only about $150 a year is
-
Thank you. These are helpful suggestions.
-
A couple of things to note:
- As Robert mentioned, I would definitely make sure there is no longer an issue on your wordpress site relating to your previous hack.
- Robots.txt disallow does not stop pages from being indexed. It merely tells search engines to stop crawling that page from here out. The meta noindex tag is more applicable for noindexing pages that are already out there.
- I would check your search console crawl errors to see if there's a hefty spike in 404 errors as well, as it may be old spam pages you removed from the site.
- If these pages that are bloating your index are all still old spam filled pages from when you were hacked, you could start by using the search console's "remove url's" tool, which will remove all these url's from the index temporarily. For a more long term approach, instead of them giving off a 404 if they have been removed, making the server give off a "410" response would tell google they are gone forever, and thus they will be removed from the index as time goes on.
-
When I do the search for my main url - the results are clean. Just the pages to my site show up. And the index results for this site still bloated. However, for my wordpress site, which is a subdomain and on a different platform to my main site, there are some issues (it was hacked as Rob noted below). But we have since cleaned up the pages etc, reuploaded the site maps, etc. So I'm a little stumped on my main site (which wasn't hacked - that I'm aware of).
-
What do you see if you do a search for site:yoursite.com ?
-
Hello Julie,
This sounds like you might have a hacking issue on your website. You probably need someone to conduct a full code audit of your site to determine whether any files you have uploaded (plugins, for example) were contaminated. If a site is hacked, new pages can be added that are hidden from view and difficult to detect unless handled by a security specialist.
We recently brought on a new client who had this issue and discovered that his site had 1000's of pages dedicated to testosterone pills, etc. We had to go through GWT and the site logs to determine what new pages were created and it was a complete hack job.
In terms of fixing your SEO, the first step is to determine where/if the hack exists. Once that is decided, you have to clean up the site and restore the site's security.
I would be happy to help you with the next steps if you would like. I am always available!
Thanks and best of luck,
Rob
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Escort directory page indexing issues
Re; escortdirectory-uk.com, escortdirectory-usa.com, escortdirectory-oz.com.au,
Technical SEO | | ZuricoDrexia
Hi, We are an escort directory with 10 years history. We have multiple locations within the following countries, UK, USA, AUS. Although many of our locations (towns and cities) index on page one of Google, just as many do not. Can anyone give us a clue as to why this may be?0 -
Canonicalization, does it still index
If I have 2 pages that are identical but on different domains that our team manages, if we place a rel=canonical tag on the page we prefer/should display, will the page that doesn't have the canonical tag still be indexed and show on SERPs?
Technical SEO | | kroe10 -
Image Indexing Issue by Google
Hello All,My URL is: www.thesalebox.comI have Submitted my image Sitemap in google webmaster tool on 10th Oct 2013,Still google could not indexing any of my web images,Please refer my sitemap - www.thesalebox.com/AppliancesHomeEntertainment.xml and www.thesalebox.com/Hardware.xmland my webmaster status and image indexing status are below, Can you please help me, why my images are not indexing in google yet? is there any issue? please give me suggestions?Thanks!
Technical SEO | | CommercePundit0 -
Duplicate content issue
Moz crawl diagnostic tool is giving me a heap of duplicate content for each event on my website... http://www.ticketarena.co.uk/events/Mint-Festival-7/ http://www.ticketarena.co.uk/events/Mint-Festival-7/index.html Should i use a 301 redirect on the second link? i was unaware that this was classed as duplicate content. I thought it was just the way the CMS system was set up? Can anyone shed any light on this please. Thanks
Technical SEO | | Alexogilvie0 -
Struggling to get indexed and ranked
I am working on a brand new website and really struggling to get the site indexed and listed for it's business name! I don't normally struggle and have got clients ranked for much more challenging keywords so I'm a bit stuck! The site is a new domain and has been live for about two months. The business previously used an old domain and this has been correctly 301'd to the new domain. There is no duplicate content with any other website when checked through Copyscape.com Webmaster Tools has been set-up and verified and this shows the site is being crawled but in Google site:www.website.com.au shows no pages as being indexed. Google Places has been set-up and verified, the site has also been added to local citation sites. There are also a few incoming links from other sources. Robots.txt is fine and has been checked. Business name mentioned in the title tag, footer and throughout the site. Does anyone have any ideas how I might be able to get ranked or is it just a waiting game? Or have I missed out something really obvious?? My last step is doing a crawl test to see if this brings up anything I have missed. Thanks Karen
Technical SEO | | Karen_Dauncey0 -
Canonicalization Issue?
Good day! I am not sure if my company has a Canonicalization issue? When typing in www.cushingco.com the site redirects to http://www.cushingco.com/index.shtml A visitor can also type in http://cushingco.com/index.shtml into a web browser and land on our homepage (and the url will be http://www.cushingco.com/index.shtml) A majority of websites that link to our company point to: http://www.cushingco.com/index.shtml We are in the process of cleaning up citations and pulling together a content marketing strategy/editorial calendar. I want to be sure folks interested in linking to us have the right url. Please ask me any questions to help narrow down what we might be doing incorrectly. Thanks in advance!! Jon
Technical SEO | | SEOSponge0 -
Rel=canonical + no index
We have been doing an a/b test of our hp and although we placed a rel=canonical tag on the testing page it is still being indexed. In fact at one point google even had it showing as a sitelink . We have this problem through out our website. My question is: What is the best practice for duplicate pages? 1. put only a rel= canonical pointing to the "wanted original page" 2. put a rel= canonical (pointing to the wanted original page) and a no index on the duplicate version Has anyone seen any detrimental effect doing # 2? Thanks
Technical SEO | | Morris770 -
Canonical Issues
Hi Guys, I have a technical question. Ive started optimising an ecommerce site for a client and come across some duplicate content issues:- This page: http://www.bracknelllamps.com/projector-manufacturer/SANYO/70 is actually indexed in Google as:- http://www.bracknelllamps.com/projector-lamps.php?make=SANYO Both pages have the same content and I'm guessing the indexed page refers to an old way of navigating the site. As I'm concerned about duplicate content issues, what's the best approach as this seems to be the case for all 'projector manufacturer' pages. would it be to 301 redirect each manufacturer url (this could take forever with 107) manufacturers or rel="canonical" tag? to show Google which page I want indexing? Kind Regards Neil
Technical SEO | | nezona0