Index bloating issue
-
Hello,
In the last month, I noticed a huge spike in the number of pages indexed on my site, which I think is impacting my SEO quality score.
While I've only have about 90 pages on my site map, the number of pages indexed jumped to 446, with about 536 pages being blocked by robots. At first we thought this might be due to duplicate product pages showing up in different categories on my site, but we added something to our robot.txt file to not index those pages. But the number has not gone down. I've tried to consult with our hosting vendor, but no one seems to be concerned or have any idea why there was such a big jump in the last month.
Any insights or pointers would be so greatly appreciated, so that I can fix/improve my SEO as quickly as possible!
Thanks!
-
in order to determine if your website is hacked this is one of the best tools I know of both to find out and to remove the malware.
In order to determine rather not you have on-site SEO problems on a very technical and granular scale I would use
https://www.deepcrawl.com/ $80 a month you cannot go wrong
another amazing tool and it's free for the first 500 pages and if you want the added features which you do or more pages only about $150 a year is
-
Thank you. These are helpful suggestions.
-
A couple of things to note:
- As Robert mentioned, I would definitely make sure there is no longer an issue on your wordpress site relating to your previous hack.
- Robots.txt disallow does not stop pages from being indexed. It merely tells search engines to stop crawling that page from here out. The meta noindex tag is more applicable for noindexing pages that are already out there.
- I would check your search console crawl errors to see if there's a hefty spike in 404 errors as well, as it may be old spam pages you removed from the site.
- If these pages that are bloating your index are all still old spam filled pages from when you were hacked, you could start by using the search console's "remove url's" tool, which will remove all these url's from the index temporarily. For a more long term approach, instead of them giving off a 404 if they have been removed, making the server give off a "410" response would tell google they are gone forever, and thus they will be removed from the index as time goes on.
-
When I do the search for my main url - the results are clean. Just the pages to my site show up. And the index results for this site still bloated. However, for my wordpress site, which is a subdomain and on a different platform to my main site, there are some issues (it was hacked as Rob noted below). But we have since cleaned up the pages etc, reuploaded the site maps, etc. So I'm a little stumped on my main site (which wasn't hacked - that I'm aware of).
-
What do you see if you do a search for site:yoursite.com ?
-
Hello Julie,
This sounds like you might have a hacking issue on your website. You probably need someone to conduct a full code audit of your site to determine whether any files you have uploaded (plugins, for example) were contaminated. If a site is hacked, new pages can be added that are hidden from view and difficult to detect unless handled by a security specialist.
We recently brought on a new client who had this issue and discovered that his site had 1000's of pages dedicated to testosterone pills, etc. We had to go through GWT and the site logs to determine what new pages were created and it was a complete hack job.
In terms of fixing your SEO, the first step is to determine where/if the hack exists. Once that is decided, you have to clean up the site and restore the site's security.
I would be happy to help you with the next steps if you would like. I am always available!
Thanks and best of luck,
Rob
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Home page canonical issues
Hi, I've noticed I can access/view a client's site's home page using the following URL variations - http://example.com/
Technical SEO | | simon-145328
http://example/index.html
http://www.example.com/
http://www.example.com/index.html There's been no preference set in Google WMT but Google has indexed and features this URL - http://example.com/ However, just to complicate matters, the vast majority of external links point to the 'www' version. Obviously i would like to tidy this up and have asked the client's web development company if they can place 301 redirects on the domains we no longer want to work - I received this reply but I'm not sure whether this does take care of the duplicate issue - Understand what you're saying, but this shouldn't be an issue regarding SEO. Essentially all the domains listed are linking to the same index.html page hosted at 1 location My question is, do i need to place 301 redirects on the domains we don't want to work and do i stick with the 'non www' version Google has indexed and try to change the external links so they point to the 'non www' version or go with the 'www' version and set this as the preferred domain in Google WMT? My technical knowledge in this area is limited so any help would be most appreciated. Regards,
Simon.0 -
Search results indexed
Hi there, is is bad practice in seo to have search results for products indexed? For example a search result of holidays to Ibiza, with lots of deals coming up? its a search query url that would be indexed, with just an image and price per product on the page, with about 10 per page? Any advice appreciated.
Technical SEO | | pauledwards0 -
301 Redirect with index.asp
I am very new to all of this so forgive the newbie questions I will get better. Ok so after starting a campaign I see that I have many issues including where some pages are being deemed as duplicate content. 1. The report says the http://lucid8.com has duplicate content on 2 other pages 2. When I look at them it shows that http://lucid8.com/index.asp and http://www.lucid8.com are duplicates. 3. Really these are the exactly the same page because the default page that is opened for www.lucid8.com http://www.lucid8.com etc always opens the index.asp page. 4. Now I read that I should do permanent redirects and how to do this via IIS and I tried to do a redirect from index.asp to www.lucid8.com but that does not work because www.lucid8.com is pointing to index.asp and so we end up in a circle. So the question is how do I get rid of these duplicate page references without causing problems. Thanks
Technical SEO | | TroyW0 -
OUTGOING LINK ISSUES
NO LINK TO HOME MY WEBSIDE PROFITIOK.COM FOR EXAMPLE YOU SEE THE PAGE I HAVE CHANGE TWO DAYS BACK PREVIOUS
Technical SEO | | Luckykhullar
ABOUT US - Profitok.com 2 DAYS BACK I CHANGED THE TITLE title>Commodity tips | Mcx TIPS | Mcx Daily Tips |Mcx Good tips Provider BUT WHEN I CHECKING IT IS NOT UPDATED THIS TITLE DEAR SIR CAN Y GIVE ME THE CORRECT ANSWER0 -
IP addresses indexed?
I've met with a potential client who has a site with 1,000's of very specific part #'s which don't show in the SERP's on Google. They definitely have the issue of dynamic URL's - but the URL for the part # searches is an IP address rather than their domain name - example: 188.888.888.888/partssearch.php?pnum='1233445' I've not seen the IP address used like this for an external website - is this acceptable for SEO purposes? Thanks, Mark
Technical SEO | | DenverKelly0 -
Yahoo and Bing do not index all pages
Only 20% of our pages are indexed by Bing and Yahoo although we have correctly submitted the sitemap to bing webmaster tools and other search engines index all our content. Do you have any suggestions?
Technical SEO | | AEM130 -
Crawling and indexing content
If a page element (div, e.g.) is initially hidden and shown only by a hover descriptor or Javascript call, will Google crawl and index it’s content?
Technical SEO | | Mont0 -
Problem with indexing
Hello, we've changed our CMS recently, everything seems to work well, but for some reason google, and other crawlers can't see or index other pages than main. There is no restriction in robots, nor any other visible issue. Please help if you can. Website: http://www.design-glassware.com/
Technical SEO | | divan0