Index bloating issue
-
Hello,
In the last month, I noticed a huge spike in the number of pages indexed on my site, which I think is impacting my SEO quality score.
While I've only have about 90 pages on my site map, the number of pages indexed jumped to 446, with about 536 pages being blocked by robots. At first we thought this might be due to duplicate product pages showing up in different categories on my site, but we added something to our robot.txt file to not index those pages. But the number has not gone down. I've tried to consult with our hosting vendor, but no one seems to be concerned or have any idea why there was such a big jump in the last month.
Any insights or pointers would be so greatly appreciated, so that I can fix/improve my SEO as quickly as possible!
Thanks!
-
in order to determine if your website is hacked this is one of the best tools I know of both to find out and to remove the malware.
In order to determine rather not you have on-site SEO problems on a very technical and granular scale I would use
https://www.deepcrawl.com/ $80 a month you cannot go wrong
another amazing tool and it's free for the first 500 pages and if you want the added features which you do or more pages only about $150 a year is
-
Thank you. These are helpful suggestions.
-
A couple of things to note:
- As Robert mentioned, I would definitely make sure there is no longer an issue on your wordpress site relating to your previous hack.
- Robots.txt disallow does not stop pages from being indexed. It merely tells search engines to stop crawling that page from here out. The meta noindex tag is more applicable for noindexing pages that are already out there.
- I would check your search console crawl errors to see if there's a hefty spike in 404 errors as well, as it may be old spam pages you removed from the site.
- If these pages that are bloating your index are all still old spam filled pages from when you were hacked, you could start by using the search console's "remove url's" tool, which will remove all these url's from the index temporarily. For a more long term approach, instead of them giving off a 404 if they have been removed, making the server give off a "410" response would tell google they are gone forever, and thus they will be removed from the index as time goes on.
-
When I do the search for my main url - the results are clean. Just the pages to my site show up. And the index results for this site still bloated. However, for my wordpress site, which is a subdomain and on a different platform to my main site, there are some issues (it was hacked as Rob noted below). But we have since cleaned up the pages etc, reuploaded the site maps, etc. So I'm a little stumped on my main site (which wasn't hacked - that I'm aware of).
-
What do you see if you do a search for site:yoursite.com ?
-
Hello Julie,
This sounds like you might have a hacking issue on your website. You probably need someone to conduct a full code audit of your site to determine whether any files you have uploaded (plugins, for example) were contaminated. If a site is hacked, new pages can be added that are hidden from view and difficult to detect unless handled by a security specialist.
We recently brought on a new client who had this issue and discovered that his site had 1000's of pages dedicated to testosterone pills, etc. We had to go through GWT and the site logs to determine what new pages were created and it was a complete hack job.
In terms of fixing your SEO, the first step is to determine where/if the hack exists. Once that is decided, you have to clean up the site and restore the site's security.
I would be happy to help you with the next steps if you would like. I am always available!
Thanks and best of luck,
Rob
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Only fraction of the AMP pages are indexed
Back in June, we had seen a sharp drop in traffic on our website. We initially assumed that it was due to the Core Update that was rolled out in early June. We had switched from http to https in May, but thought that should have helped rather than cause a problem. Until early June the traffic was trending upwards. While investigating the issue, I noticed that only a fraction (25%) of the AMP pages have been indexed. The pages don't seem to be getting indexed even though they are valid. Accordingly to Google Analytics too, the percentage of AMP traffic has dropped from 67-70% to 40-45%. I wonder if it is due to the indexing issue. In terms of implementation it seems fine. We are pointing canonical to the AMP page from the desktop version and to the desktop version from the AMP page. Any tips on how to fix the AMP indexing issue. Should I be concerned that only a fraction of the AMP pages are indexed. I really hope you can help in resolving this issue.
Technical SEO | | Gautam1 -
Fetch as Google issues
HI all, Recently, well a couple of months back, I finally got around to switching our sites over to HTTPS://. In terms of rankings etc all looks fine and we have not move about much, only the usual fluctuations of a place or two on a daily basis in a competitive niche. All links have been updated, redirects in place, the usual https domain migration stuff. I am however, troubled by one thing! I cannot for love nor money get Google to fetch my site in GSC. No matter what I have tried it continues to display "Temporarily unreachable". I have checked the robots.txt and it is on a new https:// profile in GSC. Has anyone got a clue as I am stumped! Have I simply become blinded by looking too much??? Site in Q. caravanguard co uk. Cheers and looking forward to your comments.... Tim
Technical SEO | | TimHolmes0 -
App Indexing
Can anyone please check if our app is indexed or not? Also check if deep linking done is correct or not rel="alternate" href="android-app://in.instafresh.app/http/www.instafrsh.com/" /> Website - http://instafrsh.com/ App - https://play.google.com/store/apps/details?id=in.instafresh.app
Technical SEO | | Obbserv0 -
Index problems, Part 2
Hi Guy's A few weeks ago i posted a question:
Technical SEO | | Happy-SEO
https://moz.com/community/q/index-problems After some good advice, we changed a few things: www.domain.com <<< NL version www.domain.com/fr/ <<<< French version (domain.com/nl/ 301 redirect to domain.com). So the SERPS for keyword ‘shutters’ went from #32 to #8...... for 2 day's.... and gone.... and not comming back anymore.... Did we missed something? Help is much appreciated, thanks 🙂3 -
Index Category Archives?
I'm using Wordpress categories to add products. Normally I normally noindex category archives to prevent duplicate content issues, with the blog page serving as the index, but I don't have one with this site http://66.147.244.50/~proflowc/ Should I index the category archives to ensure that products are indexed, or will Google see them anyway?
Technical SEO | | waynekolenchuk0 -
How to solve issues regarding canonicalization?
Today, I was searching for article which may help me in issues regarding canonicalization and found very interesting article on SEOmoz. I am facing issues regarding de-indexing of pages and down of organic search engine visits. I have done proper R & D and apply it very carefully. But, still my indexed pages and visits are going down. I have applied canonical tag to following pages. Narrow by search: http://www.vistastores.com/outdoor-umbrellas?manufacturer=California+Umbrella Sorting: http://www.vistastores.com/outdoor-umbrellas?dir=desc&order=position Pagination: http://www.vistastores.com/outdoor-umbrellas?p=2 How can I improve my performance?
Technical SEO | | CommercePundit0 -
301 redirect issues
Hi all, I'm hoping someone will be able to help me with an extermley frustrating problem with 301 redirects in .htaccess. Basically I'm trying to redirect some old pages (from our old website) that stil rank to the new equivilent. For example - old url = www.domain.com/frames/news/company-news/news-reader.php?newsStoryID=395 New www.domain.com/news/article-title I've tried the simple redirect 301 /frames/news/company-news/news-reader.php?newsStoryID=395 http://www.domain.com/news/article-title But this doesnt work. I've also tried - RewriteEngine on
Technical SEO | | EclipseLegal
RewriteCond %{QUERY_STRING} ^newsStoryID=395$
RewriteRule ^/news-reader.php$ http://www.domain.com/news/article-title/? [L,R=301] Could anyone help? I've followed lots of tutorials that all match the above but it just doesn't work! The only other thing within the htaccess file is from wordpress for pretty permalinks - BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]</ifmodule> END WordPress Many thanks in advance!0 -
De-indexing thin content & Panda--any advantage to immediate de-indexing?
We added the nonidex, follow tag to our site about a week ago on several hundred URLs, and they are still in Google's index. I know de-indexing takes time, but I am wondering if having those URLs in the index will continue to "pandalize" the site. Would it be better to use the URL removal request? Or, should we just wait for the noindex tags to remove the URLs from the index?
Technical SEO | | nicole.healthline0