Index bloating issue
-
Hello,
In the last month, I noticed a huge spike in the number of pages indexed on my site, which I think is impacting my SEO quality score.
While I've only have about 90 pages on my site map, the number of pages indexed jumped to 446, with about 536 pages being blocked by robots. At first we thought this might be due to duplicate product pages showing up in different categories on my site, but we added something to our robot.txt file to not index those pages. But the number has not gone down. I've tried to consult with our hosting vendor, but no one seems to be concerned or have any idea why there was such a big jump in the last month.
Any insights or pointers would be so greatly appreciated, so that I can fix/improve my SEO as quickly as possible!
Thanks!
-
in order to determine if your website is hacked this is one of the best tools I know of both to find out and to remove the malware.
In order to determine rather not you have on-site SEO problems on a very technical and granular scale I would use
https://www.deepcrawl.com/ $80 a month you cannot go wrong
another amazing tool and it's free for the first 500 pages and if you want the added features which you do or more pages only about $150 a year is
-
Thank you. These are helpful suggestions.
-
A couple of things to note:
- As Robert mentioned, I would definitely make sure there is no longer an issue on your wordpress site relating to your previous hack.
- Robots.txt disallow does not stop pages from being indexed. It merely tells search engines to stop crawling that page from here out. The meta noindex tag is more applicable for noindexing pages that are already out there.
- I would check your search console crawl errors to see if there's a hefty spike in 404 errors as well, as it may be old spam pages you removed from the site.
- If these pages that are bloating your index are all still old spam filled pages from when you were hacked, you could start by using the search console's "remove url's" tool, which will remove all these url's from the index temporarily. For a more long term approach, instead of them giving off a 404 if they have been removed, making the server give off a "410" response would tell google they are gone forever, and thus they will be removed from the index as time goes on.
-
When I do the search for my main url - the results are clean. Just the pages to my site show up. And the index results for this site still bloated. However, for my wordpress site, which is a subdomain and on a different platform to my main site, there are some issues (it was hacked as Rob noted below). But we have since cleaned up the pages etc, reuploaded the site maps, etc. So I'm a little stumped on my main site (which wasn't hacked - that I'm aware of).
-
What do you see if you do a search for site:yoursite.com ?
-
Hello Julie,
This sounds like you might have a hacking issue on your website. You probably need someone to conduct a full code audit of your site to determine whether any files you have uploaded (plugins, for example) were contaminated. If a site is hacked, new pages can be added that are hidden from view and difficult to detect unless handled by a security specialist.
We recently brought on a new client who had this issue and discovered that his site had 1000's of pages dedicated to testosterone pills, etc. We had to go through GWT and the site logs to determine what new pages were created and it was a complete hack job.
In terms of fixing your SEO, the first step is to determine where/if the hack exists. Once that is decided, you have to clean up the site and restore the site's security.
I would be happy to help you with the next steps if you would like. I am always available!
Thanks and best of luck,
Rob
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Keywords are indexed on the home page
Hello everyone, For one of our websites, we have optimized for many keywords. However, it seems that every keyword is indexed on the home page, and thus not ranked properly. This occurs only on one of our many websites. I am wondering if anyone knows the cause of this issue, and how to solve it. Thank you.
Technical SEO | | Ginovdw1 -
Redirect Chain Issue
I just found I'm having a redirect chain issue for http://ifixappliancesla.com (301 Moved Permanently). According to Moz, "Your page is redirecting to a page that is redirecting to a page that is redirecting to a page... and so on" These are the pages involved: 301 Moved Permanently
Technical SEO | | VELV
http://ifixappliancesla.com
https://ifixappliancesla.com https://www.ifixappliancesla.com/ This is what Yoast support told me: "The redirect adds the https and then the www, ending at: https://www.ifixappliancesla.com/. You want all variants of your site's domain to end up at: https://www.ifixappliancesla.com/ " - which is totally true. But I would also like not to have the redirect chain issue! Could you please give me an advise on how to properly redirect my pages so I don't have that issue anymore?0 -
Http to https redirection issue
Hi, i have a website with http but now i moved to https. when i apply 301 redirection from http to https & check in semrush it shows unable to connect with https & similar other tool shows & when i remove redirection all other tools working fine but my https version doesn't get indexed in google. can anybosy help what could be the issue?
Technical SEO | | dhananjay.kumar10 -
Tricky Duplicate Content Issue
Hi MOZ community, I'm hoping you guys can help me with this. Recently our site switched our landing pages to include a 180 item and 60 item version of each category page. They are creating duplicate content problems with the two examples below showing up as the two duplicates of the original page. http://www.uncommongoods.com/fun/wine-dine/beer-gifts?view=all&n=180&p=1 http://www.uncommongoods.com/fun/wine-dine/beer-gifts?view=all&n=60&p=1 The original page is http://www.uncommongoods.com/fun/wine-dine/beer-gifts I was just going to do a rel=canonical for these two 180 item and 60 item pages to the original landing page but then I remembered that some of these landing pages have page 1, page 2, page 3 ect. I told our tech department to use rel=next and rel=prev for those pages. Is there anything else I need to be aware of when I apply the canonical tag for the two duplicate versions if they also have page 2 and page 3 with rel=next and rel=prev? Thanks
Technical SEO | | znotes0 -
Why is my blog disappearing from Google index?
My Google blogger blog is about 10 months old. In that time i have worked really hard with adding unique content, building relationships with other bloggers in the same niche, and done some inbound marketing. 2 weeks ago I updated the template to something cleaner, with a little more "wordpress" feel to it. This means i've messed about with the code a lot in these weeks, adding social buttons etc. The problem is that from some point late last week thurs/fri my pages started disappearing from Googles index. I have checked webmaster tools and have no manual actions. My link profile is pretty clean as its a new site, and i have manually checked every piece of content published for plagiarism etc. So what is going on? Did i break my blog? Or is something else amiss? Impressions are down 96% comparing Nov 1-5th to previous 5 days. site is here: http://bit.ly/174beVm Thanks for any help in advance.
Technical SEO | | Silkstream0 -
Pagination Issue
Hello Community, We have a pagination issue on a set of pages : http://homengo.com/s/location/ http://homengo.com/s/location/?page=1 and so on ( ?page=2, ?page=3, ...). As you can see in the source code the pagination rel and prev are there. First question : does moz crawler know and recognize pagination ? Second question : if yes then do you know what could be wrong with pagination on these pages ? Thanks
Technical SEO | | seomengo0 -
Indexing Problem
My URL is: www.memovalley.comWe have submitted our sitemap last month and we are having issues seeing our URLs listed in the search results. Even though our sitemaps contain over 200 URLs, we only currently only have 7 listed (excluding blog.memovalley.com).Can someone help us with this? | |
Technical SEO | | Memovalley
| | | | It looks like Googlebot has timed out, at least once, for one of our URLs. Why is Googlebot timing out? My server is located at Amazon WS, in North Carolina and it is a small instance. Could Google be querying multiple URLs at the same time and jamming my servers? Could it be becauseThanks for your help!0 -
Canonicalization Issue | E-commerce
Hey everyone! How are you doing? I spent this week trying to solve some technical issues on my website. However i am having trouble with Duplicate Content. I came to the conclusion that canonicalization is a great solution to this problem, however, i am having trouble implementing it. The duplicate problem arises from the fact that for each product i have several colors or different attributes. For example. I have the category "Construction Clips" and then links to "Color" in which the user can choose White or Sandstone. The content is almost identical for both of them, the only thing that changes is the color. This scenarios repeats many times throughout my webcommerce site. And is throwing me off many Duplicate Content errors. I cannot use the canonicalization in the White or Sandstone page, because is a product page, it doesnt let me add or change anything on the header. http://aceromart.com/Clip-Glamet-Blanco.aspx I dont intend to rank well on most product pages, my priority is that i dont want duplicate errors on my website. What is the best solution for this conundrum? Regards!!! In advance, i thank you for your opinions!
Technical SEO | | JesusD0