I have removed over 2000+ pages but Google still says i have 3000+ pages indexed
-
Good Afternoon,
I run a office equipment website called top4office.co.uk.
My predecessor decided that he would make an exact copy of the content on our existing site top4office.com and place it on the top4office.co.uk domain which included over 2k of thin pages.
Since coming in i have hired a copywriter who has rewritten all the important content and I have removed over 2k pages of thin pages.
I have set up 301's and blocked the thin pages using robots.txt and then used Google's removal tool to remove the pages from the index which was successfully done.
But, although they were removed and can now longer be found in Google, when i use site:top4office.co.uk i still have over 3k of indexed pages (Originally i had 3700).
Does anyone have any ideas why this is happening and more importantly how i can fix it?
Our ranking on this site is woeful in comparison to what it was in 2011. I have a deadline and was wondering how quickly, in your opinion, do you think all these changes will impact my SERPs rankings?
Look forward to your responses!
-
I agree with DrPete. You cant have the pages within the robot.txt otherwise Google will not crawl the pages and "see" the 301s to then update the index.
Something else to consider is on the new pages, have them canonical to themselves. We had a site that Google was caching old URLs that had 301 redirects that had been up for 2 years. Google was finding the new pages and new titles and new content, but were referencing the old URLs. We were seeing this in the SERPs and also in the GWT. GWT was reporting duplicate content for titles and descriptions for sets of pages that were 301ed. Adding the canonical to self helped get that cleaned up.
Cheers.
-
This process can take a painfully long time, even done right, but I do have a couple of concerns:
(1) Assuming I understand the situation, I think using Robots.txt on top of 301-redirects is a bad idea. If Google doesn't recrawl the pages, they won't process the 301s, and Robots.txt is bad for removal (good for prevention, but not once something is in the index). Basically, you're telling Google not to re-crawl these pages, and if they don't re-crawl, they won't process the 301s. So, I'd drop the Robots.txt blocking for now, honestly.
(2) What's your internationalization strategy? You could potential try rel="alternate"/hreflang to specify US vs. UK English, target each domain in webmaster tools, and leave the duplicates alone. If you 301-redirect, you're not giving the UK site a chance to rank properly on Google.co.uk (if that's your objective).
-
It sounds like you have done pretty much everything you could do to remove those pages from Google, and that Google has removed them.
There are two possibilities that I can think of. First, Google is finding new pages or new URLs at least. These may be old pages that have some sort of a parameter on them or something like that that are causing Google to find some new pages even though you're not adding any new pages.
Another possibility is that, I found that the site:search is not entirely accurate. So, it's more like anything else that Google gives us words this kind of estimate of the actual figure. It's possible that Google was giving you a smaller number of pages if in that original 3700 they said they had. And now they're just reporting more of the pages that they had had in their index, which they weren't showing before.
By the way, when I do a search for site:top four office.co.uk, I only get 2600 results.
-
I no longer see the pages. No chance Google has seen any additional pages as we spend every day looking at new pages indexed by using the filter and site:top4office.co.uk.
Any ideas?
-
Just a quick question, do you see the URLs you "removed" still in the index? Or is it possible that Google has found a different set of 3000 URLs on your site?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google has penalized me for a keyword,and removed from google some one know for how long time is the penalty
i have by some links from fiverr i was ranking 9 for this keyword with 1200 of searches after fiverr it has disappeared from google more then 10 days i guess this is a penalty someone know how long a penalty like this is how many days to months ? i don't get any messages in webmaster tools this is the gig https://www.fiverr.com/carissa30/do-20-unique-domains-high-tf-and-cf-flow-backlinks-high-da?source=Order+page+gig+link&funnel=a7b5fa4f-8c0a-4c3e-98a3-74112b658c7f
Intermediate & Advanced SEO | | alexmuller870 -
Does Google Index URLs that are always 302 redirected
Hello community Due to the architecture of our site, we have a bunch of URLs that are 302 redirected to the same URL plus a query string appended to it. For example: www.example.com/hello.html is 302 redirected to www.example.com/hello.html?___store=abc The www.example.com/hello.html?___store=abc page also has a link canonical tag to www.example.com/hello.html In the above example, can www.example.com/hello.html every be Indexed, by google as I assume the googlebot will always be redirected to www.example.com/hello.html?___store=abc and will never see www.example.com/hello.html ? Thanks in advance for the help!
Intermediate & Advanced SEO | | EcommRulz0 -
"Null" appearing as top keyword in "Content Keywords" under Google index in Google Search Console
Hi, "Null" is appearing as top keyword in Google search console > Google Index > Content Keywords for our site http://goo.gl/cKaQ4K . We do not use "null" as keyword on site. We are not able to find why Google is treating "null" as a keyword for our site. Is anyone facing such issue. Thanks & Regards
Intermediate & Advanced SEO | | vivekrathore0 -
My blog is indexing only the archive and category pages
Hi there MOZ community. I am new to the QandA and have a question. I have a blog Its been live for months - but I can not get the posts to rank in the serps. Oddly only the categories rank. The posts are crawled it seems - but seen as less important for a reason I don't understand. Can anyone here help with this? See here for what i mean. I have had several wp sites rank well in the serps - and the posts do much better. Than the categories or archives - super odd. Thanks to all for help!
Intermediate & Advanced SEO | | walletapp0 -
Old pages still in index
Hi Guys, I've been working on a E-commerce site for a while now. Let me sum it up : February new site is launched Due to lack of resources we started 301's of old url's in March Added rel=canonical end of May because of huge index numbers (developers forgot!!) Added noindex and robots.txt on at least 1000 urls. Index numbers went down from 105.000 tot 55.000 for now, see screenshot (actual number in sitemap is 13.000) Now when i do site:domain.com there are still old url's in the index while there is a 301 on the url since March! I know this can take a while but I wonder how I can speed this up or am doing something wrong. Hope anyone can help because I simply don't know how the old url's can still be in the index. 4cArHPH.png
Intermediate & Advanced SEO | | ssiebn70 -
Can too many "noindex" pages compared to "index" pages be a problem?
Hello, I have a question for you: our website virtualsheetmusic.com includes thousands of product pages, and due to Panda penalties in the past, we have no-indexed most of the product pages hoping in a sort of recovery (not yet seen though!). So, currently we have about 4,000 "index" page compared to about 80,000 "noindex" pages. Now, we plan to add additional 100,000 new product pages from a new publisher to offer our customers more music choice, and these new pages will still be marked as "noindex, follow". At the end of the integration process, we will end up having something like 180,000 "noindex, follow" pages compared to about 4,000 "index, follow" pages. Here is my question: can this huge discrepancy between 180,000 "noindex" pages and 4,000 "index" pages be a problem? Can this kind of scenario have or cause any negative effect on our current natural SEs profile? or is this something that doesn't actually matter? Any thoughts on this issue are very welcome. Thank you! Fabrizio
Intermediate & Advanced SEO | | fablau0 -
Thousands of 404 Pages Indexed - Recommendations?
Background: I have a newly acquired client who has had a lot of issues over the past few months. What happened is he had a major issue with broken dynamic URL's where they would start infinite loops due to redirects and relative links. His previous SEO didn't pay attention to the sitemaps created by a backend generator, and it caused hundreds of thousands of pages to be indexed. Useless pages. These useless pages were all bringing up a 404 page that didn't have a 404 server response (it had a 200 response) which created a ton of duplicate content and bad links (relative linking). Now here I am, cleaning up this mess. I've fixed the 404 page so it creates a 404 server response. Google webmaster tools is now returning thousands of "not found" errors, great start. I fixed all site errors that cause infinite redirects. Cleaned up the sitemap and submitted it. When I search site:www.(domainname).com I am still getting an insane amount of pages that no longer exist. My question: How does Google handle all of these 404's? My client wants all the bad pages removed now but I don't have as much control over that. It's a slow process getting Google to remove these pages that are returning a 404. He is continuously dropping in rankings still. Is there a way of speeding up the process? It's not reasonable to enter tens of thousands of pages into the URL Removal Tool. I want to clean house and have Google just index the pages in the sitemap.
Intermediate & Advanced SEO | | BeTheBoss0 -
Best way to stop pages being indexed and keeping PageRank
If for example on a discussion forum, what would be the best way to stop pages such as the posting page (where a user posts a topic or message) from being indexed AND not diluting PageRank too? If we added them to the Disallow on robots.txt, would pagerank still flow through the links to those blocked pages or would it stay concentrated on the linking page? Your ideas and suggestions will be greatly appreciated.
Intermediate & Advanced SEO | | Peter2640