How do we decide which pages to index/de-index? Help for a 250k page site
-
At Siftery (siftery.com) we have about 250k pages, most of them reflected in our sitemap. Though after submitting a sitemap we started seeing an increase in the number of pages Google indexed, in the past few weeks progress has slowed to a crawl at about 80k pages, and in fact has been coming down very marginally.
Due to the nature of the site, a lot of the pages on the site likely look very similar to search engines. We've also broken down our sitemap into an index, so we know that most of the indexation problems are coming from a particular type of page (company profiles).
Given these facts below, what do you recommend we do? Should we de-index all of the pages that are not being picked up by the Google index (and are therefore likely seen as low quality)? There seems to be a school of thought that de-indexing "thin" pages improves the ranking potential of the indexed pages. We have plans for enriching and differentiating the pages that are being picked up as thin (Moz itself picks them up as 'duplicate' pages even though they're not.
Thanks for sharing your thoughts and experiences!
-
I was advised to deindex pages that had not been visited in the recent past. I deindexed about 150 pages and had a nice bump in the SERPS. Previously I was #9 and I jumped to #4. I have about a hundred more thin pages I'm working on and #crossyourfingers maybe I'll be top three.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Disallow: /jobs/? is this stopping the SERPs from indexing job posts
Hi,
Intermediate & Advanced SEO | | JamesHancocks1
I was wondering what this would be used for as it's in the Robots.exe of a recruitment agency website that posts jobs. Should it be removed? Disallow: /jobs/?
Disallow: /jobs/page/*/ Thanks in advance.
James0 -
Why do I have so many extra indexed pages?
Stats- Webmaster Tools Indexed Pages- 96,995 Site: Search- 97,800 Pages Sitemap Submitted- 18,832 Sitemap Indexed- 9,746 I went through the search results through page 28 and every item it showed was correct. How do I figure out where these extra 80,000 items are coming from? I tried crawling the site with screaming frog awhile back but it locked because of so many urls. The site is a Magento site so there are a million urls, but I checked and all of the canonicals are setup properly. Where should I start looking?
Intermediate & Advanced SEO | | Tylerj0 -
Pages that did NOT 301 redirect to the new site
Hi, Is there a tool out there that can tell me what pages did NOT 301 redirect to the new sites? I need something rather than going into google.com and typing in site:oldsite.com to see if it's still indexed and if it's not 301 redirecting.. I'm not sure if screaming frog can do that. Thanks.
Intermediate & Advanced SEO | | ggpaul5620 -
Is it bad for SEO to have a page that is not linked to anywhere on your site?
Hi, We had a content manager request to delete a page from our site. Looking at the traffic to the page, I noticed there were a lot of inbound links from credible sites. Rather than deleting the page, we simply removed it from the navigation, so that a user could still access the page by clicking on a link to it from an external site. Questions: Is it bad for SEO to have a page that is not directly accessible from your site? If no: do we keep this page in our Sitemap, or remove it? If yes: what is a better strategy to ensure the inbound links aren't considered "broken links" and also to minimize any negative impact to our SEO? Should we delete the page and 301 redirect users to the parent page for the page we had previously hidden?
Intermediate & Advanced SEO | | jnew9290 -
Why isn't my site being indexed by Google?
Our domain was originally pointing to a Squarespace site that went live in March. In June, the site was rebuilt in WordPress and is currently hosted with WPEngine. Oddly, the site is being indexed by Bing and Yahoo, but is not indexed at all in Google i.e. site:example.com yields nothing. As far as I know, the site has never been indexed by Google, neither before nor after the switch. What gives? A few things to note: I am not "discouraging search engines" in WordPress Robots.txt is fine - I'm not blocking anything that shouldn't be blocked A sitemap has been submitted via Google Webmaster Tools and I have "fetched as Google" and submitted for indexing - No errors I've entered both the www and non-www in WMT and chose a preferred There are several incoming links to the site, some from popular domains The content on the site is pretty standard and crawlable, including several blog posts I have linked up the account to a Google+ page
Intermediate & Advanced SEO | | jtollaMOT0 -
Thinking about not indexing PDFs on a product page
Our product pages generate a PDF version of the page in a different layout. This is done for 2 reasons, it's been the standard across similar industries and to help customers print them when working with the product. So there is a use when it comes to the customer but search? I've thought about this a lot and my thinking is why index the PDF at all? Only allow the HTML page to be indexed. The PDF files are in a subdomain, so I can easily no index them. The way I see it, I'm reducing duplicate content On the flip side, it is hosted in a subdomain, so the PDF appearing when a HTML page doesn't, is another way of gaining real estate. If it appears with the HTML page, more estate coverage. Anyone else done this? My knowledge tells me this could be a good thing, might even iron out any backlinks from being generated to the PDF and lead to more HTML backlinks Can PDFs solely exist as a form of data accessible once on the page and not relevant to search engines. I find them a bane when they are on a subdomain.
Intermediate & Advanced SEO | | Bio-RadAbs0 -
"site" operator and pages
Hi folks, We are having trouble in indexing, We have certain pages which are not coming in results when I am using the site operator in Google. for e.g. : sitename.com/widgets/red They are not showing any link results in Google webmaster tools too. But the pages which only linked through them are displaying in results when I am using site operator. for e.g: sitename.com/widgets/red/large We are redirecting some of the search which are close or exact match to the respective pages for e.g: sitename.com/search/red --> sitename.com/widgets/red We are fluctuating on rankings too in google serps form top ppositions to no where, for sitename.com/widgets/red and most of the times when google shows sitename.com/search/red instead of itename.com/widgets/red. Can you please put a light on this issues.
Intermediate & Advanced SEO | | semshah1430 -
What are the different tactics for getting ranked/ included in Google finance searches such as http://www.google.com/finance/company_news?q=NASDAQ:ADBE
I don't know what ranking factors they are using for this feed. The results vary greatly from a search done at google.com or google.com/news and google.com/finance I'm working with a website that regularly publishes finance-related news and currently gets traffic from google finance. I'm wondering what we can do to optimize our news articles to possibly show more prominently or more often. Thanks
Intermediate & Advanced SEO | | joemascaro0