For large sites, best practices for pages hidden behind internal search?
-
If a website has 1M+ pages, with most of them being hidden behind an internal search, what's the best way to get pages included in an engine's index?
Does a direct clickpath to those pages need to exist from the homepage or other major hub pages on the site?
Is submitting an XML sitemap enough?
-
Hello Vlevit,
You could do several things. I recommend giving Google your product feed, which should accomplish your goals. Another possible solution would be to make those search pages noindex,follow so they don't end up getting indexed, but Google can still use them for discovery.
Thanks for explaining the situation.
Below is more on submitting product feeds. It is for Google Product Search, but I would imagine the "link" field where you put the URL to your product detail page will help those pages get indexed in the standard results:
http://support.google.com/merchants/bin/answer.py?hl=en&answer=188494#USEverett
-
Everett, thanks for your reply. I understand the problems of showing internal search pages. I'm not looking to have internal search results being indexed, just the pages that the results link to. We're in eCommerce.
I was under the impression that there was a clever way to have the individual product pages indexed without establishing a direct click path, but best practices recommend otherwise.
Question answered. Thanks all for your help.
-
Hello Vlevit,
If you can be more specific we may be able to be of more help. Google doesn't want you to show internal search result pages, but if this is a different type of situation it there may be an exception. Are these search result pages, product pages, category pages, content pages.... is it an eCommerce site, community, content site... ?
Generally speaking, 1M+ pages with no links going into them and content that is either sparce/thin or partially/fully duplicated on other similar pages (like a search for widgets and a search for green widgets showing overlapping content) is exactly the type of thing that will get you in hot water that would affect even the rankings of your home page.
Do you feel like your question has been answered or would you like to be more specific about your site and goals?
Cheers,
Everett
-
This is what I was assuming, but was wondering if there was a clever way around creating direct click paths to those pages, while still maintaining their importance to the site. Thanks for the info.
-
Make sure they are part of the actual structure of your website, not just part of search. Meaning, you have to have links pointing at them. Also, you will also want to make sure that those pages have value.
-
Hi vlevit,
The best practice would be to exist a direct path of flow from index page. Something like: index -> category(filter) -> subcategory(filter) -> page/product. But in some cases xml sitemaps can also help you in indexing.
BUT, beware with to large XML sitemaps, try to create more then one sitemap, group them as possible.
A few very good resources can be found under the next links:
http://www.seomoz.org/ugc/solving-new-content-indexation-issues-for-large-b2b-websites
http://www.seomoz.org/qa/view/29009/sitemaps-management-for-big-sites-tens-of-millions-of-pages
I hope it helpes,
Istvan
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moving site from html to Wordpress site: Should I port all old pages and redirect?
Any help would be appreciated. I am porting an old legacy .html site, which has about 500,000 visitors/month and over 10,000 pages to a new custom Wordpress site with a responsive design (long overdue, of course) that has been written and only needs a few finishing touches, and which includes many database features to generate new pages that did not previously exist. My questions are: Should I bother to port over older pages that are "thin" and have no incoming links, such that reworking them would take time away from the need to port quickly? I will be restructuring the legacy URLs to be lean and clean, so 301 redirects will be necessary. I know that there will be link juice loss, but how long does it usually take for the redirects to "take hold?" I will be moving to https at the same time to avoid yet another porting issue. Many thanks for any advice and opinions as I embark on this massive data entry project.
Technical SEO | | gheh20130 -
Is site: a reliable method for getting full list of indexed pages?
The site:domain.com search seems to show less pages than it used to (Google and Bing). It doesn't relate to a specific site but all sites. For example, I will get "page 1 of about 3,000 results" but by the time I've paged through the results it will end and change to "page 24 of 201 results". In that example If I look in GSC it shows 1,932 indexed. Should I now accept the "pages" listed in site: is an unreliable metric?
Technical SEO | | bjalc20112 -
Google not indexing /showing my site in search results...
Hi there, I know there are answers all over the web to this type of question (and in Webmaster tools) however, I think I have a specific problem that I can't really find an answer to online. site is: www.lizlinkleter.com Firstly, the site has been live for over 2 weeks... I have done everything from adding analytics, to submitting a sitemap, to adding to webmaster tools, to fetching each individual page as googlebot and then submitting to index via webmaster tools. I've checked my robot files and code elsewhere on the site and the site is not blocking search engines (as far as I can see) There are no security issues in webmaster tools or MOZ. Google says it has indexed 31 pages in the 'Index Status' section, but on the site dashboard it says only 2 URLS are indexed. When I do a site:www.lizlinketer.com search the only results I get are pages that are excluded in the robots file: /xmlrpc.php & /admin-ajax.php. Now, here's where I think the issue stems from - I developed the site myself for my wife and I am new to doing this, so I developed it on the live URL (I now know this was silly) - I did block the content from search engines and have the site passworded, but I think Google must have crawled the site before I did this - the issue with this was that I had pulled in the Wordpress theme's dummy content to make the site easier to build - so lots of nasty dupe content. The site took me a couple of months to construct (working on it on and off) and I eventually pushed it live and submitted to Analytics and webmaster tools (obviously it was all original content at this stage)... But this is where I made another mistake - I submitted an old site map that had quite a few old dummy content URLs in there... I corrected this almost immediately, but it probably did not look good to Google... My guess is that Google is punishing me for having the dummy content on the site when it first went live - fair enough - I was stupid - but how can I get it to index the real site?! My question is, with no tech issues to clear up (I can't resubmit site through webmaster tools) how can I get Google to take notice of the site and have it show up in search results? Your help would be massively appreciated! Regards, Fraser
Technical SEO | | valdarama0 -
HTTP to HTTPS Transition, Large Drop in Search Traffic
My URL is: https://www.seattlecoffeegear.comWe implemented https across the site on Friday. Saturday and Sunday search traffic was normal/slightly higher than normal (in analytics) and slightly down in GWT. Today, it has dropped significantly in both, to about half of normal search traffic. From everything we can see, we implemented this correctly. 301 redirected all http requests to https (and yes, they go to the correct page and not to the homepage 😉 ) Rewrote hardcoded internal links Registered/submitted sitemaps from https in Bing and GWT Used fetch and render to ensure Google could reach the site and also was redirected appropriately from http to https versions Ensured robots.txt does not block https or secure We also use a CDN (though I don't think that impacts anything) and have had no customer issues with accessing or using the website since the transition.Is there anything else I might be missing that could correlate to a drop in search impressions or is this just a waiting game of a few days to let Google sort through the change we've made and reindex everything (it dropped to 0 indexed for a day and is now up to 1744 of our 2180 pages indexed)?Thank you so much for any input!Kaylie
Technical SEO | | Marketing.SCG0 -
What is the best way to stop a page being indexed?
What is the best way to stop a page being indexed? Is it to implement robots.txt at a site level with a Robots.txt file in the main directory or at a page level with the tag?
Technical SEO | | cbarron0 -
Is a canonical tag the best solution for multiple search listing pages in a site?
I have a site where dozens of page listings are showing in my report with a parameter showing the page number for the listings. Is the best solution to canonical these page listings back a core page (all-products)? Or, do I change my site configuration in Webmasters to ignore "page" parameters? What's the solution? Example URL 1- http://mydomain.com/products/all-products?page=84 Example URL 2- http://mydomain.com/products/all-products?page=85 Example URL 3- http://mydomain.com/products/all-products?page=86 Thanks in advance for your direction.
Technical SEO | | JoshKimber0 -
Onsite SEO Strategy for a large accommodation site
Hi All I have been thinking about the best strategy for keyword optimisation on a forthcoming accommodation website I am involved with. This may be a bit of a newbie type question, but most of my work has been on considerably smaller sites to date.... Lets say the site will have 1 primary landing page for "Hotels in Bristol" and then 50 pages that are each for a hotel in Bristol. The aim would be for the primary page which will be a browse/search result type page to rank well for the term 'Hotels in Bristol' and other similar terms. If each of the hotel listing pages that have a hotel in Bristol on, have the phrase 'Hotel in Bristol' contained within the title, url, page content, maybe headings/alt tags etc. will the result be that the rank for the site is 'spread too thin' across the domain? Whats the best way to drive all the relevancy and keyword usage on the 50 listing pages, to the primary page such that that is the one that ranks well? And the other pages rank more for the hotel name etc? I guess one way would be to avoid using the words hotels and Bristol in the title/URL etc.. but the natural approach for usability (not SEO) would be to use these words i.e. http://www.newtravelsite.com/hotels/bristol/stgeorgeshotel/ Or would each of the 50 listing pages simply need a followed, anchored link pointing the main landing page? I'm sure there may be a fundamental technique to do this that has alluded me so far, but any help, thoughts or guidance much appreciated! Regards Simon
Technical SEO | | SCL-SEO0 -
Best practices for temporary articles
Hello, I would like to have expert inputs about the best way to manage temporary content? In my case, I've a page (ex : mydomain.com/agenda) where I have listing of temporary article, with a lifetime of 1 month to 6 months for some of them. My articles also have a specific url like for ex : mydomain.com/agenda/12-02-2011/thenameofmyarticle/ As you can guess, I got hundreds of 404 😞 I'm already using canonical tag, should I use a in the listing page? I'm a bit lost here..
Technical SEO | | Alexandre_0