Block search engines from URLs created by internal search engine?
-
Hey guys,
I've got a question for you all that I've been pondering for a few days now. I'm currently doing an SEO Technical Audit for a large scale directory.
One major issue that they are having is that their internal search system (Directory Search) will create a new URL everytime a search query is entered by the user. This creates huge amounts of duplication on the website.
I'm wondering if it would be best to block search engines from crawling these URLs entirely with Robots.txt?
What do you guys think? Bearing in mind there are probably thousands of these pages already in the Google index?
Thanks
Kim
-
That sounds perfect - if the user-generated URLs are getting enough traffic, make them permanent pages and 301-redirect or canonical. If not, weed them out of the index.
-
Thanks for your reply Dr. Meyers. I think you're probably right.
Yes I'm recommending they define a canonical set of pages that are the most popular searches, categories and locations which can be reached via internal links and we'll get all those duplicates re-directed back to that canonical set.
But for pages that fall outside those categories and locations, I'll recommend a meta-no-index tag.
-
It can be a complicated question on a very large site, but in most cases I'd META NOINDEX those pages. Robots.txt isn't great at removing content that's already been indexed. Admittedly, NOINDEX will take a while to work (virtually any solution will), as Google probably doesn't crawl these pages very often.
Generally, though, the risk of having your index explode with custom search pages is too high for a site like yours (especially post-Panda). I do think blocking those pages somehow is a good bet.
The only exception I would add is if some of the more popular custom searches are getting traffic and/or links. I assume you have a solid internal link structure and other paths to these listings, but if it looks like a few searches (or a few dozen) have attracted traffic and back-links, you'll want to preserve those somehow.
-
Sure, check below and some of the duplication I mean:
Capitalization Duplication
http://yellow.co.nz/yellow+pages/Car+dealer/Auckland+Region
http://yellow.co.nz/yellow+pages/Car+Dealer/Auckland+Region
With a few URL parameters
And with location duplication
http://yellow.co.nz/yellow+pages/Car+Dealer/Auckland
Let me know if you need any more info!
Cheers
Kim
-
Whats the content look like on the new url? Can you give us an example?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content with URLs
Hi all, Do you think that is possible to have duplicate content issues because we provide a unique image with 5 different URLs ? In the HTML code pages, just one URL is provide. It's enough for that Google don't see the other URLs or not ? Example, in this article : http://www.parismatch.com/People/Kim-Kardashian-sa-securite-n-a-pas-de-prix-1092112 The same image is available on: http://cdn-parismatch.ladmedia.fr/var/news/storage/images/paris-match/people/kim-kardashian-sa-securite-n-a-pas-de-prix-1092112/15629236-1-fre-FR/Kim-Kardashian-sa-securite-n-a-pas-de-prix.jpg http://resize-parismatch.ladmedia.fr/img/var/news/storage/images/paris-match/people/kim-kardashian-sa-securite-n-a-pas-de-prix-1092112/15629236-1-fre-FR/Kim-Kardashian-sa-securite-n-a-pas-de-prix.jpg http://resize1-parismatch.ladmedia.fr/img/var/news/storage/images/paris-match/people/kim-kardashian-sa-securite-n-a-pas-de-prix-1092112/15629236-1-fre-FR/Kim-Kardashian-sa-securite-n-a-pas-de-prix.jpg http://resize2-parismatch.ladmedia.fr/img/var/news/storage/images/paris-match/people/kim-kardashian-sa-securite-n-a-pas-de-prix-1092112/15629236-1-fre-FR/Kim-Kardashian-sa-securite-n-a-pas-de-prix.jpg http://resize3-parismatch.ladmedia.fr/img/var/news/storage/images/paris-match/people/kim-kardashian-sa-securite-n-a-pas-de-prix-1092112/15629236-1-fre-FR/Kim-Kardashian-sa-securite-n-a-pas-de-prix.jpg Thank you very much for your help. Julien
Intermediate & Advanced SEO | | Julien.Ferras0 -
Google Search Analytics How to Get Search Keywords for a Page?
How do I get the keywords coming into a page on the new Google Webmaster Tools Search Analytics? Used to be there in the old version. You would just view your most popular urls and when you expanded the urls you would see the terms driving the traffic. How do I see the most popular keyword queries for a given page in the new tool? Alternatively can I still use the old tool somehow?
Intermediate & Advanced SEO | | K-WINTER0 -
How important is the optional <priority>tag in an XML sitemap of your website? Can this help search engines understand the hierarchy of a website?</priority>
Can the <priority>tag be used to tell search engines the hierarchy of a site or should it be used to let search engines know which priority to we want pages to be indexed in?</priority>
Intermediate & Advanced SEO | | mycity4kids0 -
How to Create an Infographic
Kindly let me know how to create inforgraphic I am non designer. Any Best tool or template to buy to create infographic?
Intermediate & Advanced SEO | | marknorman0 -
Should I block temporary pages
I need some SEO advice on an odd scenario: We are launching a new product line (party supplies) on it's own domain (PartySuperCenter.com). Due to some internal/technical reasons we will not be able to launch the site until the summer. We already have the product in our warehouse so the owners want to created a section on our current site (CostumeSuperCenter.com) for the new products. Once the new site is up the product will be removed from our current site and moved to the new site. I am concerned about the effect this will have on our SEO - having thousands of product pages appear and then disappear after a few months. I was thinking about blocking the pages using the "noindex" tag. Is this how you would handle it? Thanks in advance for your help!
Intermediate & Advanced SEO | | costume0 -
What is the best way to hide duplicate, image embedded links from search engines?
**Hello! Hoping to get the community’s advice on a technical SEO challenge we are currently facing. [My apologies in advance for the long-ish post. I tried my best to condense the issue, but it is complicated and I wanted to make sure I also provided enough detail.] Context: I manage a human anatomy educational website that helps students learn about the various parts of the human body. We have been around for a while now, and recently launched a completely new version of our site using 3D CAD images. While we tried our best to design our new site with SEO best practices in mind, our daily visitors dropped by ~15%, despite drastic improvements we saw in our user interaction metrics, soon after we flipped the switch. SEOMoz’s Website Crawler helped us uncover that we now may have too many links on our pages and that this could be at least part of the reason behind the lower traffic. i.e. we are not making optimal use of links and are potentially ‘leaking’ link juice now. Since students learn about human anatomy in different ways, most of our anatomy pages contain two sets of links: Clickable links embedded via JavaScript in our images. This allows users to explore parts of the body by clicking on whatever objects interests them. For example, if you are viewing a page on muscles of the arm and hand and you want to zoom in on the biceps, you can click on the biceps and go to our detailed biceps page. Anatomy Terms lists (to the left of the image) that list all the different parts of the body on the image. This is for users who might not know where on the arms the biceps actually are. But this user could then simply click on the term “Biceps” and get to our biceps page that way. Since many sections of the body have hundreds of smaller parts, this means many of our pages have 150 links or more each. And to make matters worse, in most cases, the links in the images and in the terms lists go to the exact same page. My Question: Is there any way we could hide one set of links (preferably the anchor text-less image based links) from search engines, such that only one set of links would be visible? I have read conflicting accounts of different methods from using JavaScript to embedding links into HTML5 tags. And we definitely do not want to do anything that could be considered black hat. Thanks in advance for your thoughts! Eric**
Intermediate & Advanced SEO | | Eric_R0 -
Changing URL Structure
We are going to be relaunching our website with a new URL structure. My question is, how is it best to deal with the migration process in terms of old URLS appearing whilst we launch the new ones. How best should we launch the new structure, considering we've in the region of 10,000 pages currently indexed in Google.
Intermediate & Advanced SEO | | NeilTompkins0