Robots.txt, Disallow & Indexed-Pages..
-
Hi guys,
hope you're well.
I have a problem with my new website. I have 3 pages with the same content:
- http://example.examples.com/brand/brand1 (good page)
- http://example.examples.com/brand/brand1?show=false
- http://example.examples.com/brand/brand1?show=true
The good page has rel=canonical & it is the only page should be appear in Search results but Google has indexed 3 pages...
I don't know how should do now, but, i am thinking 2 posibilites:
- Remove filters (true, false) and leave only the good page and show 404 page for others pages.
- Update robots.txt with disallow for these parameters & remove those URL's manually
Thank you so much!
-
Finally, i decided to do the next:
-
Delete all pages from my site with filters (i have the option and it wasn't a problem)
-
Delete URL using GWT individually
It works!
-
-
Hi thekiller99! Did this get worked out? We'd love an update.
-
Hi,
Did you actually implement canonical tags on duplicate pages, and do the point to the original piece?
-
Hi!
Not sure if i understood how you implemented the canonical element on your pages, but it sounds like you have only put the canonical code to what you call "good page"
The scenario should be like this:
1. You have 3 pages with similar/exact content.
2. Obviously you want to index only one of them and in your case it is the one without the parameters ("good page")
3. You need to go ahead and implement the canonical elements in the following way:- page-1: http://example.examples.com/brand/brand1 (you do not have to, but if it makes it ieasier for you you can use self canonical.)
- page-2: http://example.examples.com/brand/brand1?show=false (canonical to page-1)
- page-3: http://example.examples.com/brand/brand1?show=true (canonical page-1)
PS. Google best practice suggests that you should never use robots.txt to de-index a page from the search results. In case you decide to remove certain pages completely from the search results, the best practice is to 404 them and use Google Search console to signal google that these pages are no longer available. But if you implement the canonical element as described above, you will have no problems.
Best
Yossi
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I have two sitemaps which partly duplicate - one is blocked by robots.txt but can't figure out why!
Hi, I've just found two sitemaps - one of them is .php and represents part of the site structure on the website. The second is a .txt file which lists every page on the website. The .txt file is blocked via robots exclusion protocol (which doesn't appear to be very logical as it's the only full sitemap). Any ideas why a developer might have done that?
Intermediate & Advanced SEO | | McTaggart0 -
Indexing a several millions pages new website
Hello everyone, I am currently working for a huge classified website who will be released in France in September 2013. The website will have up to 10 millions pages. I know the indexing of a website of such size should be done step by step and not in only one time to avoid a long sandbox risk and to have more control about it. Do you guys have any recommandations or good practices for such a task ? Maybe some personal experience you might have had ? The website will cover about 300 jobs : In all region (= 300 * 22 pages) In all departments (= 300 * 101 pages) In all cities (= 300 * 37 000 pages) Do you think it would be wiser to index couple of jobs by couple of jobs (for instance 10 jobs every week) or to index with levels of pages (for exemple, 1st step with jobs in region, 2nd step with jobs in departements, etc.) ? More generally speaking, how would you do in order to avoid penalties from Google and to index the whole site as fast as possible ? One more specification : we'll rely on a (big ?) press followup and on a linking job that still has to be determined yet. Thanks for your help ! Best Regards, Raphael
Intermediate & Advanced SEO | | Pureshore0 -
No index.no follow certain pages
Hi, I want to stop Google et al from finding a some pages within my website. the url is www.mywebsite.com/call_backrequest.php?rid=14 As these pages are creating a lot of duplicate content issues. Would the easiest solution be to place a 'Nofollow/Noindex' META tag in page www.mywebsite.com/call_backrequest.php many thanks in advance
Intermediate & Advanced SEO | | wood1e19680 -
How important is the number of indexed pages?
I'm considering making a change to using AJAX filtered navigation on my e-commerce site. If I do this, the user experience will be significantly improved but the number of pages that Google finds on my site will go down significantly (in the 10,000's). It feels to me like our filtered navigation has grown out of control and we spend too much time worrying about the url structure of it - in some ways it's paralyzing us. I'd like to be able to focus on pages that matter (explicit Category and Sub-Category) pages and then just let ajax take care of filtering products below these levels. For customer usability this is smart. From the perspective of manageable code and long term design this also seems very smart -we can't continue to worry so much about filtered navigation. My concern is that losing so many indexed pages will have a large negative effect (however, we will reduce duplicate content and be able provide much better category and sub-category pages). We probably should have thought about this a year ago before Google indexed everything :-). Does anybody have any experience with this or insight on what to do? Thanks, -Jason
Intermediate & Advanced SEO | | cre80 -
Can pages compete with each other? Inbound links & domain authority, How to determine problem areas?
Heyy, I'm having some pretty big SEO issues. 😞 We have had some drops in our ranking. We're 5th page or worse depending on location for a few of our keywords that we used to rank well for. There are all sorts of random non relevant sites outranking us for the term "stickley" and "stickley furniture" One thing I noticed is that we are ranking for a different page for each keyphrase. Our home page is ranking for "Stickley" and our stickley page is ranking for "Stickley Furniture" Is this normal? I guess Google is just picking what it see's as what's more relevant. Is it possible that these two pages are "competing?" Do similar phrases linking to different pages cause pages to "fight" or unevenly disperse link juice? I'm having trouble knowing which page I should send inbound links to since Google seems to be linking similar keywords to different pages. How much should I stress about which pages I receive links on? Is it true that any inbound link to a site site will help increase its overall domain authority and overall SEO? What should I be focusing on? I've added 301 redirects for non WWW as well as tried to make the pages well optimized for SEO. Should I just add more related content to the pages? I know backlinks are important but I'm having a really hard time figuring out how to get links that aren't just spammy forum post footers or junk directory submissions. The thing that bothers me is we were ranking well and then suddenly are way back. We have never done any black hat SEO of any sort. I feel a bit stuck and confused at the moment 😞 Thanks in advance for any help!
Intermediate & Advanced SEO | | SheffieldMarketing
-Amy0 -
Webmaster Index Page significant drop
Has anyone noticed a significant drop in indexed pages within their Google Webmaster Tools sitemap area? We went from 1300 to 83 from Friday June 23 to today June 25, 2012 and no errors are showing or warnings. Please let me know if anyone else is experiencing this and suggestions to fix this?
Intermediate & Advanced SEO | | datadirect0 -
Does It Really Matter to Restrict Dynamic URLs by Robots.txt?
Today, I was checking Google webmaster tools and found that, there are 117 dynamic URLs are restrict by Robots.txt. I have added following syntax in my Robots.txt You can get more idea by following excel sheet. #Dynamic URLs Disallow: /?osCsidDisallow: /?q= Disallow: /?dir=Disallow: /?p= Disallow: /*?limit= Disallow: /*review-form I have concern for following kind of pages. Shorting by specification: http://www.vistastores.com/table-lamps?dir=asc&order=name Iterms per page: http://www.vistastores.com/table-lamps?dir=asc&limit=60&order=name Numbering page of products: http://www.vistastores.com/table-lamps?p=2 Will it create resistance in organic performance of my category pages?
Intermediate & Advanced SEO | | CommercePundit0 -
Subdomains - duplicate content - robots.txt
Our corporate site provides MLS data to users, with the end goal of generating leads. Each registered lead is assigned to an agent, essentially in a round robin fashion. However we also give each agent a domain of their choosing that points to our corporate website. The domain can be whatever they want, but upon loading it is immediately directed to a subdomain. For example, www.agentsmith.com would be redirected to agentsmith.corporatedomain.com. Finally, any leads generated from agentsmith.easystreetrealty-indy.com are always assigned to Agent Smith instead of the agent pool (by parsing the current host name). In order to avoid being penalized for duplicate content, any page that is viewed on one of the agent subdomains always has a canonical link pointing to the corporate host name (www.corporatedomain.com). The only content difference between our corporate site and an agent subdomain is the phone number and contact email address where applicable. Two questions: Can/should we use robots.txt or robot meta tags to tell crawlers to ignore these subdomains, but obviously not the corporate domain? If question 1 is yes, would it be better for SEO to do that, or leave it how it is?
Intermediate & Advanced SEO | | EasyStreet0