Robots.txt, Disallow & Indexed-Pages..
-
Hi guys,
hope you're well.
I have a problem with my new website. I have 3 pages with the same content:
- http://example.examples.com/brand/brand1 (good page)
- http://example.examples.com/brand/brand1?show=false
- http://example.examples.com/brand/brand1?show=true
The good page has rel=canonical & it is the only page should be appear in Search results but Google has indexed 3 pages...
I don't know how should do now, but, i am thinking 2 posibilites:
- Remove filters (true, false) and leave only the good page and show 404 page for others pages.
- Update robots.txt with disallow for these parameters & remove those URL's manually
Thank you so much!
-
Finally, i decided to do the next:
-
Delete all pages from my site with filters (i have the option and it wasn't a problem)
-
Delete URL using GWT individually
It works!
-
-
Hi thekiller99! Did this get worked out? We'd love an update.
-
Hi,
Did you actually implement canonical tags on duplicate pages, and do the point to the original piece?
-
Hi!
Not sure if i understood how you implemented the canonical element on your pages, but it sounds like you have only put the canonical code to what you call "good page"
The scenario should be like this:
1. You have 3 pages with similar/exact content.
2. Obviously you want to index only one of them and in your case it is the one without the parameters ("good page")
3. You need to go ahead and implement the canonical elements in the following way:- page-1: http://example.examples.com/brand/brand1 (you do not have to, but if it makes it ieasier for you you can use self canonical.)
- page-2: http://example.examples.com/brand/brand1?show=false (canonical to page-1)
- page-3: http://example.examples.com/brand/brand1?show=true (canonical page-1)
PS. Google best practice suggests that you should never use robots.txt to de-index a page from the search results. In case you decide to remove certain pages completely from the search results, the best practice is to 404 them and use Google Search console to signal google that these pages are no longer available. But if you implement the canonical element as described above, you will have no problems.
Best
Yossi
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sub Directories Domain & Page Crawl Depth
Hi, I just bought an old domain with good backlinks and authority, that domain was technology product formerly. So, I want to make this domain for my money site. The purpose of this website is to serve technological information like WordPress tutorial and etc (free software or drivers). And I just installed a sub directory on this domain like https://maindomain.com/subdirectory/ and this directory I made for a free software like graphics drivers download (NVIDIA or AMD). What you think with this website? Is it make sense? Wait, I just added this domain to my campaign at MOZ and the result shown my sub directory was 6 times of crawl depth. Is it good for directory or I need to move the sub directory to my main site? Thank you, hope someone answer my confuse. Best Regard, Matthew.
Intermediate & Advanced SEO | | matthewparkman0 -
How to create AMP Pages for product website?
How to create AMP Pages for product website? I mean we can create it easily when we have wordpress through plugin, what about when we have millions of pages, It would be too tedious to create amp version of every page. So, is there any alternative way to create amp version?
Intermediate & Advanced SEO | | sachin.kaushik0 -
Need help with Robots.txt
An eCommerce site built with Modx CMS. I found lots of auto generated duplicate page issue on that site. Now I need to disallow some pages from that category. Here is the actual product page url looks like
Intermediate & Advanced SEO | | Nahid
product_listing.php?cat=6857 And here is the auto generated url structure
product_listing.php?cat=6857&cPath=dropship&size=19 Can any one suggest how to disallow this specific category through robots.txt. I am not so familiar with Modx and this kind of link structure. Your help will be appreciated. Thanks1 -
Question about Syntax in Robots.txt
So if I want to block any URL from being indexed that contains a particular parameter what is the best way to put this in the robots.txt file? Currently I have-
Intermediate & Advanced SEO | | DRSearchEngOpt
Disallow: /attachment_id Where "attachment_id" is the parameter. Problem is I still see these URL's indexed and this has been in the robots now for over a month. I am wondering if I should just do Disallow: attachment_id or Disallow: attachment_id= but figured I would ask you guys first. Thanks!0 -
Best practice to prevent pages from being indexed?
Generally speaking, is it better to use robots.txt or rel=noindex to prevent duplicate pages from being indexed?
Intermediate & Advanced SEO | | TheaterMania0 -
Why are some pages indexed but not cached by Google?
The question is simple but I don't understand the answer. I found a webpage that was linking to my personal site. The page was indexed in Google. However, there was no cache option and I received a 404 from Google when I tried using cache:www.thewebpage.com/link/. What exactly does this mean? Also, does it have any negative implication on the SEO value of the link that points to my personal website?
Intermediate & Advanced SEO | | mRELEVANCE0 -
My indexed pages count is shrinking in webmaster tools. Is this normal ?
I noticed that our total # of indexed pages dropped recently by a substantial amount (see chart below) Is this normal? http://imgur.com/4GWzkph Also, 3 weeks after this started dropping, we got a message on increased # of crawl errors and found that a site update was causing 300+ new 404s. could this be related ?
Intermediate & Advanced SEO | | znotes0 -
XML Sitemap instruction in robots.txt = Worth doing?
Hi fellow SEO's, Just a quick one, I was reading a few guides on Bing Webmaster tools and found that you can use the robots.txt file to point crawlers/bots to your XML sitemap (they don't look for it by default). I was just wondering if it would be worth creating a robots.txt file purely for the purpose of pointing bots to the XML sitemap? I've submitted it manually to Google and Bing webmaster tools but I was thinking more for the other bots (I.e. Mozbot, the SEOmoz bot?). Any thoughts would be appreciated! 🙂 Regards, Ash
Intermediate & Advanced SEO | | AshSEO20110