Robots.txt, Disallow & Indexed-Pages..
-
Hi guys,
hope you're well.
I have a problem with my new website. I have 3 pages with the same content:
- http://example.examples.com/brand/brand1 (good page)
- http://example.examples.com/brand/brand1?show=false
- http://example.examples.com/brand/brand1?show=true
The good page has rel=canonical & it is the only page should be appear in Search results but Google has indexed 3 pages...
I don't know how should do now, but, i am thinking 2 posibilites:
- Remove filters (true, false) and leave only the good page and show 404 page for others pages.
- Update robots.txt with disallow for these parameters & remove those URL's manually
Thank you so much!
-
Finally, i decided to do the next:
-
Delete all pages from my site with filters (i have the option and it wasn't a problem)
-
Delete URL using GWT individually
It works!
-
-
Hi thekiller99! Did this get worked out? We'd love an update.
-
Hi,
Did you actually implement canonical tags on duplicate pages, and do the point to the original piece?
-
Hi!
Not sure if i understood how you implemented the canonical element on your pages, but it sounds like you have only put the canonical code to what you call "good page"
The scenario should be like this:
1. You have 3 pages with similar/exact content.
2. Obviously you want to index only one of them and in your case it is the one without the parameters ("good page")
3. You need to go ahead and implement the canonical elements in the following way:- page-1: http://example.examples.com/brand/brand1 (you do not have to, but if it makes it ieasier for you you can use self canonical.)
- page-2: http://example.examples.com/brand/brand1?show=false (canonical to page-1)
- page-3: http://example.examples.com/brand/brand1?show=true (canonical page-1)
PS. Google best practice suggests that you should never use robots.txt to de-index a page from the search results. In case you decide to remove certain pages completely from the search results, the best practice is to 404 them and use Google Search console to signal google that these pages are no longer available. But if you implement the canonical element as described above, you will have no problems.
Best
Yossi
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How long will old pages stay in Google's cache index. We have a new site that is two months old but we are seeing old pages even though we used 301 redirects.
Two months ago we launched a new website (same domain) and implemented 301 re-directs for all of the pages. Two months later we are still seeing old pages in Google's cache index. So how long should I tell the client this should take for them all to be removed in search?
Intermediate & Advanced SEO | | Liamis0 -
How to check if the page is indexable for SEs?
Hi, I'm building the extension for Chrome, which should show me the status of the indexability of the page I'm on. So, I need to know all the methods to check if the page has the potential to be crawled and indexed by a Search Engines. I've come up with a few methods: Check the URL in robots.txt file (if it's not disallowed) Check page metas (if there are not noindex meta) Check if page is the same for unregistered users (for those pages only available for registered users of the site) Are there any more methods to check if a particular page is indexable (or not closed for indexation) by Search Engines? Thanks in advance!
Intermediate & Advanced SEO | | boostaman0 -
Google indexing only 1 page out of 2 similar pages made for different cities
We have created two category pages, in which we are showing products which could be delivered in separate cities. Both pages are related to cake delivery in that city. But out of these two category pages only 1 got indexed in google and other has not. Its been around 1 month but still only Bangalore category page got indexed. We have submitted sitemap and google is not giving any crawl error. We have also submitted for indexing from "Fetch as google" option in webmasters. www.winni.in/c/4/cakes (Indexed - Bangalore page - http://www.winni.in/sitemap/sitemap_blr_cakes.xml) 2. http://www.winni.in/hyderabad/cakes/c/4 (Not indexed - Hyderabad page - http://www.winni.in/sitemap/sitemap_hyd_cakes.xml) I tried searching for "hyderabad site:www.winni.in" in google but there also http://www.winni.in/hyderabad/cakes/c/4 this link is not coming, instead of this only www.winni.in/c/4/cakes is coming. Can anyone please let me know what could be the possible issue with this?
Intermediate & Advanced SEO | | abhihan0 -
When does Google index a fetched page?
I have seen where it will index on of my pages within 5 minutes of fetching, but have also read that it can take a day. I'm on day #2 and it appears that it has still not re-indexed 15 pages that I fetched. I changed the meta-description in all of them, and added content to nearly all of them, but none of those changes are showing when I do a site:www.site/page I'm trying to test changes in this manner, so it is important for me to know WHEN a fetched page has been indexed, or at least IF it has. How can I tell what is going on?
Intermediate & Advanced SEO | | friendoffood0 -
Better for SEO to No-Index Pages with High Bounce Rates
Greeting MOZ Community: I operate www.nyc-officespace-leader.com, a New York City commercial real estate web site established in 2006. An SEO effort has been ongoing since September 2013 and traffic has dropped about 30% in the last month. The site has about 650 pages. 350 are listing pages, 150 are building pages. The listing and building pages have an average bounce rate of about 75%. The other 150 pages have a bounce rate of about 35%. The building and listing pages are dragging down click through rates for the entire site. My SEO firm believe there might be a benefit to "no-index, follow" these high bounce rate URLs. From an SEO perspective, would it be worthwhile to "no-index-follow" most of the building and listing pages in order to reduce the bounce rate? Would Google view the site as a higher quality site if I had these pages de-indexed and the average bounce rate for the site dropped significantly. If I no-indexed these pages would Google provide bette ranking to the pages that already perform well? As a real estate broker, I will constantly be adding many property listings that do not have much content so it seems that a "no-index, follow" would be good for the listings unless Google penalizes sites that have too many "no-index, follow" pages. Any thoughts??? Thanks,
Intermediate & Advanced SEO | | Kingalan1
Alan0 -
Robots.txt: Syntax URL to disallow
Did someone ever experience some "collateral damages" when it's about "disallowing" some URLs? Some old URLs are still present on our website and while we are "cleaning" them off the site (which takes time), I would like to to avoid their indexation through the robots.txt file. The old URLs syntax is "/brand//13" while the new ones are "/brand/samsung/13." (note that there is 2 slash on the URL after the word "brand") Do I risk to erase from the SERPs the new good URLs if I add to the robots.txt file the line "Disallow: /brand//" ? I don't think so, but thank you to everyone who will be able to help me to clear this out 🙂
Intermediate & Advanced SEO | | Kuantokusta0 -
Indexation of content from internal pages (registration) by Google
Hello, we are having quite a big amount of content on internal pages which can only be accessed as a registered member. What are the different options the get this content indexed by Google? In certain cases we might be able to show a preview to visitors. In other cases this is not possible for legal reasons. Somebody told me that there is an option to send the content of pages directly to google for indexation. Unfortunately he couldn't give me more details. I only know that this possible for URLs (sitemap). Is there really a possibility to do this for the entire content of a page without giving google access to crawl this page? Thanks Ben
Intermediate & Advanced SEO | | guitarslinger0 -
Reciprocal Links and nofollow/noindex/robots.txt
Hypothetical Situations: You get a guest post on another blog and it offers a great link back to your website. You want to tell your readers about it, but linking the post will turn that link into a reciprocal link instead of a one way link, which presumably has more value. Should you nofollow your link to the guest post? My intuition here, and the answer that I expect, is that if it's good for users, the link belongs there, and as such there is no trouble with linking to the post. Is this the right way to think about it? Would grey hats agree? You're working for a small local business and you want to explore some reciprocal link opportunities with other companies in your niche using a "links" page you created on your domain. You decide to get sneaky and either noindex your links page, block the links page with robots.txt, or nofollow the links on the page. What is the best practice? My intuition here, and the answer that I expect, is that this would be a sneaky practice, and could lead to bad blood with the people you're exchanging links with. Would these tactics even be effective in turning a reciprocal link into a one-way link if you could overlook the potential immorality of the practice? Would grey hats agree?
Intermediate & Advanced SEO | | AnthonyMangia0