I have more pages in my site map being blocked by the robot file than I have being allowed to be crawled. Is Google going to hate me for this?
-
Using some rules to block all pages which start with "copy-of" on my website because people have a bad habit of duplicating new product listings to create our refurbished, surplus etc. listings for those products. To avoid Google seeing these as duplicate pages I've blocked them in the robot file, but of course they are still automatically generated in our sitemap. How bad is this?
-
When you say "people," are you saying your own web team duplicates content to make their job easier? Or am I missing something?...
If that's the case, you really should create unique URL's with unique page titles, product info, etc. That's the correct way to avoid getting hit for duplicate content - don't create it. It seems like what you're doing now is more of a band-aid solution to the problem.
I'd consider that even though creating unique content in situations like this can seem daunting and/or be more expensive, there's probably huge long-term gains to made if you do it right.
-
It is not bad, just not best practices because Google will still index the URL's if they are mentioned on other pages. Just to quote them:
"While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information..."
What I would do instead is either use rel="canonical" or 301 redirects. I hope that helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do you need to include the top menu on every single page of the site in the code?
When using cache: on google, and clicking on Text-only version, our site has the top menu gibberish on top? My feeling is that this take away SEO juice from our title and focus keyword. Our website is culinarydepotinc.com
On-Page Optimization | | Sammyh1 -
Responsive site.com vs m.site.com
Hi All, My client's website have two urls like: site.com/a.html and **m.site.com/a.html. ** Will it hurt google rankings for this website because there are version of a website? Please help!
On-Page Optimization | | binhlai1 -
Why Google did not index exactly these 2 pages? Any ideas?
Dear Community, on 27th of July I relaunched my own website and submitted the sitemap as well I send the index-page to crawl it including all linked pages. Already the next day the new pages have been indexed. Today I checked them manually if they have been indexed. The result is that 2 of 13 pages have not been indexed, here marked in bold: http://inlinear.com/
On-Page Optimization | | inlinear
http://inlinear.com/suchmaschinenoptimierung-online-marketing.php
http://inlinear.com/design/
http://inlinear.com/design/printmedien-gestaltung.php
http://inlinear.com/design/corporate-design-und-corporate-identity.php
http://inlinear.com/design/corporate-raum-design.php
http://inlinear.com/webentwicklung/
http://inlinear.com/virtueller-rundgang-360grad-fotografie.php
http://inlinear.com/business-atlas-online-verzeichnis.php
http://inlinear.com/baudokumentation-bauueberwachung.php
http://inlinear.com/ueber-uns.php
http://inlinear.com/blog/
http://inlinear.com/kontakt/ The page "/design/" (which is the index.php of this folder should be the main-page because its about WEB DESIGN.
Should I create a copy and call it /design/web-design.php? May be Google prefers a meaningful URL than the index.php? So I put then a rel=canonical to web-design.php in my index.php? design/corporate-design-und-corporate-identity.php
The URL is a little long, but this should not be the reason? Or might be a reason that another page which is still in the index, but not online anymore (even redirecting to /design/) is still more dominant? Strange.... orshould I simply wait a little or try submitting these to sites manually to google? When checking Google Webmasters Tools Google tells me that just 3 pages have been indexed.
When I was checking which page is indexed or not I checked each URL with the site-search option:
site:inlinear.com/pageX.php ... when Google shows this page, it was a sign that it was indexed but why webmasters tools show up only 3 pages? (see screenshot) Do you have any ideas?
Thank You 🙂 indexed.png0 -
301 to Intermediate Page then Rel=Canonical from Intermediate to target page
Hi I'm working on an eCommerce site and don't have direct access to the CMS. I had requested developers to provide me a facilty to 301 via htaccess however this is working slight differently. I need guidance from experts whether it's okay or not: Old Page: example.com/old Target New Page: example.com/new After Implementing the redirect, It redirects to an intermediate page or in other words, The same target URL with a question mark added: example.com/new? (notice the question mark in the new URL) This intermediate page has a canonical tag for the exact target URL. So, if I 301 redirect example.com/old to example.com/new? (Intermediate page) and If the intermediate page example.com/new? has a canonical tag for the exact target URL (example.com/new), Will I be able to pass the link juice and authority of old page to the new page?
On-Page Optimization | | Ankkesh0 -
Lead With Branded Keywords or Descriptive Keywords in Page Title for (Niche) Site?
Our site is hingeheads.com, and our products and product catalog are unique in two ways. For one our product is not something that people are generally aware of, and secondly our entire product catalog consists of different variations of the same product. **Catalog Overview: **http://hingeheads.com/collections/all Product Example: http://hingeheads.com/products/dolphin I keep wondering if it is better to lead the title with "branded keywords" [1] or with "descriptive keywords" [2]? Dolphin HingeHead | Unique Home Decor & Gift Idea | HingeHeads Dolphin Decor Accessories & Unique Gift Ideas | HingeHeads I am currently going with the second solution, but I am always wondering if that's the right/better solution. I am curious to hear feedback from people who have more experience with this than I do. How would you structure the title for our product pages? Thanks! Kai
On-Page Optimization | | hingeheads0 -
How long does it take for Google to see Changes to a site?
Hi, I have a low PR site (PR 1) that I am starting to work on. Ingeneral when you make changes to my site how long would it take Google to recognize and index those changes? The reason I am wondering is because the site I am working on had a lot of duplicate content (around 700 pages), I got rid of it all, but I wasn't sure how long it would take Google to spider all these pages and re-index them since the site is low PR. Thanks, Ken
On-Page Optimization | | Jason_3420 -
Is there a report in SEOMoz that will show me what keywords each page ranks for on my site?
I would like to find all of the keywords not just the keywords that I specified in the tracking section.
On-Page Optimization | | Court_H0 -
Avoiding "Duplicate Page Title" and "Duplicate Page Content" - Best Practices?
We have a website with a searchable database of recipes. You can search the database using an online form with dropdown options for: Course (starter, main, salad, etc)
On-Page Optimization | | smaavie
Cooking Method (fry, bake, boil, steam, etc)
Preparation Time (Under 30 min, 30min to 1 hour, Over 1 hour) Here are some examples of how URLs may look when searching for a recipe: find-a-recipe.php?course=starter
find-a-recipe.php?course=main&preperation-time=30min+to+1+hour
find-a-recipe.php?cooking-method=fry&preperation-time=over+1+hour There is also pagination of search results, so the URL could also have the variable "start", e.g. find-a-recipe.php?course=salad&start=30 There can be any combination of these variables, meaning there are hundreds of possible search results URL variations. This all works well on the site, however it gives multiple "Duplicate Page Title" and "Duplicate Page Content" errors when crawled by SEOmoz. I've seached online and found several possible solutions for this, such as: Setting canonical tag Adding these URL variables to Google Webmasters to tell Google to ignore them Change the Title tag in the head dynamically based on what URL variables are present However I am not sure which of these would be best. As far as I can tell the canonical tag should be used when you have the same page available at two seperate URLs, but this isn't the case here as the search results are always different. Adding these URL variables to Google webmasters won't fix the problem in other search engines, and will presumably continue to get these errors in our SEOmoz crawl reports. Changing the title tag each time can lead to very long title tags, and it doesn't address the problem of duplicate page content. I had hoped there would be a standard solution for problems like this, as I imagine others will have come across this before, but I cannot find the ideal solution. Any help would be much appreciated. Kind Regards5