Medium sizes forum with 1000's of thin content gallery pages. Disallow or noindex?
-
I have a forum at http://www.onedirection.net/forums/ which contains a gallery with 1000's of very thin-content pages. We've currently got these photo pages disallowed from the main googlebot via robots.txt, but we do all the Google images crawler access.
Now I've been reading that we shouldn't really use disallow, and instead should add a noindex tag on the page itself.
It's a little awkward to edit the source of the gallery pages (and keeping any amends the next time the forum software gets updated).
Whats the best way of handling this?
Chris.
-
Hey Chris,
I agree that your current implementation, while not ideal, is perfectly adequate for the purposes of ensuring you don't have duplicate content or cannibalisation problems - but still allows Google to index the UCG images.
You're also preventing Googlebot from seeing the user profile pages, which is a good idea, since many of them are very thin and mostly duplicate.
So, from a pure SEO perspective, I think you've done a good job.
However... I think you should also consider the ethical implications of potentially blocking the image googlebot as well. By preventing Google from indexing all those images of young girls fawning over the vacuous runners up of a televised talent show, you would undoubtedly be doing the world a great service.
-
Hi Chris, I second Jarno's opinion in this regard. If it is going to be a huge overhead to add the page level blocking, you can rely on your current robots.txt setup. There is a small catch here though. Even if you block using robots.txt file, if Google finds a reference to the blocked content elsewhere on the Internet, then it would index the blocked content. In situations like this, page level content blocking is the way forward. So to fully restrict Google bot indexing your content, you should ideally be using the page level robots meta tag or x-robots-tag.
Here you go for more: https://support.google.com/webmasters/answer/156449?hl=en
Hope it helps.
Best,
Devanur Rafi.
-
Chris,
is the disallow meta update is too complicated for you to add due to software issues etc. then I feel that your current method is the right way to go. Normally you would be absolutely right for the simple reason that page level overrules the robots.txt. But if a software update overrules the rules places in your code then you have to manually add it after each and every update and i'm not sure you want to do that.
regards
Jarno
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
404's being re-indexed
Hi All, We are experiencing issues with pages that have been 404'd being indexed. Originally, these were /wp-content/ index pages, that were included in Google's index. Once I realized this, I added in a directive into our htaccess to 404 all of these pages - as there were hundreds. I tried to let Google crawl and remove these pages naturally but after a few months I used the URL removal tool to remove them manually. However, Google seems to be continually re/indexing these pages, even after they have been manually requested for removal in search console. Do you have suggestions? They all respond to 404's. Thanks
Technical SEO | | Tom3_151 -
Akamai's Edge Redirector good for SEO?
Hey guys, Just wondering if anyone has used/tested Akamai's new 'Edge Redirector' cloudlet?http://www.akamai.com/html/technology/edge-redirector.html It seems like it would be a better/faster option than redirects at the server level via htaccess.. thoughts? Thanks!,
Technical SEO | | wojkwasi
Woj1 -
What's wrong with this robots.txt
Hi. really struggling with the robots.txt file
Technical SEO | | Leonie-Kramer
this is it: User-agent: *
Disallow: /product/ #old sitemap
Disallow: /media/name.xml When testing in w3c.org everything looks good, testing is okay, but when uploading it to the server, Google webmaster tools gives 3 errors. Checked it with my collegue we both don't know what's wrong. Can someone take a look at this and give me the solution.
Thanx in advance! Leonie1 -
Duplicate Content Issues on Product Pages
Hi guys Just keen to gauge your opinion on a quandary that has been bugging me for a while now. I work on an ecommerce website that sells around 20,000 products. A lot of the product SKUs are exactly the same in terms of how they work and what they offer the customer. Often it is 1 variable that changes. For example, the product may be available in 200 different sizes and 2 colours (therefore 400 SKUs available to purchase). Theese SKUs have been uploaded to the website as individual entires so that the customer can purchase them, with the only difference between the listings likely to be key signifiers such as colour, size, price, part number etc. Moz has flagged these pages up as duplicate content. Now I have worked on websites long enough now to know that duplicate content is never good from an SEO perspective, but I am struggling to work out an effective way in which I can display such a large number of almost identical products without falling foul of the duplicate content issue. If you wouldnt mind sharing any ideas or approaches that have been taken by you guys that would be great!
Technical SEO | | DHS_SH0 -
Do you get credit for an external link that points to a page that's being blocked by robots.txt
Hi folks, No one, including me seems to actually know what happens!? To repeat: If site A links to /home.html on site B and site B blocks /home.html in Robots.txt, does site B get credit for that link? Does the link pass PageRank? Will Google still crawl through it? Does the domain get some juice, but not the page? I know there's other ways of doing this properly, but it is interesting no?
Technical SEO | | DaveSottimano0 -
We are still seeing duplicate content on SEOmoz even though we have marked those pages as "noindex, follow." Any ideas why?
We have many pages on our website that have been set to "no index, follow." However, SEOmoz is indexing them as duplicate content. Why is that?
Technical SEO | | cmaseattle0 -
What would you do if a site's entire content is on a subdomain?
Scenario: There is a website called mydomain.com and it is a new domain with about 300 inbound links (some going to the product pages and categories), but they have some high trust links The website has categories a, b, c etc but they are all on a subdomain so instead of being mydomain.com/categoryA/productname the entire site's structure looks like subdomain.mydomain.com/categoryA/productname Would you go to the effort of 301ing the subdomain urls to the correct url structure of mydomain.com/category/product name, or would you leave it as it is? Just interested as to the extent of the issues this could cause in the future and if this is something worth resolving sooner than later.
Technical SEO | | Kerry220 -
Switching ecommerce CMS's - Best Way to write URL 301's and sub pages?
Hey guys, What a headache i've been going through the last few days trying to make sure my upcoming move is near-perfect. Right now all my urls are written like this /page-name (all lowercase, exact, no forward slash at end). In the new CMS they will be written like this: /Page-Name/ (with the forward slash at the end). When I generate an XML sitemap in the new ecomm CMS internally it lists the category pages with a forward slash at the end, just like they show up through out the CMS. This seems sloppy to me, but I have no control over it. Is this OK for SEO? I'm worried my PR 4, well built ecommerce website is going to lose value to small (but potentially large) errors like this. If this is indeed not good practice, is there a resource about not using the forward slash at the end of URLS in sitemaps i can present to the community at the platform? They are usually real quick to make fixes if something is not up to standards. Thanks in advance, -First Time Ecommerce Platform Transition Guy
Technical SEO | | Hyrule0