Can't get auto-generated content de-indexed
-
Hello and thanks in advance for any help you can offer me!
Customgia.com, a costume jewelry e-commerce site, has two types of product pages - public pages that are internally linked and private pages that are only accessible by accessing the URL directly. Every item on Customgia is created online using an online design tool. Users can register for a free account and save the designs they create, even if they don't purchase them. Prior to saving their design, the user is required to enter a product name and choose "public" or "private" for that design. The page title and product description are auto-generated.
Since launching in October '11, the number of products grew and grew as more users designed jewelry items. Most users chose to show their designs publicly, so the number of products in the store swelled to nearly 3000. I realized many of these designs were similar to each and occasionally exact duplicates. So over the past 8 months, I've made 2300 of these design "private" - and no longer accessible unless the designer logs into their account (these pages can also be linked to directly).
When I realized that Google had indexed nearly all 3000 products, I entered URL removal requests on Webmaster Tools for the designs that I had changed to "private". I did this starting about 4 months ago. At the time, I did not have NOINDEX meta tags on these product pages (obviously a mistake) so it appears that most of these product pages were never removed from the index. Or if they were removed, they were added back in after the 90 days were up.
Of the 716 products currently showing (the ones I want Google to know about), 466 have unique, informative descriptions written by humans. The remaining 250 have auto-generated descriptions that read coherently but are somewhat similar to one another. I don't think these 250 descriptions are the big problem right now but these product pages can be hidden if necessary.
I think the big problem is the 2000 product pages that are still in the Google index but shouldn't be. The following Google query tells me roughly how many product pages are in the index: site:Customgia.com inurl:shop-for
Ideally, it should return just over 716 results but instead it's returning 2650 results. Most of these 1900 product pages have bad product names and highly similar, auto-generated descriptions and page titles. I wish Google never crawled them.
Last week, NOINDEX tags were added to all 1900 "private" designs so currently the only product pages that should be indexed are the 716 showing on the site. Unfortunately, over the past ten days the number of product pages in the Google index hasn't changed.
One solution I initially thought might work is to re-enter the removal requests because now, with the NOINDEX tags, these pages should be removed permanently. But I can't determine which product pages need to be removed because Google doesn't let me see that deep into the search results. If I look at the removal request history it says "Expired" or "Removed" but these labels don't seem to correspond in any way to whether or not that page is currently indexed. Additionally, Google is unlikely to crawl these "private" pages because they are orphaned and no longer linked to any public pages of the site (and no external links either).
Currently, Customgia.com averages 25 organic visits per month (branded and non-branded) and close to zero sales. Does anyone think de-indexing the entire site would be appropriate here? Start with a clean slate and then let Google re-crawl and index only the public pages - would that be easier than battling with Webmaster tools for months on end?
Back in August, I posted a similar problem that was solved using NOINDEX tags (de-indexing a different set of pages on Customgia): http://moz.com/community/q/does-this-site-have-a-duplicate-content-issue#reply_176813
Thanks for reading through all this!
-
I don't think there's any harm in submitting a new/full list, even if it duplicates past lists. The URLs haven't been removed, and you did fix the tags. This isn't like disavowing links - it's more of a technical issue. Worst case, it doesn't work, from what I've seen.
-
Thanks for helping me with this.
You are correct that all the product pages are in the same folder regardless of whether they are public or private so unfortunately, removing an entire folder isn't an option at this point.
When I go to Webmaster tools and view past removal requests, each one shows as either "Expired" or "Removed". WMT only allows me to resubmit the removal request if the label is "Expired". Going back past 90 days, many are still labeled "removed" but the further back I go, more and more say "Expired". There are too many requests to try to determine whether or not each page is indexed - so I think our best bet is to re-submit every expired private product page removal request and then monitor removal. Does this make sense?
Back in August, a Moz crawl showed tons of duplicates for the designer pages (the pages where the user actually designs the jewelry). Using NOINDEX tags and removal requests (credit to Dr. Pete and Everett Sizemore) the number of designer pages in the index dropped from 5K to exactly 8 - so it worked.
Our XML sitemap is dynamic and doesn't list private product pages.
-
It honestly sounds like you're on the right track - you do need to explicitly mark those (and META NOINDEX should be fine). Could you just request removal for all private pages? Worst case, Google removes some that aren't in the index, or attempts to. Since the public/private setting can be changed, you can't really put the private pages all in one folder (real or virtual) - that would make life easier, long-term, but probably isn't useful/appropriate for your case.
I'd also recommend having a clean XML sitemap with just the public entries (updated dynamically). That won't deindex the other pages, but it's one more cue Google can use. You want all of the signals you're sending to be consistent.
I agree with Doug, though - this is really tricky, because ideally you would want people to share these pages, and if you NOINDEX then you're losing out on that. My gut feeling is that, until your site is stronger, you probably can't support 3K near duplicates (and counting). If you want to get sophisticated, though, you could dynamically NOINDEX and only noindex posts that have very little content or our obvious dupes. As people fill out or share a product, you could remove the NOINDEX.
-
Hi Doug,
Thanks for the quick response. I will do my best to answer each of your points.
In Webmaster Tools, under Index Status, it shows 1781 pages indexed, with a high of 6515 on June 2, 2013. Not sure that helps to clarify anything but it's another piece of Google data to consider.
We continually monitor WMT and Analytics. I'm addressing this issue specifically because search impressions on our product pages average less than 5 impressions/day despite continuous improvements over the last 12 months - keyword research, better page titles/product names and longer, more informative descriptions. These 500 or so product pages are vastly better today than then were 12 months ago - but impressions have not improved at all.
Every design, public or private, has social/sharing buttons. As I mentioned above, these designs can all be linked to directly from any external website.
I think the category pages are sufficient. There is some fine-tuning that could be done in terms of how products are organized within categories but overall it's pretty solid and probably not an issue.
Our initial strategy was to attract long-tail traffic with user-generated content but the problem is most users gave their products personal, irrelevant (and possibly spammy) product names. There were other problems with the user generated designs as well - like one user who designed 15 earrings that looked exactly the same except for one bead which she changed to a different color for each design. Anyway, we left all these designs public for over 12 months - as more and more designs were added to the site, organic search traffic actually fell.
-
I agree with Doug.
create better category pages - make sure each product page is under a category.
the user generated products are great and should be indexed.
-
Hey Richard,
First, note that the estimated number of pages displayed by that is an estimate which gets refined the deeper you go into the search results. On page one, they tend to be wildly inaccurate.
If you go all the way to the end (page 13) and then repeat the process with ommitted results included you still get to page 13, and a total of 123 pages. (Somewhat better than the 2k+ results.)
This is less than the 716 pages you mention so maybe you've got he opposite problem? What do you see if you check your google analytics and webmaster tools? Which pages are getting organic traffic from google? Which pages are showing in the search results (Webmaster Tools, Impressions)
What are the pages you want to appear in search and what are the keywords you're targeting?
My first thought is - if you're allowing people to design your own jewellery - are you also allowing them to easily share their creations on social, etc? Have you got embed codes so that they can put their designs on their blog etc? If you're not then I think you're missing a trick.
All of these individual items, designed by users, will (should) all be linking back to the specific category pages (or other landning page) and increasing the authority of that page. Make sure your category/landing pages have good unique content that communicates both the value proposition and the products you've got available.
If you don't have these category pages, then it might be worth looking at your site architecture/hierarchy and think about creating them.
Your individual product pages might get long-tail traffic (and having lots of different variations, described in real-people's own words might actually work to your advantage here), your category pages should be the ones targeting head terms.
I notice you've no-indexed and no-followed the product pages in question. This means that if these pages are shared, then any inbound authority/link equity/link-juice/ is just being discarded. Are you sure you want to do that?
I don't think you need to worry too much about google's index at this point and I certainly wouldn't consider deindexing the whole site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can you index a Google doc?
We have updated and added completely new content to our state pages. Our old state content is sitting in a our Google drive. Can I make these public to get them indexed and provide a link back to our state pages? In theory it sounds like a great link building strategy... TIA!
Intermediate & Advanced SEO | | LindsayE1 -
Oh crap.... Just got de-indexed
Hey fam. So I'm a content creator and halfway SEO for a Locksmith company here in Oregon. As probably a lot of you know, the Locksmith industry is known for being super spammy. This company was no different. In 2009 they had like 50 urls for 5 metros. All ranking on page one or close. Well, when I came on I helped them clean up this mess and get honest. It's been quite a journey but we have 301'd almost every Url back to their one and only brand domain. We use wp-engine (whom I love) and we have never had an issue redirecting anything. But last week, we were finally ready to redirect one of their highest ranking domains back to a proper landing page on the main site. Everything looked clean and we were literally ranking number one in that metro. Then we get this email: | Hello Meier, Your SSL/TLS certificate order for locksmithplusincbendor.com had a domain validation error, so we could not successfully set up your certificate for lpibend. The most common cause for this error is that your domain is not pointed to the correct WP Engine install or IP address. To fix this error and prevent it in the future, we recommend using a CNAME record for your domain instead of an A record. If you have further questions about configuring your domain, please start a chat in the User Portal and a support tech will be glad to help. | and the next day we just disappeared from all rankings. I called wp-engine, they said they fixed a problem with the cdn the ssl. I requested reindexing in search console. Is there anything else I can do? How long should we expect to be out of the game? Thank you so much gang, it's pretty embarring to have this happen, I can't even begin to explain how livid they are with me.
Intermediate & Advanced SEO | | Meier0 -
Website can't break into Google Top100 for main keywords, considering 301 Redirect to a new domain
A little background on our case. Our website, ex: http://ourwebsite.com was officially live in December 2015 but it wasn't On-Site optimized and we haven't done any Off-site SEO to it. In April we decided to do a small redesign and we did it an online development server. Unfortunately, the developers didn't disallow crawlers and the website got indexed while we were developing it on the development server. The development version that got indexed in Google was http://dev.web.com/ourwebsite We learned that it got indexed when we migrated the new redesigned website to the initial domain. When we did the migration we decided to add www and now it looks like: http://www.ourwebsite.com Meanwhile, we deleted the development version from the development server and submitted "Remove outdated content" from the development server's Search Console. This was back in early May. It took about 15-20 days for the development version to get de-indexed and around 30 days for the original website (http://www.ourwebsite.com) to get indexed. Since then we have started our SEO campaign with Press Releases, Outreach to bloggers for Guest and Sponsored Posts etc. The website currently has 55 Backlinks from 44 Referring domains (ahrefs: UR25, DR37) moz DA:6 PA:1 with various anchor text. We are tracking our main keywords and our brand keyword in the SERPs and for our brand keyword we are position #10 in Google, but for the rest of the main (money) keywords we are not in the Top 100 results in Google. It is very frustrating to see no movement in the rankings for the past couple of months and our bosses are demanding rankings and traffic. We are currently exploring the option of using another similar domain of ours and doing a complete 301 Redirect from the original http://www.ourwebsite.com to http://www.ournewebsite.com Does this sound like a good option to you? If we do the 301 Redirect, will the link-juice be passed from the backlinks that we already have from the referring domains to the new domain? Or because the site seems "stuck," would it not pass any power to the new domain? Also, please share any other suggestions that we might use to at least break into the Top 100 results in Google? Thanks.
Intermediate & Advanced SEO | | DanielGorsky0 -
User generated content (Comments) - What impact do they have?
Hello MOZ stars! I have a question regarding user comments on article pages. I know that user generated content is good for SEO, but how much impact does it really have? For your information:
Intermediate & Advanced SEO | | idg-sweden
1 - All comments appears in source code and is crawled by spiders.
2 - A visitor can comment a page for up to 60 days.
3 - The amount of comments depends on the topic, we usually gets between 3-40 comments. My question:
1 - If we were to remove comments completely, what impact would it have from seo perspective? (I know you cant be certain - but please make an educated guess if possible)
2 - If it has a negative and-/or positive impact please specify why! 🙂 If anything is unclear or you want certain information don't hesitate to ask and I'll try to specify. Best regards,
Danne0 -
My warning report says I have too many on page links - 517! I can't find 50% of them but my q is about no follow
if we put 'no follow' on some of these links does that mean the search engines won't index the no follow pages even if those pages are linked to from elsewhere? no link juice will flow from the page with the (no follow) links on? Just trying to understand why my rankings have dropped so dramatically in the last 6 weeks or so since we redesigned the site, and it might be that now we have too many links on the homepage. This is the page http://www.suffolktouristguide.com/ All suggestions appreciated!
Intermediate & Advanced SEO | | SarahinSuffolk0 -
Can literally any site get 'burned'?
Just curious what people think. The SEOMOZ trust on my site has gone up, all while Google is dropping us in rankings for lots of keywords. Just curious if this can happen to anyone or once you are 100% 'trusted' you're good. We went from 120,000 page views down to about 50,000. All while doubling content, improving the design(at least from a user perspective), and getting more natural links. Seems counter intuitive to Google's mantra of ranking quality. I would guess 'authority' sites never get hit by these updates right? So when you make it you've made it.(at least from a dropping like a rock perspective, obviously you have to keep working). I'm guessing we just need a bunch more quality links but would hate to work on building links, quality content, trust etc for it to be something so finicky long term.
Intermediate & Advanced SEO | | astahl110 -
My Google title isn't showing what is entered
Help! On Yahoo and Bing, if you search "Chant Real Estate" the full title that I've entered appears in the search listings: Chant PA Real Estate | Real Estate PA | Pennsylvania: Find PA Homes for Sale But in Google it only shows "Chant PA Real Estate". This title was what the original developer used for their site and it's been almost a year now that it's been under our control. Any suggestions? The URL is www.chantre.com.
Intermediate & Advanced SEO | | gXe0 -
Google sees redirect when there isn't any?
I've posted a question previously regarding the very strange changes in our search positions here http://www.seomoz.org/q/different-pages-ranking-for-search-terms-often-irrelevant New strange thing I've noticed - and very disturbing thing - seems like Google has somehow glued two pages together. Or, in other words, looks like Google sees a 301 redirect from one page to another. This, actually, happened to several pages, I'll illustrate it with our Flash templates page. URL: http://www.templatemonster.com/flash-templates.php
Intermediate & Advanced SEO | | templatemonster
Has been #3 for 'Flash templates' in Google. Reasons why it looks like redirect:
Reason #1
Now this http://www.templatemonster.com/logo-templates.php page is ranking instead of http://www.templatemonster.com/flash-templates.php
Also, http://www.templatemonster.com/flash-templates.php is not in the index.
That what would typically happen if you had 301 from Flash templates to logo templates page. Reason #2
If you search for cache:http://www.templatemonster.com/flash-templates.php Google will give the cahced version of http://www.templatemonster.com/logo-templates.php!!!
If you search for info:www.templatemonster.com/flash-templates.php you again get info on http://www.templatemonster.com/logo-templates.php instead! Reason #3
In Google Webmaster Tools when I look for the external links to http://www.templatemonster.com/logo-templates.php I see all the links from different sites, which actually point to http://www.templatemonster.com/flash-templates.php listed as "Via this intermediate link: http://www.templatemonster.com/flash-templates.php" As I understand Google makes this "via intermediate link" when there's a redirect? That way, currently Google thinks that all the external links we have for Flash templates are actually pointing to Logo templates? The point is we NEVER had any kind of redirect from http://www.templatemonster.com/flash-templates.php to http://www.templatemonster.com/logo-templates.php I've seen several similar situations on Google Help forums but they were never resolved. So, I wonder if anybody can explain how that could have happened, and what can be done to solve that problem?0