10,000 New Pages of New Content - Should I Block in Robots.txt?
-
I'm almost ready to launch a redesign of a client's website. The new site has over 10,000 new product pages, which contain unique product descriptions, but do feature some similar text to other products throughout the site.
An example of the page similarities would be the following two products:
-
Brown leather 2 seat sofa
-
Brown leather 4 seat corner sofa
Obviously, the products are different, but the pages feature very similar terms and phrases.
I'm worried that the Panda update will mean that these pages are sand-boxed and/or penalised.
Would you block the new pages? Add them gradually? What would you recommend in this situation?
-
-
Consider reversing your thinking from "what will be my loss to panda" into "what can I do to make this site kick ass".
Reach for opportunity, extend yourself.
If this was my site I would get a writer on those product descriptions to make them unquestionably unique, beef them up, add salesmanship and optimize them for search. This will give you substantive unique content, that converts better, pulls more long tail traffic and moves out of competition with other sites that do the minimal.
Sure, it will cost money but in the long run it could bring back a huge return.
My only caution on this is that if you make this investment in writing you need to do that on a site that has can pull reasonable traffic. If you do this on a site that has no links it will not do you much good. It is part of a marketing plan not a single item on a "to do" list.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to speed up transition towards new 301 redirected landing pages?
Hi SEO's, I have a question about moving local landing pages from many separate pages towards integrating them into a search results page. Currently we have many separate local pages (e.g. www.3dhubs.com/new-york). For both scalability and conversion reasons, we'll integrate our local pages into our search page (e.g. www.3dhubs.com/3d-print/Bangalore--India). **Implementation details: **To mitigate the risk of a sudden organic traffic drop, we're currently running a test on just 18 local pages (Bangalore) = 1 / 18). We applied a 301 redirect from the old URL's to the new URL's 3 weeks ago. Note: We didn't yet update the sitemap for this test (technical reasons) and will only do this once we 301 redirect all local pages. For the 18 test pages I manually told the crawlers to index them in webmaster tools. That should do I suppose. **Results so far: **The old url's of the 18 test cities are still generating > 99% of the traffic while the new pages are already indexed (see: https://www.google.nl/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=site:www.3dhubs.com/3d-print/&start=0). Overall organic traffic on test cities hasn't changed. Questions: 1. Will updating the sitemap for this test have a big impact? Google has already picked up the new URL's so that's not the issue. Furthermore, the 301 redirect on the old pages should tell Google to show the new page instead, right? 2. Is it normal that search impressions will slowly shift from the old page towards the new page? How long should I expect it to take before the new pages are consistently shown over the old pages in the SERPS?
Intermediate & Advanced SEO | | robdraaijer0 -
Our parent company has included their sitemap links in our robots.txt file - will that have an impact on the way our site is crawled?
Our parent company has included their sitemap links in our robots.txt file. All of their sitemap links are on a different domain and I'm wondering if this will have any impact on our searchability or potential rankings.
Intermediate & Advanced SEO | | tsmith1310 -
Duplicate content on product pages
Hi, We are considering the impact when you want to deliver content directly on the product pages. If the products were manufactured in a specific way and its the same process across 100 other products you might want to tell your readers about it. If you were to believe the product page was the best place to deliver this information for your readers then you could potentially be creating mass content duplication. Especially as the storytelling of the product could equate to 60% of the page content this could really flag as duplication. Our options would appear to be:1. Instead add the content as a link on each product page to one centralised URL and risk taking users away from the product page (not going to help with conversion rate or designers plans)2. Put the content behind some javascript which requires interaction hopefully deterring the search engine from crawling the content (doesn't fit the designers plans & users have to interact which is a big ask)3. Assign one product as a canonical and risk the other products not appearing in search for relevant searches4. Leave the copy as crawlable and risk being marked down or de-indexed for duplicated contentIts seems the search engines do not offer a way for us to serve this great content to our readers with out being at risk of going against guidelines or the search engines not being able to crawl it.How would you suggest a site should go about this for optimal results?
Intermediate & Advanced SEO | | FashionLux2 -
Wildcarding Robots.txt for Particular Word in URL
Hey All, So I know that this isn't a standard robots.txt, I'm aware of how to block or wildcard certain folders but I'm wondering whether it's possible to block all URL's with a certain word in it? We have a client that was hacked a year ago and now they want us to help remove some of the pages that were being autogenerated with the word "viagra" in it. I saw this article and tried implementing it https://builtvisible.com/wildcards-in-robots-txt/ and it seems that I've been able to remove some of the URL's (although I can't confirm yet until I do a full pull of the SERPs on the domain). However, when I test certain URL's inside of WMT it still says that they are allowed which makes me think that it's not working fully or working at all. In this case these are the lines I've added to the robots.txt Disallow: /*&viagra Disallow: /*&Viagra I know I have the solution of individually requesting URL's to be removed from the index but I want to see if anybody has every had success with wildcarding URL's with a certain word in their robots.txt? The individual URL route could be very tedious. Thanks! Jon
Intermediate & Advanced SEO | | EvansHunt0 -
Duplicate Content From Indexing of non- File Extension Page
Google somehow has indexed a page of mine without the .html extension. so they indexed www.samplepage.com/page, so I am showing duplicate content because Google also see's www.samplepage.com/page.html How can I force google or bing or whoever to only index and see the page including the .html extension? I know people are saying not to use the file extension on pages, but I want to, so please anybody...HELP!!!
Intermediate & Advanced SEO | | WebbyNabler0 -
How is it possible to 301 specific pages to a new domain?
The old site is small, only 100 pages or so, and about 10 of them are particularly useful. I would like to 301 those 10 pages to 10 similar pages on the new site, and also 301 the other 90 pages to the new site... the new site's home page, I suppose. Does it make sense to do this and if so how? I think if I simply 301 the whole of the old domain to the new one, the juice will be shared among the new site's page equally which is not what I want. I know where the htaccess file is and I can 301 a page within a domain but I'm at a loss with this. Thanks for any help. EDIT: I'm hoping for something like this: old.com/page_1 >> new.com/page_A old.com/page_2 >> new.com/page_B ... and 8 more of those And then the other 90 pages: old.com/Remaining pages >> new.com/index
Intermediate & Advanced SEO | | Brocberry0 -
SEO-Friendly Method to Load XML Content onto Page
I have a client who has about 100 portfolio entries, each with its own HTML page. Those pages aren't getting indexed because of the way the main portfolio menu page works: It uses javascript to load the list of portfolio entries from an XML file along with metadata about each entry. Because it uses javascript, crawlers aren't seeing anything on the portfolio menu page. Here's a sample of the javascript used, this is one of many more lines of code: // load project xml try{ var req = new Request({ method: 'get', url: '/data/projects.xml', Normally I'd have them just manually add entries to the portfolio menu page, but part of the metadata that's getting loaded is project characteristics that are used to filter which portfolio entries are shown on page, such as client type (government, education, industrial, residential, industrial, etc.) and project type (depending on type of service that was provided). It's similar to filtering you'd see on an e-commerce site. This has to stay, so the page needs to remain dynamic. I'm trying to summarize the alternate methods they could use to load that content onto the page instead of javascript (I assume that server side solutions are the only ones I'd want, unless there's another option I'm unaware of). I'm aware that PHP could probably load all of their portfolio entries in the XML file on the server side. I'd like to get some recommendations on other possible solutions. Please feel free to ask any clarifying questions. Thanks!
Intermediate & Advanced SEO | | KaneJamison0 -
Duplicate page Content
There has been over 300 pages on our clients site with duplicate page content. Before we embark on a programming solution to this with canonical tags, our developers are requesting the list of originating sites/links/sources for these odd URLs. How can we find a list of the originating URLs? If you we can provide a list of originating sources, that would be helpful. For example, our the following pages are showing (as a sample) as duplicate content: www.crittenton.com/Video/View.aspx?id=87&VideoID=11 www.crittenton.com/Video/View.aspx?id=87&VideoID=12 www.crittenton.com/Video/View.aspx?id=87&VideoID=15 www.crittenton.com/Video/View.aspx?id=87&VideoID=2 "How did you get all those duplicate urls? I have tried to google the "contact us", "news", "video" pages. I didn't get all those duplicate pages. The page id=87 on the most of the duplicate pages are not supposed to be there. I was wondering how the visitors got to all those duplicate pages. Please advise." Note, the CMS does not create this type of hybrid URLs. We are as curious as you as to where/why/how these are being created. Thanks.
Intermediate & Advanced SEO | | dlemieux0