How do I Address Low Quality/Duplicate Content Issue for a Job portal?
-
Hi,
I want to optimize my job portal for maximum search traffic.
Problems
- Duplicate content- The portal takes jobs from other portals/blogs and posts on our site. Sometimes employers provide the same job posting to multiple portals and we are not allowed to change it resulting in duplicate content
- Empty Content Pages- We have a lot of pages which can be reached via filtering for multiple options. Like IT jobs in New York. If there are no IT jobs posted in New York, then it's a blank page with little or no content
- Repeated Content- When we have job postings, we have about the company information on each job listing page. If a company has 1000 jobs listed with us, that means 1000 pages have the exact same about the company wording
Solutions Implemented
- Rel=prev and next. We have implemented this for pagination. We also have self referencing canonical tags on each page. Even if they are filtered with additional parameters, our system strips of the parameters and shows the correct URL all the time for both rel=prev and next as well as self canonical tags
- For duplicate content- Due to the volume of the job listings that come each day, it's impossible to create unique content for each. We try to make the initial paragraph (at least 130 characters) unique. However, we use a template system for each jobs. So a similar pattern can be detected after even 10 or 15 jobs. Sometimes we also take the wordy job descriptions and convert them into bullet points. If bullet points already available, we take only a few bullet points and try to re-shuffle them at times
Can anyone provide me additional pointers to improve my site in terms of on-page SEO/technical SEO?
Any help would be much appreciated.
We are also thinking of no-indexing or deleting old jobs once they cross X number of days. Do you think this would be a smart strategy? Should I No-index empty listing pages as well?
Thank you.
-
Unique Listing Copy
I would try to get that unique content to the top of the source order - it doesn't necessarily have to appear at the top of the page - it could be in a sidebar for instance, but it should be first in the source so that Googlebot gobbles it up before it reaches duplicate stuff or secondary nav / footer links etc.
No Results pages
Yes, you could certainly noindex your no-results pages with a robots meta tag - that would be a good idea.
Loading duplicate content with ajax
In terms of Google and ajax content, yes Googlebot can and does follow links it finds in javascript.
All I can tell you here is my own experience On my product detail template, I have loaded up category descriptions with ajax that appear canonically (if that's the right way of putting it) on my listing pages. In the SERPs, the cat description content is indexed for the pages I want it to be indexed for (the listings in this case), and not for the product detail pages where I'm loading with ajax. And these product detail pages still perform well and get good organic landing traffic.
On the product detail page where I'm loading with ajax, I have the copy in an accordion, and it's loaded with an ajax request on document ready. It might be considered slightly more cochre to do this in response to a user action though - such as clicking on the accordion in my case. The theory being that you're making your site responsive to a user's needs, rather than loading up half the content one way and the other half another way, if you get what I mean.
Sometimes of course you just cannot avoid certain amounts of content duplication within your site.
-
Luke,
Thank you for your detailed reply.
I forgot to mention that for each of our important filter pages (like IT jobs in New York) we do have a unique paragraph text which is human readable and at the same time SEO optimized (They are around 200 words long and is not there for all filter pages due to the volume of such pages.) This unique block of text rests at the bottom of the page, just above the footer, after all the latest 20 job listings are shown.
"Filtering & Blank Results Pages Could this not be done with javascript / ajax, so that Google never finds an empty listing?"
I am afraid this cannot be done due to the structure of our system. No-Indexing them would be much more easier. Wouldn't it do?
"You could load this content from an ajax template, either as the page loads, or in response to a user action (eg. click on a button 'show company details')."
Sounds like a good idea. Are you sure Google will not consider this as cloaking and that Google cannot read Ajax content?
"Try not to load up the duplicate description by default."- Do you mean we should implement Ajax again for this part?
"You will want to, where possible, specify a view-all page for Google"- not sure if this will be possible from our side due to engineering limitations. I thought rel=next and prev would solve the issue. However, I still see intermediate pages indexed.
-
Hi, I've tried to address your issues point by point according to your post...
Duplicate Job Posting Content
You can try to offset this by having a couple hundred words of unique copy per listing page url exposed to Google. So, if your page lists all jobs in the catering industry in New Jersey for instance, write some copy on the topic, and try to make it readable and useful to the user as well. Add microdata to the template using schema.org, so that Google can understand what's there at the data level - there will likely be entities available there to describe your content in terms of location, the companies that are hiring, etc.
I'm inclined to say don't bother with reshuffling duplicate content and adding bullet points to it - Google is smart enough to recognise that this copy is the same, and will not give you any points - perhaps the opposite - for trying to disguise this.
Filtering & Blank Results Pages
Could this not be done with javascript / ajax, so that Google never finds an empty listing?
'About the Company' Duplicate Content
You could load this content from an ajax template, either as the page loads, or in response to a user action (eg. click on a button 'show company details'). I have solved this exact problem like this in the past - loading a tour category description that appears on a great many tour detail pages.
Perhaps you can do as I'm suggesting above for the job description duplication - where possible, and as long as it's done in a way that does not come across as cloaking. It's good that you have a unique paragaph above the duplicate description.Try not to load up the duplicate description by default. I'm not sure on your source order or site / template structure so difficult to get too detailed here and I don't want to risk suggesting something that could be interpreted as a violation of Google's guidelines.
Pagination
You will want to, where possible, specify a view-all page for Google - this is suggested by them as a best practice, and in my experience, Googlebot loves to find chunky listing content, PROVIDED that it loads quickly enough not to hamper the user experience.
You can make sure of this by lazyloading images and other media. Be sure to specify the correct image src attributes (not spacer.gif for instance) inside of noscript tags to make sure that image content is still indexed.
You could also load up the markup for all items in the listing, and then use javascript to chunk the content into 'pages', or load it asynchronously where javascript is available. If no javascript, then load all content. By using javascript pagination, you basically avert the need for a separate view all page, meaning only have one template to maintain and optimise.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Affiliate Url & duplicate content
Hi i have checked passed Q&As and couldn't find anything on this so thought I would ask.
Technical SEO | | Direct_Ram
I have recently noticed my URLS adding the following to the end: mydomain.com/?fullweb=1 I cant seem to locate where these URLS are coming from and how this is being created? This is causing duplicate content on google. I wanted to know ig anyone has had any previous experience with something like this? If anyone has any information on this it would be a great help. thanks E0 -
PR / News stories across multiple sites - is it still duplicate content?
I was wondering does Google make an exception for news stories where duplicate content is concerned? After all depending on the story there can be a lot of quotes and bulk blocks of the same details. Is Google intelligent enough to distinguish between general website content and actual news stories? Also like a lot of big firms we publish news stories on our website, but then they get passed on to other websites in the form of PR, and then published on other websites. So if we put it on our website, then within a few hours or the same day other websites publish the story at the same time (literally copied and pasted) - how does this affect our website in terms of duplicate content? Will Google know automatically that we published it first? Thanks!
Technical SEO | | Brabian0 -
Duplicate Content Issues on Product Pages
Hi guys Just keen to gauge your opinion on a quandary that has been bugging me for a while now. I work on an ecommerce website that sells around 20,000 products. A lot of the product SKUs are exactly the same in terms of how they work and what they offer the customer. Often it is 1 variable that changes. For example, the product may be available in 200 different sizes and 2 colours (therefore 400 SKUs available to purchase). Theese SKUs have been uploaded to the website as individual entires so that the customer can purchase them, with the only difference between the listings likely to be key signifiers such as colour, size, price, part number etc. Moz has flagged these pages up as duplicate content. Now I have worked on websites long enough now to know that duplicate content is never good from an SEO perspective, but I am struggling to work out an effective way in which I can display such a large number of almost identical products without falling foul of the duplicate content issue. If you wouldnt mind sharing any ideas or approaches that have been taken by you guys that would be great!
Technical SEO | | DHS_SH0 -
Duplicate content or Duplicate page issue?
Hey Moz Community! I have a strange case in front of me. I have published a press release on my client's website and it ranked right away in Google. A week after the page completely dropped and it completely disappeared. The page is being indexed in Google, but when I search "title of the PR", the only results I get for that search query are the media and news outlets that have reported the news. No presence of my client's page. I also have to mention that I found two URLs of the same page: one with lower case letters and one with capital letters. Is this a duplicate page or a duplicate content issue coming from the news websites? How can I solve it? Thanks!
Technical SEO | | Workaholic0 -
Google Webmasters Quality Issue Message
I am a consultant who works for a website www.skift.com. Today we received an automated message from Google Webmasters saying our site has quality issues. Since the message is very vague and obviously automated I was hoping to get some insight into whether this message is something to be very concerned about and what can be done to correct the issue.From reviewing the Webmasters Quality Guidelines, the site is not in violation of any of the guidelines. I am wondering if this message is generated as a results of licensing content from Newscred, as I have other clients who are licensing content from Newscred and getting the same message from Google Webmasters.Thanks in advance for any assistance.
Technical SEO | | electricpulp0 -
Development Website Duplicate Content Issue
Hi, We launched a client's website around 7th January 2013 (http://rollerbannerscheap.co.uk), we originally constructed the website on a development domain (http://dev.rollerbannerscheap.co.uk) which was active for around 6-8 months (the dev site was unblocked from search engines for the first 3-4 months, but then blocked again) before we migrated dev --> live. In late Jan 2013 changed the robots.txt file to allow search engines to index the website. A week later I accidentally logged into the DEV website and also changed the robots.txt file to allow the search engines to index it. This obviously caused a duplicate content issue as both sites were identical. I realised what I had done a couple of days later and blocked the dev site from the search engines with the robots.txt file. Most of the pages from the dev site had been de-indexed from Google apart from 3, the home page (dev.rollerbannerscheap.co.uk, and two blog pages). The live site has 184 pages indexed in Google. So I thought the last 3 dev pages would disappear after a few weeks. I checked back late February and the 3 dev site pages were still indexed in Google. I decided to 301 redirect the dev site to the live site to tell Google to rank the live site and to ignore the dev site content. I also checked the robots.txt file on the dev site and this was blocking search engines too. But still the dev site is being found in Google wherever the live site should be found. When I do find the dev site in Google it displays this; Roller Banners Cheap » admin dev.rollerbannerscheap.co.uk/ A description for this result is not available because of this site's robots.txt – learn more. This is really affecting our clients SEO plan and we can't seem to remove the dev site or rank the live site in Google. In GWT I have tried to remove the sub domain. When I visit remove URLs, I enter dev.rollerbannerscheap.co.uk but then it displays the URL as http://www.rollerbannerscheap.co.uk/dev.rollerbannerscheap.co.uk. I want to remove a sub domain not a page. Can anyone help please?
Technical SEO | | SO_UK0 -
Determining where duplicate content comes from...
I am getting duplicate content warnings on the SEOMOZ crawl. I don't know where the content is duplicated. Is there a site that will find duplicate content?
Technical SEO | | JML11790 -
WordPress Duplicate Content Issues
Everyone knows that WordPress has some duplicate content issues with tags, archive pages, category pages etc... My question is, how do you handle these issues? Is the smart strategy to use robots meta and add no follow/ no index category pages, archive pages tag pages etc? By doing this are you missing out on the additional internal links to your important pages from you category pages and tag pages? I hope this makes sense. Regards, Bill
Technical SEO | | wparlaman0