Metadata and duplicate content issues
-
Hi there: I'm seeing a steady decline in organic traffic, but at the same time and increase in pageviews and direct traffic. My site has about 3,000 crawl errors!! Errors are duplicate content, missing description tags, and description too long. Most of these issues are related to events that are being imported from Google calendars via ical and the pages created from these events. Should we block calendar events from being crawled by using the disallow directive in the robots.txt file? Here's the site: https://www.landmarkschool.org/
-
Yes, of course you can keep running the calendar .
But you have to keep in mind somes pages will still appear in search results even when you has deleted those URL.
You can watch this video
Matt Cutts explains why a page that is disallowed in robots.txt may still appear in Google's search results.On that case just to make sure, you can implement a 301 redirection.
This is going to be your second line defense. Just redirect all of those URLs to your home page.
There are many option to make a redirection. In my I'm case wordpress user so, whit a simple plugin I can resolve the problem in 5 minutes, in your case I have been checking your website and I have no idea which cms you are using.
Anyway you can use this app 301 Redirect Code Generator with many option available
PHP, JS, ASP, ASP.NET and of course APACHE (htaccess)Now is the right moment to use the list that I mentioned in my first answer.
(2 - Create a list of all url that you want disable)**So lets talk about your second question. **
Of course it will hurt your ranking, if you have 3020 index pages on google but just 20 of those pages are useful for the users you have a big problem.A website should address any question or concern that a current or potential customer or client may have. If it doesn’t, the website is essentially useless.
with a simple divison 20 / 3020= 0.00625 less that 1% of your site is useful. So Im pretty sure that your rank has ben affected.
Dont forget mark my answer as a "GOOD ANSWER" that will make me happy, and good luck.
-
Hi Roman: Thanks so much for your prompt reply. I agree that using robots.txt is the way to go. I do not want to disable the google calendar sync (we're a school and need our events to feed from several google calendars). I want to confirm that the robots.txt option will still work if the calendars are still syncing with the site.
One more question--do you think that all these errors are causing the dip in organic traffic?
-
SOLUTION
1 - You have to disable the google calendar sync with your website
2 - Create a list of all url that you want disable
3 - At this point you have multiples option to block those URLs that you want to exclude from search engines.So first lets define your problem
By blocking a URL on your site, you can stop Google from indexing that web page for display in Google Search results. In other words, people looking through Google Search results can't see or navigate to a blocked URL or its content.
If you have pages or other content that you don't want to appear in Google Search results, you can do this using a number of options:
- robots.txt files (Best Option)
- meta tags
- password-protection of web server files
In your case the option 2 will take a lot of time, why? beacuse you will have to manually add the "noindex" meta tag to each page, one by one....no make sense and the option 3 requires some server configurations and for me are little bit complex and time consuming at leats in my case, I would have to research on google, see some videos on Youtube and see what happen.
So firts option is the winner for me ....let see some example of how your robot.txt should look like.
- The following example "/robots.txt" file specifies that no robots should visit any URL starting with "/events/january/" or "/tmp/", or /calendar.html:
<------------------------------START HERE------------------------------>
robots.txt for https://www.landmarkschool.org/
User-agent: *
Disallow: /events/january/ # This is an infinite virtual URL space
Disallow: /tmp/ # these will soon disappear
Disallow: /calendar.html
<------------------------------END HERE------------------------------>FOR MORE INFO SEE THE VIDEO > https://www.youtube.com/watch?v=40hlRN0paks
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Drupal SEO Issues
Hi, I have two questions regarding my enterprise website. It is built on the Drupal CMS. First, and in looking at Google Analytics, I'm seeing more than 6k pages listed, but over 5k have received less than 10 page views in six months. In fact, most of them are not really content pages at all. The URLs I'm seeing listed, which to me indicates actual crawlable content in GA, shows pages like this: http://www.domainname.com/node/2153
Reporting & Analytics | | jaccardi62
http://www.domainname.com/company/careers?gnk=apply&gni=8a87142e4d086a73014d2a0d65242b8e&gns=glassdoor+free
http://www.domainname.com/blog?page=1
http://www.domainname.com/resources/videos?field_video_category_value=all&page=4&page=1
http://www.domainname.com/search/site?search_api_views_fulltext=talent+pool What is the problem here? Why are these non-pages being indexed as content and why are they showing up in GA? Second question is about my blog and blog best practices. While I know blog content is important for SEO, why is my site blog pagination being indexed as content. For example, these "pages" are showing up in SERPs: http://www.domainname.com/blog/tag/business_intelligence?page=2
http://www.domainname.com/blog/topic/expansion?page=5
http://www.domainname.com/blog/weeks_news_april_26 What is the best way to fix this? Thanks in advance for your help!0 -
Referral issue in Google analytics
We have an eCommerce website that counts paypal as a referral source in Analytics. The site takes people to Paypal to make a payment and then back to the website to a Thank You page once that payment has been made. Due to this, Analytics sees this as a conversion that has come from Paypal, and also records it as a referral source, when we know this is not really the case. This also distorts the data in analytics and prohibits us from clearly seeing which channels sales have come from. Is there anyway in Analytics to include Paypal as a part of the website so that it does not record Paypal as a separate referral website?
Reporting & Analytics | | Gavo0 -
Duplicate Title Errors on Product Category Pages - The best practice?
I'm getting quite a few 'Duplicate Title Error' on category pages which span over 2 - 3 pages. E.g. http://www.partwell.com/cutting-punches http://www.partwell.com/cutting-punches?page=1 http://www.partwell.com/cutting-punches?page=2 http://www.partwell.com/cutting-punches?page=3 All 4 pages currently have the same title... <title>Steel Cutting Punches</title> I was thinking of adding Page Numbers to the title of each corresponding page, thus making them all unique and clearing the Duplicate Page Title errors. E.g. <title>Steel Cutting Punches</title> <title>Steel Cutting Punches | Page 1 of 3</title> <title>Steel Cutting Punches | Page 2 of 3</title> <title>Steel Cutting Punches | Page 3 of 3</title> Is this the best way to go around it? Or is there another way that I'm not thinking of? Would I need to use the rel=canonical tag to show that the original page is the one I want to be found? Thanks
Reporting & Analytics | | bricktech0 -
Moz Crawler suddenly reporting 1000s of duplicates (BE.net)
In the last 3-4 days we've had several thousand 'duplicate content' warnings appear in our crawl report, 99% of them related to our on-site blog. The blog is BlogEngine.Net, but the pages simply don't exist. The majority seem to be Roger trying quasi-random URLs like:
Reporting & Analytics | | Progauto
/?page=410 /?page=151 Etc. etc. The blog will present content for these requests, but it is of course the same empty page since there's only unique content for up to /?Page=10 or so. Two questions: 1. Did something change recently? These blogs have been up for months, and this problem has only come up this week. Did Roger change to become more aggressive lately? 2. Suggested remediation? On one of the blogs I've put no-index no-follow for any page that has a /?page querystring, and we'll see what effect that has come next crawl next week. However, I'm not sure this will work as per: http://moz.com/community/q/functionality-of-seomoz-crawl-page-reports Anyone else had dynamic blogs suddenly blossom into thousands of duplicate content warnings? Google (rightly) ignores these pages completely.0 -
What is the best way to eliminate this specific image low lying content?
The site in question is www.homeanddesign.com where we are working on recovering from some big traffic loss. I finally have gotten the sites articles properly meta titled and descriptioned now I'm working on removing low lying content. The way there CMS is built, images have their own page (every one that's clickable). So this leads to a lot of thin content that I think needs to be removed from the index. Here is an example: http://www.homeanddesign.com/photodisplay.asp?id=3633 I'm considering the best way to remove it from the index but not disturb how users enjoy the site. What are my options? Here is what I'm thinking: add Disallow: /photodisplay to the robots.txt file See if there is a way to make a lightbox instead of a whole new page for images. But this still leaves me with 100s of pages with just an image on there with backlinks, etc. Add noindex tag to the photodisplay pages
Reporting & Analytics | | williammarlow0 -
2 questions on avoiding issues with Google and while being right in it.
Hi SEOmoz community In fact I have two questions I would like to ask (with future SEO in mind). Do you consider a WordPress Multisite or various Single installs 'safer' for SEO? Theoretically, having various sites packed into one Multisite network seems like an ideal solution. However, is there a chance that once a site in the network encounters a little 'negative turbulence', that your other sites in the network might get impacted too due to the cross-referencing, linked account i.e. Webmaster Tools etc.? It would seem outrageous, but then again I wouldn't rule it out. Do I even have to go as far as setting up new Gmail, Google Analytics and Webmaster Tools accounts, so they (the sites) are technically not linked? You can see, I don't trust search engines one bit... Is there still a point posting articles once Google is having a hissy fit with your site? Basically I am currently going through a 'rankings and traffic drops storm'. It's not as bad as being de-indexed, but it's still having enough of an impact. In addition, Google does not seem to treat my new articles (unique content) with the same attention anymore i.e. does not seem to index them 'fully' or not at all (i.e. posting the headline in Google should return the article, but it doesn't). Is there even a point spending time now and posting new material or may it pick it up again once I am through this low phase? Does Google still index what it considers worth or is it a waste of time right now to keep posting, posting and posting more? Thanks for your help. I really appreciate it.
Reporting & Analytics | | Hermski0 -
Solving link and duplicate content errors created by Wordpress blog and tags?
SEOmoz tells me my site's blog (a Wordpress site) has 2 big problems: a few pages with too many links and duplicate content. The problem is that these pages seem legit the way they are, but obviously I need to fix the problem, sooooo... Duplicate content error: error is a result of being able to search the blog by tags. Each blog post has mutliple tags, so the url.com/blog/tag pages occasionally show the same articles. Anyone know of a way to not get penalized for this? Should I exclude these pages from being crawled/sitemapped? Too many links error: SEOmoz tells me my main blog page has too many links (both url.com/blog/ and url.com/blog-2/) - these pages have excerpts of 6 most recent blog posts. I feel like this should not be an error... anyone know of a solution that will keep the site from being penalized by these pages? Thanks!
Reporting & Analytics | | RUNNERagency0 -
Campaign tracking and duplicate content
Hi all, When you set up campaign tracking in Google Analytics you get something like this "?variable=value parameters" in the URL. If you place such a link on your site as an internal link, will it be considered as a different URL and will have its own link value? The question I have is, since Google knows it's a Google link and knows the original URL (by stripping the tags), does it pass link value to the original URL? If not, what can be done to pass link value? Thanks in advance. Henry
Reporting & Analytics | | hnydnn0