Metadata and duplicate content issues
-
Hi there: I'm seeing a steady decline in organic traffic, but at the same time and increase in pageviews and direct traffic. My site has about 3,000 crawl errors!! Errors are duplicate content, missing description tags, and description too long. Most of these issues are related to events that are being imported from Google calendars via ical and the pages created from these events. Should we block calendar events from being crawled by using the disallow directive in the robots.txt file? Here's the site: https://www.landmarkschool.org/
-
Yes, of course you can keep running the calendar .
But you have to keep in mind somes pages will still appear in search results even when you has deleted those URL.
You can watch this video
Matt Cutts explains why a page that is disallowed in robots.txt may still appear in Google's search results.On that case just to make sure, you can implement a 301 redirection.
This is going to be your second line defense. Just redirect all of those URLs to your home page.
There are many option to make a redirection. In my I'm case wordpress user so, whit a simple plugin I can resolve the problem in 5 minutes, in your case I have been checking your website and I have no idea which cms you are using.
Anyway you can use this app 301 Redirect Code Generator with many option available
PHP, JS, ASP, ASP.NET and of course APACHE (htaccess)Now is the right moment to use the list that I mentioned in my first answer.
(2 - Create a list of all url that you want disable)**So lets talk about your second question. **
Of course it will hurt your ranking, if you have 3020 index pages on google but just 20 of those pages are useful for the users you have a big problem.A website should address any question or concern that a current or potential customer or client may have. If it doesn’t, the website is essentially useless.
with a simple divison 20 / 3020= 0.00625 less that 1% of your site is useful. So Im pretty sure that your rank has ben affected.
Dont forget mark my answer as a "GOOD ANSWER"
that will make me happy, and good luck.
-
Hi Roman: Thanks so much for your prompt reply. I agree that using robots.txt is the way to go. I do not want to disable the google calendar sync (we're a school and need our events to feed from several google calendars). I want to confirm that the robots.txt option will still work if the calendars are still syncing with the site.
One more question--do you think that all these errors are causing the dip in organic traffic?
-
SOLUTION
1 - You have to disable the google calendar sync with your website
2 - Create a list of all url that you want disable
3 - At this point you have multiples option to block those URLs that you want to exclude from search engines.So first lets define your problem
By blocking a URL on your site, you can stop Google from indexing that web page for display in Google Search results. In other words, people looking through Google Search results can't see or navigate to a blocked URL or its content.
If you have pages or other content that you don't want to appear in Google Search results, you can do this using a number of options:
- robots.txt files (Best Option)
- meta tags
- password-protection of web server files
In your case the option 2 will take a lot of time, why? beacuse you will have to manually add the "noindex" meta tag to each page, one by one....no make sense and the option 3 requires some server configurations and for me are little bit complex and time consuming at leats in my case, I would have to research on google, see some videos on Youtube and see what happen.
So firts option is the winner for me ....let see some example of how your robot.txt should look like.
- The following example "/robots.txt" file specifies that no robots should visit any URL starting with "/events/january/" or "/tmp/", or /calendar.html:
<------------------------------START HERE------------------------------>
robots.txt for https://www.landmarkschool.org/
User-agent: *
Disallow: /events/january/ # This is an infinite virtual URL space
Disallow: /tmp/ # these will soon disappear
Disallow: /calendar.html
<------------------------------END HERE------------------------------>FOR MORE INFO SEE THE VIDEO > https://www.youtube.com/watch?v=40hlRN0paks
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Tracking links and duplicate content
Hi all, I have a bit of a conundrum for you all pertaining to a tracking link issue I have run into on a clients site. They currently have over duplicate content. Currently, they have over 15,000 pages being crawled (using Screaming Frog) but only 7,000+ are legitimate pages in the sense of they are not duplicates of themselves. The client is using Omniture instead of Google Analytics and using an advanced tracking system on their site for internal and external links (ictids and ectids) in the URL parameters. This is creating thousands of duplicated pages being crawled by Google (as seen on their Search Console and on Screaming Frog). They also are in the middle of moving over from http to https and have thousands of pages currently set up for both, again, creating a duplicate content issue. What I have suggested for the tracking links is setting up a URL parameter in Search Console for these tracking links. I've also suggested they canonical all tracking links to point to the clean page so the pages that have already been indexed point to the correct clean url. Does this seam like the appropriate strategy? Additionally, I've told them before they submit a new sitemap to Google, they need to switch their website over to https to avoid worsening their duplicate content issue. They have not submitted a sitemap to Google Search Console since March 2015. Thank you for any help you can offer!
Reporting & Analytics | | Rydch410 -
Google Analytics is treating my blog like all the content is just on the home page.
Hello all, I installed Google Analytics on a main website and a blog (blog.travelexinsurance.com) While it appears to be tracking correctly (and when I test it in real time it shows that I'm visiting) but it is treating the entire blog as though it's one page. So I can't see data on blog post X. All I see is that X visitors came to my blog in aggregate. So I see blog.travelex.com has 999 visitors, but it doesn't show that /travel-luggage got 50 visits, while /insurace-tips got 75 and so forth. I assume I screwed up the tracking somehow, but can't figure out where I went wrong. Tracking on the main domain works just fine. It's specific to the blog.
Reporting & Analytics | | Patrick_G0 -
Analytics - content performance
Hi guys, is there any way to group specific pages together in analytics and just see how they have peroformed visit wise, etc? I need to group some resort guides all with different names, it's tedious going through each one manually. Help much appreciated as always!
Reporting & Analytics | | pauledwards0 -
Duplicate page content
I'm seeing duplicate page content for tagged URLs. For example:
Reporting & Analytics | | DolbySEO
http://www.dolby.com/us/en/about-us/careers/landing.html
http://www.dolby.com/us/en/about-us/careers/landing.html?onlnk=al-sc as well as PPC campaigns. We tag certain landing pages purposefully in order to understand that traffic comes from these pages, since we use Google Analytics and don't have the abiility to see clickpaths in the package we have. Is there a way to set parameters for crawling to exclude certain pages or tagged content, such as those set up for PPC campaigns?0 -
Is Google able to determine duplicate content every day/ month?
A while ago I talked to somebody who used to work for MSN a couple of years ago within their engineering department. We talked about a recent dip we had with one of our sites.We argued this could be caused by the large amount of duplicate content we have on this particular website (+80% of our site). Then he said, quoted: "Google seems only to be able to determine every couple of months instead of every day if the content is actually duplicate content". I clearly don't doubt that duplicate content is a ranking factor. But I would like to know you guys opinions about Google being only able to determine this every couple of X months instead of everyday. Have you seen or heard something similar?
Reporting & Analytics | | Martijn_Scheijbeler0 -
Duplicate content? Split URLs? I don't know what to call this but it's seriously messing up my Google Analytics reports
Hi Friends, This issue is crimping my analytics efforts and I really need some help. I just don't trust the analytics data at this point. I don't know if my problem should be called duplicate content or what, but the SEOmoz crawler shows the following URLS (below) on my nonprofit's website. These are all versions of our main landing pages, and all google analytics data is getting split between them. For instance, I'll get stats for the /camp page and different stats for the /camp/ page. In order to make my report I need to consolidate the 2 sets of stats and re-do all the calculations. My CMS is looking into the issue and has supposedly set up redirects to the pages w/out the trailing slash, but they said that setting up the "ref canonical" is not relevant to our situation. If anyone has insights or suggestions I would be grateful to hear them. I'm at my wit's end (and it was a short journey from my wit's beginning ...) Thanks. URL www.enf.org/camp www.enf.org/camp/ www.enf.org/foundation www.enf.org/foundation/ www.enf.org/Garden www.enf.org/garden www.enf.org/Hante_Adventures www.enf.org/hante_adventures www.enf.org/hante_adventures/ www.enf.org/oases www.enf.org/oases/ www.enf.org/outdoor_academy www.enf.org/outdoor_academy/
Reporting & Analytics | | DMoff0 -
Setting up Analytics on a Site that Uses Frames For Some Content
I work with a real estate agent and he uses strings from another tool to populate the listings on his site. In an attempt to be able to track traffic to both the framed pages and the non-framed pages he has two sets of analytics code on his site - one inside the frame and one for the regular part of the site. (there's also a third that the company who hosts his site and provides all these other tools put on his site - but I don't think that's really important to this conversation). Not only is it confusing looking at the analytics data, his bounce rate is down right unmanageable. As soon as anyone clicks on any of the listings they've bounced away. Here's a page - all of those listings below " Here are the most recent Toronto Beaches Real Estate Listings" are part of a frame. http://eastendtorontohomes.com/toronto-beach-real-estate-search/ I'm not really sure what to do about it or how to deal with it? Anyone out there got any good advice? And just in case you're wondering there aren't any other options - apart from spending thousands to build his own database thingie. We've thought about that (as other agents in the city have done that), but just aren't sure it's worth it. And, quite frankly he doesn't want to spend the money.
Reporting & Analytics | | annasus0 -
Campaign tracking and duplicate content
Hi all, When you set up campaign tracking in Google Analytics you get something like this "?variable=value parameters" in the URL. If you place such a link on your site as an internal link, will it be considered as a different URL and will have its own link value? The question I have is, since Google knows it's a Google link and knows the original URL (by stripping the tags), does it pass link value to the original URL? If not, what can be done to pass link value? Thanks in advance. Henry
Reporting & Analytics | | hnydnn0