Canonical URLs and screen scraping
-
So a little question here. I was looking into a module to help implement canonical URLs on a certain CMS and I came a cross a snarky comment about relative vs. absolute URLs being used. This person was insistent that relative URLs are fine and absolute URLs are only for people who don't know what they are doing.
My question is, if using relative URLs, doesn't it make it easier to have your content scraped? After all, if you do get your content scraped at least it would point back to your site if using absolute URLs, right? Am I missing something or is my thinking OK on this?
Any feedback is much appreciated!
-
Thanks for your reply, Alan. I also considered a screen scraper removing the canonical tag, but to me screen scraping seemed lazy in the first place and so maybe they wouldn't bother in most cases. I guess that a best practice with canonicals is really situation dependent.
-
Thanks, Robert. Your rational for using relative links make sense. I appreciate you helping me sort through the noise on this issue.
John
-
People don’t abuse people when you have facts on their side, reminds me of "you don’t believe in global warming, because your un-educated" argument.
I have seen just in the last few weeks where using absolute url has got me a link. I wrote a youmoz article with a link to my website, it has been copied and has the link in it. Of cause being on SEOMoz, I have to use a absolute url back to myself
I don’t usually use absolute links on my own site, I think search engines almost always know who copied who.
I agree with rob, but I will add, a good screen scraper will remove a canonical tag, but removing absolute links is not so easy, as you then have broken links, also I believe if you have image in the article linking back to you, search engines will know who the real owner is, same with css, js and a number of other refs. Screen scrapers rarely get credit for these reasons as well as the fact that if your site has a lot of duplicate, then it is obvious that you are the one coping It’s either the one site is copied from many locations or many locations have copied from the one site. -
John
You can use either and the web is full of those who go back and forth on this issue. My guess is that any really good scraper software can likely deal with absolute urls today. The advantage that we like with relative is all about page load speed - the file size is smaller with relative urls.
So, you will get arguments both ways. If scraping is a huge issue for you, maybe you go with absolute. We know people will scrape content and we continue with relative for the above reason and because it is easier to make certain changes/linking/redirects within a CMS.
Oh as to people who use absolutes not knowing what they are doing....that is bunk. They have other priorities, maybe.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Same URL for languages sub-directories
Hi All, I have a main domain and 9 different subdirectories for languages, example: www.example.com/page.html www.example.com/uk/page-uk.html www.example.com/es/page-es.html we are implementing hreflang tags for the languages, but we are thinking to get rid of the dashes on the languages URL: -uk or -es, so it will be: www.example.com/page.html www.example.com/uk/page.html www.example.com/es/page.hrml would this be a problem? to have same page names even if they are in different subdirectories? would we need to add canonical tags, at lease for the main domain URLs? www.kornferry.com/page.html Thank you, Rachel
Technical SEO | | RaquelSaiz0 -
Question on canonicals
hi community let's say i have to 2 e-commerce sites selling the same English books in different currencies - one of the site serves the UK market ( users can purchase in sterling) while another one European markets ( user can purchase in euro). Sites are identical. SEO wise, while the "European" site homepage has a good ranking across major search engines in europe, product pages do not rank very well at all. Since site is a .com too it s hard to push it in local search engines. I would like then to push one of the sites across all search engines,tackling duplicate content etc.Geotargeting would make the rest. I would like to add canonicals tag pointing at the UK version across all EU site product pages, while leaving the EU homepage rank. I have 2 doubts though: is it ok to have canonical tags pointing at an external site. is it ok to have part of a site with canonical tags, while other parts are left ranking?
Technical SEO | | Mrlocicero0 -
URL Format
Often we have web platforms that have a default URL structure that looks something like this www.widgetcompany.co.uk/widget-gallery/coloured-widgets/red-widgets This format is quite well structured but would it just be more effective to be www.widgetcompany.co.uk/red-widgets? I realise that it may depend on a lot of factors but generally is it better to have the shorter URL if targeting the key phrase "red widgets" One thing, it certainly looks a bit keyword stuffy with all those "widgets"
Technical SEO | | vital_hike0 -
Duplicate blog URLs in Magenton
On one my sites Moz is picking up 4483 duplicate content pages. The majority of these are from our blog and video sections on our site. We're using a URL shortener and it appears that some of the pages are the full version of the URL then the shortened version. However if you go to the full version you get redirected to the shorter one. So I would assume that the Moz crawler should get the same redirect? We're also getting pagination being shown as duplicate pages, which I would half expect, but the URLs Magento is creating are truly bizarre: e.g http://www.xxx.com/uk/blog/cat/view/identifier/news/page/news/index.php/alarms-doorbells/?p=2 Alarms and doorbells is one of our product categories, which is displayed in the LHN on the blog page but has nothing to do with the blog itself. On another site on the same Magento instance, with the same content (they're for two different regions) we're show as having 248 duplicate pages, again in the video and news section, but this is a completely different scale of issue. Has anyone else encountered issues like these? I'm probably going to put a noindex in place on these two sections until we can get a solution in place as we're completely unranked in google on this site. Thanks
Technical SEO | | ahyde0 -
SEO URLs?
What are the best practices for generating SEO-friendly headlines? dashes between words? underscores between words? etc. Looking for a programatically generated solution that's using editor-written headlines to produce an SEO-friendly URL Thanks.
Technical SEO | | ShaneHolladay0 -
Canonical: Is this a problem?
Hi!!
Technical SEO | | petrospan
I am running a small wordpress website and i have a question because i am a litle confusic about Rel Canonical notices in the crawl diagnostics! I have the seo by yoast and i have fix all the canonical url for my page, but i take notices. I must worried about it or is something that inform me that everyting is ok? rel.jpg rel.jpg0 -
Changing all urls
A client of mine has a wordpress website that is installed in a directory, called "site". So when you go to www.domain.com you are redirected to www.domain.com/site. We all know how bad it is to have a redirect fron your subdomain to another page. In this case I measured a loss of 5 points of page authority. The question is: what is the best practice to remove the "site" from the address and changing all the urls? Should I use the webmaster tool to tell to Google that the site is moving? It's not 100% true, cause the site is just moving one level up. Should I install a copy of the website under www.domain.com and just redirect 301 every old page to its new url? This way I think the site would be deindexet for 2/3 months. Any suggestions or tips welcome! Thanks DoMiSol
Technical SEO | | DoMiSoL0 -
Canonical URL Issue
Hi Everyone, I'm fairly new here and I've been browsing around for a good answer for an issue that is driving me nuts here. I tried to put the canonical url for my website and on the first 5 or 6 pages I added the following script SEOMoz reported that there was a problem with it. I spoke to another friend and he said that it looks like it's right and there is nothing wrong but still I get the same error. For the URL http://www.cacaniqueis.com.br/video-caca-niqueis.html I used the following: <link rel="<a class="attribute-value">canonical</a>" href="http://www.cacaniqueis.com.br/video-caca-niqueis.html" /> Is there anything wrong with it? Many thanks in advance for the attention to my question.. 🙂 Alex
Technical SEO | | influxmedia0