Canonical URLs and screen scraping
-
So a little question here. I was looking into a module to help implement canonical URLs on a certain CMS and I came a cross a snarky comment about relative vs. absolute URLs being used. This person was insistent that relative URLs are fine and absolute URLs are only for people who don't know what they are doing.
My question is, if using relative URLs, doesn't it make it easier to have your content scraped? After all, if you do get your content scraped at least it would point back to your site if using absolute URLs, right? Am I missing something or is my thinking OK on this?
Any feedback is much appreciated!
-
Thanks for your reply, Alan. I also considered a screen scraper removing the canonical tag, but to me screen scraping seemed lazy in the first place and so maybe they wouldn't bother in most cases. I guess that a best practice with canonicals is really situation dependent.
-
Thanks, Robert. Your rational for using relative links make sense. I appreciate you helping me sort through the noise on this issue.
John
-
People don’t abuse people when you have facts on their side, reminds me of "you don’t believe in global warming, because your un-educated" argument.
I have seen just in the last few weeks where using absolute url has got me a link. I wrote a youmoz article with a link to my website, it has been copied and has the link in it. Of cause being on SEOMoz, I have to use a absolute url back to myself
I don’t usually use absolute links on my own site, I think search engines almost always know who copied who.
I agree with rob, but I will add, a good screen scraper will remove a canonical tag, but removing absolute links is not so easy, as you then have broken links, also I believe if you have image in the article linking back to you, search engines will know who the real owner is, same with css, js and a number of other refs. Screen scrapers rarely get credit for these reasons as well as the fact that if your site has a lot of duplicate, then it is obvious that you are the one coping It’s either the one site is copied from many locations or many locations have copied from the one site. -
John
You can use either and the web is full of those who go back and forth on this issue. My guess is that any really good scraper software can likely deal with absolute urls today. The advantage that we like with relative is all about page load speed - the file size is smaller with relative urls.
So, you will get arguments both ways. If scraping is a huge issue for you, maybe you go with absolute. We know people will scrape content and we continue with relative for the above reason and because it is easier to make certain changes/linking/redirects within a CMS.
Oh as to people who use absolutes not knowing what they are doing....that is bunk. They have other priorities, maybe.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Hreflang and canonical
Hi all, I'm hoping someone can help me solve this once and for all! I keep getting hreflang errors on our site crawls and I cannot understand why. Does anything here look off to you? Thank you! JGdWcqu
Technical SEO | | eGInnovations1 -
URL ASCII Characters Issue
Hi guys; Is there any different between URL whit capital ASCII code and URL with small ASCII Code? For example I have 2 URLS for one page like this: 1- 332-%D8%AA%D8%AD%D8%B5%DB%8C%D9%84-%D8%AF%D8%B1-%DA%A9%D8%A7%D9%86%D8%A7%D8%AF%D8%A7.html 2- 332-%d8%aa%d8%ad%d8%b5%db%8c%d9%84-%d8%af%d8%b1-%da%a9%d8%a7%d9%86%d8%a7%d8%af%d8%a7.html both of them point to same page but no 1 is non SSL and no 2 is ssl version! and whole pges of site forces to https
Technical SEO | | seoiransite0 -
Problems with canonical urls / redirect (magento webshop)
Hi all, We're running a Magento webshop and we discover some strangs things regarding canonical urls and redirects after using the Amasty improved navigation extension. To clarify, please check these four urls. They contain the same content (the same product page). https://www.afwerkingshop.be/gyproc-gipskartonplaat-ak-2600x1200x9-5mm.html https://www.afwerkingshop.be/wanden/gyproc-gipskartonplaat-ak-2600x1200x9-5mm.html https://www.afwerkingshop.be/wanden/gipsplaten/gyproc-gipskartonplaat-ak-2600x1200x9-5mm.html https://www.afwerkingshop.be/wanden/gipsplaten/standaard/gyproc-gipskartonplaat-ak-2600x1200x9-5mm.html All these four pages have different canoncials (the page url). Obviously, that's not good. However, in Google (site:...) url (1) is the only one that's indexed. Thereby, if I visit the productpage by first going to a category page (fe. www.afwerkingshop.be/wanden.html), I'm redirected to url (1), but the canonical url is www.afwerkingshop.be/last_visited_category_name/product. So, the canonical seems dynamic depending on the last visited category. And still, only url (1) is indexed. Additionally, all aforementioned pages contain . Is anyone familiar with this issue? And more important, will it cause problems in future? Thanks in advance. Kind regards, Chendon
Technical SEO | | RBijsterveld0 -
New URL Structure
Hi Guy's, For our webshop we're considering a new URL structure because longtail keywords to rank so well. Now we have /category (main focus keywords)
Technical SEO | | Happy-SEO
/product/the-product345897345123/ (nice to rank on, not that much volume) We have over 500 categories and every one of them is placed after our domain. Because i think it's better to work with a good structure and managed a way to make categories and sub-categories. The 500 categories may be the case why not every one of them is ranking so well, so that was also the choice of thinking about a new structure. So the new URL structure will be: /category (main focus keywords)
/category/subcat/ (also main focus keywords) Everything will be redirect (301, good way), so i think there won't be to much problems. I'm thinking about what to do with the /product/ URL. Because now it will be on the same level as the subcategories, and i'm affraid that when it's on that level, Google will give the same value to both of them. My options that i'm considering are: **Old way **
/product/the-product-345897345123/ .html (seen this on big webshops)
/product/the-product-345897345123.html/ Level deeper SKU /product/the-product/345897345123/ What would you suggest? The new structure would be 20 categories 500+ sub's devided under main categories 5000+ products Thanks!0 -
50 Duplicate URLS, but not the same
Hi According to my latest site crawl, many of my pages are showing up to 50 duplicate urls. However this isn't the case in real life. http://www.fortusgroup.com.au/browse-products/rubber-tracks/excavator-rubber-tracks/hitachi/ex-33mu.html is showing 31 duplicate URL. Examples include: http://www.fortusgroup.com.au/browse-products/rubber-tracks/excavator-rubber-tracks/parts/x430.html
Technical SEO | | JDadd
http://www.fortusgroup.com.au/browse-products/rubber-tracks/excavator-rubber-tracks/case/cx-75sr.html Obviously these URL's are very similar and I know that Moz judges URLs by 90% of their similarity, but is this affecting my actual raking on google? If so, what can I do? This pages are also very similar in code and content, so they are also showing as duplicate content etc as well. Worried that this is having an affect on my SERP rankings, as this pages arent ranking particularly well. Thanks, Ellie0 -
Wordpress Canonical Problem
I'm using wordpress for my website but m unable to implement Canonical tag property for pages under the same category, Like for matt's blog: The Tag is same .. for all pages under that category: http://www.mattcutts.com/blog/type/googleseo/ & http://www.mattcutts.com/blog/type/googleseo/page/2/ is it some hack or some plugin ? please suggest! thanks
Technical SEO | | AnkitRawat0 -
Backslash in URL
my main URL is www.americanmusical.com, SEOMOZ shows I have a duplicate page title on www.americanmusical.com/. I have the think the backslash is causing other issues. I noticed when I first go to my site it is without the /, but if I navigate to the home page, the URL has the / in it. Any ideas on if this is a problem or how to handle it?
Technical SEO | | dianeb1520 -
Best practice canonical tags
I WAS WONDERING WHAT THE BESTPRACTICE IS WHEN USING CANONICAL TAGS: or 2:
Technical SEO | | NEWCRAFT0