Canonical URLs and screen scraping
-
So a little question here. I was looking into a module to help implement canonical URLs on a certain CMS and I came a cross a snarky comment about relative vs. absolute URLs being used. This person was insistent that relative URLs are fine and absolute URLs are only for people who don't know what they are doing.
My question is, if using relative URLs, doesn't it make it easier to have your content scraped? After all, if you do get your content scraped at least it would point back to your site if using absolute URLs, right? Am I missing something or is my thinking OK on this?
Any feedback is much appreciated!
-
Thanks for your reply, Alan. I also considered a screen scraper removing the canonical tag, but to me screen scraping seemed lazy in the first place and so maybe they wouldn't bother in most cases. I guess that a best practice with canonicals is really situation dependent.
-
Thanks, Robert. Your rational for using relative links make sense. I appreciate you helping me sort through the noise on this issue.
John
-
People don’t abuse people when you have facts on their side, reminds me of "you don’t believe in global warming, because your un-educated" argument.
I have seen just in the last few weeks where using absolute url has got me a link. I wrote a youmoz article with a link to my website, it has been copied and has the link in it. Of cause being on SEOMoz, I have to use a absolute url back to myself
I don’t usually use absolute links on my own site, I think search engines almost always know who copied who.
I agree with rob, but I will add, a good screen scraper will remove a canonical tag, but removing absolute links is not so easy, as you then have broken links, also I believe if you have image in the article linking back to you, search engines will know who the real owner is, same with css, js and a number of other refs. Screen scrapers rarely get credit for these reasons as well as the fact that if your site has a lot of duplicate, then it is obvious that you are the one coping It’s either the one site is copied from many locations or many locations have copied from the one site. -
John
You can use either and the web is full of those who go back and forth on this issue. My guess is that any really good scraper software can likely deal with absolute urls today. The advantage that we like with relative is all about page load speed - the file size is smaller with relative urls.
So, you will get arguments both ways. If scraping is a huge issue for you, maybe you go with absolute. We know people will scrape content and we continue with relative for the above reason and because it is easier to make certain changes/linking/redirects within a CMS.
Oh as to people who use absolutes not knowing what they are doing....that is bunk. They have other priorities, maybe.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirect_to in URLs?
I've never seen this before. I'm assuming that it's not SEO friendly and that these should be 301s or 302s instead? http://ksa-beta.motory.com/ar/login/?redirect_to=http://ksa-beta.motory.com/ar/cars-for-sale-search/results/central/riyadh/ford/explorer/2010/ford-explorer-2010-1038353 http://ksa-beta.motory.com/ar/login/?redirect_to=http://ksa-beta.motory.com/ar/account/my-saved-searches/
Technical SEO | | KatherineWatierOng0 -
URL Question: Is there any value for ecomm sites in having a reverse "breadcrumb" in the URL?
Wondering if there is any value for e-comm sites to feature a reverse breadcrumb like structure in the URL? For example: Example: https://www.grainger.com/category/anchor-bolts/anchors/fasteners/ecatalog/N-8j5?ssf=3&ssf=3 where we have a reverse categorization happening? with /level2-sub-cat/level1-sub-cat/category in the reverse order as to the actual location on the site. Category: Fasteners
Technical SEO | | ROI_DNA
Sub-Cat (level 1): Anchors
Sub-Cat (level 2): Anchor Bolts0 -
Does rel="canonical" support protocol relative URL?
I need to switch a site from http to https. We gonna add 301 redirect all over the board. I also use rel="canonical" to strip some queries parameter from the index (parameter uses to identify which navigation elements were use.) rel="canonical" can be used with relative or absolute links, but Google recommend using absolute links to minimize potential confusion or difficulties. So here my question, did you see any issue using relative protocol in rel="canonical"? Instead of:
Technical SEO | | EquipeWeb0 -
Updating content on URL or new URL
High Mozzers, We are an event organisation. Every year we produce like 350 events. All the events are on our website. A lot of these events are held every year. So i have an URL like www.domainname.nl/eventname So what would you do. This URL has some inbound links, some social mentions and so on. SO if the event will be held again in 2013. Would it be better to update the content on this URL or create a new one. I would keep this URL and update it because of the linkvalue and it is allready indexed and ranking for the desired keyword for that event. Cheers, Ruud
Technical SEO | | RuudHeijnen0 -
Approved Word Separators in URLs
Hi There, We are in the process of revamping our URL structure and my devs tell me they have a technical problem using a hyphen as a word separator. There's a whole lot of competing recommendations out there and at this point I'm just confused. Does anyone have any idea what character would be next-best to the hyphen for separating words in a URL? Any reason to prefer one over another? Some links I've found discussing the topic: This page says that "__Google has confirmed that the point (.), the comma (,) and the hyphen (-) are valid word separators in URL’s.": http://www.internetofficer.com/seo/google-word-separator/ This page suggests the plus (+) symbol would be best: http://labs.phurix.net/posts/word-separators-in-urls This guy says he's tested and there's a whole bunch of symbols that will work as word separators: http://www.webproguide.com/articles/Symbols-as-word-separators-a-look-inside-the-search-engine-logic/ I'm leaning towards the tilde (~) or the plus (+) sign. Usage would be like so: http://www.domain.com/shop/sterling~silver OR /shop/sterling+silver etc... Thanks in advance for your help!
Technical SEO | | Richline_Digital1 -
Canonical Question
Our site has thousands of items, however using the old "Widgets" analogy we are unsure on how to implement the canonical tag, and if we need to at all. At the moment our main product pages lists all different "widget" products on one page, however the user can visit other sub pages that filter out the different versions of the product. I.e. glass widgets (20 products)
Technical SEO | | Corpsemerch
glass blue widgets (15 products)
glass red widgets (5 products)
etc.... I.e. plastic widgets (70 products)
plastic blue widgets (50 products)
plastic red widgets (20 products)
etc.... As the sub pages are repeating products from the main widgets page we added the canonical tag on the sub pages to refer to the main widget page. The thinking is that Google wont hit us with a penalty for duplicate content. As such the subpages shouldnt rank very well but the main page should gather any link juice from these subpages? Typically once we added the canonical tag it was coming up to the penguin update, lost a 20%-30% of our traffic and its difficult not to think it was the canonical tag dropping our subpages from the serps. Im tempted to remove the tag and return to how the site used to be repeating products on subpages.. not in a seo way but to help visitors drill down to what they want quickly. Any comments would be welcome..0 -
Changing .html to .asp in URLs
Hi Mozzers, I have a question. The webmaster of a client of mine needs to make changes to some files which will effect the URL's. Essentially everything is staying the same but the end of the URL will change from .html to .asp. This is because the site will be dynamically loading content (perhaps from a database) (i.e. latest news to come from their blog etc..) In order to do this we would need to change the filenames of the whole website. (i.e. personnel.html would become personel.asp). Changing URLs can harm indexation but a small change to the end - would Google drop these pages? A 301 redirect is not possible from old URL to new. What impact would this have on Rankings? Thanks Gareth
Technical SEO | | Bush_JSM0 -
URL paths and keywords
I'm recommending some on-page optimization for a home builder building in several new home communities. The site has been through some changes in the past few months and we're almost starting over. The current URL structure is http://homebuilder.com/oakwood/features where homebuilder = builder name Oakwood Estates= name of community features = one of several sub-paths including site plan, elevations, floor plans, etc. The most attainable keyword phrases include the word 'home' and 'townname' I want to change the URL path to: http://homebuilder.com/oakwood-estates-townname-homes/features Is there any problem with doing this? It just seems to make a lot of sense. Any input would be appreciated.
Technical SEO | | mikescotty0