Brackets vs Encoded URLs: The "Same" in Google's eyes, or dup content?
-
Hello,
This is the first time I've asked a question here, but I would really appreciate the advice of the community - thank you, thank you! Scenario: Internal linking is pointing to two different versions of a URL, one with brackets [] and the other version with the brackets encoded as %5B%5D
Version 1: http://www.site.com/test?hello**[]=all&howdy[]=all&ciao[]=all
Version 2: http://www.site.com/test?hello%5B%5D**=all&howdy**%5B%5D**=all&ciao**%5B%5D**=allQuestion: Will search engines view these as duplicate content? Technically there is a difference in characters, but it's only because one version encodes the brackets, and the other does not (See: http://www.w3schools.com/tags/ref_urlencode.asp)
We are asking the developer to encode ALL URLs because this seems cleaner but they are telling us that Google will see zero difference. We aren't sure if this is true, since engines can get so _hung up on even one single difference in character. _
We don't want to unnecessarily fracture the internal link structure of the site, so again - any feedback is welcome, thank you.
-
Thanks guys - yes, we're using canonical tags already to help resolve this, but I'd like even better if we didn't have to resort to this. It also makes me nervous that these characters are technically classified as "unsafe", but I haven't been able to find any official word from Google on whether or not they will index URLs with brackets or not. It's definitely not the web standard....
-
Hi,
I wouldn't worry to much on this issue, it's true that you don't want to depend on the level of the Googlebot to find out if this could be an issue but I think that the encoding of characters will make sure you'll be fine. As a suggestion I would say use canonical tags on of these pages to direct Google or other search engines to the right page. This makes sure you'll never get an issue with duplicate content. However I really doubt that this will turn into an issue.
-
Hi Mirabile,
This is a difficult one. My understanding would be to use the hexadecimal encoding of potentially unsafe characters (of which a square bracket would be) in a URL (i.e. %5b instead of [ ), but I think assuming the URLs are the same, then it makes no difference.
But that said, whilst Google might read the URLs as the same, that's not to say another search engine will do that as well. And then, what about how a browser might interpret a URL encoded differently but being effectively the same?
Probably, the main danger is that the search engine or the browser won't be able to follow the link with unsafe characters in at all.
I'm not sure that is the full answer you were looking for, but maybe someone with more expertise will be able to shed more light on this for you.
I hope my answer helps at least in part.
Peter
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
When "pruning" old content, is it normal to see an drop in Domain Authority on Moz crawl report?
After reading several posts about the benefits of pruning old, irrelevant content, I went through a content audit exercise to kick off the year. The biggest category of changes so far has been to noindex + remove from sitemap a number of blog posts from 2015/2016 (which were very time-specific, i.e. software release details). I assigned many of the old posts a new canonical URL pointing to the parent category. I realize it'd be ideal to point to a more relevant/current blog post, but could this be where I've gone wrong? Another big change was to hide the old posts from the archive pages on the blog. Any advice/experience from anyone doing something similar much appreciated! Would be good to be reassured I'm on the right track and a slight drop is nothing to worry about. 🙂 If anyone is interested in having a look: https://vivaldi.com https://vivaldi.com/blog/snapshots [this is the category where changes have been made, primarily] https://vivaldi.com/blog/snapshots/keyboard-shortcut-editing/ [example of a pruned post]
Intermediate & Advanced SEO | | jonmc1 -
Why is rel="canonical" pointing at a URL with parameters bad?
Context Our website has a large number of crawl issues stemming from duplicate page content (source: Moz). According to an SEO firm which recently audited our website, some amount of these crawl issues are due to URL parameter usage. They have recommended that we "make sure every page has a Rel Canonical tag that points to the non-parameter version of that URL…parameters should never appear in Canonical tags." Here's an example URL where we have parameters in our canonical tag... http://www.chasing-fireflies.com/costumes-dress-up/womens-costumes/ rel="canonical" href="http://www.chasing-fireflies.com/costumes-dress-up/womens-costumes/?pageSize=0&pageSizeBottom=0" /> Our website runs on IBM WebSphere v 7. Questions Why it is important that the rel canonical tag points to a non-parameter URL? What is the extent of the negative impact from having rel canonicals pointing to URLs including parameters? Any advice for correcting this? Thanks for any help!
Intermediate & Advanced SEO | | Solid_Gold1 -
How to get a site out of Google's Sandbox
Hi I am working on a website that is ranking well in bing for the domain name / exact url search but appears no where in Google or Yahoo. I have done the site search in Google and it is indexed so I am presuming it is in the sandbox. The website was originally developed in India and I do not know whether it had some history of bad backlinks. The website itself is well optimised and I have checked all pages in Moz - getting a grade A. Webmaster Tools is not showing any manual actions - I was wondering what I could do next?
Intermediate & Advanced SEO | | AllieMc0 -
How can I get a list of every url of a site in Google's index?
I work on a site that has almost 20,000 urls in its site map. Google WMT claims 28,000 indexed and a search on Google shows 33,000. I'd like to find what the difference is. Is there a way to get an excel sheet with every url Google has indexed for a site? Thanks... Mike
Intermediate & Advanced SEO | | 945010 -
Should I "NoIndex" Pages with Almost no Unique Content
I have a real estate site with MLS data (real estate listings shared across the Internet by Realtors, which means data exist across the Internet already). Important pages are the "MLS result pages" - the pages showing thumbnail pictures of all properties for sale in a given region or neighborhood. 1 MLS result page may be for a region and another for a neighborhood within the region:
Intermediate & Advanced SEO | | khi5
example.com/region-name and example.com/region-name/neighborhood-name
So all data on the neighborhood page will be 100% data from the region URL. Question: would it make sense to "NoIndex" such neighborhood page, since it would reduce nr of non-unique pages on my site and also reduce amount of data which could be seen as duplicate data? Will my region page have a good chance of ranking better if I "NoIndex" the neighborhood page? OR, is Google so advanced they know Realtors share MLS data and worst case simple give such pages very low value, but will NOT impact ranking of other pages on a website? I am aware I can work on making these MLS result pages more unique etc, but that isn't what my above question is about. thank you.0 -
Why are our sites top landing pages URL's that no longer exist and retrun 404 errors?
Digging through analytics today an noticed that our sites top landing pages are for pages that were part of the old www.towelsrus.co.uk website taken down almost 12 months ago. All these pages had the 301 re-directs which were removed a few months back but still have not dropped out of Googles crawl error logs. I can't understand why this is happening but almost certainly the bounce rate on these pages (100%) mean we are loosing potential conversions. How can I identify what keywords and links people are using to land on these pages?
Intermediate & Advanced SEO | | Towelsrus0 -
Wordpress.com content feeding into site's subdomain, who gets SEO credit?
I have a client who had created a Wordpress.com (not Wordpress.org) blog, and feeds blog posts into a subdomain blog.client-site.com. My understanding was that in terms of SEO, Wordpress.com would still get the credit for these posts, and not the client, but I'm seeing conflicting information. All of the posts are set with permalinks on the client's site, such as blog.client-site.com/name-of-post, and when I run a Google site:search query, all of those individual posts appear in the Google search listings for the client's domain. Also, I've run a marketing.grader.com report, and these same results are seen. Looking at the source code on the page, however, I see this information which leads me to believe the content is being credited to, and fed in from, Wordpress.com ('client name' altered for privacy): href="http://client-name.files.wordpress.com/2012/08/could_you_survive_a_computer_disaster.jpeg">class="alignleft size-thumbnail wp-image-2050" title="Could_you_survive_a_computer_disaster" src="http://client-name.files.wordpress.com/2012/08/could_you_survive_a_computer_disaster.jpeg?w=150&h=143" I'm looking to provide a recommendation to the client on whether they are ok to continue moving forward with this current setup, or whether we should port the blog posts over to a subfolder on their primary domain www.client-site.com/blog and use Wordpress.org functionality, for proper SEO. Any advice?? Thank you!
Intermediate & Advanced SEO | | grapevinemktg0 -
Use of rel="alternate" hreflang="x"
Google states that use of rel="alternate" hreflang="x" is recommended when: You translate only the template of your page, such as the navigation and footer, and keep the main content in a single language. This is common on pages that feature user-generated content, like a forum post. Your pages have broadly similar content within a single language, but the content has small regional variations. For example, you might have English-language content targeted at readers in the US, GB, and Ireland. Your site content is fully translated. For example, you have both German and English versions of each page. Does this mean that if I write new content in different language for a website hosted on my sub-domain, I should not use this tag? Regards, Shailendra Sial
Intermediate & Advanced SEO | | IM_Learner0