Long URLs due to foreign characters
-
I have a site which provides forum sections for various languages. When foreign characters are used in the post title, each letter is replace by a three character replacement such as %93. This conversion makes the URLs long.
The site's software automatically uses the thread's title in the URL. It is never a problem except in these instances.
Any suggestions on how to handle this issue?
-
Thank you John.
The solution you offered works if a site is geared for one particular language. The site I am working with has language dedicated forums covering more then a dozen languages. The end solution will need to adjust for all of them.
I will speak to the forum software about your idea and hopefully we can build something off your suggestion. Thanks for taking the time to share your experience.
-
You should have a meta tag for the page language (adjust language code as needed):
As far as the URLs go... many sites are converting these to non-escaped variants on save. Magento, for example, treats e, é, and ê as e in the url. Check out Lemonde.fr, french news source. They are just stripping the accents as well.
To adjust for the accents, you would need to transliterate them. First, find the function that is generating the URL. Next, if your system allows has the iconv() function:
$new_url = iconv('utf-8', 'us-ascii//IGNORE//TRANSLIT', $old_url);
If not... then you could go this sort of route:
$table = array(
'Š'=>'S', 'š'=>'s', 'Đ'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z',
'ž'=>'z', 'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c',
'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'Ae',
'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I',
'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
'Õ'=>'O', 'Ö'=>'Oe', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U',
'Û'=>'U', 'Ü'=>'Ue', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'ss',
'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'ae',
'å'=>'a', 'æ'=>'ae', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i',
'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
'ô'=>'o', 'õ'=>'o', 'ö'=>'oe', 'ø'=>'o', 'ù'=>'u',
'ú'=>'u', 'û'=>'u', 'ü'=>'ue', 'ý'=>'y', 'ý'=>'y',
'þ'=>'b', 'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r', 'Ā'=>'A',
'ā'=>'a', 'Ē'=>'E', 'ē'=>'e', 'Ī'=>'I', 'ī'=>'i',
'Ō'=>'O', 'ō'=>'o', 'Ū'=>'U', 'ū'=>'u', 'œ'=>'oe',
'ß'=>'ss', 'ij'=>'ij'
); $new_url = strtr($old_url, $table);
I'm not sure about Korean handling - perhaps someone else knows how these are being handled?
-John
-
XenForo is the forum software in use.
I was really wondering what type of replacement process would be used?
When Google crawls a russian or korean site, do they convert the characters? If not, is there a way of telling Google "hey, this title is from the Russian forums so please use the Russian alphabet?"
If they do still convert the characters, how do other countries handle this change? The title length would be reduced by two-thirds.
-
Hey Ryan-
What software are you using?
Depending on your coding experience, you may be able to set up replacements for the foreign characters and override the URL generating function.
Just let me know, I may be able to help you out.
-John
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best Practices for Leveraging Long Tail Content & Gated Content
Our B2B site has a lot of of long form content (e.g., transcriptions from presentations and webinars). We'd like to leverage the long tail SEO traffic driven to these pages and convert those visitors to leads. Essentially, we'd like Google to index all this lengthy, keyword-rich content AND we'd like to put up a read gate that requires users to register before viewing the full article. This is a B2B site, and the goal is to generate leads. Some considerations and questions: How much of the content to share before requiring registration? Ask too soon and it's a terrible user experience, give too much away and our business objectives are not met. Design-wise, what are good ways to do this? I notice Moz uses a "teaser" to block Mozinar content, and I've seen modals and blur bars on other sites. Any gotchas that Google doesn't like that we should be aware of? Trying to avoid anything that might seem like cloaking. Is it better to split the content across several pages (split a 10K word doc across 10 URLs and include a read gate on each) or keep to one page? Thank you!
Web Design | | Allie_Williams0 -
Help with error: Not Found The requested URL /java/backlinker.php was not found on this server.
Hi all, We got this error for almost a month now. Until now we were outsourcing the webdesign and optimization, and now we are doing it in house, and the previous company did not gave us all the information we should know. And we've been trying to find this error and fix it with no result. Have you encounter this issue before? Did anyone found or knows a solution? Also would this affect our website in terms of SEO and in general. Would be very grateful to hear from you. Many thanks. Here is what appears on the bottom of the site( www.manvanlondon.co.uk) Not Found The requested URL /java/backlinker.php was not found on this server. <address>Apache/2.4.7 (Ubuntu) Server at 01adserver.com Port 80</address> <address> </address> <address> </address>
Web Design | | monicapopa0 -
When Site:Domain Search Run on Google, SSL Error Appears on One URL, Will this Harm Ranking
Greetings MOZ Community: When a site:domain search is run on Google, a very strange URL appears in the search results. The URL is http://www.nyc-officespace-leader.com:2082/ The page displays a "the site's security certificate is not trusted." This only appears for one URL out of 400. Could this indicate a wider problem with the server's configuration? Is this something that needs to be corrected, and if so how? Our ranking has dropped a lot in the last few months. Thanks,
Web Design | | Kingalan1
Alan0 -
Is it necessary to Remove 301 redirects from Wordpress after removing the 404 url from Google Webmaster?
There were many 404 urls in my site found by Google Webmaster. I've redirected these urls to the relevant urls with 301 redirect in wordpress. After that I removed these 404 urls from Google Index through Webmaster. "Should I cleanup these 301 redirects from Wordpress or not? ". Help Needed.
Web Design | | SangeetaC0 -
Pulling old site-map and URL structure of a site
Hey guys how do I pull an old sitemap or URL structure of a site ! This company I am helping out . Build a new site without any 301 redirect ! It's been about 2 months and hosting company sent me. SQL database file said we basically need to build another site ! Wondering if there are any other ways to see what exact urls were existent before their change over
Web Design | | BizDetox0 -
URLs appear in Google Webmaster Tools that I can't find on my own site?!?
Hi, I have a Magento e-commerce site (clothing) and when I had a look through some of the sections in Google Webmaster Tools I found URLs that I can't find on my site. For example, a product url maybe http://www.example.co.uk/product-url/ which is fine. In that product there maybe three sizes of the product (Small, Medium, Large) and for some reason Googlebot is sometimes finding a url like: http://www.example.co.uk/product-url/1202/ has been found and when clicked on is a live url (Status code: 200) with is one of the sizes (medium). However I have ran a site crawl in Screaming Frog and other crawl tests and can't seem to find where Googlebot is finding these URLs. I think I need to: 1. Find how Googlebot is finding these urls? 2. Find out how to keep out of index (e.g. robots.txt, canonical etc.... Any help would be much appreciated and I'm happy to share the URL with members if they think they can have a look and help with this problem. I can share specific URLs which might make the issue seem clearer, let me know? Thanks, Darrell
Web Design | | clickyleap0 -
How important is URL length?
Is URL length really that important? I have many articles that would lose meaning if the URL was shortened and for most, they would have to be under the root domain instead of under the category in order to fit. Has anyone tested if they were negatively impacted by URL's that are too long?
Web Design | | HMCOE0 -
URL structure for multiple cities?
Hi, i am in the process of setting up a business directory site that will be used in a number of cities, though i am initially launching with only one city. My question is, what is the best URL structure to use for the site and should i start using this URL structure from day one? At the moment i am using www.mysite.com.au as my primary website where it contains all listings for the the one initial launch city. Though to plan for the future i was considering this URL structure: www.mysite.com.au/cityname so for example, if i launch in the city Sydney initially then all website traffic that goes to www.mysite.com.au would simply be redirected (302 temp redirect?) to www.mysite.com.au/sydney. When i expand to other cities www.mysite.com.au would simply be a "select your city" screen that then redirects to the city of choice (similar to www.groupon.com page). How would doing a 302 redirect from www.mysite.com.au to www.mysite.com.au/city impact on SEO for the initial launch? Or should i just place this on the root domain since no other cities exist at the moment?
Web Design | | adamkirk0