Long URLs due to foreign characters
-
I have a site which provides forum sections for various languages. When foreign characters are used in the post title, each letter is replace by a three character replacement such as %93. This conversion makes the URLs long.
The site's software automatically uses the thread's title in the URL. It is never a problem except in these instances.
Any suggestions on how to handle this issue?
-
Thank you John.
The solution you offered works if a site is geared for one particular language. The site I am working with has language dedicated forums covering more then a dozen languages. The end solution will need to adjust for all of them.
I will speak to the forum software about your idea and hopefully we can build something off your suggestion. Thanks for taking the time to share your experience.
-
You should have a meta tag for the page language (adjust language code as needed):
As far as the URLs go... many sites are converting these to non-escaped variants on save. Magento, for example, treats e, é, and ê as e in the url. Check out Lemonde.fr, french news source. They are just stripping the accents as well.
To adjust for the accents, you would need to transliterate them. First, find the function that is generating the URL. Next, if your system allows has the iconv() function:
$new_url = iconv('utf-8', 'us-ascii//IGNORE//TRANSLIT', $old_url);
If not... then you could go this sort of route:
$table = array(
'Š'=>'S', 'š'=>'s', 'Đ'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z',
'ž'=>'z', 'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c',
'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'Ae',
'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I',
'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
'Õ'=>'O', 'Ö'=>'Oe', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U',
'Û'=>'U', 'Ü'=>'Ue', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'ss',
'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'ae',
'å'=>'a', 'æ'=>'ae', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i',
'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
'ô'=>'o', 'õ'=>'o', 'ö'=>'oe', 'ø'=>'o', 'ù'=>'u',
'ú'=>'u', 'û'=>'u', 'ü'=>'ue', 'ý'=>'y', 'ý'=>'y',
'þ'=>'b', 'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r', 'Ā'=>'A',
'ā'=>'a', 'Ē'=>'E', 'ē'=>'e', 'Ī'=>'I', 'ī'=>'i',
'Ō'=>'O', 'ō'=>'o', 'Ū'=>'U', 'ū'=>'u', 'œ'=>'oe',
'ß'=>'ss', 'ij'=>'ij'
); $new_url = strtr($old_url, $table);
I'm not sure about Korean handling - perhaps someone else knows how these are being handled?
-John
-
XenForo is the forum software in use.
I was really wondering what type of replacement process would be used?
When Google crawls a russian or korean site, do they convert the characters? If not, is there a way of telling Google "hey, this title is from the Russian forums so please use the Russian alphabet?"
If they do still convert the characters, how do other countries handle this change? The title length would be reduced by two-thirds.
-
Hey Ryan-
What software are you using?
Depending on your coding experience, you may be able to set up replacements for the foreign characters and override the URL generating function.
Just let me know, I may be able to help you out.
-John
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Shortened URL is breaking when URL is in Upper Case
Hi there, Currently I'm having some troubling mitigating an odd occurrence with some redirected shortened URLs being in upper case. Here is how they should be behaving - www.rhinosec.com/webapp -> https://rhinosecuritylabs.com/landing/sample-report-webapp-pentest/
Web Design | | BCaudill
www.rhinosec.com/network -> https://rhinosecuritylabs.com/landing/sample-report-network-pentest/
www.rhinosec.com/se -> https://rhinosecuritylabs.com/landing/social-engineering-example-report/ but when the /______ is capitalized - for example - WEBAPP, NETWORK, SE; WordPress either gives me a 404 or guesses the pages and lands on: NETWORK = https://rhinosecuritylabs.com/assessment-services/network-penetration-testing/
SE = https://rhinosecuritylabs.com/assessment-services/secure-code-review/
WEBAPP = 404 I was wondering if this discrepancy should be taken care of in the Htaccess file, Cloudflare, or WordPress redirect plug-in?0 -
When Site:Domain Search Run on Google, SSL Error Appears on One URL, Will this Harm Ranking
Greetings MOZ Community: When a site:domain search is run on Google, a very strange URL appears in the search results. The URL is http://www.nyc-officespace-leader.com:2082/ The page displays a "the site's security certificate is not trusted." This only appears for one URL out of 400. Could this indicate a wider problem with the server's configuration? Is this something that needs to be corrected, and if so how? Our ranking has dropped a lot in the last few months. Thanks,
Web Design | | Kingalan1
Alan0 -
Is there a way to redirect URLs with a hash-bang (#!) format?
Hi Moz, I'm trying to redirect www.site.com/locations/#!city to www.site.com/locations/city. This seems difficult because anything after the hash character in the URL does not make it to the server thus cannot be parsed for rewriting. Is there an SEO friendly way to implement these redirects? Thanks for reading!
Web Design | | DA20130 -
Existing URL structure and how to handle new pages before migration
Hi there! Currently, our site uses underscores "_" within the url structure. We are moving to Wordpress soon (the site is currently static html) but it will be a couple of months before the migration. Here is an example of the current structure: www.oldsitestructure.com/about_us/success_stories/custom_vinyl_banners When we do change, our url structure will have hyphen's "-" to separate terms, so the preferred new structure will be: www.oldsitestructure.com/about-us/success-stories/custom-vinyl-banners The entire site (with the exception of our Wordpress blog) currently uses the old structure. We have about 10 - 15 pages we will add before our migration, my question is: Should we use the preferred url structure starting NOW or stick with the old one? And set up 301 redirects are part of the migration process? Many thanks!
Web Design | | SEOSponge
Jon0 -
Weird url backslashing action...
Hi guys this is more of a technical question. Has anyone seen this before in a url www.domain.com/\page\ i'm referring to the forward slash / followed by a backslash \ resulting in /\ Any idea why this happens?
Web Design | | Immanuel0 -
Keywords in url - specific case question
There are a bunch of questions about keywords in the url and so far what I've gathered is that it's good to have them but keep it simple so it doesn't look stuffed. I'm working on redesigning some sites that were originally setup by a group who had no understanding of SEO (or perhaps I should say a misunderstanding) and spent a lot of time stuffing keywords EVERYWHERE. In some cases they weren't too far off but in others I think they just went overboard. One of the areas I'm trying to fix are the paths which leads to the following concerns. One of the sites has a basketball section and through the use of the Adwords keyword tool they determined that most people are searching for "basketball hoops". My first question is, how reliable are the monthly search numbers in the Adwords keyword tool? Are they accurate enough to warrant forming keyword strategies based on the results? As it relates to the url issue, the current tree for the basketball section of the site looks like this: /basketball (the landing page for the whole section, there are other sport specific pages as well) /basketball/hoops (goes nowhere. not sure why they didn't just go to /basketball-hoops/x for other pages) /basketball/hoops/72in-backboards (the systems are split into three different backboard sizes, these pages group them onto one overview page per size) /basketball/hoops/72in-backboards/specific-basketball-goal (the actual basketball goal details page with options to buy and such) So what I'm wondering about this setup is: does having /basketball/hoops take care of having the "basketball hoops" search term or would it be more effective to switch to /basketball-hoops? If it's fine to leave it at /basketball/hoops, do you think it would be beneficial to create an actual page for that path? We found that actually more people search for "basketball basket" than "basketball hoops" so maybe that would be a good page to try to make use of that term and explain maybe why people think "basket" instead of "hoop" and why we call ours "goals" or something. I tend to navigate pages by deleting path arguments and I hate when I land on a nonexistent path so I'm leaning toward changing the paths but just don't know if it's worth it at this point. Additionally, on one of the other sites, we have a domain that is the main keyword we want to rank for: swingsets.com The other company I mentioned then decided to put all of the product pages under: swingsets.com/swing-sets/{category}/{set-height}-{'swing-set'|'playset'|'swingsets'|'play-set'|etc...}/combo{#} So that comes out to look something like this: swingsets.com/swing-sets/outback/5ft-playsets/combo2 I've never liked that path setup. It looks stuffed to me, especially once they start using '5ft-swing-sets' and '6ft-play-set' on other product pages. It's inconsistent which is another issue I have since I tend to surf by path. Another issue with that setup is the final argument of combo{#} but there's nothing I can really do about that because they call the products out as combinations. The only actual product name is the "outback" part. I've been trying to come up with a better path setup for a long time now but again I'm concerned that I may just be wasting my time. The only thing I did do was make the height section consistently {height}-playsets. Is that good enough or should these paths remove /swing-sets from the beginning? The actual /swing-sets page is a good and valuable landing page but then I'm not sure if it remains valuable to keep it in the paths for the product pages afterward. Any insight into this dilemma would be appreciated. I've been stewing over this for a long time and my reasoning always becomes circular since I can see plenty of reasons for keeping them the way they are and simplifying them.
Web Design | | EscaladeSports0 -
Magento URL Structure
I'm about to migrate to Magento and wanted to ask about the optimal URL structure for the following page: Knowledge Centre |-Videos |-Customer Testimonials |-Customer X Would it be better to use: Domain/knowledge-centre/videos/customer-testimonials/customer-x or Domain/customer-x Thanks in advance for any replies.
Web Design | | ssoneil0 -
Google fails to pick out the correct URL of the story
Hi , I have a page with many news storeys on it. Google craws the page but it picks up a more general url even though I've embedded the direct URL within anchor tags around the headline . The snippet below got linked by Google to http://www.irishnews.com/ Any idea how i can get Google to pick-up http://www.irishnews.com/news.aspx?storyId=1180708 would be very welcome Peter Quinn: Family made scapegoats of financial crisis News Peter Quinn: Family made scapegoats of financial crisis THE Quinn family have been made scapegoats of the financial crisis surrounding the former Anglo Irish Bank, tycoon Sean Quinn's brother Peter claimed yesterday.Peter Quinn, a former president of the GAA, said hi read more»
Web Design | | Liammcmullen0