Internationalised URLs
One of the web projects Rob and I are working on at the moment has some internationalisation requirements that are pretty key to its success. The standard user-application interactions aren’t that problematic, there’s some things to think about encoding/storage wise, but it’s a well understood area.
The tricky bit is that URLs are ASCII only. You can encode non-ASCII characters and handle things in the application to make it look and act like it’s coping with different character sets, but this only really works if you think of a URL as a pure reference, that isn’t containing any information in itself. For web 2.0 type applications (and when using REST), this doesn’t really work as the URL contains information in itself. If you want a piece of information referenced by a URL like http://mysite/user/page1 making that URL make sense in languages not using ASCII is hard.

June 16th, 2006 at 9:26 am
Tim Bray recently linked
to an blog entry by James Holderness at his site: http://www.詹姆斯.com. Makes an interesting read on the troubles of using & and < in RSS feed titles (such as AT&T).
More relevant to this discuession however is James’ site url: http://www.詹姆斯.com or http://www.xn--8ws00zhy3a.com.
I wonder what wordpress will make of the characters in this post; hopefully it will do the right thing. If not, go see Tim’s original article.
June 16th, 2006 at 10:49 am
looks liek we ant to be looking at punycode (interesting name…). There’s aconvertor we can try out here.
June 19th, 2006 at 4:20 pm
wow i was suffering from some serious finger dyslexia in that last comment ;)
May 29th, 2007 at 9:48 am
xqokgnbc