suggests to treat TWiki:Support.InternationalPagesUsingUTF8
as a bug, posting here for reference:
Our site is working fine with mixed language characters in topic content. However, if a user makes, for example, a Korean topic name, that topic's Korean content is corrupted.
You can see this in our sandbox: https://gopedia.gopetslive.com/twiki/bin/view/Sandbox/WebHome
Relevant settings from TWiki.cfg:
$useLocale = 1;
$siteLocale = "ko_KR.utf8";
$siteCharsetOverride = "";
$localeRegexes = 1;
-- TWikiGuest - 06 Jan 2006
It looks like the page contents is having the problem, but the page name is OK with Firefox 1.5 and its built-in fonts (i.e. this page.) This is odd, as usually it's the URLs that have the problem and the page contents that are fine. Also, the https://gopedia.gopetslive.com/twiki/bin/view/Sandbox/WebHome page was fine for embedding those characters in UTF-8 as part of the URL, so it seems only that one page that has the problem.
You don't seem to have set the
CHARSET parameter in TWikiPreferences at all, but see the TWikiInstallationGuide section on I18N troubleshooting.
It's interesting that this URL using XML entity codes, i.e.
https://gopedia.gopetslive.com/twiki/bin/view/Sandbox/고피디아, seems to work - I wouldn't have thought Apache would accept that sort of URL, but somehow it is working. Do you have any additional Apache modules for I18N, e.g.
mod_fileiri as mentioned in EncodeURLsWithUTF8?
You might also want to try commenting out the following line in TWiki.pm since it doesn't really help matters.
$fullTopicName = Encode::decode("utf8", $fullTopicName); # 'decode' into UTF-8
I'm on holiday until 16th Jan from tomorrow, and busy thereafter, but ping me by email if this doesn't work.
There are some Chinese sites using UTF-8 successfully, e.g. http://www.pgsqldb.org/, so it may help to check their testenv settings and other setup, or email their administrators.
-- TWiki:Main.RichardDonkin - 06 Jan 2006
Commenting out that line in TWiki.pm worked like a charm! Many, many thanks!
-- TWiki:Main.TWikiGuest - 09 Jan 2006
Interesting - this sounds like a bug in using TWiki with UTF-8 as the
$siteCharset. Could you log this as a bug in Codev? This should really be fixed in Dakar since it's a low-impact fix.
-- TWiki:Main.RichardDonkin - 16 Jan 2006
AFAICT this issue does not affect Dakar. I've just reproduced the experiment and everything went ok.
I wrote that line of code originally when trying to put in full Unicode support (i.e. using Perl's internal UTF-8 characters). Since that support was not completed, this line doesn't do anything useful, and in the case of this support issue, did cause problems, so it should be removed.
This appears to have been fixed in Dakar, i.e. this line of code has been removed anyway.