Reported in
TWiki:Support/JapaneseHeadersInIE6
Our TWiki site is mostly in English but we want to have a few pages in Japanese (we are a Japanese company). With this in mind we have set our CharSet to UTF-8. We have written a topic in Japanese and it works fine in IE7 and Firefox, but renders very poorly in IE6 (fonts enormous, Japanese not displayed, etc).
We have narrowed the problem down to the header at the top of the topic. The TWikiML reads as follows:
---+ トトロシステムのご紹介
The corresponding html generated by TWiki contains an anchor at this point, i.e.:
<h1><a name="トトロシステムのご紹�"></a> トトロシステムのご紹介 </h1>
Note that the final character in the name of the anchor seems to have been corrupted in some way. If we remove this character from the heading, the page renders fine in IE6.
Does anyone know what is going wrong here? Or is there a way to stop headers (---+ etc) generating anchors in html?
It looks like the second byte of a multi-byte character is cut off when applying the length limit to anchor names.
--
TWiki:Main/PeterThoeny - 15 May 2007
Analysis sounds credible. Another
I18N issue (I set the Component accordingly)
We really need someone with a vested interest in this area to help out.
CC
Reported in
TWiki:Codev/UtfAnchorError, with patch to fix.
--
PTh
This bug was not fixed in Georgetown release.
Before making a suggestion about this bug, I have a question.
Do you guys think that Anchor name have to be recognized by human? I mean, I don't think that anchor name keeps the summarized form(32 characters) of real heading strings.
Actually, I have my own solution, which is to use md5 hash as an anchor name. If you agree with my opinion, I'll submit my code into repository.
--
TWiki:Main.JustinKim - 31 Mar 2009
I've commit my own solution to Georgetown branch.
--
TWiki:Main.JustinKim - 01 Apr 2009
Thank you Justin for working on this!
We need to consider compatibility: People send links by e-mail pointing inside a TWiki page via TOC, for example
https://develop.twiki.org/do/view/TWiki/TWikiVariables#Setting_Preferences_Variables. Therefore I think we should keep the current behavior for ASCII characters, and do something special for UTF-8 chars.
Please keep trunk and latest branch in sync as much as possible, e.g. besides applying changes to the 4.3 branch, apply your changes also to trunk.
--
TWiki:Main.PeterThoeny - 09 Apr 2009
One more point to consider:
Unicode::String
was added to
Render.pm
. This module is listed as an optional module in
TWiki:TWiki04x03/TWikiSystemRequirements. It is OK to move it to the required section if that module comes standard with the required Perl version for TWiki, 5.6.1.
--
TWiki:Main.PeterThoeny - 09 Apr 2009
We have now a support question on TWiki 4.3.1 not running due to the new dependency on Unicode::String, see
TWiki:Support/SID-00291.
We need to discuss and decide what to do with this dependency. IMHO we should make that dependency optional, e.g. use the module if installed, else fall back to old code.
--
TWiki:Main.PeterThoeny - 30 Apr 2009
(Per Peter's request, mostly quoting from SID-00291): As as result of this code, upgrading from 4.3.0 --> 4.3.1 requires an additional CPAN package as a dependency. It seems odd to introduce a new dependency in a minor version release, especially as a hotfix.
Also, this module is not mentioned in configure -> CGI Setup -> Perl Modules. The only Unicode module there is Unicode::MapUTF8, which doesn't seem to be required unless you need international chars. That would be a good place to give admins a clue that something is missing or gone wrong.
In any case, unfortunately, this code makes a perfectly functional and usable 4.3.0 TWiki site (without the Unicode::String module) completely unusable.
--
TWiki:Main.JohnDeStefano - 30 Apr 2009
No problem, it was an unintended consequence. A learning experience.
--
TWiki:Main.PeterThoeny - 30 Apr 2009
FWIW and for those who want to revert the dependency, here is the diff:
--- TWiki-4.3.0/lib/TWiki/Render.pm 2009-03-30 02:05:08.000000000 -0700
+++ TWiki-4.3.1/lib/TWiki/Render.pm 2009-04-29 13:39:22.000000000 -0700
@@ -12,6 +12,7 @@
use strict;
use Assert;
use Error qw(:try);
+use Unicode::String qw(utf8 latin1 utf16be);
require TWiki::Time;
@@ -423,7 +424,10 @@
if ( !$compatibilityMode ) {
$anchorName =~ s/^[\s#_]+//; # no leading space nor '#', '_'
}
- $anchorName =~ s/^(.{32})(.*)$/$1/; # limit to 32 chars - FIXME: Use Unicode chars before truncate
+
+ my $utf8AnchorName = Unicode::String->new($anchorName);
+ $anchorName = $utf8AnchorName->substr(0, 32);
+
if ( !$compatibilityMode ) {
$anchorName =~ s/[\s_]+$//; # no trailing space, nor '_'
}
--
TWiki:Main.PeterThoeny - 30 Apr 2009
See manual patch at
TWiki:Support/SID-00291
--
TWiki:Main.PeterThoeny - 13 May 2009
I am rasing an urgent bug
Item6439 to fix this.
--
TWiki:Main.PeterThoeny - 27 Apr 2010