• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.

Item4074: TOC and heading anchors not multi-byte aware

Item Form Data

AppliesTo: Component: Priority: CurrentState: WaitingFor: TargetRelease ReleasedIn
Engine I18N Normal Closed   patch 4.3.1

Edit Form Data

Summary:
Reported By:
Codebase:
Applies To:
Component:
Priority:
Current State:
Waiting For:
Target Release:
Released In:
 

Detail

Reported in TWiki:Support/JapaneseHeadersInIE6
Our TWiki site is mostly in English but we want to have a few pages in Japanese (we are a Japanese company). With this in mind we have set our CharSet to UTF-8. We have written a topic in Japanese and it works fine in IE7 and Firefox, but renders very poorly in IE6 (fonts enormous, Japanese not displayed, etc).

We have narrowed the problem down to the header at the top of the topic. The TWikiML reads as follows:

---+ トトロシステムのご紹介

The corresponding html generated by TWiki contains an anchor at this point, i.e.:

<h1><a name="トトロシステムのご紹�"></a> トトロシステムのご紹介 </h1>

Note that the final character in the name of the anchor seems to have been corrupted in some way. If we remove this character from the heading, the page renders fine in IE6.

Does anyone know what is going wrong here? Or is there a way to stop headers (---+ etc) generating anchors in html?

It looks like the second byte of a multi-byte character is cut off when applying the length limit to anchor names.

-- TWiki:Main/PeterThoeny - 15 May 2007

Analysis sounds credible. Another I18N issue (I set the Component accordingly)

We really need someone with a vested interest in this area to help out.

CC

Reported in TWiki:Codev/UtfAnchorError, with patch to fix.

-- PTh

This bug was not fixed in Georgetown release. Before making a suggestion about this bug, I have a question.

Do you guys think that Anchor name have to be recognized by human? I mean, I don't think that anchor name keeps the summarized form(32 characters) of real heading strings.

Actually, I have my own solution, which is to use md5 hash as an anchor name. If you agree with my opinion, I'll submit my code into repository.

-- TWiki:Main.JustinKim - 31 Mar 2009

I've commit my own solution to Georgetown branch.

-- TWiki:Main.JustinKim - 01 Apr 2009

Thank you Justin for working on this!

We need to consider compatibility: People send links by e-mail pointing inside a TWiki page via TOC, for example https://develop.twiki.org/do/view/TWiki/TWikiVariables#Setting_Preferences_Variables. Therefore I think we should keep the current behavior for ASCII characters, and do something special for UTF-8 chars.

Please keep trunk and latest branch in sync as much as possible, e.g. besides applying changes to the 4.3 branch, apply your changes also to trunk.

-- TWiki:Main.PeterThoeny - 09 Apr 2009

One more point to consider: Unicode::String was added to Render.pm. This module is listed as an optional module in TWiki:TWiki04x03/TWikiSystemRequirements. It is OK to move it to the required section if that module comes standard with the required Perl version for TWiki, 5.6.1.

-- TWiki:Main.PeterThoeny - 09 Apr 2009

We have now a support question on TWiki 4.3.1 not running due to the new dependency on Unicode::String, see TWiki:Support/SID-00291.

We need to discuss and decide what to do with this dependency. IMHO we should make that dependency optional, e.g. use the module if installed, else fall back to old code.

-- TWiki:Main.PeterThoeny - 30 Apr 2009

(Per Peter's request, mostly quoting from SID-00291): As as result of this code, upgrading from 4.3.0 --> 4.3.1 requires an additional CPAN package as a dependency. It seems odd to introduce a new dependency in a minor version release, especially as a hotfix.

Also, this module is not mentioned in configure -> CGI Setup -> Perl Modules. The only Unicode module there is Unicode::MapUTF8, which doesn't seem to be required unless you need international chars. That would be a good place to give admins a clue that something is missing or gone wrong.

In any case, unfortunately, this code makes a perfectly functional and usable 4.3.0 TWiki site (without the Unicode::String module) completely unusable. frown

-- TWiki:Main.JohnDeStefano - 30 Apr 2009

No problem, it was an unintended consequence. A learning experience.

-- TWiki:Main.PeterThoeny - 30 Apr 2009

FWIW and for those who want to revert the dependency, here is the diff:

--- TWiki-4.3.0/lib/TWiki/Render.pm     2009-03-30 02:05:08.000000000 -0700
+++ TWiki-4.3.1/lib/TWiki/Render.pm     2009-04-29 13:39:22.000000000 -0700
@@ -12,6 +12,7 @@
 use strict;
 use Assert;
 use Error qw(:try);
+use Unicode::String qw(utf8 latin1 utf16be);
 
 require TWiki::Time;
 
@@ -423,7 +424,10 @@
     if ( !$compatibilityMode ) {
         $anchorName =~ s/^[\s#_]+//;  # no leading space nor '#', '_'
     }
-    $anchorName =~ s/^(.{32})(.*)$/$1/; # limit to 32 chars - FIXME: Use Unicode chars before truncate
+
+    my $utf8AnchorName = Unicode::String->new($anchorName);
+    $anchorName = $utf8AnchorName->substr(0, 32);
+
     if ( !$compatibilityMode ) {
         $anchorName =~ s/[\s_]+$//;    # no trailing space, nor '_'
     }

-- TWiki:Main.PeterThoeny - 30 Apr 2009

See manual patch at TWiki:Support/SID-00291

-- TWiki:Main.PeterThoeny - 13 May 2009

I am rasing an urgent bug Item6439 to fix this.

-- TWiki:Main.PeterThoeny - 27 Apr 2010

ItemTemplate
Summary TOC and heading anchors not multi-byte aware
ReportedBy TWiki:Main.PeterThoeny
Codebase 4.0.5, 4.3.0
SVN Range TWiki-4.1.2, Sun, 13 May 2007, build 13714
AppliesTo Engine
Component I18N
Priority Normal
CurrentState Closed
WaitingFor

Checkins TWikirev:17949 TWikirev:17951
TargetRelease patch
ReleasedIn 4.3.1
Edit | Attach | Watch | Print version | History: r16 < r15 < r14 < r13 < r12 | Backlinks | Raw View |  Raw edit | More topic actions
Topic revision: r16 - 2010-04-27 - PeterThoeny
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback