• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.

Current entityEncode() seems broken. &#-notation represent Unicode code position, regardless of the content-encodings.

I found TWiki::entityEncode() doesn't work when you have $TWiki::cfg{Site}{CharSet} == 'utf8' . For example, the character "日" becomes æ—¥ (instead of correct 日 ). Note that the UTF-8 representation is &#-encoded byte-wise.

In Perl 5.8, we can use utf8 flags on strings as follows:

sub entityEncode {
    my $text = shift;

    use Encode;
    $text = Encode::decode($TWiki::cfg{Site}{CharSet}, $text);
    $text =~ s/([^ -~\n\r]|[]["<>&])/'&#'.ord( $1 ).';'/ge;
    return Encode::encode($TWiki::cfg{Site}{CharSet}, $text);

Similar for entityDecode.

Or, just make entityEncode() a nop for chars above 127 and Dakar works better.

    $text =~ s/([]["<>&])/'&#'.ord( $1 ).';'/ge;
This way byte sequences with MSB-set are sent as is---this is what the browsers expect for the specified content charset.


Kaoru, could you specify the version you are testing? This issue was fixed in TWiki:Codev/DakarRelease some time ago (days, weeks, I can't really remember).


I got it from TWiki:Codev.TWikiRelease2005x11x06x7338beta. The installed TWiki.WebHome says:

  • This site is running TWiki version Sun, 06 Nov 2005 build 7330, Plugin API version 1.1

I installed it on a newly created VMware virtual machine running Debian/GNU Linux sarge.

-- TWiki:Main.KaoruMaeda - 22 Nov 2005

It's fixed at Revision 7375. (Just after the beta 4 release)

Thanks. (I usually don't look into SVN branches.)

r7375 | AntonioTerceiro | 2005-11-08 19:41:29 +0100 | 2 lines
Item782: changing regex to not break non 1-byte encodings.

-- TWiki:Main.KaoruMaeda - 22 Nov 2005

Summary TWiki::entityEncode() doesn't work for chars above 0x7f
ReportedBy KaoruMaeda

AppliesTo Engine

Priority Normal
CurrentState Closed

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r7 - 2005-11-22 - KaoruMaeda
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback