• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.

Item6896: When charset=UTF-8, viewing a page with non-ASCII character with raw=on parameter corrupts some characters

Item Form Data

AppliesTo: Component: Priority: CurrentState: WaitingFor: TargetRelease ReleasedIn
Engine   Normal Closed   major 6.0.0

Edit Form Data

Summary:
Reported By:
Codebase:
Applies To:
Component:
Priority:
Current State:
Waiting For:
Target Release:
Released In:
 

Detail

utf-8-corruption.png

This is caused by CGI::textarea() in view() in core/lib/TWiki/UI/View.pm. Calling CGI::charset($TWiki::cfg{Site}{CharSet}) before CGI::textarea() call solves this problem. It's confirmed to be harmless with ISO-8859-1. It should do no harm to other character sets.

-- TWiki:Main/HideyoImazu - 2012-06-29

It turned out that UTF-8 is not the only character set affected. Specifically, character sets utilizing the \x8b or \x9b bytes in a different way from Windows CP1252 are affected. Such as...

  • Shift JIS (a.k.a. Windows CP932, very common for Japanese)
  • Big5 (a.k.a. Windows CP950, traditional Chinese)
  • Windows CP936 (Simplified Chinese)
  • Windows CP949 (Korean)

In CGI.pm, the following lines of escapeHTML() affect those bytes.

         my $latin = uc $self->{'.charset'} eq 'ISO-8859-1' ||
                     uc $self->{'.charset'} eq 'WINDOWS-1252';
         if ($latin) {  # bug in some browsers
Other character sets than ISO-8859-1 or WINDOWS-1252 may need some treatment here, but it's up to CGI.pm maintainer to take care.

-- TWiki:Main.HideyoImazu - 2012-07-02

ItemTemplate
Summary When charset=UTF-8, viewing a page with non-ASCII character with raw=on parameter corrupts some characters
ReportedBy TWiki:Main.HideyoImazu
Codebase

SVN Range

AppliesTo Engine
Component

Priority Normal
CurrentState Closed
WaitingFor

Checkins TWikirev:23066
TargetRelease major
ReleasedIn 6.0.0
Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng utf-8-corruption.png r1 manage 34.3 K 2012-06-29 - 09:45 HideyoImazu  
Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View |  Raw edit | More topic actions
Topic revision: r10 - 2013-10-15 - PeterThoeny
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback