• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.

Item5626: Encoding problem with iso-8859 and umlauts

Item Form Data

AppliesTo: Component: Priority: CurrentState: WaitingFor: TargetRelease ReleasedIn
Extension TinyMCEPlugin Urgent Closed   patch 4.2.1

Edit Form Data

Summary:
Reported By:
Codebase:
Applies To:
Component:
Priority:
Current State:
Waiting For:
Target Release:
Released In:
 

Detail

What you need to have this bug:
  • Using not utf8, so e.g. iso-8859
  • Using Umlauts
  • 4.1.2 ( heard that it is also happending on 4.2)

With the current version, i got problems with the ecoding. In details that means, editing a topic and adding umlauts, works and shows correctly. But on saving and editing that topic or simply going into raw mode and back ( by pickaxe ) it breaks when viewing the umlauts. Breaks means, umlauts are shown not correctly, seems like a wrong encoding.

I tried to debug it with the pickaxe mode and find out, that it must be the TML2HTML convertert. After just skipping the whole converter, i just returned the TML text in the _restTML2THML handler and the encoding still was wrong. After some tips of CDot, it seems like any content coming from TinyMCE, not depending what your endoding ( in the TWiki configuration ) is UTF8.

And thats where the bug is from. In the current version, there a commented line :

#_handleUTF8( $tml ); if not utf8? :
Iam not sure what it was initialy for, but using it, uncommenting does not solve the problem just right away. When you look at the code, it just only decodes the content from UTF-8, when your twiki charset is utf8, what is, not what we need in our case. In our case, we always have utf8, not depending what you set in twiki, because thats the behavior of tinyMCE. So what i did to solve the problem was : lib/TWiki/Plugins/WysiwygPlugin.pm

sub _restTML2HTML {
    my ($session) = @_;
    my $tml = TWiki::Func::getCgiQuery()->param('text');
    require Encode;
    $tml = Encode::decode_utf8( $tml );
    #_handleUTF8( $tml ); if not utf8?

So the interesting lines are of course

sub _restTML2HTML {
    require Encode;
    $tml = Encode::decode_utf8( $tml );

So i decode the content with utf8 in any case. I did not touch the handUTF8 method as iam not sure, what is should be supposed to do in other cases.

So why i post this all. Iam not a guru on tinyMCE and not with all that encodings, but it worked for me. You guys should check wheather this is the correct solutions and more important, the right place to solve it.

I mean, i modify the WysiwygPlugin to solve a TinymcePlugin bug, this should not be the right way. Maybe we need a method between the rest handler and TinymcePlugin, in the TinymcePlugin, which decodes that all before sending it to the _restTML2HTML method.

dont use this fix on not iso 8859 systems, it breaks the encoding there. I hope to find a proper fix in the next days

-- TWiki:Main.MayerEugen - 13 May 2008

Mayer, I'm sorry but I'm not longer responding to "it works for me" reports that leave me to do the bulk of the work. Our understanding of all the transforms and encoding assumptions is incomplete (as I'm sure you realise) and until we build up a complete picture, I can't be sure your fix doesn't break anything.

Having said that your proposed change does reflect some code I already have in my dev version but not yet checked in (no time).

Here's my current understand of what goes on. Assume a server configured for encoding X (it doesn't matter at this stage what X is):

  1. Server sends edit page encoded using X to the client. This includes a textarea that has the topic content (encoded using X) embedded in it.
  2. Client (e.g. firefox) loads the page, recognises the encoding, and builds the DOM. This implicitly converts the text encoded using X into unicode.
  3. TinyMCE runs, and camps on the textarea, replacing it in the DOM with a div. It then fires a callback that invokes my JS.
  4. My JS compiles an XHR using the content of the textarea (which was converted to unicode in step 2). The XHR uses URL-encoding on the data.
  5. CGI.pm calls the REST handler passing in the (now UTF8 encoded) textarea content.
  6. This is converted to HTML, which is then (because this is how TWiki works) written assuming encoding X
  7. XHR returns the result of the request to JS, when then sets innerHTML.
Now, there are a number of dubious assumptions here:
  • Does the browser really convert the textarea to unicode in step 2? How can we check?
  • What really happens in step 4? How does unicode become UTF-8 embedded in the URL encoding? Does it?
  • Does the XHR call really use UTF8 in step 5? Or does it try to convert back to the encoding used in the page? How can we check?
  • When setting innerHTML in step 7, does it really assume unicode?
Until we have answers to these questions (i.e. a complete picture of the encodings used at each stage of the above flow) I can't deal with your report.

Any help you (or anyone else) can give in analysing this would be very helpful. I have already spent far too much of my free time working on this.

-- CrawfordCurrie - 14 May 2008

Please see http://develop.twiki.org/~twiki4/cgi-bin/view/LitterTray/TestEncodings and run the test script. If it works, then I can fix this bug.

-- TWiki:Main.CrawfordCurrie - 23 May 2008

Never mind, I ran the necessary tests myself, and fixed the editor.

-- CrawfordCurrie - 25 May 2008

ItemTemplate
Summary Encoding problem with iso-8859 and umlauts
ReportedBy TWiki:Main.MayerEugen
Codebase 4.1.2, 4.2.0
SVN Range TWiki-5.0.0, Sun, 04 May 2008, build 16770
AppliesTo Extension
Component TinyMCEPlugin
Priority Urgent
CurrentState Closed
WaitingFor

Checkins TWikirev:16830 TWikirev:16831
TargetRelease patch
ReleasedIn 4.2.1
Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View |  Raw edit | More topic actions
Topic revision: r9 - 2008-08-04 - KennethLavrsen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback