What you need to have this bug:
- Using not utf8, so e.g. iso-8859
- Using Umlauts
- 4.1.2 ( heard that it is also happending on 4.2)
With the current version, i got problems with the ecoding. In details that means, editing a topic and adding umlauts, works and shows correctly. But on saving and editing that topic or simply going into raw mode and back ( by pickaxe ) it breaks when viewing the umlauts. Breaks means, umlauts are shown not correctly, seems like a wrong encoding.
I tried to debug it with the pickaxe mode and find out, that it must be the
TML2HTML convertert. After just skipping the whole converter, i just returned the TML text in the _restTML2THML handler and the encoding still was wrong. After some tips of CDot, it seems like any content coming from
TinyMCE, not depending what your endoding ( in the TWiki configuration ) is UTF8.
And thats where the bug is from. In the current version, there a commented line :
#_handleUTF8( $tml ); if not utf8? :
Iam not sure what it was initialy for, but using it, uncommenting does not solve the problem just right away. When you look at the code, it just only decodes the content from UTF-8, when your twiki charset is utf8, what is, not what we need in our case. In our case, we always have utf8, not depending what you set in twiki, because thats the behavior of tinyMCE. So what i did to solve the problem was : lib/TWiki/Plugins/WysiwygPlugin.pm
sub _restTML2HTML {
my ($session) = @_;
my $tml = TWiki::Func::getCgiQuery()->param('text');
require Encode;
$tml = Encode::decode_utf8( $tml );
#_handleUTF8( $tml ); if not utf8?
So the interesting lines are of course
sub _restTML2HTML {
require Encode;
$tml = Encode::decode_utf8( $tml );
So i decode the content with utf8 in any case. I did not touch the handUTF8 method as iam not sure, what is should be supposed to do in other cases.
So why i post this all. Iam not a guru on tinyMCE and not with all that encodings, but it worked for me. You guys should check wheather this is the correct solutions and more important, the right place to solve it.
I mean, i modify the
WysiwygPlugin to solve a
TinymcePlugin bug, this should not be the right way. Maybe we need a method between the rest handler and
TinymcePlugin, in the
TinymcePlugin, which decodes that all before sending it to the _restTML2HTML method.
dont use this fix on not iso 8859 systems, it breaks the encoding there. I hope to find a proper fix in the next days
--
TWiki:Main.MayerEugen - 13 May 2008
Mayer, I'm sorry but I'm not longer responding to "it works for me" reports that leave me to do the bulk of the work. Our understanding of all the transforms and encoding assumptions is incomplete (as I'm sure you realise) and until we build up a complete picture, I can't be sure your fix doesn't break anything.
Having said that your proposed change does reflect some code I already have in my dev version but not yet checked in (no time).
Here's my current understand of what goes on. Assume a server configured for encoding X (it doesn't matter at this stage what X is):
- Server sends
edit
page encoded using X to the client. This includes a textarea
that has the topic content (encoded using X) embedded in it.
- Client (e.g. firefox) loads the page, recognises the encoding, and builds the DOM. This implicitly converts the text encoded using X into unicode.
- TinyMCE runs, and camps on the textarea, replacing it in the DOM with a
div
. It then fires a callback that invokes my JS.
- My JS compiles an XHR using the content of the textarea (which was converted to unicode in step 2). The XHR uses URL-encoding on the data.
- CGI.pm calls the REST handler passing in the (now UTF8 encoded) textarea content.
- This is converted to HTML, which is then (because this is how TWiki works) written assuming encoding X
- XHR returns the result of the request to JS, when then sets
innerHTML
.
Now, there are a number of dubious assumptions here:
- Does the browser really convert the textarea to unicode in step 2? How can we check?
- What really happens in step 4? How does unicode become UTF-8 embedded in the URL encoding? Does it?
- Does the XHR call really use UTF8 in step 5? Or does it try to convert back to the encoding used in the page? How can we check?
- When setting
innerHTML
in step 7, does it really assume unicode?
Until we have answers to these questions (i.e. a complete picture of the encodings used at each stage of the above flow) I can't deal with your report.
Any help you (or anyone else) can give in analysing this would be very helpful. I have already spent far too much of my free time working on this.
--
CrawfordCurrie - 14 May 2008
Please see
http://develop.twiki.org/~twiki4/cgi-bin/view/LitterTray/TestEncodings and run the test script. If it works, then I can fix this bug.
--
TWiki:Main.CrawfordCurrie - 23 May 2008
Never mind, I ran the necessary tests myself, and fixed the editor.
--
CrawfordCurrie - 25 May 2008