• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.

Item4831: Embedded HTML elements within TML are mutilated or eliminated

Item Form Data

AppliesTo: Component: Priority: CurrentState: WaitingFor: TargetRelease ReleasedIn
Extension TinyMCEPlugin Urgent Closed   n/a  

Edit Form Data

Reported By:
Applies To:
Current State:
Waiting For:
Target Release:
Released In:


Consider a TWiki topic which contains explicit HTML, like e.g.
<div style="border: thin solid red">

After a TMCE edit/save cycle either the attributes are gone (for example in <br class="wide" />) or the elements vanish completely (as in the div example above). During editing they're still there, visible both in WYSIWYG and HTML source.

Currently this is a show stopper for us. In the long run, I would have tried to replace all explicit HTML with TWiki preferences, but there's still a lot of stuff around which has been copypasted from legacy web pages. Some are tables with class attributes which are absolutely needed for the table layout.

The critical thing is that the whole topic is messed up if someone tries to make a tiny correction in another place. In longer topics this might go unnoticed by someone who just fixes a typo.

Apparently form gets a special treatment because it is so frequently encountered. Is there a list of "permitted" HTML elements?

-- TWiki:Main/HaraldJoerg - 15 Oct 2007

Kinda. Any tag that does not have a conversion function in lib/TWiki/Plugins/WysiwygPlugin/HTML2TML/Node.pm will pass through unscathed.

The issue you describe here is a really difficult one. We made the decision to retain TML underneath WYSIWYG, and the requirement on WYSIWYG was and remains to edit TML. When we allow too much HTML to remain in topics, we get flamed for that. When we remove too much of it, we get flamed as well. The balance point is different for everybody; it all depends on how important TML is to you. If it is totally unimportant, you want the minimum HTML removed. If it is totally critical, you want the reverse.

There are various mechanisms in place to try and find the balance point. For example, there is a list of PALATABLE_HTML. The comment for this variable reads as follows:

# HTML elements that are palatable to editors. Other HTML tags will be
# rendered in 'protected' regions to prevent the WYSIWYG editor mussing
# them up.

so, if you removed DIV from this list, it would be protected from the WYSIWYG editor and so displayed as <div>. If you leave it in the list, then the translator tries to convert that DIV back to TML, which necessitates the loss of attributes.

Note that we could choose to prevent post-filtering of tags that have specific blocking attributes - for example, if a P has an onclick or a style, don't try and post-convert it to a TML paragraph. However you more frequently want to apply this sort of filtering, for example when pasting content from another web site, or importing a M$Word document. For this reason I decided to remove all the existing filters of this type (they existed in a previous incarnation).

I would be happy to work on this problem, but not alone. One user group seems more concerned about TML integrity, but another is more worried about HTML integrity. Right now I'm sitting on the fence getting poked with sharp sticks from both sides frown

-- CrawfordCurrie - 16 Oct 2007

See also Item4790 and Item4832 which report what is essentially the same problem.

-- TWiki:Main.CrawfordCurrie - 16 Oct 2007

I see your point. I admit that I did not monitor nor contribute to any WYSIWYG discussion because I myself prefer plain text. I have never experimented with copypasting M$ stuff, but I know and loathe their moronic approach to HTML.

That hurts. I had hoped that certain empty elements seasoned with attributes could be used as "marker" for TML specials, invisible to TMCE but allowing HTML2TML to create correct TML (the Item4747 case).

My first experiments of copypasting office documents (OpenOffice in my case) to TMCE were nevertheless devastating - often the topic would end up empty. I guess this is a Firefox/OO issue, have to experiment a bit more.

... And finally, I've understood what the f*ck "Protect on save" is supposed to do. It only sort of works for HTML. Once. If I write:

<span style="border: thin solid red">protect on save</span>
...and mark that paragraph as "protect on save" before saving, then I get a thin, solid, red border in my topic. Fine. If I edit again, I see the text in a thin, solid, red border in TMCE. And if I save, again, the border is gone, as is the span.

-- TWiki:Main.HaraldJoerg - 16 Oct 2007

I added a new mapping table that lets you constrain HTML removal. The table can be set up to keep HTML if certain attributes exist on the tags; though I haven't gone as far as inspecting attribute values. With <sticky> also available, I believe this is now closed.


Sorry, but I have no idea how to use the mapping table. Did you miss to commit the documentation changes? TinyMCEPlugin.txt has a sentence reading

To do this, you must . 
I suppose this was intended to be the cookbook I'm looking for?

-- TWiki:Main.HaraldJoerg - 18 Oct 2007

Hi Crawford,

Sorry to hear about your troubles. Let's try to figure out something that is agreeable to both camps. It seems you have the $WC::VERY_CLEAN variable in place to control this sort of behavior.

I'm in the "more HTML" camp, and would be satisfied if this variable were settable from the WysiwygPlugin preferences page. When unset, the plugin will still try to convert to TML, but do nothing to break the Wysiwyg qualities of the editor (removing font tags, contracting whitespace, etc.).

I'd be happy to submit patches and help maintain unit tests for this.

-- TWiki:Main.BuckGolemon - 23 Oct 2007

I am getting shit scared that we are about to turn TWiki and Wysiwyg into the worst possible hack and create a compatibility hell.

It is important to note that the TinyMCE Wysiwyg is a "the best we have at the moment" thing which for sure will be replaced with something else in TWiki 5.0. A Wysiwyg editor that converts between TML and HTML will never really be good and will be replaced by something else.

In Item4790 it was suggested that pasting from MS Word to TWiki should preserve all the junk html code. I did not meet one single developer at WikiSym2007 from other wiki platforms that agreed. On the contrary we all agreed that the wiki markup is a STRENGTH in wikis and loosing that is suicide. We should not try and turn TWiki and other wikis into a "MS Word online". The TWiki formatted searching, the TWiki Applications etc all live from a predictable and simple markup. If we allow a super-hack of a Wysiwyg editor to let people paste all sorts of junk html into topics we reduce TWiki to an on-line type writer.

And we end up one year from now when TWiki 5.0 gets released that 1000s of topics will be pure junk that noone can edit. I have seen some pretty good examples of Wysiwyg editors for other Wiki's at the Wikisym 2007 and things are about to change in several Wiki platforms the next year or so.

It is also essential that plugins made for TWiki work on the TML in the topics. And those plugins should not be depending on what settings you may have in the WysiwygPlugin.

And from a users point of view. Having to hack settings in the WysiwygPlugin to be able to edit topics makes no sense. Only in a TWiki where every application is created by a single geek and users are regarded dumb people that are only allowed to fill out forms does such setup make sense. And this is exactly what wikis are NOT about.

I am all for a feature that enables TWIki to protect an areas from being modified by TMCE. That can also later be working with a different Wysiwyg editor. But major hacks in the plugin to specific HTML tags should be avoided.

-- TWiki:Main.KennethLavrsen - 24 Oct 2007

Harald, I checked in more doc before I left for Scotland; please SVN update and check again. The doc is in WysiwygPluginSettings.

-- TWiki:Main.CrawfordCurrie - 24 Oct 2007

Summary Embedded HTML elements within TML are mutilated or eliminated
ReportedBy TWiki:Main.HaraldJoerg
Codebase 4.2.0
SVN Range TWiki-4.3.0, Fri, 12 Oct 2007, build 15261
AppliesTo Extension
Component TinyMCEPlugin
Priority Urgent
CurrentState Closed

Checkins TWikirev:15324 TWikirev:15326
TargetRelease n/a

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View |  Raw edit | More topic actions
Topic revision: r10 - 2007-10-24 - CrawfordCurrie
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback