• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.

Item3715: Simplify I18N configuration

Item Form Data

AppliesTo: Component: Priority: CurrentState: WaitingFor: TargetRelease ReleasedIn
Engine Configuration, I18N Normal Closed   patch 4.2.1

Edit Form Data

Summary:
Reported By:
Codebase:
Applies To:
Component:
Priority:
Current State:
Waiting For:
Target Release:
Released In:
 

Detail

Triggered by the I18N discussions in other closed bug items I tried to setup my test server for something else.

And it is beyond anything normal people will ever learn to setup.

In configure we have

  • {UserInterfaceInternationalisation}
  • {UseLocale}
  • {Site}{Locale} which takes very mysterious values that you need to be a guru to know
  • {Site}{LocaleRegexes}
  • {Site}{CharSet} - again mysterious values that you have to know to get right
  • {Site}{Lang} - And another one doing almost the same???
  • {Site}{FullLang} - And yet another one?????

I am sorry! I do not understand what all these different settings actually do or which values I should put when.

And I am sure people downloading TWiki and installing it do not know either.

This is far too complex. And it is clear that most developers have no clue either. We do not even understand the spec when different values are selected.

-- TWiki:Main/KennethLavrsen - 04 Mar 2007

Me neither. We inherited most of these settings from Cairo (and probably before that?). I didn't understand them then, and I don't understand them now.

This is another case where someone with expertise in the field is required to rationalise it. While several have entered the race, no-one has stayed the course. frown

CC

I'm sure this can be simplified, but it's partly because I18N is complex (locales are sometimes broken), partly a mattter of reading the docs, and partly finding a better config UI:

  • {UserInterfaceInternationalisation} - this a simple on-off switch, useful for debugging and to preserve performance for non-I18N sites
  • {UseLocale} - ditto, but for the character set and WikiWord
  • {Site}{Locale} - takes industry-standard locale values - you can find these fairly easily, but better docs would help, mapping countries to recommended locales.

For most people, that's it - just three settings.

There are two more for special cases:

  • {Site}{LocaleRegexes} - this is for broken locales including Windows
  • {Site}{CharSet} - this is an override only, sometimes configured by mistake but not needed in 99% of cases

And two that should not be in configure at all, as they are calculated from the {Locale} - let's remove these now:

  • {Site}{Lang}
  • {Site}{FullLang}

We could simplify this probably - just set {Site}{Locale} to 'none' (mapping to 'C' locale which is null), and eliminate {UseLocale}. However... we want to move to Unicode support, at which point locales disappear completely and there will be a new {UseUnicode} setting (you can't mix locales and Unicode without interesting bugs). And there's also a need for a {LangAlphabetic} setting to make TOCs and other features work securely for both Chinese and alphabetic languages. So a config wizard of some sort would be very helpful - just ask for country and set everything else to recommended defaults.

I've written some installation documentation at TWiki:TWiki.InstallationWithI18N which is linked from various places - did you not find this page, or was it not clear, or too long, or something else?

The spec for what happens was originally documented in comments in TWiki.cfg - localeRegexes is rather complex and could probably be simplified. Also the upper and lower national characters stuff is only for Perl 5.5.3, and could be dropped when we drop support for that.

It is good to see some interest in I18N - for a while I wondered if anyone was using it, which has now changed significantly, but it would be good to see it taken further including the ever-elusive Unicode support (see TWiki:Codev.UnicodeSupport). I don't have a huge amount of time for coding, but am always happy to advise - email is a good idea if I don't comment on a topic, though I will try to track Bugs better as that's where the action is these days.

-- RichardDonkin - 05 Mar 2007

Related discussion is at Item3751 - suggesting cleanup to return behaviour to original simpler model, as I have documented above (but which is not current behaviour, so Kenneth is right!) Although I would still like to know if he read the docs - running locale -a is not that hard.

We do also need to talk about locale-gen, which at least on Debian/Ubuntu is needed to actually generate all but basic .utf8 type locales. Does anyone know other Linux distros that use this command, particularly Red Hat / SUSE as I'm an Ubuntu and Debian user?

-- TWiki:Main.RichardDonkin - 12 Mar 2007

Re the documentation, there is too much of it and it includes too much old stuff, you read something until your brain hurts, then at the end you discover it no longer applies. Docs which seem to apply and be useful tend to say "see also" and refer to other docs, so you are compelled to read thrugh the whole nest of them. With some years of twiki experience, it's taken me a couple of weeks to read them all and decide that I've found a bug after all.

It would be very helpful to have one self contained up to date complete admin reference topic that required no "see also" to get the entire picture, instructions, problems, etc for the current version, and which clearly identifies itself as being that kind of document.

-- TWiki:Main.SueBlake - 13 Mar 2007

Sue, you seem to be talking about the whole doc set, not just the I18N doc which is pretty short (see link from I18N page) - and this bug is about Configure not just the docs, so it's best if you file a new bug.

-- TWiki:Main.RichardDonkin - 14 Mar 2007

Richard fixed this some time ago (thanks!), closing.

-- TWiki:Main.SteffenPoulsen - 17 Sep 2007

This is not even close to being solved. The original bug report was that there were too many settings in configure that do not make sense and that MUST be possible to simplify.

  • {UserInterfaceInternationalisation}
  • {UseLocale}
  • {Site}{Locale}
  • {Site}{LocaleRegexes}
  • {Site}{CharSet}
  • {Site}{Lang}
  • {Site}{FullLang}

I have one setting that I can set to en_US.ISO-8859-1

I have one that I can set to iso-8859-15. Or should it be -1?

I have one that I can set to en

And one I can set to en-us

This is crazy and no normal person will be able to set this up. In other products you just choose your language from a pull down menu and the geek stuff is handled in the background.

It is clearly my impression that the settings of {Site}{Locale}, {Site}{CharSet}, {Site}{Lang}, and {Site}{FullLang} must match. It should be possible to replace these by ONE setting.

You cannot document your way out of this.

-- TWiki:Main.KennethLavrsen - 17 Sep 2007

In standard configure they are already replaced by ONE setting, ({Site}{Locale}).

{Site}{Lang} and {Site}{FullLang} can probably be deprecated as mentioned above, but as far as I can tell there are special cases justifying the rest of the options *shrug*. Let's leave it open then.

-- TWiki:Main.SteffenPoulsen - 17 Sep 2007

The whole reason I opened this item was to get rid of the redundant settings.

So let us do that. Richard if you read this - can we get rid of at least {Site}{Lang} and {Site}{FullLang} now?

-- TWiki:Main.KennethLavrsen - 17 Sep 2007

I am raising this to urgent.

After having spent 3 days of development because I did not understand how to define these settings this has to be fixed now.

Half of these settings have to go.

I will walk through the code one setting at a time the next days.

{Site}{FullLang} is as far as I can see not used for anything else than setting a $functionTags{LANG} in TWiki.pm and as far as I can see it is not used anywhere. It seems we can remove it. {Site}{Lang} sets $functionTags{SHORTLANG} and I cannot see this used anywhere either. The {Site}{Lang} is used one place where it is not really needed and that can be removed as well so this can be eliminated. Unless I have missed something they are history as of 4.2.1. I will remove them from configure and TWiki.pm unless someone protest.

Now I need to look at {Site}{CharSet}. As a minimum the TWiki code must be tolerent of upper vs lower case and with a without dash in utf-8/utf8 and the help text needs to loose the lame geek explanation and simply state what value to choose in which case with some examples of the syntax.

{Site}{Locale} and {Site}{CharSet} needs better help text. The UTF8 case needs to be added to the examples both places.

And I still wonder why we need both settings.

-- TWiki:Main.KennethLavrsen - 01 Apr 2008

am running code with {Site}{Lang} and {Site}{FullLang} and all code related to them eliminated.

Need to test for one day before I check in.

I corrected one place where utf-8 was hardcoded but there may be more places where the exact case and "-" in utf-8 must match exactly.

-- TWiki:Main.KennethLavrsen - 01 Apr 2008

congratulations Kenneth - too many people stopped through fear of breaking things - when as you indicate, the complexity isa brokeness.

-- TWiki:Main.SvenDowideit - 03 Apr 2008

Thanks for the encouragement Sven.

I did my best. I think it all works. I had one unit test failure but it was also there without my code changes.

One small sacrifize. TWiki disabled the plural feature when the language was not en. But we have a configure option called {PluralToSingular} which directly controls this so I see no reason to also rely on an expert setting. If people want it then it is better to derive it from the {Site}{Locale} (two first letters). But this kind of logic - also the old code relying on the {lang} - is hidden, and surprising. If you enable {PluralToSingular} you expect it to work. If it disturbs you, you disable it.

I tried to improve the configure help text. It can for sure be TWiki:Codev.NerdoMeter reduced another step or two later. But this will do for a 4.2.1.

I unexperted the {Site}{CharSet} option. Assuming we get the UTF8 urgent bug fixed this setting is important that it gets set along with {Site}{Locale}.

Still left the help text saying that utf-8 is experimental because I am sure we have many new bugs waiting once we get the first urgent utf8 bug fixed.

-- TWiki:Main.KennethLavrsen - 04 Apr 2008

Just found this bug again after a long gap....

The {Site}{Lang} and {Site}{FullLang} items were never intended to be config items originally - back when we used testenv and TWiki.cfg, they were not configurable at all, but were derived from the locale as internal TWiki variables. The idea was that I18N code could be enhanced later to support translation support etc by knowing the site's default language etc. At some point in TheGreatTWikiRefactoring this was changed so these became config options.

The {Site}{CharSet} config was always intended to be blank by default, normally derived from locale setting and used only if specified as a non-blank, as an override. At the time of refactoring, this was also broken. At least one "hard-coding" of utf-8 was also intended to work like this by taking the locale's spelling of "utf8" and generating the IANA version "utf-8" which works as an HTTP header with various browsers. If this has been removed it may well have broken this case.

On the plural-to-singular - don't agree with this change, the reason this was added was because the TWiki plural to singular feature on WikiWords only works for English - your change has made the config interface more complex for non-English sites, whereas previously it just worked automatically.

Getting I18N working correctly is hard when you're trying to work across Perl versions and on platforms with broken locales (including Windows) - however, the help text needs improving so most of these things can be ignored in the normal case.

On your last point, it would be better to say in the text that utf-8 is completely unsupported - I wrote that text while in middle of utf-8 hacking which was never completed, and it's misleading to imply that utf-8 works at all other than as documented in TWiki:TWiki.InstallationWithI18N.

The original config when I coded this in early 2003 was quite simple (translated into configure):

  • {UseLocale}
  • {Site}{Locale} - you just specify a locale - finding out a suitable locale by typing locale -a on the server is not rocket science compared to configuring TWiki

Arguably I should have just had one setting, the locale, and when this was blank just turned off the locale code.

There were two expert items added to work around platform or browser issues:

  • {Site}{LocaleRegexes} - used to work around broken locales
  • {Site}{CharSet} - overrides spelling of charset in locale as mentioned.

The extra complexity added in the refactoring is essentially bitrot - the problems are partly my fault for not remaining more involved and partly because people who didn't understand the I18N code have broken a few things. However, this is all fixable and it's good to see this code getting some attention smile

-- TWiki:Main.RichardDonkin - 26 Jun 2008

With respect to the plural S feature. It actually works pretty well also in French. But not in Danish. So it is better that you can turn it on or off because none of us have the perfect overview of which languages use plural S and which do not, to make an automated feature. Another thing is that many TWikis are bi or tri lingual. I can easily see a Danish TWiki setup for Danish to make it work with ÆØÅ (Danish alphabet is ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅ) as letters in words with correct sorting and still have the plural S feature because a lot of the content is English. In a Danish TWiki used in a company with just a little bit of international context (we have many foreign people hired in most companies) you write the content in English but you still need the site to be using the Danish locale because our names of people and places are full of ÆØÅ.

We live in a complex world. It would be so much easier if we all spoke Danish stick out tongue wink

-- TWiki:Main.KennethLavrsen - 26 Jun 2008

Good point about Danish and mixed Danish-English sites - always useful to understand more about how people use TWiki I18N.

-- TWiki:Main.RichardDonkin - 28 Jun 2008

ItemTemplate
Summary Simplify I18N configuration
ReportedBy TWiki:Main.KennethLavrsen
Codebase ~twiki4
SVN Range TWiki-4.1.2, Sat, 03 Mar 2007, build 13043
AppliesTo Engine
Component Configuration, I18N
Priority Normal
CurrentState Closed
WaitingFor

Checkins TWikirev:16611 TWikirev:16612
TargetRelease patch
ReleasedIn 4.2.1
Edit | Attach | Watch | Print version | History: r21 < r20 < r19 < r18 < r17 | Backlinks | Raw View |  Raw edit | More topic actions
Topic revision: r21 - 2008-08-10 - GilmarSantosJr
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback