the core strips characters that aren't
[A-Za-z0-9_]
which means fields Dón and Dán can't be disambiguated from each other. there are many comments already in the code pointing out this is a
BadThing. also, it forces other plugins in the system to know about this procedure (see
Item2167)
# Chop out all except A-Za-z0-9_.
# I'm sure there must have been a good reason for this once.
sub _cleanField {
my( $text ) = @_;
$text = '' if( ! $text );
# TODO: make this dependent on a 'character set includes non-alpha'
# setting in TWiki.cfg - and do same in Render.pm re 8859 test.
# I18N: don't get rid of non-ASCII characters
# TW: this is applied to the key in the field; it is not obvious
# why we need I18N in the key (albeit there could be collisions due
# to the filtering... but all the current topics are keyed on _cleanField
$text =~ s/<nop>//go; # support <nop> character in title
$text =~ s/[^A-Za-z0-9_\.]//go;
return $text;
}
--
WN
Do you have an alternative implementation in mind?
CC
Is there any (really) good reason for removing non-ascii characters from field names?
Just removing this filter wouldn't break applications that don't use international characters in field names ...
--
AT
Some info on this bug at
TWiki:Support.InternationalCharactersInFormFields and corresponding Codev page. Thought it was fixed in SVN, probably mainline though...
Other previously logged bugs in
I18N are generally at
TWiki:Codev.InternationalisationIssues in case anyone wants to go bug-fixing.
--
RD
This is fixed in SVN.
--
SP