Caught the RSS feed failing because there were special chars in the XML output from the feed, breaking the DTD.
Look for the <code>/<pre> parts of the description tag in this snippet:
<item rdf:about="http://develop.twiki.org/~develop/cgi-bin/view/Bugs/Item1978">
<title>Item1978 - Form.pm fails when the <code>name</code> field is <pre>[[Topic][fieldname]]</pre> for controls -- Waiting for Release</title> <link>http://develop.twiki.org/~develop/cgi-bin/view/Bugs/Item1978?t=2006-03-29T18:37:06Z</link>
<description>Form.pm fails when the <code>name</code> field is <pre>[[Topic][fieldname]]</pre> for controls State: Waiting for Release -- last changed by KennethLavrsen</description>
<dc:date>2006-03-29T18:37:06Z</dc:date>
<dc:contributor>
<rdf:Description link="http://develop.twiki.org/~develop/cgi-bin/view?topic=Main.KennethLavrsen">
<rdf:value>KennethLavrsen</rdf:value>
</rdf:Description>
</dc:contributor>
</item>
One way to solve this would be to alter Render.pm:
Index: lib/TWiki/Render.pm
===================================================================
--- lib/TWiki/Render.pm (revision 9600)
+++ lib/TWiki/Render.pm (working copy)
@@ -1263,6 +1263,16 @@
defined( $TWiki::cfg{Site}{CharSet} ) &&
$TWiki::cfg{Site}{CharSet} =~ /^iso-?8859-?1$/i ) {
$text =~ s/([\x7f-\xff])/"\&\#" . unpack( 'C', $1 ) .';'/ge;
+
+ # if there is an & that is not part of an entity, convert it
+ # to &
+ $text =~ s/&(?!#?[a-zA-Z0-9]+;)/&/g;
+
+ # do the rest of the standard escapes for XML: <, >, ', "
+ $text =~ s/</</g;
+ $text =~ s/>/>/g;
+ $text =~ s/"/"/g;
+ $text =~ s/'/'/g;
}
return $text;
This would solve it for RSS feeds only, not sure if it might be more sensible to add it in somewhere else / more generic for SEARCH results?
--
SP
Item1924 is related to this.
--
SP
Perhaps add new parameter to
%SEARCH%
, i.e.
escapexmlentities
, escaping the 5 internal XML entities, as above (and as doc'ed at
http://www.xml.com/pub/a/98/08/xmlqna1.html#INTENT)?
Entity Name |
Replacement Text |
lt |
The less than sign (<) |
gt |
The greater than sign (>) |
amp |
The ampersand (&) |
apos |
The single quote or apostrophe (') |
quot |
The double quote (") |
--
SP
closed duplicate
Item3612
these 5 HTML entities are the only ones that an XML parser is required to know (these 5 are predefined). the other problem occurs on all of the other HTML entities: one approach (probably the best?) is to spit out a list of HTML entity definitions (to define
é
and the rest...) at the top of the RSS feed. see
http://www.xml.com/pub/a/98/08/xmlqna1.html#INTENT and
http://www.w3.org/TR/REC-xml/#sec-entity-decl
--
TWiki:Main.WillNorris - 20 May 2007
It would also be good to add
<![CDATA[
tags when xml is generated. Or use it in an example - currently the
<
in the tag is converted to
<
--
TWiki:Main.ArthurClemens - 21 May 2007
Is this still a problem? AFAICT all the relevant entities are escaped correctly. Can anyone reproduce a problem? I can't.
CC
The relevant part of
WebRss currently looks like this:
%SEARCH{"%URLPARAM{"search" default=".*" }%" web="%WEB%" excludetopic="WebStatistics" regex="on"
nosearch="on" order="modified" reverse="on" nototal="on" limit="16" format="<item rdf:about=\"%SCRIPTURL{"view"}%/$web/$topic\">$n
<title><noautolink>$topic - $formfield(Summary) -- $formfield(CurrentState)</noautolink></title>$n
<link>%SCRIPTURL{"view"}%/$web/$topic?t=$isodate</link>$n <description><noautolink>$formfield(Summary)
State: $formfield(CurrentState) -- last changed by <nop>$wikiname</noautolink></description>$n
<dc:date>$isodate</dc:date>$n <dc:contributor>$n <rdf:Description link=\"%SCRIPTURL{"view"}%?topic=$wikiusername\">$n
<rdf:value>$username</rdf:value>$n </rdf:Description>$n </dc:contributor>$n</item>"}%
The problem is how to XML-encode the
$formfield(Summary)
parts - not the search summary itself. The
<![CDATA[
suggestion above is one way to go, but many RSS-readers doesn't recognize this construction and fails anyway. My experience says he same is still true for other (non-stringent) XML-document parsers in general.
Currently authorization is required to read the Bugs feed, but normally the feed can be directly validated by visiting
feedvalidator.org.
--
TWiki:Main.SteffenPoulsen - 14 Jun 2007
Before we strand in a lock-up, could we use the CDATA as a partial solution first?
--
TWiki:Main.ArthurClemens - 14 Jun 2007
I have added
sections to title
and description
to WebRSS to demonstrate that this is not a valid solution, output becomes:
<title><![CDATA[Item4298 - <code>secret</code> parameter not working -- Closed]]></title>
<link>http://develop.twiki.org/~twiki4/cgi-bin/view/Bugs/Item4298?t=2007-06-24T11:16:55Z</link>
<description><![CDATA[=secret= parameter not working State: Closed -- last changed by CrawfordCurrie]]></description>
with the <
translated to its HTML entity (as Arthur also states above).
Is it correct spec that the original search string is translated to HTML entities, but the output from $formfield
is not?
-- TWiki:Main.SteffenPoulsen - 24 Jun 2007
Yes, I think that is correct. $formfield ought to come out exactly as found in meta.
-- TWiki:Main.CrawfordCurrie - 02 Jul 2007
In the above example, the < to < translation is ruining the CDATA markup - any ideas on how to escape it?
-- TWiki:Main.SteffenPoulsen - 18 Dec 2007
Figured this is an error in Render.pm
, mistakingly taking CDATA sections for lone < and >.
The Bugs RSS feed with CDATA markup now validates, closing this.
-- TWiki:Main.SteffenPoulsen - 18 Dec 2007