Version
TWIKI: http://ilps-twiki.science.uva.nl/twiki/bin/view/Main/WebHome?topic=ParliamentTransformers GIT: get your local copy for editing: $ git clone git@bitbucket.org:ilps/pm-transformer.git EXIST LOGS: http://monitor.politicalmashup.nl/monitor/logs TODO: make this README a readable format, as it is included in transformer.politicalmashup.nl/index.php PULL INTO transformer.politicalmashup.nl ssh mashup2.science.uva.nl sudo su ilps_bg /scratch/tools/git/pull.sh transformer All info in Remember to clone DutchParlSchema if you need it: $ git clone git@github.science.uva.nl:politicalmashup/schema.git (an updated clone lives on http://schema.politicalmashup.nl/) TODO: most information below is out of date. Update the twiki page, and fill this README (for people who can not read the twiki) Addedum about below: this is (non-breaking space), which is obviously whitespace, but is apparently not cleaned by normalize-space(). There is a strange "invisible" character that every now and then crops up in the swedish data (perhaps norwegian too). normalize-space() does not remove it, but a replace(.,' ','') does (for html display, that is  ). replace(.,' ','') should too (note the whitespace looking char. It should not be whitespace, and hexdump get's confused. No clue, but non-disappearing space might be good to remove everywhere. check: http://parliament.politicalmashup.nl/monitor/logs?pipe=se&result=&action=&top=10 with set-xml-base.xsl on se/live/se-ge-2003.xsl (re)move: folder nl/ with example documents (re)move: earliest two se periods (1970-1990), and clean up any old se stuff clean up: scripts (transform/validate etc.) move to subfolder script, or delete alltogether move: nl, new and old (!) to its own folder - nl-draft and nl-officielebekendmakingen-ge-2011.xsl are to use the new structure (common/live/ includes) - nl-officielebekendmakingen.xsl (and its Preprocess?) are older and more or less standalone, and will probably remain as such for now. - nl-sgd.* is trickier, uses several pre/post processing steps and the old All2Pol.. includes. Long-term here also, to rewrite to new format. - N.B. it is not immediately important, but desirable to also convert older nl-* xslt to the new format. This makes them easier to maintain, for when we want to re-run all data and include new structure (new meta-fields perhaps?). (remove: make sure (after nl) they are not needed anymore. Then delete all All2Pol... in the root, remove CountrySpecificTemplate.xsl) about nl order: http://www.overheid.nl/help/oep/handelingen/nummering
(and TekstLuft2)
Looking in the stylesheet, TekstIndryk has an indent, and TekstLuft some margin on top. This looks like Tekst is alway the start, TekstIndryk starts a new (dutch style interpretation, not american) 'alinea', TekstLuft a new 'paragraaf' (TekstLuft2 has double margin, a "really different" 'paragraaf').