[H-GEN] Edit massive XML files
Russell Stuart
russell-humbug at stuart.id.au
Sun Sep 18 23:55:30 EDT 2011
On Mon, 2011-09-19 at 13:11 +1000, Mick wrote:
> small snips
> ===========
> ...
> <node id="290343" version="3" timestamp="2011-07-01T10:12:42Z"
> uid="61216" user="UniEagle" changeset="8597887" lat="51.2238425"
> lon="-2.5264778"/>
> <node id="290344" version="7" timestamp="2011-06-01T11:50:39Z"
> uid="109205" user="Jack Stringer" changeset="8309592"
> lat="51.2237079" lon="-2.5236032">
> <tag k="name" v="Oakhill"/>
> <tag k="place" v="village"/>
> <tag k="postal_code" v="BA3 5"/>
> <tag k="source" v="npe"/>
> </node>
> <node id="290345" version="7" timestamp="2009-10-29T15:49:02Z"
> uid="90943" user="chris_debian" changeset="2982115"
> lat="51.2222188" lon="-2.5236527"/>
> ...
>
> should become
> =============
> ...
> <node id="290343" version="3" timestamp="2011-07-01T10:12:42Z"
> uid="61216" user="UniEagle" changeset="8597887" lat="51.2238425"
> lon="-2.5264778"/>
> <node id="290344" version="7" timestamp="2011-06-01T11:50:39Z"
> uid="109205" user="Jack Stringer" changeset="8309592"
> lat="51.2237079" lon="-2.5236032">
> <tag k="name" v="Oakhill"/>
> <tag k="place" v="village"/>
> </node>
> <node id="290345" version="7" timestamp="2009-10-29T15:49:02Z"
> uid="90943" user="chris_debian" changeset="2982115"
> lat="51.2222188" lon="-2.5236527"/>
> ...
> ==========
>
> in this instance the tags:
> <tag k="postal_code" v="BA3 5"/>
> <tag k="source" v="npe"/>
> need to be deleted along with EVERY other instance whatever the v="
> value is
I am not sure what the word "instance" in "EVERY other instance whatever
the v= value" refers to. Is it the node the tag was found in, or every
tag with the same k in the file, or just all tags with the same k under
the node the original was found in?
If it is one of those it becomes difficult in the general case as you
may have to delete lines before the one you match on. In that case you
are up for writing a program, be it in sed, awk or whatever, but the
response you got from Peter is probably best advice - the simplest way
is to use a streaming XML parser in the language of your choice. For
Python, that would be xml.sax or xml.parsers.expat.
More information about the General
mailing list