[H-GEN] Edit massive XML files

Johannes Sprigode johannes at paradise.net.nz
Mon Sep 19 00:34:08 EDT 2011


Mick,

awk is your friend.

A very basic awk script below

---> cut below <---
#!/bin/sh

BEGIN {
	s1 = "<tag k=\"source\"";
	s2 = "<tag k=\"postal_code\"";
}

{
	if ((index($0, s1) != 0) || (index($0, s2) != 0));else print $0;
} 
---> cut above <---

call with awk -f mick.awk mick.xml

where mick.awk is above script and mick.xml is your input file.

Cheers

Johannes

On Mon, 19 Sep 2011 15:11:12 Mick wrote:
> [ Humbug *General* list - semi-serious discussions about Humbug and     ]
> [ Unix-related topics. Posts from non-subscribed addresses will vanish. ]
> 
> On Mon, 19 Sep 2011 12:51:39 +1000
> 
> Russell Stuart <russell-humbug at stuart.id.au> wrote:
> > On Mon, 2011-09-19 at 12:16 +1000, Mick wrote:
> > > I am sure a combination of grep & sed or similar could automate this
> > > but I can't get my head around the sed part.
> > 
> > Maybe it could, but without actual samples showing the data you are
> > trying to edit and what you are trying to accomplish it is hard to
> > say.
> 
> mea culpa
> 
> small snips
> ===========
> ...
>   <node id="290343" version="3" timestamp="2011-07-01T10:12:42Z"
> 	uid="61216" user="UniEagle" changeset="8597887" lat="51.2238425"
> 	lon="-2.5264778"/>
>   <node id="290344" version="7" timestamp="2011-06-01T11:50:39Z"
> 	uid="109205" user="Jack Stringer" changeset="8309592"
> 	lat="51.2237079" 	lon="-2.5236032">
>     <tag k="name" v="Oakhill"/>
>     <tag k="place" v="village"/>
>     <tag k="postal_code" v="BA3 5"/>
>     <tag k="source" v="npe"/>
>   </node>
>   <node id="290345" version="7" timestamp="2009-10-29T15:49:02Z"
> 	uid="90943" user="chris_debian" changeset="2982115"
> 	lat="51.2222188" lon="-2.5236527"/>
> ...
> 
> should become
> =============
> ...
>   <node id="290343" version="3" timestamp="2011-07-01T10:12:42Z"
> 	uid="61216" user="UniEagle" changeset="8597887" lat="51.2238425"
> 	lon="-2.5264778"/>
>   <node id="290344" version="7" timestamp="2011-06-01T11:50:39Z"
> 	uid="109205" user="Jack Stringer" changeset="8309592"
> 	lat="51.2237079" 	lon="-2.5236032">
>     <tag k="name" v="Oakhill"/>
>     <tag k="place" v="village"/>
>   </node>
>   <node id="290345" version="7" timestamp="2009-10-29T15:49:02Z"
> 	uid="90943" user="chris_debian" changeset="2982115"
> 	lat="51.2222188" lon="-2.5236527"/>
> ...
> ==========
> 
> in this instance the tags:
>     <tag k="postal_code" v="BA3 5"/>
>     <tag k="source" v="npe"/>
> need to be deleted along with EVERY other instance whatever the v="
> value is
> 
> mick
> _______________________________________________
> General mailing list
> General at lists.humbug.org.au
> http://lists.humbug.org.au/mailman/listinfo/general



More information about the General mailing list