[H-GEN] Extracting info from XML open street map file

mick bareman at tpg.com.au
Tue Jan 26 21:27:19 EST 2016


On Tue, 26 Jan 2016 18:49:49 +1100
Edwin Groothuis <edwin at mavetju.org> wrote:

> [ Humbug *General* list - semi-serious discussions about Humbug and     ]
> [ Unix-related topics. Posts from non-subscribed addresses will vanish. ]
> 
> On 26/01/2016 12:10 pm, mick wrote:
> > [ Humbug *General* list - semi-serious discussions about Humbug and     ]
> > [ Unix-related topics. Posts from non-subscribed addresses will vanish. ]
> > 
> > I've been struggling for a few years, on and off to extract useable subsets from open streetmap files with very limited success. osm2pgsql produces the best results but depends on knowing all the keys in the input file.
> > 
> > I have grep'd and sort -u the input to produce a list of unique key/value tags (2.1 million of them). The next step is to process that list into an osm2psql style file but I can't think how to automate that process.
> > 
> > Generic key/value lines:
> > ========================
> > 		<tag k="107" v="96"/>
> > 		<tag k="1744_field_ref" v="143"/>
> > 		<tag k="1744_field_ref" v="94"/>
> > 		<tag k="1860name" v="Aberargie Mill (Corn & Flour)"/>
> > 
> > osm2pgsql style:
> > ================
> > node,way   building     text         polygon
> > node       capital      text         delete	#linear
> > node,way   construction text         delete	#linear
> > 
> > I've spent the last 3 days doing it manually but only got about 10% into it and am at the stage where I keep making mistakes.
> > 
> > Can sks please give me a few pointers on the nifty *nix utilities that can work some magic on this process.  
> 
> Write a Perl/Python/PHP/whatever script which reads the XML file and
> then deals with the data. Don't go the way you are currently going.
> 
> Edwin

I had come to that conclusion myself, it just doesn't work with the amount of data involved. I am seeking guidance on how to do it. I was thinking along the lines of simple unix utilities like grep, awk. etc. but I haven't used them in so long I couldn't remember what to use.


More information about the General mailing list