[H-GEN] Extracting info from XML open street map file
mick
bareman at tpg.com.au
Tue Jan 26 21:27:19 EST 2016
On Tue, 26 Jan 2016 18:49:49 +1100
Edwin Groothuis <edwin at mavetju.org> wrote:
> [ Humbug *General* list - semi-serious discussions about Humbug and ]
> [ Unix-related topics. Posts from non-subscribed addresses will vanish. ]
>
> On 26/01/2016 12:10 pm, mick wrote:
> > [ Humbug *General* list - semi-serious discussions about Humbug and ]
> > [ Unix-related topics. Posts from non-subscribed addresses will vanish. ]
> >
> > I've been struggling for a few years, on and off to extract useable subsets from open streetmap files with very limited success. osm2pgsql produces the best results but depends on knowing all the keys in the input file.
> >
> > I have grep'd and sort -u the input to produce a list of unique key/value tags (2.1 million of them). The next step is to process that list into an osm2psql style file but I can't think how to automate that process.
> >
> > Generic key/value lines:
> > ========================
> > <tag k="107" v="96"/>
> > <tag k="1744_field_ref" v="143"/>
> > <tag k="1744_field_ref" v="94"/>
> > <tag k="1860name" v="Aberargie Mill (Corn & Flour)"/>
> >
> > osm2pgsql style:
> > ================
> > node,way building text polygon
> > node capital text delete #linear
> > node,way construction text delete #linear
> >
> > I've spent the last 3 days doing it manually but only got about 10% into it and am at the stage where I keep making mistakes.
> >
> > Can sks please give me a few pointers on the nifty *nix utilities that can work some magic on this process.
>
> Write a Perl/Python/PHP/whatever script which reads the XML file and
> then deals with the data. Don't go the way you are currently going.
>
> Edwin
I had come to that conclusion myself, it just doesn't work with the amount of data involved. I am seeking guidance on how to do it. I was thinking along the lines of simple unix utilities like grep, awk. etc. but I haven't used them in so long I couldn't remember what to use.
More information about the General
mailing list