[H-GEN] regex brian fade

Fri Jun 5 21:04:58 EDT 2015

On Fri, 5 Jun 2015 10:08:59 +0100
Benjamin Fowler <ben.fowler.bjf at gmail.com> wrote:

> Hold it right there!
> 
> On 5 June 2015 at 05:59, mick <bareman at tpg.com.au> wrote:
> 
> > On Fri, 05 Jun 2015 14:43:20 +1000
> > Paul Gear <humbug at libertysys.com.au> wrote:
> >
> > >  "[a-zA-Z0-9:]*"
> >
> > many thanks to Paul & Russell
> >
> > just what I needed
> >
> 
> 
> You're trying to parse XML with regexes. Not recommended, because you can
> have any combination of unmatchable whitespace in an XML document, and
> still have the 'same' XML document. If you're not careful, your regexes
> will break without warning.
> 
> I would do this for a quick hack (say, to whip up some test data), but I
> sure wouldn't put this into production.
> 
> Depending on what you're doing, I'd look at something like XQuery, an XSLT
> stylesheet (selecting all the interesting nodes, and then outputting them
> as text) (use xsltproc(1)), or a one-shot Python script.
> 
> Cheers, Ben.

What I needed was a "quick hack" to find a list of unique keys to set parameters for further processing (load into a postgis database).

I'm not making any changes to the input, just doing a bit of analysis of it with this:

grep -o -e "<tag k=\"[a-zA-Z0-9:-\_]*\"" ~/Documents/gis/OSM-AU/new_south_wales.osm |sort -u > ~/Documents/gis/OSM-AU/keys.txt

I still have to scan the output list to check the range of characters but thats todays task.

mick