[H-GEN] regex brian fade
mick
bareman at tpg.com.au
Fri Jun 5 21:04:58 EDT 2015
On Fri, 5 Jun 2015 10:08:59 +0100
Benjamin Fowler <ben.fowler.bjf at gmail.com> wrote:
> Hold it right there!
>
> On 5 June 2015 at 05:59, mick <bareman at tpg.com.au> wrote:
>
> > On Fri, 05 Jun 2015 14:43:20 +1000
> > Paul Gear <humbug at libertysys.com.au> wrote:
> >
> > > "[a-zA-Z0-9:]*"
> >
> > many thanks to Paul & Russell
> >
> > just what I needed
> >
>
>
> You're trying to parse XML with regexes. Not recommended, because you can
> have any combination of unmatchable whitespace in an XML document, and
> still have the 'same' XML document. If you're not careful, your regexes
> will break without warning.
>
> I would do this for a quick hack (say, to whip up some test data), but I
> sure wouldn't put this into production.
>
> Depending on what you're doing, I'd look at something like XQuery, an XSLT
> stylesheet (selecting all the interesting nodes, and then outputting them
> as text) (use xsltproc(1)), or a one-shot Python script.
>
> Cheers, Ben.
What I needed was a "quick hack" to find a list of unique keys to set parameters for further processing (load into a postgis database).
I'm not making any changes to the input, just doing a bit of analysis of it with this:
grep -o -e "<tag k=\"[a-zA-Z0-9:-\_]*\"" ~/Documents/gis/OSM-AU/new_south_wales.osm |sort -u > ~/Documents/gis/OSM-AU/keys.txt
I still have to scan the output list to check the range of characters but thats todays task.
mick
More information about the General
mailing list