[H-GEN] regex brian fade

Peter Hall hall.peter.john at gmail.com
Fri Jun 5 06:19:37 EDT 2015


Refer to the classic Stack Overflow post about parsing HTML with regular
expressions for details about why this shouldn't be done in production:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

As for solving the problem at hand you could look into xmlstarlet as being
easier to use than XSLT: http://xmlstar.sourceforge.net/doc/UG/ch04.html

For the entire tag:
> xmlstarlet sel -t -m '//tag[@k]' -c . -n
~/Documents/gis/OSM-AU/new_south_wales.osm
<tag k="highway" v="traffic_signals"/>
<tag k="source" v="yahoo"/>
<tag k="highway" v="traffic_signals"/>
<tag k="highway" v="traffic_signals"/>
<tag k="highway" v="traffic_signals"/>
<tag k="crossing" v="traffic_signals"/>
<tag k="highway" v="traffic_signals"/>


Just the k attribute:
> xmlstarlet sel -t -m '//tag[@k]' -v '@k' -n
~/Documents/gis/OSM-AU/new_south_wales.osm
highway
source
highway
highway
highway
crossing
highway

Print out k=v:
> xmlstarlet sel -t -m '//tag[@k]' -v '@k' -o '=' -v '@v' -n
~/Documents/gis/OSM-AU/new_south_wales.osm
highway=traffic_signals
source=yahoo
highway=traffic_signals
highway=traffic_signals
highway=traffic_signals
crossing=traffic_signals
highway=traffic_signals

Cheers,
Peter Hall

On 5 June 2015 at 19:08, Benjamin Fowler <ben.fowler.bjf at gmail.com> wrote:

> [ Humbug *General* list - semi-serious discussions about Humbug and     ]
> [ Unix-related topics. Posts from non-subscribed addresses will vanish. ]
>
>
> Hold it right there!
>
> On 5 June 2015 at 05:59, mick <bareman at tpg.com.au> wrote:
>
>> On Fri, 05 Jun 2015 14:43:20 +1000
>> Paul Gear <humbug at libertysys.com.au> wrote:
>>
>> >  "[a-zA-Z0-9:]*"
>>
>> many thanks to Paul & Russell
>>
>> just what I needed
>>
>
>
> You're trying to parse XML with regexes. Not recommended, because you can
> have any combination of unmatchable whitespace in an XML document, and
> still have the 'same' XML document. If you're not careful, your regexes
> will break without warning.
>
> I would do this for a quick hack (say, to whip up some test data), but I
> sure wouldn't put this into production.
>
> Depending on what you're doing, I'd look at something like XQuery, an XSLT
> stylesheet (selecting all the interesting nodes, and then outputting them
> as text) (use xsltproc(1)), or a one-shot Python script.
>
> Cheers, Ben.
>
>
> _______________________________________________
> General mailing list
> General at lists.humbug.org.au
> http://lists.humbug.org.au/mailman/listinfo/general
>
>


-- 
Trapped in signature factory please send help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.humbug.org.au/pipermail/general/attachments/20150605/335f0a91/attachment.html>


More information about the General mailing list