[H-GEN] Text Search and Replace @ shell prompt

Jason Parker-Burlingham jasonp at uq.net.au
Mon Aug 26 14:13:46 EDT 2002


[ Humbug *General* list - semi-serious discussions about Humbug and     ]
[ Unix-related topics. Posts from non-subscribed addresses will vanish. ]

Greg Black <gjb at gbch.net> writes:

> Jason Parker-Burlingham wrote:
> | (Basically I think one can't beat the "don't use regexes on HTML" drum
> | too much, especially since there are much better tools;
> I'm not really convinced by this.  I think you can do more with
> RE's (especially where you can use them interactively to accept
> or reject the selections) than people often realise.

If you've got a human involved, or the HTML was generated from some
other source, then there's a good chance you can get a long way.

On the whole, though, when someone comes to me and asks for `a regular
expression to parse HTML', I wince, since such people rarely
understand regular expressions or context-sensitive languages.  I
started a habit of not using REs for HTML work anymore, and this is
where it got me.

I find that people select REs as their hammer, and spend weeks
pounding away trying to make things work.  First they find that
attribute values can be quoted with ' instead of "; then they find
that maybe they don't need to be quoted at all, then they find out
about anchors, case sensitivity, line-breaks.  Gah!

> Of course, there is a whole class of stuff that you just cannot do
> with RE's, but I suspect that a general statement that they're not
> usable with HTML is overkill.

Okay, so that's a little hyperbole on my part, but there exist any
number of good tools that will parse valid (and even not-so-valid)
HTML perfectly well and allow the user to make all sorts of changes.
Some of them, like HTML::TreeBuilder, are even simple.

jason, who did, however, once write some code that used REs out the
       wazoo to find proper names
-- 
||----|---|------------|--|-------|------|-----------|-#---|-|--|------||
| ``Ooooaah!                                                            |
|   I'm getting so excited about cheese-making I can't stand it!''      |
||--|--------|--------------|----|-------------|------|---------|-----|-|

--
* This is list (humbug) general handled by majordomo at lists.humbug.org.au .
* Postings to this list are only accepted from subscribed addresses of
* lists 'general' or 'general-post'.  See http://www.humbug.org.au/



More information about the General mailing list