[H-GEN] Text Search and Replace @ shell prompt
Jason Parker-Burlingham
jasonp at uq.net.au
Mon Aug 26 14:13:46 EDT 2002
[ Humbug *General* list - semi-serious discussions about Humbug and ]
[ Unix-related topics. Posts from non-subscribed addresses will vanish. ]
Greg Black <gjb at gbch.net> writes:
> Jason Parker-Burlingham wrote:
> | (Basically I think one can't beat the "don't use regexes on HTML" drum
> | too much, especially since there are much better tools;
> I'm not really convinced by this. I think you can do more with
> RE's (especially where you can use them interactively to accept
> or reject the selections) than people often realise.
If you've got a human involved, or the HTML was generated from some
other source, then there's a good chance you can get a long way.
On the whole, though, when someone comes to me and asks for `a regular
expression to parse HTML', I wince, since such people rarely
understand regular expressions or context-sensitive languages. I
started a habit of not using REs for HTML work anymore, and this is
where it got me.
I find that people select REs as their hammer, and spend weeks
pounding away trying to make things work. First they find that
attribute values can be quoted with ' instead of "; then they find
that maybe they don't need to be quoted at all, then they find out
about anchors, case sensitivity, line-breaks. Gah!
> Of course, there is a whole class of stuff that you just cannot do
> with RE's, but I suspect that a general statement that they're not
> usable with HTML is overkill.
Okay, so that's a little hyperbole on my part, but there exist any
number of good tools that will parse valid (and even not-so-valid)
HTML perfectly well and allow the user to make all sorts of changes.
Some of them, like HTML::TreeBuilder, are even simple.
jason, who did, however, once write some code that used REs out the
wazoo to find proper names
--
||----|---|------------|--|-------|------|-----------|-#---|-|--|------||
| ``Ooooaah! |
| I'm getting so excited about cheese-making I can't stand it!'' |
||--|--------|--------------|----|-------------|------|---------|-----|-|
--
* This is list (humbug) general handled by majordomo at lists.humbug.org.au .
* Postings to this list are only accepted from subscribed addresses of
* lists 'general' or 'general-post'. See http://www.humbug.org.au/
More information about the General
mailing list