[H-GEN] Scripting guide
Greg Black
gjb at gbch.net
Thu May 25 19:42:51 EDT 2006
On 2006-05-25, Jason Parker-Burlingham wrote:
> On 5/25/06, Greg Black <gjb at gbch.net> wrote:
>>On 2006-05-25, Jason Parker-Burlingham wrote:
>>> What you want is:
>>> awk '/([0-9]+\.)+[0-9]+/ {print $8,$13}' \
>>> /var/log/apache/error.log.1
>>>
>>> I'm not sure if + is a GNU extension to awk, but [0-9][0-9]* will do in its
>>> place.
>>
>> I was going to leave this topic alone, but this historical
>> element got me in. I have nothing handy on my shelf prior to
>> 1984, but I can assure you that basic regular expressions have
>> used the '+' since at least that far back. I'm pretty sure it
>> dates from the early 1970's, certainly before there was such a
>> thing as GNU.
>
> The reason I mentioned it is that i I'm remembering correctly---and NetBSD
> grep bears me out---the plus sign is a part of the extended regular
> expression syntax.
No, the plus sign is part of basic regular expressions (modulo
some final remarks at the end of this message).
> My question was whether or not awk's extended regular
> expression support was a GNU extension.
Again, no; extended regex support has been in awk more or less
forever.
Here's some data from a good source (Steve Bourne in 1987):
He covers various operators, including the '*', '+' and '?'
(meaning "0 or more, 1 or more, 0 or 1) and then says:
The expressions described so far are available in all
the programs providing patterns, namely awk, ed, grep,
lex and sed. Further facilities are provided in awk,
lex and egrep [...]
And then he describes what we now call extended regular
expressions including '|' operator.
However, I note that the versions of ed and grep provided on my
current machines do not support the '+' operator. I don't have
a traditional version of awk available for testing.
I also note that the BSD manuals from 1994 claim (as part of 7th
Edition Unix) that basic regular expressions (which it calls
obsolete regular expressions) treat the '+' (inter alia) as an
ordinary character and provide no equivalent for its function.
This is in the re_format(7) page.
Current BSD manuals still state that there is no equivalent for
'|', but that '+' and '?' can be expressed with '{1,}' '{0,1}'
respectively.
So it seems that, in the real world, we not only have two
standard types of regular expressions, but we don't even have
agreement about what they consist of. My own memory was that
the '+' operator was always part of basic regular expressions,
hence my original interruption. But it seems that either Steve
Bourne and I are both wrong, or history has been re-written.
In general, I'd suggest that people try to use the most modern
or capable variants of their utilities in the expectation that
they will provide the better facilities.
I could do more digging, but I suspect that I've probably gone
well past the likely level of interest in this amongst the list
members, so I'll stop.
Cheers, Greg
More information about the General
mailing list