Perl Context (was: [H-GEN] Mergesort)

Jason Henry Parker jasonp at uq.net.au
Mon Jul 31 10:04:26 EDT 2000


[ Humbug *General* list - semi-serious discussions about Humbug and ]
[ Unix-related topics.  Please observe the list's charter.          ]

Byron Ellacott <bje at apnic.net> writes:

> #!/usr/bin/perl -w
> 
> $/ = undef;
> my $array = <STDIN>;
> print join("\n", sort {$a <=> $b}  split(/\s/, $array));

If you just want to sort a file and print it to stdout:

#!/bin/perl
print sort { $a <=> $b } <>

Basically you have the right idea by supplying an anonymous sub to
C<sort> (using the spaceship operator (`<=>') instead of the default
string comparison operator (`cmp')), but why cripple yourself by
slurping stdin into a scalar and then splitting it back into an array
on the next line?[1]  Without anything better to go on[2], I'm putting
this down to oversight, so I'll leap into a quick chat about context.

WARNING:  Those of you who don't like perl, don't want to understand
context, or are offended by a lack of whitespace should *STOP READING
NOW*.  I'm demonstrating a point about Perl, not writing good, fast,
nice, or even perfectly correct code, okay?  Treat the perl that
follows as working pseudocode.

Context is by far the oddest notion Perl has; I've struck[6] a large
number of people who simply do *not* understand it, some of them very
competent in many languages (even Perl!), but it runs like this:

In perl, many functions, operators and variables have different
meanings in different settings.  Scalar context and array context are
the most important.  For example, in scalar context, @a (the list (or
array) `a') evaluates to the number of items it holds, usually
referred to as $#a, which is a scalar, as opposed to merely behaving
like one for a single expression.

A good example of an operator returning different values in
scalar/array context is the <> operator[3]: in scalar context it will
return the next line from the filehandle; in array context it returns
the entire file, placing each line into its own position in the array.
Voila!

The other thing to remember about context is that as well as behaving
differently in contexts, is operators *take* a scalar or list
context[4].  For example, the C<print> operator forces list context.
Telling list context operators from scalar context ones is *easy*.
Look at your perldoc (`perldoc -f print', for example): if you see a
description like `print LIST', then that function takes a list
context.

Both these statements:

        print "Joe ", "F. ", "Bloggs","\n";
        print "Joe F. Bloggs\n";

print exactly the same thing.  In the first case, the argument to
C<print> is a list.  In the second case, it's still a list, but it
contains only one element.  If you're doing a lot of string
interpolation (ie, this sort of thing):

        print "$firstname $initial $surname\n";

this can get a little (I'm not sure how little) slow, since Perl takes
these scalar values and joins them together through string
concatenation (the dot operator, C<.>) before passing them to
C<print>.  However, if you're *smart*, you'll take advantage of
C<print>'s list context:

        print $firstname, " ", $initial, " ", $surname, "\n";

In this case, it lacks the snappiness of the string interpolation,
were it not for the final twist of context: functions and operators
that take list context can use the result of other functions and
operators!  This is an incredibly useful syntactic and semantic
device, which feels a lot like a unix shell pipeline.  To wit:

        print join " ", $firstname, $initial, $surname, "\n";

The Perlish among you will spot this does not print the same string as
before.  More skilled Perl Geeks than I will know immediately how to
fix it.  It took me a few minutes to work out how to fix it (usually
it wouldn't matter).  

Wow!  What an excellent opportunity to advertise the perlgeeks mailing
list!  Send "subscribe perlgeeks" in the body of a message to
majordomo at pisoftware.com *now*!  No^H^HLow traffic!  Perl Golf!
Locals you can laugh at^W^Wrelate to!

My favourite example of list vs. scalar context is C<localtime>
without an argument (which returns the current local time).
If you say

        print localtime,"\n";

the call is in *list* context---C<localtime> returns a *list* which is
basically a struct tm.  But if you say

        print localtime."\n"

you'll see the local time, rendered in human-readable form.  This is a
nicety provided by the function (you'll also get a nice warning you
should pay attention to, but, as I said, I'm making a point, not good
code).

Just so you don't think context is completely useless, grok the
fullness of this sort, which reimplements the basic feature of 
uniq -c | sort -rn:

while(<>) {
        $h{$_}++; # %h is a hash containing a count for each line,
                  # $_ is the default argument, in this case the line
                  # just read from the input stream.
}

print map { $h{$_}, "\t", $_, "\n" } sort { $h{$b} <=> $h{$a} } keys %h;

The statement outside the loop works by having C<print> take a list
context which is the list returned from the result of C<map>, which
works on the list returned (yes, C<map> uses a list context too) by
C<sort>, which in turn uses the list returned by C<keys>, which,
unfortunately for this example, takes a hash as its argument[5].

I write this sort of thing on the shell command line when I can't be
bothered finding the appropriate arguments to cut, paste, reverse,
uniq, sort, awk, sed, grep, and occasionally vi.

jason

[1] : The only benefit is it will cope with lines like `39 42 56'
      whereas my first version will behave incorrectly.  A better
      version would be:

      perl -lane'push @A, @F}{map {print} sort { $a <=> $b } @A'

      This isn't short for shortness's sake---it just happens to *be*
      short because it can and because that's the way I would really
      write it if I wanted something quick to write I could remember
      for next time (in fact this *is* a variant on something I used
      last time, coupled with a recent winner in a round of Perl
      Golf).

      (Yes, those are unbalanced braces.  If you don't know why, you
      need to read perlrun(1) again.)

[2] : As in, say, bothering to *ask* first...

[3] : The observant will have noticed I don't use it with a
      filehandle, er, argument, since it's easier to type and in a
      while() loop (more context!) it is a Highly Magic operator that
      slurps STDIN or all of @ARGV (ie, the arguments to your
      program).

[4] : I'm quite sure array and list context are the same thing.
      Someone quietly bop me if this is not the case.

[5] : Sadly, there is no hash context.  An appropriately formatted
      list will replace a hash, however.

[6] : In that I've encountered them and their thinking.  I have wanted
      to hit some of them, though.
-- 
``I worry that you'll work in an office ... have children ... celebrate
wedding anniversaries ... the world of heterosexualism is a sick and
boring life!''  --  Aunt Ida

--
* This is list (humbug) general handled by majordomo at lists.humbug.org.au .
* Postings to this list are only accepted from subscribed addresses of
* lists 'general' or 'general-post'.



More information about the General mailing list