[H-GEN] one for an awk guru

The Fuzzy One s335810 at student.uq.edu.au
Tue Feb 3 20:45:26 EST 1998


I don't have access to the awk standard, so some of what I say here is
educated guess and heresay :)

On Wed, 4 Feb 1998, Michael Dooley wrote:

> I get:
> 47
> 21
> 60
> 46
> 51
> 9

	Awk arrays are stored as hash tables.  Many of you may not be
familiar with hash tables, so I'll explain the basic concept (with a few
simplifications :).

Imagine you want to quickly access a small block of data in a very large
index.  Say you have to find a student according to his or her student
number. You certainly don't want to have to start at the beginning and
search to the end, that's far too slow.  An array is the obvious solution,
becuase if you want, say, the fifth element you just go grab it.  If you
want the 2096th element, it's just a matter of going there.  The problem
is that if you did that, then most of the array would be empty.  There are
no students with #0, or #120, or #1123423  (I think :).

The most basic hash table is an array with a randomising function.  You
know you'll never have more than 100 000 students, so that's how big you
make the array, and you simply customise the randomising function to
always end up somewhere between 0 and 100 000.  That way you get access
almost as quick as an array, but without taking nearly as much space.


	Awk stores it's arrays in this way.  Notice that it's a
randomizing function.  Unless the program takes special care, all ordering
is (may be :) lost in this process.  As I undersand the awk standard,
there is no requirement for a standard awk implementation to enfoce an
ordering on it's hash table.

> If I use the alternative 'for' definition : for (m=1;m<=x;m++), I get
> the output I want.  However x will vary from file to file so I'd rather
> use the array specification of for.

	By using the for (m=1;m<=x;m++) loop, you are imposing an ordering
on the hash table by looking up the index on "1", "2", "3"...
consecutively.  This will produce the output you expect.  I suggest you
keep x as a count of how many elements are currently in the array at any
particular time.  That is, whenever you add an element to the array,
increment x.  Whenever you delete an element, decrement x, and whenever
you delete the whole array, set x=0.

        fuzzy BSc.

                                //////      //////    ///  ///
         ////////   // //    ////   ///  ////   ///  ///  ///
                            ///    ///  ///    ///  ///  ///
       // // //  ////////  ///    ///  ///    ///  ///  ///
                          ///    ///  ///    ///  ///  ///

            <A href="http://student.uq.edu.au/~s335810">
            <A href="mailto:s335810 at student.uq.edu.au">



----------------------- HUMBUG General List --------------------------------
echo "unsubscribe general" | mail majordomo at humbug.org.au # To Unsubscribe



More information about the General mailing list