[H-GEN] Processing large text files in perl

Wed Mar 19 00:26:59 EST 2003

[ Humbug *General* list - semi-serious discussions about Humbug and     ]
[ Unix-related topics. Posts from non-subscribed addresses will vanish. ]

Michael Anthon <michael at anthon.net> writes:

> Now... the problem is that these files can be upwards of 500M and I
> need to be able to run aggregate queries while doing the conversion
> to look for error/inconsistencies or whatever.  The current system
> uses the Borland Database Engine (local files on the machine running
> the program).

I think you're on the right track to use a database, but if the data
can be encoded compactly, could you create a bunch of objects and
ensure consistency by cross-referencing between them, then calling a
"stringify" method to write them out to the other format?

This only works if the consistency checks and the output format are
pretty simple but it is a method I've used to great effect in the
past:  it's nice to be able to add methods to the objects to add extra
checks while leaving the input and output routines alone, once you
know what's going on.

One last suggestion:  you may want to mess about with C<unpack> a
little instead of using regular expressions if the input format is
particularly regular.  It's likely to be a lot faster than constantly
matching your input.
-- 
``I may have agreed to something involving a goat.''  -- CJ

--
* This is list (humbug) general handled by majordomo at lists.humbug.org.au .
* Postings to this list are only accepted from subscribed addresses of
* lists 'general' or 'general-post'.  See http://www.humbug.org.au/