[H-GEN] Processing large text files in perl

Michael Anthon michael at anthon.net
Thu Mar 13 22:39:45 EST 2003


[ Humbug *General* list - semi-serious discussions about Humbug and     ]
[ Unix-related topics. Posts from non-subscribed addresses will vanish. ]

I am currently struggling here at work adding a new import filter to one of
my programs.  Basically this program takes files from various places in
various (very bad) formats and converts them to a single format suitable for
use further down the track in pur system.  The current converter is written
in Delphi.  That in itself is not a problem but the type of stuff I need to
do with the data would be a LOT easier with regex in perl.  As a rough guess
I reckon I could code a new import in perl in about 1/4 of the time it takes
using Delphi.

Now... the problem is that these files can be upwards of 500M and I need to
be able to run aggregate queries while doing the conversion to look for
error/inconsistencies or whatever.  The current system uses the Borland
Database Engine (local files on the machine running the program).  

My real question is how I should go about this if I were to rewrite it in
perl.  My first thought was to have mysql/postgresql installed on the
machine that will be running the process (to avoid network traffic) and use
perl DBI but I don't know if that is the "best" way to do it.  Is there some
other simple and fast DB system I could use instead?

I don't want this thread to generate into a language war please...  I know
enough perl to do the job I need to do so that would probably be able to
write this reasonably efficiently.  My main concern is how best to manage
the large amounts of data that I need to process.

Cheers
Michael

--
* This is list (humbug) general handled by majordomo at lists.humbug.org.au .
* Postings to this list are only accepted from subscribed addresses of
* lists 'general' or 'general-post'.  See http://www.humbug.org.au/



More information about the General mailing list