[H-GEN] Processing large text files in perl
Michael Anthon
michael at anthon.net
Thu Mar 13 22:39:45 EST 2003
[ Humbug *General* list - semi-serious discussions about Humbug and ]
[ Unix-related topics. Posts from non-subscribed addresses will vanish. ]
I am currently struggling here at work adding a new import filter to one of
my programs. Basically this program takes files from various places in
various (very bad) formats and converts them to a single format suitable for
use further down the track in pur system. The current converter is written
in Delphi. That in itself is not a problem but the type of stuff I need to
do with the data would be a LOT easier with regex in perl. As a rough guess
I reckon I could code a new import in perl in about 1/4 of the time it takes
using Delphi.
Now... the problem is that these files can be upwards of 500M and I need to
be able to run aggregate queries while doing the conversion to look for
error/inconsistencies or whatever. The current system uses the Borland
Database Engine (local files on the machine running the program).
My real question is how I should go about this if I were to rewrite it in
perl. My first thought was to have mysql/postgresql installed on the
machine that will be running the process (to avoid network traffic) and use
perl DBI but I don't know if that is the "best" way to do it. Is there some
other simple and fast DB system I could use instead?
I don't want this thread to generate into a language war please... I know
enough perl to do the job I need to do so that would probably be able to
write this reasonably efficiently. My main concern is how best to manage
the large amounts of data that I need to process.
Cheers
Michael
--
* This is list (humbug) general handled by majordomo at lists.humbug.org.au .
* Postings to this list are only accepted from subscribed addresses of
* lists 'general' or 'general-post'. See http://www.humbug.org.au/
More information about the General
mailing list