[H-GEN] Converting csv file to tab delimited text file
Greg Black
gjb at gbch.net
Wed Jun 7 05:14:53 EDT 2006
On 2006-06-07, Kelvin Heng wrote:
> I am writing scripts using bash shell in cygwin. Right now, I have a set of
> raw data that is daily exported from a server into a csv format. My script
> can only process a Tab Delimited text file, so I have to open the file in
> excel and save it as Tab Delimited text file. Anyone has any alternative to
> automate this process? Please comments.
I know this question has been answered several times already,
but many of the answers were somewhat deficient and some issues
seem not to have been addressed at all.
This won't be exhaustive, either, but I hope it will be of some
help to those who have been thinking about this.
The first puzzle is why the bash script can only process a tab
delimited file. It is trivial in bash (and all other shells) to
use different field delimiters -- if this is news, read up on
the IFS (internal field separators) environment variable in the
manual for your shell. This is worth knowing about anyway, so
don't dismiss it out of hand.
Next, all trivial solutions using awk[1] and sed to split the
fields are intrinsically incorrect, although they will probably
work just fine for the simple sample data that was provided.
The danger is that an apparently working solution based on such
trivial data can easily lead to an expectation that the solution
will continue to work. But as soon as the CSV data has stuff in
it that you haven't allowed for, then trouble strikes.
I have written parsers to handle the CSV output of Microsoft
software and they generally require more than 100 lines of C to
get right and to handle the special cases correctly. This is
why the suggestions that involved using some existing library
with a scripting language like Python make a lot of sense, as
you can hope that the library's author has at least come up
against the perverse cases and handled them correctly -- and you
even get somebody to blame if they got it wrong.
Cheers, Greg
----------
[1] Non-trivial solutions to splitting up a CSV file in awk are,
of course, possible -- but they involve an awk script of
several lines, not a one-liner.
More information about the General
mailing list