[H-GEN] sed query

Greg Black gjb at gbch.net
Tue May 6 22:06:56 EDT 2003


On 2003-05-06, Scott Pullen wrote:

> I thought that I could run the file through sed, insert some newlines as
> appropriate and redirect the output into a new file and then be able to make
> sense of the contents.  When I use:
> 
> sed -e 's/SOMETEXT/\n SOMETEXT/g' > output.txt
> 
> the output file is empty.

I find it extremely difficult to believe that -- using the exact
command line shown -- the output file would be empty.  If there
are no instances of "SOMETEXT", the output should be the same as
the input.  If there are instances of "SOMETEXT" they should be
replaced by "n SOMETEXT" (read that carefully).

For sed (not counting weird variants that are not sed at all),
the expression "\n" in the replacement text does not have any
special meaning and will become "n".

To give a simple example, here's how to change every space
character in a file into a newline with standard sed:

    $ sed 's/ /
    > /g' < input_file

The "> " is $PS2 in my shell; the newline is a literal newline.
This will always work (except where bugs in sed or unparseable
input cause it to fail).

> Is there anyway that a file could be large enough
> to overflow sed and cause it to fail?

Yes, but the answer depends on the actual sed implementation and
the input data.  Bear in mind that sed reads a line at a time;
and may, depending on the commands it's given, have to read more
than one line into memory.  If the line really is too long for
the memory that sed chooses to make available, then it will
clearly fail.  But we have not seen the input data and we don't
know what sed implementation is being used, so it's hard to say
more than that.

If this was my problem, I'd use a real editor to look at the
file; or if the file was too big to make that wise, I'd use dd
to cut a chunk off one end of it and then look at it with a real
editor.  In this context a real editor is one that doesn't care
about line lengths and which can show unambiguously exactly what
characters are in the file.

For instance, it might be that the newlines are really in there,
but just got translated into carriage returns instead.  If that
was the problem, then you'd fix it with this command line:

    $ tr '\r' '\n' < input_file > output_file

As you can see, knowledge is power -- you need to know what
you're dealing with before you can be sure how to fix it.  If
you don't have a "real editor" (e.g., emacs), you can always use
od or one of its relatives to give you a dump of the actual
content for analysis.

Greg

-- 
Greg Black <gjb at gbch.net> <http://www.gbch.net/gjb.html>
GPG signed mail preferred; further information in headers.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 249 bytes
Desc: not available
URL: <http://lists.humbug.org.au/pipermail/general/attachments/20030507/461cc34d/attachment.sig>


More information about the General mailing list