[H-GEN] Edit massive XML files

Paul Gearon gearon at ieee.org
Mon Oct 3 11:40:24 EDT 2011


On Sun, Oct 2, 2011 at 7:38 PM, Michael Anthon <michael at anthon.net> wrote:
<snip/>
> I've tried writing XML parsers in various types of languages using various libraries before but have often run into difficulties with the size of files I needed to work with.  The main issue I ran into is that a lot of the libraries I was using attempted to build the whole DOM as an in memory object... which can be a bit of an problem at times :-)

Ugh. *Never* build a DOM. The only time I break this rule is in a web
browser, since the browser has already built it for me.

SAX is useful here, if you're prepared to process the data as it's
coming in. However, for really large datasets, that can be akin to
drinking from the fire hose. It depends on the application. In a lot
of cases, a more useful API is StAX, which lets you pull things off
the stream as you need them.

Either SAX or StAX is fine, but never DOM.  :-)

Regards,
Paul



More information about the General mailing list