I started with, what I guess is, a straightforward approach: using XMLHttpRequest's responseXML object. Traversing the XML object to populate a Binding Params Array before executing its associated SQL statement to populate my SQLite database. And this works fine until you import a really big XML file, files around 100mb in size.
This consumed a large amount of memory (circa 2 gig of memory), so much so that a computer without enough ram, and cpu, just destroys the Firefox session - frozen browser, not responding, unresponsive script, unresponsive computer - the word 'lag' just doesn't do it justice.
I discovered two things. Importing an XML file of 100mb produces an XML object of around 6 times that size. Creating the Binding Params Array for that amount of data can quite happily consume over a gig of memory on top. To put this in context, I was testing using a file with over 2 million rows and iterating over a 100,000 parent elements.
My approach was to stop using responseXML and instead use responseText (with an overrideMimeType of text/plain). Then use .match(regex) to produce an array of the XML parent elements - this is another memory hit but nowhere near as bad as the responseXML object. Take the array and slice through it in smaller array chunks. Then, for the smaller array, join it back into a single string and then parseFromString to an XML object. Finally, traverse the (much smaller) XML object to populate a Binding Params Array as before. Rinse and repeat for each slice of the larger array until you're done.
Code: Select all
/* create an array of the XML parent elements from responseText */
var xmlArray = this.responseText.match(/<element([\s\S]*?)<\/element>/gm);
/* slice, join, and parse array chunk to an XML object */
var xmlObject = parser.parseFromString('<root>' + xmlArray.slice(start, end).join('') + '</root>', "application/xml");
So, as if this isn't already convoluted enough, back to the drawing board.
I discover that blobs don't get truncated! So, I'm now taking XMLHttpRequest's response (with responseType = "blob") and then using FileReader's readAsText to change the blob into a text string. The readAsText response then goes forward as before to be match(ed), slice(d) and join(ed) into the smaller XML objects.
In this way, I finally got an ancient Pentium 4, with 2 gig of memory, to import and process a 90mb xml file in about 5 minutes but I still think this is a seriously ugly solution. On the other hand, it may be that my solution is, in fact, a really cunning solution. In which case I hope this is of help to someone.
As I don't have any great expertise in either XMLHttpRequest or XML - I have more experience using 'proper' databases - I've pretty much had to make the whole import process up with the help of google and a lot of trial and error. So, I'd be really interested to know if there is a more fit-for-purpose solution to handle really large XML files using Javascript. Most of the XML examples I find don't use Javascript at all. Which might be a clue.
Ben.