Store and instantly access a Billion xml records (HOW TO)

admin

Administrator
Staff member
These are my preliminary results obtained storing 'many' XML files whose recordsize matches the drive cluster size.Format as big a hard drive as you need (or can) at 512 byte clusters withNTFS and turn the compression on. In the event your xml files exceed this,adjust the cluster accordingly, just be sure to sample your files to findthe best size needed to store the vast majority of the records in a singlecluster.Next, path encode your primary keys to distribute your files in a hierarchicaltree. In my case, I had two different kinds of records:F:\Mailboxes\ZIP\ZIP4\Street (notice that 'ZIP\ZIP4\Street' is a primarykey)andF:\Records\abc\def\ghi\jkl ('abc\def\ghi\jkl' is the primary key)Initial sampling results in 63MB I had 56MB on disk with 51,000 files. Inother words you should be able store as many records as there are clusterson the disk--that's right, billions with a 'B'.Creating record objects that persist thier data using MSXML2.DOMDocument,then spawn PublicNonCreatable classes that persist based on an element ofthe xml tree, for example:Friend Sub Constructor(hElement As MSXML2.IXMLDOMElement)Your business logic/interface then writes to that DLL and never (directly)touches XML. Best part, you can persist and instantly access a record amongbillions. Also, it's possible for an end user to search the text from thethe top '\\share\xml\' directory with explorer, and then be able to viewthe record in an understandble way. Binary files could accompany the xmldata file in the directory.The downside is that you have to forget about sequential access altogethersince the primary keys must be stored hierarchically.Instead, code new objects that keep lists of the things you need. If youstill have to go the RDBMS route to SQL a recordset, make sure the primarykeys can be encoded and decoded. If your primary keys are random long integers,format them at a fixed length (Format$(lnPrimaryKey&, "000000000000"), thensplit up into directories like\\share\xml\category\000\000\000\[email protected]
 
Back
Top