SQL Server XML shredding performance

nekroman

New Member
I am working with NOAA's current observation XML (example: Washington DC) and am shredding the files for the 4000+ stations into a SQL Server 2008 R2 table. After trying many different approaches, I have one that I am moving forward with. This question is about the performance between the different methods and most importantly why is it so drastic.First AttemptWorking in C# I parsed all files with Linq to XML and wrote the resulting records to the database with Linq to SQL. The code for this is predictable, so I won't bore you with it.Rewriting with linq to Entity Framework didn't help.This resulted in the application taking running for over an hour and having only processed 1600 or so files. The slowness is the result of both Linq to SQL and Linq to Entities executing an insert and select for each record.Second AttemptStill working in C# I attempted to speed it up by using the bulk insert methods available online (example: Speeding up inserts using Linq-to-SQL - Part 1).Still slow, although noticably faster than first attempt.At this point I moved to using a stored procedure to handle the XML shredding and insert with the C# code concatenating the files into one XML string and adding a wrapper tag.Third AttemptUsing SQL Server's XML Query similar to this (@xml is the xml file) [from memory]:\[code\]select credit = T.observation.value('credit[1]', 'varchar(256)') ,... -- the rest of the elements possible in the file.from @xml.nodes('wrapper') W(station) cross apply W.station.nodes('current_observation') T(observation)\[/code\]I let it run for 15 minutes and cancelled with 250 or so records processed.Fourth AttemptI changed the query to use OpenXML:\[code\]declare $idoc intexec sp_xml_preparedocument @idoc output, @xmlselect Credit ,... -- the rest of the elementsfrom openxml(@idoc, '/wrapper/current_observations', 2) with ( Credit varchar(256) 'credit' ,...) -- the rest of the elementsexec sp_xml_removedocument @idoc\[/code\]This processed all 4000+ records in 10 seconds! Quite acceptable.While I expected some differences between the methods, I didn't expect the difference to be so dramatic. So my question is simply, 'Why is there such a drastic difference in performance between the different methods?'I am quite happy to be shown that I was using the first 3 wrong.
 
Back
Top