Most Efficient Way of Combining or Joining Objects Read From Separate XML Files

rinvend

New Member
I have a large amount of data which is received in separated XML files each morning. I need to combine the objects within the XML and generate a report from them. I am looking to use an optimal solution for this problem.To demonstrate I have fabricated the following example:There are 2 XML files:The first is a list of languages and the countries they are spoken in. The second is a list of products and the countries they are sold in. The report I generate is the product name followed by the languages the packaging has to be in. XML1:\[code\]<?xml version="1.0" encoding="utf-8"?><languages> <language> <name>English</name> <country>8</country> <country>9</country> <country>3</country> <country>11</country> <country>12</country> </language> <language> <name>French</name> <country>3</country> <country>6</country> <country>7</country> <country>13</country> </language> <language> <name>Spanish</name> <country>1</country> <country>2</country> <country>3</country> </language></languages>\[/code\]XML2:\[code\]<?xml version="1.0" encoding="utf-8"?><products> <product> <name>Screws</name> <country>3</country> <country>12</country> <country>29</country> </product> <product> <name>Hammers</name> <country>1</country> <country>13</country> </product> <product> <name>Ladders</name> <country>12</country> <country>39</country> <country>56</country> </product> <product> <name>Wrenches</name> <country>8</country> <country>13</country> <country>456</country> </product> <product> <name>Levels</name> <country>19</country> <country>18</country> <country>17</country> </product></products>\[/code\]Sample Program Output:\[code\] Screws -> English, French, Spanish Wrenches -> English, French Hammer - > French, Spanish Ladders-> English\[/code\]Currently I deserialise into a DataSet and then use linq to join across the datasets to generate the required report strings. (Shown Below - Passing the names of the files in as command line arguments).\[code\]public static List<String> XMLCombine(String[] args){ var output = new List<String>(); var dataSets = new List<DataSet>(); //Load each of the Documents specified in the args foreach (var s in args) { var path = Environment.CurrentDirectory + "\\" + s; var tempDS = new DataSet(); try { tempDS.ReadXml(path); } catch (Exception ex) { //Custom Logging + Error Reporting return null; } dataSets.Add(tempDS); } //determine order of files submitted var productIndex = dataSets[0].DataSetName == "products" ? 0:1; var languageIndex = dataSets[0].DataSetName == "products" ? 1:0; var joined = from tProducts in dataSets[productIndex].Tables["product"].AsEnumerable() join tProductCountries in dataSets[productIndex].Tables["country"].AsEnumerable() on (int)tProducts["product_id"] equals (int)tProductCountries["product_id"] join tLanguageCountries in dataSets[languageIndex].Tables["country"].AsEnumerable() on (String)tProductCountries["country_text"] equals (String)tLanguageCountries["country_text"] join tLanguages in dataSets[languageIndex].Tables["language"].AsEnumerable() on (int)tLanguageCountries["language_Id"] equals (int)tLanguages["language_Id"] select new { Language = tLanguages["name"].ToString(), Product = tProducts["name"].ToString() }; var listOfProducts = joined.OrderByDescending(_ => _.Product).Select(_ => _.Product).Distinct().ToList(); foreach (var e in listOfProducts) { var e1 = e; var languages = joined.Where(_ => _.Product == e1).Select(_ => _.Language).Distinct().ToList(); languages.Sort(); //Custom simple Array to text method output.Add(String.Format("{0} {1}", e, ArrayToText(languages))); } return output;}\[/code\]This works fine but I know there must be more optimal solutions to this problem (particularly when the XML files are huge in real life). Does anyone have experience in alternate approaches (other than linq) or advice on optimising the current approach which would bring me closer to the best solution?Many thanks in advance.SolutionImplementation of suggested solutions:Casperah's approach using Dictionaries processed data set in 312ms.yamen's approach using Linq Lookup processed data set in 452ms.
 
Back
Top