Rails Parsing a large XML with Nokogiri::XML::Reader => Model.create

TodyGoaxogy

New Member
I have plenty large (32 Mb) XML-files with product information from different stores. I am using Rails which is hosted on Heroku.I want to parse these XML-feeds and write these products into my database. I have a semi-working solution but it is very slow and too memory intensive. I have up until now been using more or less this:\[code\]open_uri_fetched = open(xml_from_url)xml_list = Nokogiri::HTML(open_uri_fetched)xml_list.xpath("//product").each do |product|// parsed nodes// Model.create()end\[/code\]This has been working to some extent. However, this has caused memory problems on Heroku which crashes the script. It is also VERY slow (I do this for 200+ feeds).Heroku told me to fix the problem by using Nokogiri::XML::Reader which is what I am trying to do now.I have also looked into using:\[code\]ActiveRecord::Base.transaction doModel.create()end\[/code\]to speed up the Model.create()-process. 1. My first question: Is this the right way (or at least a decent way) to go for my problem?NOW, this is what I try to do:\[code\] reader = Nokogiri::XML::Reader(File.open('this_feed.xml')) reader.each do |node| if node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT if node.name.downcase == xname puts "Name: " + node.inner_xml use_name = node.inner_xml end end end\[/code\]Question 2: but where do I put the Model create-code?\[code\]ActiveRecord::Base.transaction do Model.create(:name => use_name)end\[/code\]If I put it in the loop, it will try to create for each node, which is wrong. I want it to be called after each product in the xml-list, right?If I create a Hash that is being built up during the reading of the XML (and then used to create the Model-creates), will that not be just as Memory intensive as opening the XML-file in the first place?The XML-file looks, in short, like this:\[code\]<?xml version="1.0" encoding="UTF-8" ?><products> <product> <name>This cool product</name> <categories> <category>Food</category> <category>Drinks</category> </categories> <SKU /> <EAN /> <description>A long description...</description> <model /> <brand /> <gender /> <price>126.00</price> <regularPrice>126.00</regularPrice> <shippingPrice /> <currency>SEK</currency> <productUrl>http://www.domain.com/1.html</productUrl> <graphicUrl>http://www.domain.com/1.jpg</graphicUrl> <inStock /> <inStockQty /> <deliveryTime /> </product></products>\[/code\]
 
Back
Top