Trouble with Python inserting UTF characters from XML into MySQL

gradystewart35

New Member
So I'm looping through multiple large xml files and generating MySQL insert statements to add rental property listings to a database. Problem is, a number of elements contain special characters like ? or ? or even some dashes and bullets. I can get the elements just fine, and I can make a string to hold the insert statement, but as soon as I try to execute the statement I get dumped out to the next file. I've got the insert in its own try block, thinking that would just allow me to move on to the next listing rather than scrap the remainder of the xml document, but that's not happening.I've tried making sure the insert is utf-8 encoded, but it's not making a difference.Here is the gist of the code I've got:\[code\]try: print "About to read file: "+fullpath data = http://stackoverflow.com/questions/10939917/f.read() #read the file into a string print"Data read from file, now closing: "+fullpath f.close() #close the file, we don't need it any more dom = minidom.parseString(data) #parse the xml #get the first child node -- <property_data> property_data = http://stackoverflow.com/questions/10939917/dom.firstChild properties = property_data.getElementsByTagName('property') for property in properties: try: print "getting details" details = property.getElementsByTagName('property_details') for detail in details: print "attempting to get detail values" try: checkin = getElementValue('check_in', detail) name = stripCDATA(getElementValue('name', detail)) checkout = getElementValue('check_out', detail) ...etc, etc... print "building insert string" sql = u"""insert into PROPERTY(NAME, CHECKIN, CHECKOUT, etc...) values(%s,%s,%s,...)""".encode('utf-8') print "starting insert with query:" print sql % (name,checkin,checkout, etc...) try: #HERE IS WHERE THE PROBLEM HAPPENS cursor.execute(sql,(name, checkin, checkout, ...)) #display number of rows affected print "Number of rows inserted: %d" % cursor.rowcount conn.commit() except Exception as (errno, strerror): print "Problem inserting the property. Error({0}): {1}".format(errno, strerror) except Exception as (errno, strerror): print "Problem with reading/inserting details. Error({0}): {1}".format(errno, strerror) except Exception as (errno, strerror): print "The loop broke with the following error({0}): {1}".format(errno, strerror) errCount += 1 print "This has happened %d times" % (errCount)except: #HERE IS WHERE I GET DUMPED TO print "Something bad happened while reading and inserting"\[/code\]As you can see, I've got lines printing at various points so I can see when things blow up. I know it's parsing the file correctly, I know it's getting all my elements correctly, I know it's building the insert statement correctly, and as long as I hit a property with no special characters anywhere in any of the elements I grab, I know it's inserting into the database correctly. It's just as soon as it hits a special character that it breaks, and when it breaks it dumps me out 3 levels higher than it should. Yelling and pulling my hair out have been ineffective so far. Any ideas?As per the suggestion from @deadly I removed all the try...except blocks and got the following traceback:\[quote\] Traceback (most recent call last): File "dbinsert2.py", line 118, in cursor.execute(sql,([bunch of var names])) File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 159, in execute query = query % db.literal(args) File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 264, in literal return self.escape(o, self.encoders) File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 202, in unicode_literal return db.literal(u.encode(unicode_literal.charset)) UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in position 20: ordinal not in range(256)\[/quote\]
 
Back
Top