Python – Import XML into SQL database

pythonsqlxml

I'm working with a 20 gig XML file that I would like to import into a SQL database (preferably MySQL, since that is what I am familiar with). This seems like it would be a common task, but after Googling around a bit I haven't been able to figure out how to do it. What is the best way to do this?

I know this ability is built into MySQL 6.0, but that is not an option right now because it is an alpha development release.

Also, if I have to do any scripting I would prefer to use Python because that's what I am most familiar with.

Thanks.

Best Solution

You can use the getiterator() function to iterate over the XML file without parsing the whole thing at once. You can do this with ElementTree, which is included in the standard library, or with lxml.

for record in root.getiterator('record'):
    add_element_to_database(record) # Depends on your database interface.
                                    # I recommend SQLAlchemy.