How Do Convert a Pandas Dataframe to Xml

How do convert a pandas dataframe to XML?

You can create a function that creates the item node from a row in your DataFrame:

def func(row):
    xml = ['<item>']
    for field in row.index:
        xml.append('  <field name="{0}">{1}</field>'.format(field, row[field]))
    xml.append('</item>')
    return '\n'.join(xml)

And then apply the function along the axis=1.

>>> print '\n'.join(df.apply(func, axis=1))
<item>
  <field name="field_1">cat</field>
  <field name="field_2">15,263</field>
  <field name="field_3">2.52</field>
  <field name="field_4">00:03:00</field>
</item>
<item>
  <field name="field_1">dog</field>
  <field name="field_2">1,652</field>
  <field name="field_3">3.71</field>
  <field name="field_4">00:03:47</field>
</item>
...

Converting pandas dataframe to XML

The question you have linked to actually has a great answer to your question but I guess you’re having difficulty transposing your data into that solution so Ive done it below for you.

Ok your level of detail is a bit sketchy. If your specific situation differs slightly then you'll need to tweak my answer but heres something that works for me:

First off assuming you have a text file as follows :

0       Alahärmä  2014      0   0.1   0.2
1      Alajärvi  2014  10171  5102  5069
2      Alastaro  2014      0     0     0
3     Alavieska  2014   2687  1400  1287
4        Alavus  2014  12103  6102  6001
5  Anjalankoski  2014      0     0     0

Moving on to creating the python script, we first import that text file using the following line:

pop = pd.read_csv(r'directory\population.csv', delimiter=r"\s+", names=['cityname', 'year', 'total', 'male', 'females'])

This brings in the text file as a dataframe and gives the new dataframe the correct column headers.

Then taking the data from the question you linked to, we add the following to our python script:

def func(row):
    xml = ['<item>']
    for field in row.index:
        xml.append('  <field name="{0}">{1}</field>'.format(field, row[field]))
    xml.append('</item>')
    return '\n'.join(xml)

print('\n'.join(pop.apply(func, axis=1)))

Now we put it all together and we get the below:

import pandas as pd
pop = pd.read_csv(r'directory\population.csv', delimiter=r"\s+", names=['cityname', 'year', 'total', 'male', ‘females'])

def func(row):
    xml = ['<item>']
    for field in row.index:
        xml.append('  <field name="{0}">{1}</field>'.format(field, row[field]))
    xml.append('</item>')
    return '\n'.join(xml)

print('\n'.join(pop.apply(func, axis=1)))

When we run the above file we get the following output:

<item>
  <field name="cityname">Alahärmä</field>
  <field name="year">2014</field>
  <field name="total">0</field>
  <field name="male">0.1</field>
  <field name="females">0.2</field>
</item>
<item>
  <field name="cityname">Alajärvi</field>
  <field name="year">2014</field>
  <field name="total">10171</field>
  <field name="male">5102.0</field>
  <field name="females">5069.0</field>
</item>
<item>
  <field name="cityname">Alastaro</field>
  <field name="year">2014</field>
  <field name="total">0</field>
  <field name="male">0.0</field>
  <field name="females">0.0</field>
</item>
<item>
  <field name="cityname">Alavieska</field>
  <field name="year">2014</field>
  <field name="total">2687</field>
  <field name="male">1400.0</field>
  <field name="females">1287.0</field>
</item>
<item>
  <field name="cityname">Alavus</field>
  <field name="year">2014</field>
  <field name="total">12103</field>
  <field name="male">6102.0</field>
  <field name="females">6001.0</field>
</item>
<item>
  <field name="cityname">Anjalankoski</field>
  <field name="year">2014</field>
  <field name="total">0</field>
  <field name="male">0.0</field>
  <field name="females">0.0</field>
</item>

Pandas DataFrame to XML with added data

Single-liner func:

def func(df, width, height, depth):
    return '<annotations>\n'+f'<width>{width}</width>\n<height>{height}</height>\n<depth>{depth}</depth>\n'+df.apply(lambda row:f'<item>\n<xmin>{row.xmin}</xmin>\n<ymin>{row.ymin}</ymin>\n<xmax>{row.xmax}</xmax>\n<ymax>{row.ymax}</ymax>\n</item>\n',axis=1).str.cat()+'\n</annotations>'

Concatenating strings with + and using a map-reduce approach to the dataframe with apply and cat. Apply will build each dataframe row and transform it to a string equivalent to the <item> tag, and str.cat() will concatenate each line (also renamed the input parameter row to df)

Python Pandas Dataframe to XML

First of all get rid of the Series tags in your to_xml method:

def to_xml(df, filename=None, mode='w'):
    def row_to_xml(row):
        date = row.TIMESTAMP.split()[0]
        time = row.TIMESTAMP.split()[1]
        value = row.A
        xml = '<event date="{0}" time="{1}" value="{2}"></event>'.format(date, time, value)
        return xml
    res = ' '.join(df.apply(row_to_xml, axis=1))

    if filename is None:
        return res
    with open(filename, mode) as f:
        f.write(res)

Then you can create your XML tree like this:

*from xml.dom.minidom import parseString 

doc = Document()
base = doc.createElement('Timeseries')
doc.appendChild(base)
series = parseString('<Series>' + to_xml(df) + '</Series>').childNodes[0]
base.appendChild(series)

base.appendChild(series)

header = doc.createElement('Header')
series.appendChild(header)

type = doc.createElement('type')
type_content = doc.createTextNode('instantaneous')
type.appendChild(type_content)
header.appendChild(type)

timeStepElem = doc.createElement('timeStep')
timeStepElem.setAttribute ('unit','minute')
timeStepElem.setAttribute ('multiplier','5')
header.appendChild(timeStepElem)

startDateElem = doc.createElement('startDate')
startDateElem.setAttribute ('time','13:30:00')
startDateElem.setAttribute ('date','2015-06-24')
header.appendChild(startDateElem)
print(doc.toprettyxml())*

Output:

<?xml version="1.0" ?>
<Timeseries>
        <Series>
                <event date="2015/03/09" time="15:40" value="52.2885"/>

                <event date="2015/03/09" time="15:50" value="52.3277"/>

                <event date="2015/03/09" time="16:00" value="52.5045"/>

                <event date="2015/03/09" time="16:10" value="52.5702"/>

                <event date="2015/03/09" time="16:20" value="52.5608"/>
                <Header>
                        <type>instantaneous</type>
                        <timeStep multiplier="5" unit="minute"/>
                        <startDate date="2015-06-24" time="13:30:00"/>
                </Header>
        </Series>
</Timeseries>

Creating an XML File from a dataframe using Python and LXML

One solution is to iterate over the dataframe, populating each XML element individually:

import pandas as pd
from lxml import etree as et

df = pd.DataFrame({'id_profile': [439, 444654, 56454, 12222], 'ServiceDate':
['2017-12-05', '2017-01-25', '2017-12-05', '2017-01-25'],
'PrimaryServiceCategory': [25, 25, 33, 25]})

root = et.Element('root')

for row in df.iterrows():
    report = et.SubElement(root, 'ClientReport')
    idnum = et.SubElement(report, 'ID')
    prime_serv = et.SubElement(report, 'ServiceCategory')
    serv_date = et.SubElement(report, 'ServiceDate')
    idnum.text = str(row[1]['id_profile'])
    prime_serv.text = str(row[1]['PrimaryServiceCategory'])
    serv_date.text = str(row[1]['ServiceDate'])

print(et.tostring(root, pretty_print=True).decode('utf-8'))

Converting a XML-file with data saved as vectors to a pandas dataframe

The key is to split the text -> this converts it into a list, that Pandas can internally convert to a Series/Column.

data = """<Hospital>
     ...:     <HospitalClass name = "St. Mungo's Hospital for Magical Maladies">
     ...:         <dataStorage id ="3" class="UnitVector">
     ...:             <UnitVector name="numHospitalized">
     ...:                 <Data> 3; 5; 1; 2; 6; 9; 8  </Data>
     ...:             </UnitVector>
     ...:         </dataStorage>
     ...:         <dataStorage id ="1" class="UnitVector">
     ...:             <UnitVector name="numOperated">
     ...:                 <Data> 5; 0; 12; 8; 4; 5; 7</Data>
     ...:             </UnitVector>
     ...:         </dataStorage>
     ...:         <dataStorage id = "2" class ="UnitVector">
     ...:             <UnitVector name="antibioticsUsed">
     ...:                 <Data> 4.54; 5.71; nan; 7.12; 8.75; 2.99; 4.94</Data>
     ...:             </UnitVector>
     ...:         </dataStorage>
     ...:     </HospitalClass>
     ...: </Hospital>"""

import xml.etree.ElementTree as ET
root = ET.fromstring(data)

Collect data into a dictionary:

collection = {}
for entry in root.findall(".//UnitVector"):
    key = entry.attrib['name']
    values = entry.find(".Data").text.split(";")
    collection[key] = values

Create the dataframe:

pd.DataFrame(collection)
 
  numHospitalized numOperated antibioticsUsed
0               3           5            4.54
1               5           0            5.71
2               1          12             nan
3               2           8            7.12
4               6           4            8.75
5               9           5            2.99
6               8           7            4.94

Your data is going to be strings -> you can convert to int/float with astype

Python convert CSV to XML with pandas groupby

ElementTree can do it without a trouble. See below

from collections import defaultdict
from xml.etree.ElementTree import Element, SubElement,ElementTree

data = defaultdict(list)

with open('in.txt') as f:
  next(f)
  for line in f:
    parts = line.split(',')
    data[parts[0]].append(parts[1].strip())

root = Element('root')
for k,v in data.items():
  sub = SubElement(root,'coupon-codes',attrib={'coupon-id':k})
  for vv in v:
    sub_v = SubElement(sub,'code')
    sub_v.text = vv

tree = ElementTree(root)
with open('out.xml', 'w') as f:
  tree.write(f, encoding='unicode')

Creating XML values from values in Pandas Dataframe

You had the right idea with ElementTree

from io import StringIO
import pandas as pd
from xml.etree import ElementTree


# in-place prettyprint formatter
# https://web.archive.org/web/20200703234431/http://effbot.org/zone/element-lib.htm
def indent(elem, level=0):
    i = "\n" + level * "  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for elem in elem:
            indent(elem, level + 1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i


data = """\
Id,Email,State,Country,LastName
Mjkx,sealover71@yahoo.com,CA,United States,Withers
Asdf,foo@yahoo.com,IL,United States,Capone
Qwer,bar@yahoo.com,NV,United States,Sinatra
"""

f = StringIO(data)
df = pd.read_csv(f)

#print(df)

root = ElementTree.Element('crc')

for _, r in df.iterrows():
    lead = ElementTree.SubElement(root, 'lead')

    for c in df.columns:
        e = ElementTree.SubElement(lead, c)
        e.text = str(r[c])

indent(root)
x = ElementTree.tostring(root)

print(x.decode('UTF-8'))

How do I convert an XML file to a pandas dataframe?

I would suggest that you pull all the data into dictionaries, and do the final work in the dataframe. More efficient, than individually creating a series and appending.

The solution I propose below gets the id and name separately into a dictionary(defaultdict), while pulling the plot summary into a different dictionary(mapping).

After that, you can convert to pandas data structures and merge.

from collections import defaultdict
data = defaultdict(list)
mapping = {}

In [142]: for entry in root:
     ...:     data['id'].append(entry.attrib['id'])
     ...:     data['name'].append(entry.attrib['name'])
     ...:     for ent in entry.findall("./info"):
     ...:         if ent.attrib['type'] == "Plot Summary":
     ...:             mapping[entry.attrib['id']] = ent.text


In [150]: pd.DataFrame(data).merge(pd.Series(mapping, name='plot_summary'), 
                                   left_on='id', 
                                   right_index=True, 
                                   how='left')
Out[150]: 
       id                                               name                                       plot_summary
0   24235                        Love After World Domination                                                NaN
1   24233                          Himitsu Kessha Yaruminati                                                NaN
2   24232                          Enman Kaiketsu! Enma-chan                                                NaN
3   24231                          Zenryoku Kaihi Flag-chan!                                                NaN
4   24230                               Konketsu no Karekore                                                NaN
5   24229                                      Teikō Penguin                                                NaN
6   24227                                      Black Channel                                                NaN
7   24225                                    She's My Knight  Haruma Ichinose, 17, has been popular since he...
8   24224                                Watanuki-san Chi to                                                NaN
9   24223                                Watanuki-san Chi no                                                NaN
10  24222                                    Tiger & Bunny 2                                                NaN
11  24220                                          Super Cub                                                NaN
12  24218                                           FUUTO PI                                                NaN
13  24217                                        Fūto Tantei                                                NaN
14  24216                                       Inō no Aicis                                                NaN
15  24215                     Gyakuten Sekai no Denchi Shōjo                                                NaN
16  24214                                     Eiga Yurukyan△                                                NaN
17  24213                            Re:cycle of Penguindrum                                                NaN
18  24212            That Time I Got Reincarnated as a Slime                                                NaN
19  24211                                Wonder Egg Priority                                                NaN
20  24210                                 Dosukoi Sushi-Zumō                                                NaN
21  24209          Motto! Majime ni Fumajime Kaiketsu Zorori                                                NaN
22  24208                                     Pui Pui Molcar                                                NaN
23  24207                              Case Study of Vanitas                                                NaN
24  24206                                              HOME!                                                NaN
25  24205                         Hachimitsu Suicide Machine                                                NaN
26  24204  Deliver Police: Nishitokyo-shi Deliver Keisats...                                                NaN
27  24203                               Ryūsatsu no Kyōkotsu                                                NaN
28  24202                          Muteking the Dancing Hero                                                NaN
29  24201                                      World Trigger                                                NaN
30  24200  Gekijō-ban Utano☆Princesama♪ Maji Love ST☆RISH...                                                NaN
31  24199  My Hero Academia THE MOVIE: World Heroes' Mission                                                NaN
32  24198                            Vampire Dies in No Time                                                NaN
33  24196                                      Visual Prison                                                NaN
34  24195                               IDOLiSH7 Third Beat!  Kujo starts carrying out his plans to defame G...
35  24194                                   Jujutsu Kaisen 0                                                NaN
36  24193                        Gekijō-ban Jujutsu Kaisen 0                                                NaN
37  24192                                           takt op.                                                NaN
38  24191        She Professed Herself Pupil of the Wise Man                                                NaN
39  24189                             Akebi's Sailor Uniform                                                NaN
40  24187                                     Love and Heart  Sure, university freshman Yagisawa has a lot o...
41  24186                                   Do It Yourself!!                                                NaN
42  24185                                   Ningen Kaishūsha                                                NaN
43  24183                           Kanashiki Debu Neko-chan                                                NaN
44  24182                    Ikinuke! Bakusō! Kusohamu-chan!                                                NaN
45  24180                                        Kaiju No. 8                                                NaN
46  24179                                       Phantom Seer                                                NaN
47  24178                      Magu-chan: God of Destruction  The God of Destruction Magu Menueku has been s...
48  24177                                           i tell c                                                NaN
49  24176                 High School Family: Kokosei Kazoku                                                NaN

How Do Convert a Pandas Dataframe to Xml