How Do Convert a Pandas Dataframe to Xml

How do convert a pandas dataframe to XML?

You can create a function that creates the item node from a row in your DataFrame:

def func(row):
xml = ['<item>']
for field in row.index:
xml.append(' <field name="{0}">{1}</field>'.format(field, row[field]))
xml.append('</item>')
return '\n'.join(xml)

And then apply the function along the axis=1.

>>> print '\n'.join(df.apply(func, axis=1))
<item>
<field name="field_1">cat</field>
<field name="field_2">15,263</field>
<field name="field_3">2.52</field>
<field name="field_4">00:03:00</field>
</item>
<item>
<field name="field_1">dog</field>
<field name="field_2">1,652</field>
<field name="field_3">3.71</field>
<field name="field_4">00:03:47</field>
</item>
...

Converting pandas dataframe to XML

The question you have linked to actually has a great answer to your question but I guess you’re having difficulty transposing your data into that solution so Ive done it below for you.

Ok your level of detail is a bit sketchy. If your specific situation differs slightly then you'll need to tweak my answer but heres something that works for me:

First off assuming you have a text file as follows :

0       Alahärmä  2014      0   0.1   0.2
1 Alajärvi 2014 10171 5102 5069
2 Alastaro 2014 0 0 0
3 Alavieska 2014 2687 1400 1287
4 Alavus 2014 12103 6102 6001
5 Anjalankoski 2014 0 0 0

Moving on to creating the python script, we first import that text file using the following line:

pop = pd.read_csv(r'directory\population.csv', delimiter=r"\s+", names=['cityname', 'year', 'total', 'male', 'females'])

This brings in the text file as a dataframe and gives the new dataframe the correct column headers.

Then taking the data from the question you linked to, we add the following to our python script:

def func(row):
xml = ['<item>']
for field in row.index:
xml.append(' <field name="{0}">{1}</field>'.format(field, row[field]))
xml.append('</item>')
return '\n'.join(xml)

print('\n'.join(pop.apply(func, axis=1)))

Now we put it all together and we get the below:

import pandas as pd
pop = pd.read_csv(r'directory\population.csv', delimiter=r"\s+", names=['cityname', 'year', 'total', 'male', ‘females'])

def func(row):
xml = ['<item>']
for field in row.index:
xml.append(' <field name="{0}">{1}</field>'.format(field, row[field]))
xml.append('</item>')
return '\n'.join(xml)

print('\n'.join(pop.apply(func, axis=1)))

When we run the above file we get the following output:

<item>
<field name="cityname">Alahärmä</field>
<field name="year">2014</field>
<field name="total">0</field>
<field name="male">0.1</field>
<field name="females">0.2</field>
</item>
<item>
<field name="cityname">Alajärvi</field>
<field name="year">2014</field>
<field name="total">10171</field>
<field name="male">5102.0</field>
<field name="females">5069.0</field>
</item>
<item>
<field name="cityname">Alastaro</field>
<field name="year">2014</field>
<field name="total">0</field>
<field name="male">0.0</field>
<field name="females">0.0</field>
</item>
<item>
<field name="cityname">Alavieska</field>
<field name="year">2014</field>
<field name="total">2687</field>
<field name="male">1400.0</field>
<field name="females">1287.0</field>
</item>
<item>
<field name="cityname">Alavus</field>
<field name="year">2014</field>
<field name="total">12103</field>
<field name="male">6102.0</field>
<field name="females">6001.0</field>
</item>
<item>
<field name="cityname">Anjalankoski</field>
<field name="year">2014</field>
<field name="total">0</field>
<field name="male">0.0</field>
<field name="females">0.0</field>
</item>

Pandas DataFrame to XML with added data

Single-liner func:

def func(df, width, height, depth):
return '<annotations>\n'+f'<width>{width}</width>\n<height>{height}</height>\n<depth>{depth}</depth>\n'+df.apply(lambda row:f'<item>\n<xmin>{row.xmin}</xmin>\n<ymin>{row.ymin}</ymin>\n<xmax>{row.xmax}</xmax>\n<ymax>{row.ymax}</ymax>\n</item>\n',axis=1).str.cat()+'\n</annotations>'

Concatenating strings with + and using a map-reduce approach to the dataframe with apply and cat. Apply will build each dataframe row and transform it to a string equivalent to the <item> tag, and str.cat() will concatenate each line (also renamed the input parameter row to df)

Python Pandas Dataframe to XML

First of all get rid of the Series tags in your to_xml method:

def to_xml(df, filename=None, mode='w'):
def row_to_xml(row):
date = row.TIMESTAMP.split()[0]
time = row.TIMESTAMP.split()[1]
value = row.A
xml = '<event date="{0}" time="{1}" value="{2}"></event>'.format(date, time, value)
return xml
res = ' '.join(df.apply(row_to_xml, axis=1))

if filename is None:
return res
with open(filename, mode) as f:
f.write(res)

Then you can create your XML tree like this:

*from xml.dom.minidom import parseString 

doc = Document()
base = doc.createElement('Timeseries')
doc.appendChild(base)
series = parseString('<Series>' + to_xml(df) + '</Series>').childNodes[0]
base.appendChild(series)

base.appendChild(series)

header = doc.createElement('Header')
series.appendChild(header)

type = doc.createElement('type')
type_content = doc.createTextNode('instantaneous')
type.appendChild(type_content)
header.appendChild(type)

timeStepElem = doc.createElement('timeStep')
timeStepElem.setAttribute ('unit','minute')
timeStepElem.setAttribute ('multiplier','5')
header.appendChild(timeStepElem)

startDateElem = doc.createElement('startDate')
startDateElem.setAttribute ('time','13:30:00')
startDateElem.setAttribute ('date','2015-06-24')
header.appendChild(startDateElem)
print(doc.toprettyxml())*

Output:

<?xml version="1.0" ?>
<Timeseries>
<Series>
<event date="2015/03/09" time="15:40" value="52.2885"/>

<event date="2015/03/09" time="15:50" value="52.3277"/>

<event date="2015/03/09" time="16:00" value="52.5045"/>

<event date="2015/03/09" time="16:10" value="52.5702"/>

<event date="2015/03/09" time="16:20" value="52.5608"/>
<Header>
<type>instantaneous</type>
<timeStep multiplier="5" unit="minute"/>
<startDate date="2015-06-24" time="13:30:00"/>
</Header>
</Series>
</Timeseries>

Creating an XML File from a dataframe using Python and LXML

One solution is to iterate over the dataframe, populating each XML element individually:

import pandas as pd
from lxml import etree as et

df = pd.DataFrame({'id_profile': [439, 444654, 56454, 12222], 'ServiceDate':
['2017-12-05', '2017-01-25', '2017-12-05', '2017-01-25'],
'PrimaryServiceCategory': [25, 25, 33, 25]})

root = et.Element('root')

for row in df.iterrows():
report = et.SubElement(root, 'ClientReport')
idnum = et.SubElement(report, 'ID')
prime_serv = et.SubElement(report, 'ServiceCategory')
serv_date = et.SubElement(report, 'ServiceDate')
idnum.text = str(row[1]['id_profile'])
prime_serv.text = str(row[1]['PrimaryServiceCategory'])
serv_date.text = str(row[1]['ServiceDate'])

print(et.tostring(root, pretty_print=True).decode('utf-8'))

Converting a XML-file with data saved as vectors to a pandas dataframe

The key is to split the text -> this converts it into a list, that Pandas can internally convert to a Series/Column.

data = """<Hospital>
...: <HospitalClass name = "St. Mungo's Hospital for Magical Maladies">
...: <dataStorage id ="3" class="UnitVector">
...: <UnitVector name="numHospitalized">
...: <Data> 3; 5; 1; 2; 6; 9; 8 </Data>
...: </UnitVector>
...: </dataStorage>
...: <dataStorage id ="1" class="UnitVector">
...: <UnitVector name="numOperated">
...: <Data> 5; 0; 12; 8; 4; 5; 7</Data>
...: </UnitVector>
...: </dataStorage>
...: <dataStorage id = "2" class ="UnitVector">
...: <UnitVector name="antibioticsUsed">
...: <Data> 4.54; 5.71; nan; 7.12; 8.75; 2.99; 4.94</Data>
...: </UnitVector>
...: </dataStorage>
...: </HospitalClass>
...: </Hospital>"""

import xml.etree.ElementTree as ET
root = ET.fromstring(data)

Collect data into a dictionary:

collection = {}
for entry in root.findall(".//UnitVector"):
key = entry.attrib['name']
values = entry.find(".Data").text.split(";")
collection[key] = values

Create the dataframe:

pd.DataFrame(collection)

numHospitalized numOperated antibioticsUsed
0 3 5 4.54
1 5 0 5.71
2 1 12 nan
3 2 8 7.12
4 6 4 8.75
5 9 5 2.99
6 8 7 4.94

Your data is going to be strings -> you can convert to int/float with astype

Python convert CSV to XML with pandas groupby

ElementTree can do it without a trouble. See below

from collections import defaultdict
from xml.etree.ElementTree import Element, SubElement,ElementTree

data = defaultdict(list)

with open('in.txt') as f:
next(f)
for line in f:
parts = line.split(',')
data[parts[0]].append(parts[1].strip())

root = Element('root')
for k,v in data.items():
sub = SubElement(root,'coupon-codes',attrib={'coupon-id':k})
for vv in v:
sub_v = SubElement(sub,'code')
sub_v.text = vv

tree = ElementTree(root)
with open('out.xml', 'w') as f:
tree.write(f, encoding='unicode')

Creating XML values from values in Pandas Dataframe

You had the right idea with ElementTree

from io import StringIO
import pandas as pd
from xml.etree import ElementTree


# in-place prettyprint formatter
# https://web.archive.org/web/20200703234431/http://effbot.org/zone/element-lib.htm
def indent(elem, level=0):
i = "\n" + level * " "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
if not elem.tail or not elem.tail.strip():
elem.tail = i
for elem in elem:
indent(elem, level + 1)
if not elem.tail or not elem.tail.strip():
elem.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i


data = """\
Id,Email,State,Country,LastName
Mjkx,sealover71@yahoo.com,CA,United States,Withers
Asdf,foo@yahoo.com,IL,United States,Capone
Qwer,bar@yahoo.com,NV,United States,Sinatra
"""

f = StringIO(data)
df = pd.read_csv(f)

#print(df)

root = ElementTree.Element('crc')

for _, r in df.iterrows():
lead = ElementTree.SubElement(root, 'lead')

for c in df.columns:
e = ElementTree.SubElement(lead, c)
e.text = str(r[c])

indent(root)
x = ElementTree.tostring(root)

print(x.decode('UTF-8'))

How do I convert an XML file to a pandas dataframe?

I would suggest that you pull all the data into dictionaries, and do the final work in the dataframe. More efficient, than individually creating a series and appending.

The solution I propose below gets the id and name separately into a dictionary(defaultdict), while pulling the plot summary into a different dictionary(mapping).

After that, you can convert to pandas data structures and merge.

from collections import defaultdict
data = defaultdict(list)
mapping = {}

In [142]: for entry in root:
...: data['id'].append(entry.attrib['id'])
...: data['name'].append(entry.attrib['name'])
...: for ent in entry.findall("./info"):
...: if ent.attrib['type'] == "Plot Summary":
...: mapping[entry.attrib['id']] = ent.text


In [150]: pd.DataFrame(data).merge(pd.Series(mapping, name='plot_summary'),
left_on='id',
right_index=True,
how='left')
Out[150]:
id name plot_summary
0 24235 Love After World Domination NaN
1 24233 Himitsu Kessha Yaruminati NaN
2 24232 Enman Kaiketsu! Enma-chan NaN
3 24231 Zenryoku Kaihi Flag-chan! NaN
4 24230 Konketsu no Karekore NaN
5 24229 Teikō Penguin NaN
6 24227 Black Channel NaN
7 24225 She's My Knight Haruma Ichinose, 17, has been popular since he...
8 24224 Watanuki-san Chi to NaN
9 24223 Watanuki-san Chi no NaN
10 24222 Tiger & Bunny 2 NaN
11 24220 Super Cub NaN
12 24218 FUUTO PI NaN
13 24217 Fūto Tantei NaN
14 24216 Inō no Aicis NaN
15 24215 Gyakuten Sekai no Denchi Shōjo NaN
16 24214 Eiga Yurukyan△ NaN
17 24213 Re:cycle of Penguindrum NaN
18 24212 That Time I Got Reincarnated as a Slime NaN
19 24211 Wonder Egg Priority NaN
20 24210 Dosukoi Sushi-Zumō NaN
21 24209 Motto! Majime ni Fumajime Kaiketsu Zorori NaN
22 24208 Pui Pui Molcar NaN
23 24207 Case Study of Vanitas NaN
24 24206 HOME! NaN
25 24205 Hachimitsu Suicide Machine NaN
26 24204 Deliver Police: Nishitokyo-shi Deliver Keisats... NaN
27 24203 Ryūsatsu no Kyōkotsu NaN
28 24202 Muteking the Dancing Hero NaN
29 24201 World Trigger NaN
30 24200 Gekijō-ban Utano☆Princesama♪ Maji Love ST☆RISH... NaN
31 24199 My Hero Academia THE MOVIE: World Heroes' Mission NaN
32 24198 Vampire Dies in No Time NaN
33 24196 Visual Prison NaN
34 24195 IDOLiSH7 Third Beat! Kujo starts carrying out his plans to defame G...
35 24194 Jujutsu Kaisen 0 NaN
36 24193 Gekijō-ban Jujutsu Kaisen 0 NaN
37 24192 takt op. NaN
38 24191 She Professed Herself Pupil of the Wise Man NaN
39 24189 Akebi's Sailor Uniform NaN
40 24187 Love and Heart Sure, university freshman Yagisawa has a lot o...
41 24186 Do It Yourself!! NaN
42 24185 Ningen Kaishūsha NaN
43 24183 Kanashiki Debu Neko-chan NaN
44 24182 Ikinuke! Bakusō! Kusohamu-chan! NaN
45 24180 Kaiju No. 8 NaN
46 24179 Phantom Seer NaN
47 24178 Magu-chan: God of Destruction The God of Destruction Magu Menueku has been s...
48 24177 i tell c NaN
49 24176 High School Family: Kokosei Kazoku NaN


Related Topics



Leave a reply



Submit