Create Pandas Dataframe from a String

Create Pandas DataFrame from a string

A simple way to do this is to use StringIO.StringIO (python2) or io.StringIO (python3) and pass that to the pandas.read_csv function. E.g:

import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO

import pandas as pd

TESTDATA = StringIO("""col1;col2;col3
1;4.4;99
2;4.5;200
3;4.7;65
4;3.2;140
""")

df = pd.read_csv(TESTDATA, sep=";")

Create Pandas DataFrame from space separated String

First, recreate the string:

s = """
C1 C2 DATE C4 C5 C6 C7
0 0.0 W04 2021-01-08 00:00:00+00:00 E EUE C1 157
1 0.0 W04 2021-01-08 00:00:00+00:00 E AEU C1 157
2 0.0 W04 2021-01-01 00:00:00+00:00 E SADA H1 747
3 0.0 W04 2021-01-04 00:00:00+00:00 E SSEA H1 747
4 0.0 W04 2021-01-05 00:00:00+00:00 E GPEA H1 747
"""

Now, you can use Pandas.read_csv to import a buffer:

from io import StringIO
df = pd.read_csv(StringIO(s), sep=r"\s\s+")

From what I can tell, this results in exactly the DataFrame that you are looking for:

Screenshot of resulting DataFrame

You may want to convert the DATE column to datetime values as well:

df['DATE'] = df.DATE.astype('datetime64')

Create pandas dataframe from string (in csv format)

I believe you need StringIO with read_csv:

import pandas as pd

data = '"A1","B1","C1","D1","E1","F1","G1","H1"\n"A2","B2","C2","D2","E2","F2"'
df = pd.read_csv(pd.compat.StringIO(data), header=None)

print (df)

0 1 2 3 4 5 6 7
0 A1 B1 C1 D1 E1 F1 G1 H1
1 A2 B2 C2 D2 E2 F2 NaN NaN

how do create a pandas data frame from a string output in python

Don't create any dataframe upfront, and also don't convert and concatenate the string values (since it may have performance impact for a large data).
Just create an empty list to hold each rows (as a list), and append each of the list in a loop, then finally create the dataframe, once you have required list of lists.

import  requests
# this code is to call each url and get the data
hyperId = ['hyper1', 'hyper2', 'hyper3']
# Don't create any Data Frame
# df1 = pd.DataFrame(columns=['id', 'servername', 'modelname'])
dataList = []
for id in hyperId:
hyperUrl = "http://testabc" + id
resp = requests.get(hyperUrl)
# data1 = id + "," + resp['servername'] + "," + resp['model']
dataList.append([id, resp['servername'], resp['model']])

df = pd.DataFrame(dataList, columns=['id', 'servername', 'modelname'])

Create pandas dataframe from string

In the off chance that you are getting data from elsewhere in the weird format that you described, following regular expression based substitutions can fix your json and there after you can go as per @Anton vBR 's solution.

import pandas as pd
import json
import re

string2 = '{"Country":"USA","Name":"Ryan"}{"Country":"Sweden","Name":"Sam"}{"Country":"Brazil","Name":"Ralf"}'

#create dict of substitutions
rd = { '^{' : '[{' , #substitute starting char with [
'}$' : '}]', #substitute ending char with ]
'}{' : '},{' #Add , in between two dicts
}

#replace as per dict
for k,v in rd.iteritems():
string2 = re.sub(r'{}'.format(k),r'{}'.format(v),string2)

df = pd.DataFrame(json.loads(string2))
print(df)

Create dataframe from a string Python

In the code below s is the string:

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO(s)).dropna(axis=1)
df.rename(columns={df.columns[0]: ""}, inplace=True)

By the way, if the string comes from a csv file then it is simpler to read the file directly using pd.read_csv.

Edit: This code will create a multiindex of columns:

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO(s), header = None).dropna(how="all", axis=1).T
df[0] = df.loc[1, 0]
df = df.set_index([0, 1]).T

How to create a new columns of dataframe based on string containing condition

You can do it with pd.Series.str.contains with giving the list l as a OR string :

import re
import pandas as pd

df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
'Phrases':['I have a cool family', 'I like avocados', 'I would like to go to school', 'I enjoy Harry Potter']})

l=['cool','avocado','lord of the rings']

df['new_column']=df['Phrases'].str.contains('|'.join(l))

df['matched strings']=df['Phrases'].apply(lambda x: ','.join(re.findall('|'.join(l),x)))

df
Out[18]:
Date Phrases new_column matched strings
0 10/2/2011 I have a cool family True cool
1 11/2/2011 I like avocados True avocado
2 12/2/2011 I would like to go to school False
3 13/2/2011 I enjoy Harry Potter False

Replace special characters in pandas dataframe from a string of special characters

You can use apply with str.replace:

import re
chars = ''.join(map(re.escape, listOfSpecialChars))

df2 = df.apply(lambda c: c.str.replace(f'[{chars}]', '', regex=True))

Alternatively, stack/unstack:

df2 = df.stack().str.replace(f'[{chars}]', '', regex=True).unstack()

output:

  col1 col2
0 1 A
1 3 B
2 4 C


Related Topics



Leave a reply



Submit