Create Pandas DataFrame from a string
A simple way to do this is to use StringIO.StringIO
(python2) or io.StringIO
(python3) and pass that to the pandas.read_csv
function. E.g:
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
import pandas as pd
TESTDATA = StringIO("""col1;col2;col3
1;4.4;99
2;4.5;200
3;4.7;65
4;3.2;140
""")
df = pd.read_csv(TESTDATA, sep=";")
Create Pandas DataFrame from space separated String
First, recreate the string:
s = """
C1 C2 DATE C4 C5 C6 C7
0 0.0 W04 2021-01-08 00:00:00+00:00 E EUE C1 157
1 0.0 W04 2021-01-08 00:00:00+00:00 E AEU C1 157
2 0.0 W04 2021-01-01 00:00:00+00:00 E SADA H1 747
3 0.0 W04 2021-01-04 00:00:00+00:00 E SSEA H1 747
4 0.0 W04 2021-01-05 00:00:00+00:00 E GPEA H1 747
"""
Now, you can use Pandas.read_csv
to import a buffer:
from io import StringIO
df = pd.read_csv(StringIO(s), sep=r"\s\s+")
From what I can tell, this results in exactly the DataFrame that you are looking for:
You may want to convert the DATE
column to datetime
values as well:
df['DATE'] = df.DATE.astype('datetime64')
Create pandas dataframe from string (in csv format)
I believe you need StringIO
with read_csv
:
import pandas as pd
data = '"A1","B1","C1","D1","E1","F1","G1","H1"\n"A2","B2","C2","D2","E2","F2"'
df = pd.read_csv(pd.compat.StringIO(data), header=None)
print (df)
0 1 2 3 4 5 6 7
0 A1 B1 C1 D1 E1 F1 G1 H1
1 A2 B2 C2 D2 E2 F2 NaN NaN
how do create a pandas data frame from a string output in python
Don't create any dataframe upfront, and also don't convert and concatenate the string values (since it may have performance impact for a large data).
Just create an empty list to hold each rows (as a list), and append each of the list in a loop, then finally create the dataframe, once you have required list of lists.
import requests
# this code is to call each url and get the data
hyperId = ['hyper1', 'hyper2', 'hyper3']
# Don't create any Data Frame
# df1 = pd.DataFrame(columns=['id', 'servername', 'modelname'])
dataList = []
for id in hyperId:
hyperUrl = "http://testabc" + id
resp = requests.get(hyperUrl)
# data1 = id + "," + resp['servername'] + "," + resp['model']
dataList.append([id, resp['servername'], resp['model']])
df = pd.DataFrame(dataList, columns=['id', 'servername', 'modelname'])
Create pandas dataframe from string
In the off chance that you are getting data from elsewhere in the weird format that you described, following regular expression based substitutions can fix your json and there after you can go as per @Anton vBR 's solution.
import pandas as pd
import json
import re
string2 = '{"Country":"USA","Name":"Ryan"}{"Country":"Sweden","Name":"Sam"}{"Country":"Brazil","Name":"Ralf"}'
#create dict of substitutions
rd = { '^{' : '[{' , #substitute starting char with [
'}$' : '}]', #substitute ending char with ]
'}{' : '},{' #Add , in between two dicts
}
#replace as per dict
for k,v in rd.iteritems():
string2 = re.sub(r'{}'.format(k),r'{}'.format(v),string2)
df = pd.DataFrame(json.loads(string2))
print(df)
Create dataframe from a string Python
In the code below s
is the string:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(s)).dropna(axis=1)
df.rename(columns={df.columns[0]: ""}, inplace=True)
By the way, if the string comes from a csv file then it is simpler to read the file directly using pd.read_csv
.
Edit: This code will create a multiindex of columns:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(s), header = None).dropna(how="all", axis=1).T
df[0] = df.loc[1, 0]
df = df.set_index([0, 1]).T
How to create a new columns of dataframe based on string containing condition
You can do it with pd.Series.str.contains
with giving the list l
as a OR string :
import re
import pandas as pd
df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
'Phrases':['I have a cool family', 'I like avocados', 'I would like to go to school', 'I enjoy Harry Potter']})
l=['cool','avocado','lord of the rings']
df['new_column']=df['Phrases'].str.contains('|'.join(l))
df['matched strings']=df['Phrases'].apply(lambda x: ','.join(re.findall('|'.join(l),x)))
df
Out[18]:
Date Phrases new_column matched strings
0 10/2/2011 I have a cool family True cool
1 11/2/2011 I like avocados True avocado
2 12/2/2011 I would like to go to school False
3 13/2/2011 I enjoy Harry Potter False
Replace special characters in pandas dataframe from a string of special characters
You can use apply
with str.replace
:
import re
chars = ''.join(map(re.escape, listOfSpecialChars))
df2 = df.apply(lambda c: c.str.replace(f'[{chars}]', '', regex=True))
Alternatively, stack
/unstack
:
df2 = df.stack().str.replace(f'[{chars}]', '', regex=True).unstack()
output:
col1 col2
0 1 A
1 3 B
2 4 C
Related Topics
Class Method Differences in Python: Bound, Unbound and Static
How to Make a Timezone Aware Datetime Object
Matplotlib Different Size Subplots
What Do Ellipsis [...] Mean in a List
Pythonic Way to Print List Items
How to Convert a String with Dot and Comma into a Float in Python
Why Is Button Parameter "Command" Executed When Declared
What Is the Most Efficient Way to Loop Through Dataframes with Pandas
What Is the Reason for Performing a Double Fork When Creating a Daemon
How Accurate Is Python's Time.Sleep()
Capturing Repeating Subpatterns in Python Regex
Set Bash Variable from Python Script
I Have a Problem with Sending Mail:Typeerror: _Init_() Got an Unexpected Keyword Argument 'Context'