Creating dataframe from a dictionary where entries have different lengths
In Python 3.x:
import pandas as pd
import numpy as np
d = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )
pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.items() ]))
Out[7]:
A B
0 1 1
1 2 2
2 NaN 3
3 NaN 4
In Python 2.x:
replace d.items()
with d.iteritems()
.
Creating a Pandas DataFrame from a Dictionary of different-length list of Tuples
You are on the right track. Just further flatten the tupled values so the resulting table can be pivoted easily.
Code
dic
is the given dict data.
df = pd.DataFrame(
[[k, v[0], v[1]] for k, ls_v in dic.items() for v in ls_v],
columns=["Person", "GUID", "value"]
).pivot(index="GUID", columns="Person")
# drop hierarchical level of "value"
df.columns = df.columns.droplevel(0)
Result
print(df)
Person Person A Person B Person C
GUID
abc123 1.0 4.0 2.0
bcc222 2.0 NaN NaN
icy558 NaN NaN 7.0
igh643 1.0 NaN NaN
How to create a DataFrame from dict of unequal length lists, and truncating to a specific length?
You can filter values
of dict
in dict comprehension
, then DataFrame
works perfectly:
print ({k:v[:min_length] for k,v in data_dict.items()})
{'b': [1, 2, 3], 'c': [2, 45, 67], 'a': [1, 2, 3]}
df = pd.DataFrame({k:v[:min_length] for k,v in data_dict.items()})
print (df)
a b c
0 1 1 2
1 2 2 45
2 3 3 67
If is possible some length can be less as min_length
add Series
:
data_dict = {'a': [1,2,3,4], 'b': [1,2], 'c': [2,45,67,93,82,92]}
min_length = 3
df = pd.DataFrame({k:pd.Series(v[:min_length]) for k,v in data_dict.items()})
print (df)
a b c
0 1 1.0 2
1 2 2.0 45
2 3 NaN 67
Timings:
In [355]: %timeit (pd.DataFrame({k:v[:min_length] for k,v in data_dict.items()}))
The slowest run took 5.32 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 520 µs per loop
In [356]: %timeit (pd.DataFrame({k:pd.Series(v[:min_length]) for k,v in data_dict.items()}))
The slowest run took 4.50 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 937 µs per loop
#Allen's solution
In [357]: %timeit (pd.DataFrame.from_dict(data_dict,orient='index').T.dropna())
1 loop, best of 3: 16.7 s per loop
Code for timings:
np.random.seed(123)
L = list('ABCDEFGH')
N = 500000
min_length = 10000
data_dict = {k:np.random.randint(10, size=np.random.randint(N)) for k in L}
Panda Dataframe from Dict with different length value
Check with Series
pd.Series(my_dict).rename_axis('Key_Column').reset_index(name='Value_Column')
Key_Column Value_Column
0 a [1, 2, 3]
1 b [0]
2 c [3, 5]
Generate a dataframe from list with different length
Use
In [9]: pd.DataFrame({'a': pd.Series(a), 'b': pd.Series(b)})
Out[9]:
a b
0 1 2.0
1 2 3.0
2 3 NaN
Or,
In [10]: pd.DataFrame.from_dict({'a': a, 'b': b}, orient='index').T
Out[10]:
a b
0 1.0 2.0
1 2.0 3.0
2 3.0 NaN
Create a pandas dataframe from a dict of uneven length
Here is an example of possible solution:
d = {
"a": [1],
"b": 2,
"c": [[7, 8, 9], ["a", "b", "c"], [9, 10, 11]],
"d": None,
}
max_len = max(len(l) if isinstance(l, list) else 1 for l in d.values())
for key in d.keys():
if isinstance(d[key], list):
if len(d[key]) != max_len:
d[key] = np.repeat(d[key], max_len).tolist()
else:
d[key] = np.repeat(np.array(d[key]), max_len).tolist()
Result:
{
'a': [1, 1, 1],
'b': [2, 2, 2],
'c': [[7, 8, 9], ['a', 'b', 'c'], [9, 10, 11]],
'd': [None, None, None]
}
But it will work obviously only for a particular case, when all column but one have only one element. To solve this task generally one should also specify how columns of different length should be handled: should the whole column be repeated and rest trimmed on the last iteration, or should only first / last value be repeated, or some other approach.
Creating a dataframe where one of the arrays has a different length
You can loop each forecast_items value with iter
and next
for select first value, if not exist is assigned fo dictionary NaN
value:
page = requests.get("http://forecast.weather.gov/MapClick.php?lat=37.7772&lon=-122.4168")
soup = BeautifulSoup(page.content, 'html.parser')
seven_day = soup.find(id="seven-day-forecast")
forecast_items = seven_day.find_all(class_="tombstone-container")
out = []
for x in forecast_items:
periods = next(iter([t.get_text() for t in x.select('.period-name')]), np.nan)
short_descs = next(iter([t.get_text() for t in x.select('.short-desc')]), np.nan)
temps = next(iter([t.get_text() for t in x.select('.temp')]), np.nan)
descs = next(iter([d['alt'] for d in x.select('img')]), np.nan)
out.append({'period':periods, 'short_desc':short_descs, 'temp':temps, 'descs':descs})
weather = pd.DataFrame(out)
print (weather)
descs period \
0 NOW until4:00pm Sat
1 Today: Showers, with thunderstorms also possib... Today
2 Tonight: Showers likely and possibly a thunder... Tonight
3 Sunday: A chance of showers before 11am, then ... Sunday
4 Sunday Night: Rain before 11pm, then a chance ... SundayNight
5 Monday: A 40 percent chance of showers. Cloud... Monday
6 Monday Night: A 30 percent chance of showers. ... MondayNight
7 Tuesday: A 50 percent chance of rain. Cloudy,... Tuesday
8 Tuesday Night: Rain. Cloudy, with a low aroun... TuesdayNight
short_desc temp
0 Wind Advisory NaN
1 Showers andBreezy High: 56 °F
2 ShowersLikely Low: 49 °F
3 Heavy Rainand Windy High: 56 °F
4 Heavy Rainand Breezythen ChanceShowers Low: 52 °F
5 ChanceShowers High: 58 °F
6 ChanceShowers Low: 53 °F
7 Chance Rain High: 59 °F
8 Rain Low: 53 °F
Related Topics
How to Convert a Utc Datetime to a Local Datetime Using Only Standard Library
Sort() and Reverse() Functions Do Not Work
How to Time a Code Segment for Testing Performance with Pythons Timeit
How to Check If a Word Is an English Word with Python
Reloading Submodules in Ipython
How to Extract a Single Value from a JSON Response
How to Extract Multiple JSON Objects from One File
Strings in a Dataframe, But Dtype Is Object
How to Convert Each Item in the List to String, for the Purpose of Joining Them
Writing Unicode Text to a Text File
Scrape Multiple Urls Using Qwebpage
Comparing Two Dictionaries and Checking How Many (Key, Value) Pairs Are Equal
What Soap Client Libraries Exist for Python, and Where Is the Documentation for Them
When Do I Need to Call Mainloop in a Tkinter Application
Setup Script Exited with Error: Command 'X86_64-Linux-Gnu-Gcc' Failed with Exit Status 1