How to fill nan when length of values does not match length of index?
Try via Series()
and concat()
method:
date=pd.Series(pd.date_range(start='2012-08-23', end='2014-08-23', freq='D'),name='date')
Finally:
df=pd.concat([date,df],axis=1)
Fill with Nan when Length of values does not match length of index
from_dict
and orient='index'
pd.DataFrame.from_dict({n: c.unique() for n, c in oldDF.iteritems()}, orient='index').T
Transportation Condition
0 cars New
1 bikes None
2 trains None
zip_longest
from itertools import zip_longest
pd.DataFrame([*zip_longest(*map(pd.unique, map(oldDF.get, oldDF)))], columns=[*oldDF])
Transportation Condition
0 cars New
1 bikes None
2 trains None
Length of values does not match length of index - update dataframe column
The issue with pakpe's answer is the same as the logic in your original code. You say you want
the 'newCol' column is populated with a list of the matching items
from myList; otherwise, in case of no match, Nan or an empty list
Neither of your attempts account for that.
import pandas as pd
import numpy as np
#initialize with blank list in newCol
testdata = [['Subject Test1 yes', []], ['Subject Test2 yes', []], ['Subject Test3 no', []]]
myList = ['yes', '2', 'random']
#create dataframe
df = pd.DataFrame(testdata, columns=['subject', 'newCol'])
# set column type of newCol as object to ensure it may contain a list of values
df['newCol'] = df['newCol'].astype('object')
for index, row in df.iterrows():
for x in myList:
if x in row['subject']:
df.at[index, 'newCol'].append(x)
print(df)
Or if you want to fill with NaN you can do:
for index, row in df.iterrows():
for x in range(len(myList)):
if myList[x] in row['subject']:
df.at[index, 'newCol'].append(myList[x])
if x == len(myList)-1 and df.at[index, 'newCol'] == []:
df.at[index, 'newCol'] = np.NAN
Output:
myList = ['yes', '2', 'random']
subject newCol
0 Subject Test1 yes [yes]
1 Subject Test2 yes [yes, 2]
2 Subject Test3 no [nan]
ValueError: Length of values does not match length of index | Pandas DataFrame.unique()
The error comes up when you are trying to assign a list of numpy array of different length to a data frame, and it can be reproduced as follows:
A data frame of four rows:
df = pd.DataFrame({'A': [1,2,3,4]})
Now trying to assign a list/array of two elements to it:
df['B'] = [3,4] # or df['B'] = np.array([3,4])
Both errors out:
ValueError: Length of values does not match length of index
Because the data frame has four rows but the list and array has only two elements.
Work around Solution (use with caution): convert the list/array to a pandas Series, and then when you do assignment, missing index in the Series will be filled with NaN:
df['B'] = pd.Series([3,4])
df
# A B
#0 1 3.0
#1 2 4.0
#2 3 NaN # NaN because the value at index 2 and 3 doesn't exist in the Series
#3 4 NaN
For your specific problem, if you don't care about the index or the correspondence of values between columns, you can reset index for each column after dropping the duplicates:
df.apply(lambda col: col.drop_duplicates().reset_index(drop=True))
# A B
#0 1 1.0
#1 2 5.0
#2 7 9.0
#3 8 NaN
ValueError: Length of values (4) does not match length of index (179) Pandas
This is a bit tricky, but to access an item in a list within a pandas series, you must also use str
. Therefore, you should replace:
employees["job1month"] = employees["jobDateRange"][0]
With:
employees["job1month"] = employees["jobDateRange"].str[0]
Pandas group by cumsum length of values does not match length of index
This will give you the groups then drop the zero rows.
df = pd.DataFrame({'code': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112],
'value': [0.0, 0.0, 23.2, 10.3, 0.2, 0.0, 22.6, 0.0, 0.0, 2.2, 3.8, 0.0]})
df['group'] = df.value.eq(0).cumsum()
df = df.loc[df.value.ne(0)]
Output
code value group
2 103 23.2 2
3 104 10.3 2
4 105 0.2 2
6 107 22.6 3
9 110 2.2 5
10 111 3.8 5
ValueError: Length of values does not match length of index when trying to modify column values a pandas groupby
Your method didn't work because of the index error. When you groupby 'A', the index is represented the same way in the grouped data too. Since set_value(0)
could not find the correct index, it creates a new object with that index. That's the reason why there was a length mismatch.
Fix 1reset_index(drop=True)
df['A'] = df.groupby('A')['A'].apply(lambda x: x.str.replace('.*', '')\
.reset_index(drop=True).set_value(0, x.values[0])).values
df
A C D
0 one 0.410599 -0.205158
1 0.144044 0.313068
2 0.333674 -0.742165
3 three 0.761038 -2.552990
4 1.494079 2.269755
5 two 1.454274 -0.854096
6 0.121675 0.653619
7 0.443863 0.864436
Fix 2set_value
set_value
has a 3rd parameter called takeable
which determines how the index is treated. It is False
by default, but setting it to True
worked for my case.
In addition to Zero's solutions, the solution for isolating values at the centre of their groups is as follows:
df.A = df.groupby('A'['A'].apply(lambda x: x.str.replace('.*', '')\
.set_value(len(x) // 2, x.values[0], True)).values
df
A C D
0 0.410599 -0.205158
1 one 0.144044 0.313068
2 0.333674 -0.742165
3 0.761038 -2.552990
4 three 1.494079 2.269755
5 1.454274 -0.854096
6 two 0.121675 0.653619
7 0.443863 0.864436
Related Topics
How to Find the Unit Digits of a Specific Number
How to Insert a Checkbox in a Django Form
Python - How to Fix "Valueerror: Not Enough Values to Unpack (Expected 2, Got 1)"
Opencv Typeerror: Expected Cv::Umat for Argument 'Src' - What Is This
Python Read File Determined by Separator \R\N
How to Downgrade Tensorflow, Multiple Versions Possible
How to Get the Latest File in a Folder
How to Use and Print the Pandas Dataframe Name
Importerror: No Module Named Bs4 (Beautifulsoup)
Python Number With 1000 Separator
How to Find the Maximum Consecutive Occurrences of a Number in Python
Output the Same Amount of Rows as Asterisks Using For-Loop
Keras + Tensorflow and Multiprocessing in Python
Pandas: Merging Two Columns into One With Corresponding Values
Comparing Two Xml Files in Python