Fill With Nan When Length of Values Does Not Match Length of Index

How to fill nan when length of values does not match length of index?

Try via Series() and concat() method:

date=pd.Series(pd.date_range(start='2012-08-23', end='2014-08-23', freq='D'),name='date')

Finally:

df=pd.concat([date,df],axis=1)

Fill with Nan when Length of values does not match length of index

from_dict and orient='index'

pd.DataFrame.from_dict({n: c.unique() for n, c in oldDF.iteritems()}, orient='index').T

Transportation Condition
0 cars New
1 bikes None
2 trains None

zip_longest

from itertools import zip_longest

pd.DataFrame([*zip_longest(*map(pd.unique, map(oldDF.get, oldDF)))], columns=[*oldDF])

Transportation Condition
0 cars New
1 bikes None
2 trains None

Length of values does not match length of index - update dataframe column

The issue with pakpe's answer is the same as the logic in your original code. You say you want

the 'newCol' column is populated with a list of the matching items
from myList; otherwise, in case of no match, Nan or an empty list

Neither of your attempts account for that.

import pandas as pd
import numpy as np

#initialize with blank list in newCol
testdata = [['Subject Test1 yes', []], ['Subject Test2 yes', []], ['Subject Test3 no', []]]
myList = ['yes', '2', 'random']

#create dataframe
df = pd.DataFrame(testdata, columns=['subject', 'newCol'])

# set column type of newCol as object to ensure it may contain a list of values
df['newCol'] = df['newCol'].astype('object')

for index, row in df.iterrows():
for x in myList:
if x in row['subject']:
df.at[index, 'newCol'].append(x)

print(df)

Or if you want to fill with NaN you can do:

for index, row in df.iterrows():
for x in range(len(myList)):
if myList[x] in row['subject']:
df.at[index, 'newCol'].append(myList[x])
if x == len(myList)-1 and df.at[index, 'newCol'] == []:
df.at[index, 'newCol'] = np.NAN

Output:

myList = ['yes', '2', 'random']

subject newCol
0 Subject Test1 yes [yes]
1 Subject Test2 yes [yes, 2]
2 Subject Test3 no [nan]

ValueError: Length of values does not match length of index | Pandas DataFrame.unique()

The error comes up when you are trying to assign a list of numpy array of different length to a data frame, and it can be reproduced as follows:

A data frame of four rows:

df = pd.DataFrame({'A': [1,2,3,4]})

Now trying to assign a list/array of two elements to it:

df['B'] = [3,4]   # or df['B'] = np.array([3,4])

Both errors out:

ValueError: Length of values does not match length of index

Because the data frame has four rows but the list and array has only two elements.

Work around Solution (use with caution): convert the list/array to a pandas Series, and then when you do assignment, missing index in the Series will be filled with NaN:

df['B'] = pd.Series([3,4])

df
# A B
#0 1 3.0
#1 2 4.0
#2 3 NaN # NaN because the value at index 2 and 3 doesn't exist in the Series
#3 4 NaN

For your specific problem, if you don't care about the index or the correspondence of values between columns, you can reset index for each column after dropping the duplicates:

df.apply(lambda col: col.drop_duplicates().reset_index(drop=True))

# A B
#0 1 1.0
#1 2 5.0
#2 7 9.0
#3 8 NaN

ValueError: Length of values (4) does not match length of index (179) Pandas

This is a bit tricky, but to access an item in a list within a pandas series, you must also use str. Therefore, you should replace:

employees["job1month"] = employees["jobDateRange"][0]

With:

employees["job1month"] = employees["jobDateRange"].str[0]

Pandas group by cumsum length of values does not match length of index

This will give you the groups then drop the zero rows.

df = pd.DataFrame({'code': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112],
'value': [0.0, 0.0, 23.2, 10.3, 0.2, 0.0, 22.6, 0.0, 0.0, 2.2, 3.8, 0.0]})

df['group'] = df.value.eq(0).cumsum()
df = df.loc[df.value.ne(0)]

Output

    code  value  group
2 103 23.2 2
3 104 10.3 2
4 105 0.2 2
6 107 22.6 3
9 110 2.2 5
10 111 3.8 5

ValueError: Length of values does not match length of index when trying to modify column values a pandas groupby

Your method didn't work because of the index error. When you groupby 'A', the index is represented the same way in the grouped data too. Since set_value(0) could not find the correct index, it creates a new object with that index. That's the reason why there was a length mismatch.

Fix 1

reset_index(drop=True)

df['A'] = df.groupby('A')['A'].apply(lambda x: x.str.replace('.*', '')\
.reset_index(drop=True).set_value(0, x.values[0])).values
df

A C D
0 one 0.410599 -0.205158
1 0.144044 0.313068
2 0.333674 -0.742165
3 three 0.761038 -2.552990
4 1.494079 2.269755
5 two 1.454274 -0.854096
6 0.121675 0.653619
7 0.443863 0.864436

Fix 2

set_value

set_value has a 3rd parameter called takeable which determines how the index is treated. It is False by default, but setting it to True worked for my case.

In addition to Zero's solutions, the solution for isolating values at the centre of their groups is as follows:

df.A = df.groupby('A'['A'].apply(lambda x: x.str.replace('.*', '')\
.set_value(len(x) // 2, x.values[0], True)).values

df

A C D
0 0.410599 -0.205158
1 one 0.144044 0.313068
2 0.333674 -0.742165
3 0.761038 -2.552990
4 three 1.494079 2.269755
5 1.454274 -0.854096
6 two 0.121675 0.653619
7 0.443863 0.864436


Related Topics



Leave a reply



Submit