Number of Columns

How do I retrieve the number of columns in a Pandas data frame?

Like so:

import pandas as pd
df = pd.DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})

len(df.columns)
3

How to get number of columns in a DataFrame row that are above threshold

The value of x in the lambda is a Series, which can be indexed like this.

df[9] = df.apply(lambda x: x[x > 2].count(), axis=1)

How to count the number of columns with a value on each row in python?

Replace all the blank values to NaN, then count the notnull values by row using sum(1):

df['Chains'] = df.iloc[:,1:].replace('',np.nan).notnull().sum(1)

>>> df
IndividualID Trip1 Trip2 Trip3 Trip4 Trip5 Trip6 Trip7 Trip8 \
0 200100001 23 1.0 2.0 4.0 4.0 1.0 5.0 5.0
1 200100002 21 1.0 12.0 3.0 1.0 55.0 7.0 7.0
2 200100003 12 3.0 3.0 6.0 3.0 NaN NaN NaN
3 200100004 4 NaN NaN NaN NaN NaN NaN NaN
4 200100005 6 5.0 3.0 9.0 3.0 5.0 6.0 NaN
5 200100005 23 4.0 4.0 2.0 4.0 3.0 6.0 5.0

Trip9 Chains
0 5.0 9
1 NaN 8
2 NaN 5
3 NaN 1
4 NaN 7
5 NaN 8

Python ValueError: The number of columns in this dataset is different from the one used to fit this transformer (when using the fit() method)

Problem solved. In my pipeline, the categorical features are being one-hot encoded. In my training set, there were 42 unique categories, meaning that this will result in 42 columns when one-hot encoded. In my testing set, there were 27 unique categories, resulting in 27 columns when one-hot encoded. Thence, the ValueError was raised.

Split a pandas DataFrame column into a variable number of columns

You could slightly change the function and use it in a list comprehension; then assign the nested list to columns:

def get_header_properties(header):
pf_type = re.match(".*?(?=\.)", header).group()
pf_id = re.search(f"(?<={pf_type}\.).*?(?=(,|$))", header).group()
pf_coords = re.search(f"(?<={pf_id}).*", header).group()
coords = pf_coords.split(",")[1:]
return [pf_type, pf_id] + coords + ([np.nan]*(2-len(coords)) if len(coords)<2 else [])

df[['Type','ID','dim1','dim2']] = [get_header_properties(i) for i in df['index']]
out = df.drop(columns='index')[['Type','ID','dim1','dim2','value']]

That said, instead of the function, it seems it's simpler and more efficient to use str.split once on "index" column and join it to df:

df = (df['index'].str.split('[.,]', expand=True)
.fillna(np.nan)
.rename(columns={i: col for i,col in enumerate(['Type','ID','dim1','dim2'])})
.join(df[['value']]))

Output:

        Type       ID dim1 dim2   value
0 FirstType FirstID NaN NaN 0.23
1 OtherType OtherID 1 NaN 50.00
2 OtherType OtherID 4 NaN 60.00
3 LastType LastID 1 1 110.00
4 LastType LastID 1 2 199.00
5 LastType LastID 2 3 123.00

Adding Dummy Column If Number of Column is less than number of rows

You can do reindex

out = df.reindex(columns = df.columns.to_list()+[*range(df.shape[0]-df.shape[1])],fill_value=0)
Out[65]:
A B C 0 1 2
0 4 2 5 0 0 0
1 2 6 8 0 0 0
2 8 3 4 0 0 0
3 4 2 5 0 0 0
4 3 6 7 0 0 0
5 7 3 8 0 0 0

Change amount of columns in Pandas

Create list of all possible values from file, then reshape by numpy.reshape for 4 columns DataFrame:

with open('data.txt') as f:
L = [x for line in f for x in line.strip().split()]
print (L)
['32', '45', '2.65', '-845', '1', '-84', '97.236', '454',
'35.78', '77.12', '948.87', '151', '-23.5', '-787.48', '13.005', '31']

df = pd.DataFrame(np.array(L).reshape(-1, 4))
print (df)
0 1 2 3
0 32 45 2.65 -845
1 1 -84 97.236 454
2 35.78 77.12 948.87 151
3 -23.5 -787.48 13.005 31

But solution not working, if not possible create full 4 columns, then it is a bit complicated:

#missing last value
print (L)
['32', '45', '2.65', '-845', '1', '-84', '97.236', '454', '35.78',
'77.12', '948.87', '151', '-23.5', '-787.48', '13.005']

arr = np.empty(((len(L) - 1)//4 + 1)*4, dtype='O')
arr[:len(L)] = L
df = pd.DataFrame(arr.reshape((-1, 4))).fillna('0')
print(df)
0 1 2 3
0 32 45 2.65 -845
1 1 -84 97.236 454
2 35.78 77.12 948.87 151
3 -23.5 -787.48 13.005 0


Related Topics



Leave a reply



Submit