Number of Columns

How do I retrieve the number of columns in a Pandas data frame?

Like so:

import pandas as pd
df = pd.DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})

len(df.columns)
3

How to get number of columns in a DataFrame row that are above threshold

The value of x in the lambda is a Series, which can be indexed like this.

df[9] = df.apply(lambda x: x[x > 2].count(), axis=1)

How to count the number of columns with a value on each row in python?

Replace all the blank values to NaN, then count the notnull values by row using sum(1):

df['Chains'] = df.iloc[:,1:].replace('',np.nan).notnull().sum(1)

>>> df
   IndividualID  Trip1  Trip2  Trip3  Trip4  Trip5  Trip6  Trip7  Trip8  \
0     200100001     23    1.0    2.0    4.0    4.0    1.0    5.0    5.0   
1     200100002     21    1.0   12.0    3.0    1.0   55.0    7.0    7.0   
2     200100003     12    3.0    3.0    6.0    3.0    NaN    NaN    NaN   
3     200100004      4    NaN    NaN    NaN    NaN    NaN    NaN    NaN   
4     200100005      6    5.0    3.0    9.0    3.0    5.0    6.0    NaN   
5     200100005     23    4.0    4.0    2.0    4.0    3.0    6.0    5.0   

   Trip9  Chains  
0    5.0       9  
1    NaN       8  
2    NaN       5  
3    NaN       1  
4    NaN       7  
5    NaN       8

Python ValueError: The number of columns in this dataset is different from the one used to fit this transformer (when using the fit() method)

Problem solved. In my pipeline, the categorical features are being one-hot encoded. In my training set, there were 42 unique categories, meaning that this will result in 42 columns when one-hot encoded. In my testing set, there were 27 unique categories, resulting in 27 columns when one-hot encoded. Thence, the ValueError was raised.

Split a pandas DataFrame column into a variable number of columns

You could slightly change the function and use it in a list comprehension; then assign the nested list to columns:

def get_header_properties(header):
    pf_type = re.match(".*?(?=\.)", header).group()
    pf_id = re.search(f"(?<={pf_type}\.).*?(?=(,|$))", header).group()
    pf_coords = re.search(f"(?<={pf_id}).*", header).group()
    coords = pf_coords.split(",")[1:]
    return [pf_type, pf_id] + coords + ([np.nan]*(2-len(coords)) if len(coords)<2 else [])

df[['Type','ID','dim1','dim2']] = [get_header_properties(i) for i in df['index']]
out = df.drop(columns='index')[['Type','ID','dim1','dim2','value']]

That said, instead of the function, it seems it's simpler and more efficient to use str.split once on "index" column and join it to df:

df = (df['index'].str.split('[.,]', expand=True)
      .fillna(np.nan)
      .rename(columns={i: col for i,col in enumerate(['Type','ID','dim1','dim2'])})
      .join(df[['value']]))

Output:

        Type       ID dim1 dim2   value
0  FirstType  FirstID  NaN  NaN    0.23
1  OtherType  OtherID    1  NaN   50.00
2  OtherType  OtherID    4  NaN   60.00
3   LastType   LastID    1    1  110.00
4   LastType   LastID    1    2  199.00
5   LastType   LastID    2    3  123.00

Adding Dummy Column If Number of Column is less than number of rows

You can do reindex

out = df.reindex(columns = df.columns.to_list()+[*range(df.shape[0]-df.shape[1])],fill_value=0)
Out[65]: 
   A  B  C  0  1  2
0  4  2  5  0  0  0
1  2  6  8  0  0  0
2  8  3  4  0  0  0
3  4  2  5  0  0  0
4  3  6  7  0  0  0
5  7  3  8  0  0  0

Change amount of columns in Pandas

Create list of all possible values from file, then reshape by numpy.reshape for 4 columns DataFrame:

with open('data.txt') as f:
    L = [x for line in f for x in line.strip().split()]
    print (L)
['32', '45', '2.65', '-845', '1', '-84', '97.236', '454', 
 '35.78', '77.12', '948.87', '151', '-23.5', '-787.48', '13.005', '31']

df = pd.DataFrame(np.array(L).reshape(-1, 4))
print (df)
       0        1       2     3
0     32       45    2.65  -845
1      1      -84  97.236   454
2  35.78    77.12  948.87   151
3  -23.5  -787.48  13.005    31

But solution not working, if not possible create full 4 columns, then it is a bit complicated:

#missing last value
print (L)
['32', '45', '2.65', '-845', '1', '-84', '97.236', '454', '35.78', 
 '77.12', '948.87', '151', '-23.5', '-787.48', '13.005']

arr = np.empty(((len(L) - 1)//4 + 1)*4, dtype='O')
arr[:len(L)] = L
df = pd.DataFrame(arr.reshape((-1, 4))).fillna('0')
print(df)
       0        1       2     3
0     32       45    2.65  -845
1      1      -84  97.236   454
2  35.78    77.12  948.87   151
3  -23.5  -787.48  13.005     0