Handling Variable Number of Columns with Pandas - Python

One way which seems to work (at least in 0.10.1 and 0.11.0.dev-fc8de6d):

>>> !cat ragged.csv
1,2,3
1,2,3,4
1,2,3,4,5
1,2
1,2,3,4
>>> my_cols = ["A", "B", "C", "D", "E"]
>>> pd.read_csv("ragged.csv", names=my_cols, engine='python')
   A  B   C   D   E
0  1  2   3 NaN NaN
1  1  2   3   4 NaN
2  1  2   3   4   5
3  1  2 NaN NaN NaN
4  1  2   3   4 NaN

Note that this approach requires that you give names to the columns you want, though. Not as general as some other ways, but works well enough when it applies.

Handling Variable Number of Columns Dataframe - Python

I believe you can just pass the list into pd.DataFrame() and you will just get NaNs for the values that don't exist.

For example:

List_of_Lists = [[1,2,3,4],
                 [5,6,7],
                 [9,10],
                 [11]]
df = pd.DataFrame(List_of_Lists)
print(df)
    0     1    2    3
0   1   2.0  3.0  4.0
1   5   6.0  7.0  NaN
2   9  10.0  NaN  NaN
3  11   NaN  NaN  NaN

Then to get the naming the way you want just use pandas.DataFrame.add_prefix

df = df.add_prefix('Column')
print(df)
   Column0  Column1  Column2  Column3
0        1      2.0      3.0      4.0
1        5      6.0      7.0      NaN
2        9     10.0      NaN      NaN
3       11      NaN      NaN      NaN

Now I guess there is the possibility that you also could want each list to be a column. In that case you need to transpose your List_of_Lists.

from itertools import zip_longest

df = pd.DataFrame(list(map(list, zip_longest(*List_of_Lists))))
print(df)
   0    1     2     3
0  1  5.0   9.0  11.0
1  2  6.0  10.0   NaN
2  3  7.0   NaN   NaN
3  4  NaN   NaN   NaN

Reading csv with variable number of columns with pandas

Just make use of the usecols params, instead of the names one. names assume that you're listing all the columns' name, whereas usecolsassume a subample of the columns.

from io import StringIO
import pandas as pd

file = StringIO(
        '''1, 2, 3, 4,
           1, 2
           1, 2, 3, 4, 
           1, 2, 3,''')

df = pd.read_csv(file, usecols =[0,1,2], header = None)
df
0   1   2
0   1   2   3.0
1   1   2   NaN
2   1   2   3.0
3   1   2   3.0

How to deal with variable number of columns in dataframe

Use Index.intersection:

df[df.columns.intersection(['Col_A','Col_A','Col_E'], sort=False)]

Multiple condition over a variable number of columns

Here's a solution using any and mask without apply:

df = pd.DataFrame(index=range(8), columns = ['TOT_SIGNAL','TRADING_DAY']).join(pd.DataFrame(np.eye(8, 5)))

df.TRADING_DAY = df.TRADING_DAY.mask((df.iloc[:,2:] != 0).any(axis=1), 1)

Result:

  TOT_SIGNAL TRADING_DAY    0    1    2    3    4
0        NaN           1  1.0  0.0  0.0  0.0  0.0
1        NaN           1  0.0  1.0  0.0  0.0  0.0
2        NaN           1  0.0  0.0  1.0  0.0  0.0
3        NaN           1  0.0  0.0  0.0  1.0  0.0
4        NaN           1  0.0  0.0  0.0  0.0  1.0
5        NaN         NaN  0.0  0.0  0.0  0.0  0.0
6        NaN         NaN  0.0  0.0  0.0  0.0  0.0
7        NaN         NaN  0.0  0.0  0.0  0.0  0.0

How to Read A CSV With A Variable Number of Columns?

You could pass a dummy separator, and then use str.split (by ",") with expand=True:

df = pd.read_csv('path/to/file.csv', sep=" ", header=None)
df = df[0].str.split(",", expand=True).fillna("")
print(df)

Output

      0     1     2     3
0  5783  145v            
1  g656  4589  3243  tt56
2  6579

Split a pandas DataFrame column into a variable number of columns

You could slightly change the function and use it in a list comprehension; then assign the nested list to columns:

def get_header_properties(header):
    pf_type = re.match(".*?(?=\.)", header).group()
    pf_id = re.search(f"(?<={pf_type}\.).*?(?=(,|$))", header).group()
    pf_coords = re.search(f"(?<={pf_id}).*", header).group()
    coords = pf_coords.split(",")[1:]
    return [pf_type, pf_id] + coords + ([np.nan]*(2-len(coords)) if len(coords)<2 else [])

df[['Type','ID','dim1','dim2']] = [get_header_properties(i) for i in df['index']]
out = df.drop(columns='index')[['Type','ID','dim1','dim2','value']]

That said, instead of the function, it seems it's simpler and more efficient to use str.split once on "index" column and join it to df:

df = (df['index'].str.split('[.,]', expand=True)
      .fillna(np.nan)
      .rename(columns={i: col for i,col in enumerate(['Type','ID','dim1','dim2'])})
      .join(df[['value']]))

Output:

        Type       ID dim1 dim2   value
0  FirstType  FirstID  NaN  NaN    0.23
1  OtherType  OtherID    1  NaN   50.00
2  OtherType  OtherID    4  NaN   60.00
3   LastType   LastID    1    1  110.00
4   LastType   LastID    1    2  199.00
5   LastType   LastID    2    3  123.00

Pandas comparison with variable number of columns

So assuming your dataframe has parsed the datetime columns (you can use to_datetime for that, or eg specify parse_dates in read_csv):

In [64]: df
Out[64]:
   id       date birth_date_1 birth_date_2
0   1 2000-01-01   2000-01-03   2000-01-05
1   1 2000-01-07   2000-01-03   2000-01-05
2   2 2000-01-02   2000-01-10   2000-01-01
3   2 2000-01-05   2000-01-10   2000-01-01

You can now check where the values in the 'birth_date' columns are lower than the values in the 'date' column, and then use sum to count:

In [65]: df[['birth_date_1', 'birth_date_2']].lt(df['date'], axis=0)
Out[65]:
  birth_date_1 birth_date_2
0        False        False
1         True         True
2        False         True
3        False         True

In [66]: df[['birth_date_1', 'birth_date_2']].lt(df['date'], axis=0).sum(axis=1)

Out[66]:
0    0
1    2
2    1
3    1
dtype: int64

To deal with the varying number of 'birth_date' columns, you can do this automatically with filter, like this:

In [67]: df.filter(like="birth_date")
Out[67]:
  birth_date_1 birth_date_2
0   2000-01-03   2000-01-05
1   2000-01-03   2000-01-05
2   2000-01-10   2000-01-01
3   2000-01-10   2000-01-01

Altogether, this would give:

In [66]: df.filter(like="birth_date").lt(df['date'], axis=0).sum(axis=1)

Out[66]:
0    0
1    2
2    1
3    1
dtype: int64

Pandas - string split into multiple columns with variable number of delimited values into 3 columns

Use DataFrame.reindex:

s.str.split(' - ', expand=True).reindex(range(3), axis=1).astype(object).mask(lambda x: x.isna(), None)

Or:

s.str.split(' - ', expand=True).reindex(range(3), axis=1).fillna('')

How set value on dataframe given a variable number of conditions?

If I understand you correctly, you are looking for .query() method:

import pandas as pd
from itertools import product

animals = ["dogs", "cats"]
eyes = ['brown', 'blue', 'green']
height = ['short', 'average', 'tall']
a = [animals, eyes, height]
df = pd.DataFrame(list(product(*a)), columns=["animals", "eyes", "height"])
df['value'] = 1

def zero_out(df, lst):
    q = ' & '.join( '{} == "{}"'.format(col, val) for col, val in lst )
    df.loc[df.query(q).index, 'value'] = 0

zero_out(df, [("height", "tall")])
print(df)

Prints:

   animals   eyes   height  value
0     dogs  brown    short      1
1     dogs  brown  average      1
2     dogs  brown     tall      0
3     dogs   blue    short      1
4     dogs   blue  average      1
5     dogs   blue     tall      0
6     dogs  green    short      1
7     dogs  green  average      1
8     dogs  green     tall      0
9     cats  brown    short      1
10    cats  brown  average      1
11    cats  brown     tall      0
12    cats   blue    short      1
13    cats   blue  average      1
14    cats   blue     tall      0
15    cats  green    short      1
16    cats  green  average      1
17    cats  green     tall      0

Or zero_out(df, [("animals", "dogs"), ("eyes", "blue")]):

   animals   eyes   height  value
0     dogs  brown    short      1
1     dogs  brown  average      1
2     dogs  brown     tall      1
3     dogs   blue    short      0
4     dogs   blue  average      0
5     dogs   blue     tall      0
6     dogs  green    short      1
7     dogs  green  average      1
8     dogs  green     tall      1
9     cats  brown    short      1
10    cats  brown  average      1
11    cats  brown     tall      1
12    cats   blue    short      1
13    cats   blue  average      1
14    cats   blue     tall      1
15    cats  green    short      1
16    cats  green  average      1
17    cats  green     tall      1

Handling Variable Number of Columns with Pandas - Python