add columns different length pandas
Use concat and pass axis=1
and ignore_index=True
:
In [38]:
import numpy as np
df = pd.DataFrame({'a':np.arange(5)})
df1 = pd.DataFrame({'b':np.arange(4)})
print(df1)
df
b
0 0
1 1
2 2
3 3
Out[38]:
a
0 0
1 1
2 2
3 3
4 4
In [39]:
pd.concat([df,df1], ignore_index=True, axis=1)
Out[39]:
0 1
0 0 0
1 1 1
2 2 2
3 3 3
4 4 NaN
Add column vector to a dataframe of different length
Simple example, np.repeat()
does what you need
D2 = np.array([1,2])
np.repeat(D2,60)
Adding list with different length as a new column to a dataframe
If you convert the list to a Series then it will just work:
datasetTest.loc[:,'predict_close'] = pd.Series(test_pred_list)
example:
In[121]:
df = pd.DataFrame({'a':np.arange(3)})
df
Out[121]:
a
0 0
1 1
2 2
In[122]:
df.loc[:,'b'] = pd.Series(['a','b'])
df
Out[122]:
a b
0 0 a
1 1 b
2 2 NaN
The docs refer to this as setting with enlargement which talks about adding or expanding but it also works where the length is less than the pre-existing index.
To handle where the index doesn't start at 0
or in fact is not an int:
In[126]:
df = pd.DataFrame({'a':np.arange(3)}, index=np.arange(3,6))
df
Out[126]:
a
3 0
4 1
5 2
In[127]:
s = pd.Series(['a','b'])
s.index = df.index[:len(s)]
s
Out[127]:
3 a
4 b
dtype: object
In[128]:
df.loc[:,'b'] = s
df
Out[128]:
a b
3 0 a
4 1 b
5 2 NaN
You can optionally replace the NaN
if you wish calling fillna
How to add row and column to a dataframe of different length?
Transpose 'y' and repeat to the desired number of rows. Set column names to 'x'.
cbind(Dataset, `colnames<-`(t(Headers$y)[rep(1, nrow(Dataset)), ], Headers$x))
H W x1 x2 x3 x4
1 20 30 1 2 3 4
2 10 20 1 2 3 4
3 11 30 1 2 3 4
4 8 10 1 2 3 4
5 10 6 1 2 3 4
Concat two Pandas DataFrame column with different length of index
Use:
df1 = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
})
df2 = pd.DataFrame({
'SMA':list('rty')
})
df3 = df1.join(df2.set_index(df1.index[-len(df2):]))
Or:
df3 = pd.concat([df1, df2.set_index(df1.index[-len(df2):])], axis=1)
print (df3)
A B SMA
0 a 4 NaN
1 b 5 NaN
2 c 4 NaN
3 d 5 r
4 e 5 t
5 f 4 y
How it working:
First is selected index in df1
by length of df2
from back:
print (df1.index[-len(df2):])
RangeIndex(start=3, stop=6, step=1)
And then is overwrite existing values by DataFrame.set_index
:
print (df2.set_index(df1.index[-len(df2):]))
SMA
3 r
4 t
5 y
Python : Add a column into a dataframe with different length repeating the added column till fill the dataframe length
You can use np.tile
to repeat the elements of column C
:
m, n = len(df1), len(df2)
df1['C'] = np.tile(df2['C'], int(np.ceil(m / n)))[:m]
Result:
A B C
0 1 AA 11
1 2 AB 12
2 3 AC 11
3 5 AD 12
Adding new column to dataframe of different length from list
I'm not exactly clear on what you're trying to do, but maybe you want something like this?
df = DataFrame()
def myfunc(number):
row_index = 0
for x in range(0,10):
if 'some condition':
df.loc[row_index, 'results%d' % number] = x
row_index += 1
Split lists in a dataframe with different length lists in columns and rows
It seems that the exploded columns and the non-exploded columns need to be separated. Since we can't hide them in the index as we normally do (given C2
) contains lists (which are unhashable) we must separate the DataFrame then rejoin.
# Convert to single series to explode
cols = ['C1', 'C4']
new_df = df[cols].stack().explode().to_frame()
# Enumerate groups then unstack
new_df = new_df.set_index(
new_df.groupby(level=[0, 1]).cumcount(),
append=True
).unstack(1).groupby(level=0).ffill()
# Join Back Unaffected columns
new_df = new_df.droplevel(0, axis=1).droplevel(1, axis=0).join(
df[df.columns.symmetric_difference(cols)]
)
# Re order columns and reset index
new_df = new_df.reindex(df.columns, axis=1).reset_index(drop=True)
new_df
:
C1 C2 C3 C4
0 A [1] s1 123
1 B [1] s1 123
2 C [2] s2 321
3 D [3] s3 777
4 E [3] s3 111
5 F [4] s4 145
We stack
to get all values into a single series then explode
together and convert back to_frame
cols = ['C1', 'C4']
new_df = df[cols].stack().explode().to_frame()
new_df
0
0 C1 A
C1 B
C4 123
1 C1 C
C4 321
2 C1 D
C1 E
C4 777
C4 111
3 C1 F
C4 145
We can create a new index by enumerating groups with groupby cumcount
set_index
and unstacking
:
new_df = new_df.set_index(
new_df.groupby(level=[0, 1]).cumcount(),
append=True
).unstack(1)
0
C1 C4
0 0 A 123
1 B NaN
1 0 C 321
2 0 D 777
1 E 111
3 0 F 145
We can then groupby ffill
within index groups:
new_df = new_df.groupby(level=0).ffill()
new_df
:
0
C1 C4
0 0 A 123
1 B 123
1 0 C 321
2 0 D 777
1 E 111
3 0 F 145
We can then join
back the unaffected columns to the DataFrame and reindex
to reorder them the way the initially appeared also droplevel
to remove unneeded index levels, lastly reset_index
:
# Join Back Unaffected columns
new_df = new_df.droplevel(0, axis=1).droplevel(1, axis=0).join(
df[df.columns.symmetric_difference(cols)]
)
# Re order columns and reset index
new_df = new_df.reindex(df.columns, axis=1).reset_index(drop=True)
new_df
:
C1 C2 C3 C4
0 A [1] s1 123
1 B [1] s1 123
2 C [2] s2 321
3 D [3] s3 777
4 E [3] s3 111
5 F [4] s4 145
Python Pandas: Assign lists with different lengths as a row to pandas dataframe
You could construct your lists
into a DataFrame
and concat
them:
(pd.concat([df.set_index('ColA'),
pd.DataFrame([list_a, list_c], index=['a', 'c'])],
axis=1).rename_axis('ColA').reset_index())
[out]
ColA ColB 0 1 2
0 a 0 0.0 1.0 NaN
1 b 1 NaN NaN NaN
2 c 2 0.0 1.0 2.0
Or as @QuangHoang suggested, use DataFrame.merge
:
df.merge(pd.DataFrame([list_a, list_c], index=['a', 'c']),
left_on='ColA',
right_index=True,
how='left')
Related Topics
How to Drop Columns by Name in a Data Frame
Opposite of %In%: Exclude Rows With Values Specified in a Vector
Replace a Value in a Data Frame Based on a Conditional ('If') Statement
Plot Multiple Boxplot in One Graph
Count Number of Occurences For Each Unique Value
R: Rjava Package Install Failing
Reorder Levels of a Factor Without Changing Order of Values
Warning Message: in '...': Invalid Factor Level, Na Generated
Position Geom_Text on Dodged Barplot
Generate List of All Possible Combinations of Elements of Vector
Order a "Mixed" Vector (Numbers With Letters)
Calculate the Mean of Every 13 Rows in Data Frame
Add a Common Legend For Combined Ggplots
Force R Not to Use Exponential Notation (E.G. E+10)
Concatenate a Vector of Strings/Character
Find Indices of Duplicated Rows