Shifting Non-Na Cells to the Left

Shifting non-NA cells to the left

You can use the standard apply function:

df=data.frame(x=c("l","m",NA,NA,"p"),y=c(NA,"b","c",NA,NA),z=c("u",NA,"w","x","y"))
df2 = as.data.frame(t(apply(df,1, function(x) { return(c(x[!is.na(x)],x[is.na(x)]) )} )))
colnames(df2) = colnames(df)

> df
x y z
1 l <NA> u
2 m b <NA>
3 <NA> c w
4 <NA> <NA> x
5 p <NA> y
> df2
x y z
1 l u <NA>
2 m b <NA>
3 c w <NA>
4 x <NA> <NA>
5 p y <NA>

Move non-empty cells to the left in pandas DataFrame

Here's what I did:

I unstacked your dataframe into a longer format, then grouped by the name column. Within each group, I drop the NaNs, but then reindex to the full h1 thought h4 set, thus re-creating your NaNs to the right.

from io import StringIO
import pandas

def defragment(x):
values = x.dropna().values
return pandas.Series(values, index=df.columns[:len(values)])

datastring = StringIO("""\
Name h1 h2 h3 h4
A 1 nan 2 3
B nan nan 1 3
C 1 3 2 nan""")

df = pandas.read_table(datastring, sep='\s+').set_index('Name')
long_index = pandas.MultiIndex.from_product([df.index, df.columns])

print(
df.stack()
.groupby(level='Name')
.apply(defragment)
.reindex(long_index)
.unstack()
)

And so I get:

   h1  h2  h3  h4
A 1 2 3 NaN
B 1 3 NaN NaN
C 1 3 2 NaN

Pandas how to shift na values to the right?

Let us try

df=df.replace('na',np.nan).transform(lambda x : sorted(x,key=pd.isnull),1)
x y z
0 1 NaN NaN
1 2 1 NaN
2 2 3 1
3 1 NaN NaN

Shift values in dataframe to the left if column name == Year and value is NaN pandas

Idea is replace next values of years to years with forward filling misisng values and then use DataFrame.groupby with axis=1 for grouping per columns and get first non missing values if exist by GroupBy.first:

s = df.columns.astype(str).to_series()
a = s.where(s.str.contains('\d{4}')).ffill().fillna(s)
print (a)
0 0
1 1
2018 2018
3 2018
2017 2017
5 2017
dtype: object

df1 = df.groupby(pd.Index(a), axis=1).first()
print (df1)
0 1 2017 2018
0 Population 3.0 501433.0 418980.0
1 British 4.0 96797.0 31514.0
2 French NaN 201.0 3089.0
3 NaN NaN 96998.0 34603.0

Dropping all left NAs in a dataframe and left shifting the cleaned rows

I don't think you can do this without a loop.

dat <- as.data.frame(rbind(c(NA,NA,1,3,5,NA,NA,NA), c(NA,1:3,6:8,NA), c(1:7,NA)))
dat[3,2] <- NA

# V1 V2 V3 V4 V5 V6 V7 V8
# 1 NA NA 1 3 5 NA NA NA
# 2 NA 1 2 3 6 7 8 NA
# 3 1 NA 3 4 5 6 7 NA

t(apply(dat, 1, function(x) {
if (is.na(x[1])) {
y <- x[-seq_len(which.min(is.na(x))-1)]
length(y) <- length(x)
y
} else x
}))

# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#[1,] 1 3 5 NA NA NA NA NA
#[2,] 1 2 3 6 7 8 NA NA
#[3,] 1 NA 3 4 5 6 7 NA

Then turn the matrix into a data.frame if you must.

Pandas: shifting columns depending on if NaN or not

Use:

#for each row remove NaNs and create new Series - rows in final df 
df1 = df.apply(lambda x: pd.Series(x.dropna().values), axis=1)
#if possible different number of columns like original df is necessary reindex
df1 = df1.reindex(columns=range(len(df.columns)))
#assign original columns names
df1.columns = df.columns
print (df1)
phone_number_1_clean phone_number_2_clean phone_number_3_clean
0 8546987 NaN NaN
1 8316589 8751369 NaN
2 4569874 2645981 NaN

Or:

s = df.stack()
s.index = [s.index.get_level_values(0), s.groupby(level=0).cumcount()]

df1 = s.unstack().reindex(columns=range(len(df.columns)))
df1.columns = df.columns
print (df1)
phone_number_1_clean phone_number_2_clean phone_number_3_clean
0 8546987 NaN NaN
1 8316589 8751369 NaN
2 4569874 2645981 NaN

Or a bit changed justify function:

def justify(a, invalid_val=0, axis=1, side='left'):    
"""
Justifies a 2D array

Parameters
----------
A : ndarray
Input array to be justified
axis : int
Axis along which justification is to be made
side : str
Direction of justification. It could be 'left', 'right', 'up', 'down'
It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.

"""

if invalid_val is np.nan:
mask = pd.notnull(a) #changed to pandas notnull
else:
mask = a!=invalid_val
justified_mask = np.sort(mask,axis=axis)
if (side=='up') | (side=='left'):
justified_mask = np.flip(justified_mask,axis=axis)
out = np.full(a.shape, invalid_val, dtype=object)
if axis==1:
out[justified_mask] = a[mask]
else:
out.T[justified_mask.T] = a.T[mask.T]
return out

df = pd.DataFrame(justify(df.values, invalid_val=np.nan),  
index=df.index, columns=df.columns)
print (df)
phone_number_1_clean phone_number_2_clean phone_number_3_clean
0 8546987 NaN NaN
1 8316589 8751369 NaN
2 4569874 2645981 NaN

Performance:

#3k rows
df = pd.concat([df] * 1000, ignore_index=True)

In [442]: %%timeit
...: df1 = df.apply(lambda x: pd.Series(x.dropna().values), axis=1)
...: #if possible different number of columns like original df is necessary reindex
...: df1 = df1.reindex(columns=range(len(df.columns)))
...: #assign original columns names
...: df1.columns = df.columns
...:
1.17 s ± 10.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [443]: %%timeit
...: s = df.stack()
...: s.index = [s.index.get_level_values(0), s.groupby(level=0).cumcount()]
...:
...: df1 = s.unstack().reindex(columns=range(len(df.columns)))
...: df1.columns = df.columns
...:
...:
5.88 ms ± 74.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [444]: %%timeit
...: pd.DataFrame(justify(df.values, invalid_val=np.nan),
index=df.index, columns=df.columns)
...:
941 µs ± 131 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Shifting the last non-NA value by id

Here's a data.table solution with an assist from zoo:

library(data.table)
library(zoo)

DT[, `:=`(day_shift = shift(day),
yj = shift(Consumption)),
by = id]

#make the NA yj records NA for the days
DT[is.na(yj), day_shift := NA_integer_]

#fill the DT with the last non-NA value
DT[,
`:=`(day_shift = na.locf(day_shift, na.rm = F),
yj = zoo::na.locf(yj, na.rm = F)),
by = id]

# finally calculate j
DT[, j:= day - day_shift]

# you can clean up the ordering or remove columns later
DT

day Consumption id day_shift yj j
1: 1 5 1 NA NA NA
2: 2 9 2 NA NA NA
3: 3 10 3 NA NA NA
4: 4 2 1 1 5 3
5: 5 NA 1 4 2 1
6: 6 NA 2 2 9 4
7: 7 NA 2 2 9 5
8: 8 NA 1 4 2 4

How to move cells with a value row-wise to the left in a dataframe

yourdata[]<-t(apply(yourdata,1,function(x){
c(x[!is.na(x)],x[is.na(x)])}))

should work : for each row, it replaces the row by a vector that consists of, first, the value that are not NA, then the NA values.



Related Topics



Leave a reply



Submit