How to move cells with a value row-wise to the left in a dataframe
yourdata[]<-t(apply(yourdata,1,function(x){
c(x[!is.na(x)],x[is.na(x)])}))
should work : for each row, it replaces the row by a vector that consists of, first, the value that are not NA, then the NA values.
Python Dataframe. Move rows values left according index of rows
Another way by using a simple loop to shift
the values in every row, and then usefillna
to replace NA values with 0:
for i in range(len(df)):
df.iloc[i,:] = df.iloc[i,:].shift(-i)
df.fillna(0, inplace=True)
Output:
>>> df
one two three four
0 20 15.0 10.0 5.0
1 15 10.0 5.0 0.0
2 10 5.0 0.0 0.0
3 5 0.0 0.0 0.0
Shift pd.dataframe's rows depending of value in a specific cells
If use shift
in pandas by default, then last columns are lost. So is necessary first add new columns filled by missing values - number of columns depends of difference of non 2017 values.
df = df.set_index('Year')
diff = np.setdiff1d(df.index.dropna().unique(), [2017]).astype(int)
print (diff)
[2018 2019]
df = df.assign(**{f'new{x}':np.nan for x in range(max(diff-2017))})
Then you can use shift
in loop and filter by DataFrame.loc
by years in index:
for y in diff:
df.loc[y, :] = df.astype(float).shift(y - 2017, axis=1).loc[y, :]
Last replace missing values, cast to integers and convert index to columns:
df = df.fillna(0).astype(int).reset_index()
print (df)
Year B C D E new0 new1
0 2017 4 0 0 5 0 0
1 2019 0 0 5 0 1 3
2 2018 0 4 0 3 6 0
3 2017 5 0 5 9 0 0
4 2017 5 0 7 2 0 0
5 2017 4 7 1 4 0 0
EDIT:
Solution with another column:
df = pd.DataFrame({
'new':list('abcdef'),
'Year':[2017, 2019, 2018, 2017, 2017, 2017],
'B':[4,5,4,5,5,4],
'C':[0,0,0,0,0,7],
'D':[0,1,3,5,7,1],
'E':[5,3,6,9,2,4]})
print (df)
new Year B C D E
0 a 2017 4 0 0 5
1 b 2019 5 0 1 3
2 c 2018 4 0 3 6
3 d 2017 5 0 5 9
4 e 2017 5 0 7 2
5 f 2017 4 7 1 4
df = df.set_index(['new','Year'])
diff = np.setdiff1d(df.index.get_level_values('Year').dropna().unique(), [2017]).astype(int)
print (diff)
[2018 2019]
df1 = pd.DataFrame(index=df.index, columns=['new{}'.format(x) for x in range(max(diff-2017))])
df = pd.concat([df, df1], axis=1)
print (df)
B C D E new0 new1
new Year
a 2017 4 0 0 5 NaN NaN
b 2019 5 0 1 3 NaN NaN
c 2018 4 0 3 6 NaN NaN
d 2017 5 0 5 9 NaN NaN
e 2017 5 0 7 2 NaN NaN
f 2017 4 7 1 4 NaN NaN
for y in diff:
idx = pd.IndexSlice
df.loc[idx[:, y], :] = df.astype(float).shift(y - 2017, axis=1).loc[idx[:, y], :]
df = df.fillna(0).astype(int).reset_index()
print (df)
new Year B C D E new0 new1
0 a 2017 4 0 0 5 0 0
1 b 2019 0 0 5 0 1 3
2 c 2018 0 4 0 3 6 0
3 d 2017 5 0 5 9 0 0
4 e 2017 5 0 7 2 0 0
5 f 2017 4 7 1 4 0 0
Pandas. A pretty way to delete cell and shift left others in row?
You need select rows for shifting, e.g. here is tested if first 2 values in X1
are numeric by str[:2]
and Series.str.isnumeric
, invert mask by ~
, so only for non numeric value use DataFrame.shift
:
m = ~df['X1'].str[:2].str.isnumeric()
Another idea for mask, thank you @Manakin is test if datetimes in format HH:MM
:
m = pd.to_datetime(df['X1'],format='%H:%M',errors='coerce').isna()
Also if want test numeric 2 numbers with :
with length 2
:
m = ~df['X1'].str.contains('^\d{2}:\d{2}$')
df[m] = df[m].shift(-1, axis=1)
print(df)
X1 X2 X3
0 12:40 anytext anytext
1 12:44 anytext NaN
2 14:06 anytext NaN
3 15:44 anytext anytext
4 16:01 anytext anytext
If need modify all columns after X1
one idea:
df=pd.DataFrame({'X0':['anytext','anytext','anytext','anytext','anytext'],
'X1':['12:40','boss','engen','15:44','16:01'],
'X2':['anytext','12:44','14:06','anytext','anytext'],
'X3':['anytext','anytext','anytext','anytext','anytext']})
m = ~df['X1'].str.contains('^\d{2}:\d{2}$')
df.loc[m, 'X1':] =df.loc[m, 'X1':].shift(-1, axis=1)
print(df)
X0 X1 X2 X3
0 anytext 12:40 anytext anytext
1 anytext 12:44 anytext NaN
2 anytext 14:06 anytext NaN
3 anytext 15:44 anytext anytext
4 anytext 16:01 anytext anytext
Another with convert X0
to index:
df = df.set_index('X0')
m = ~df['X1'].str.contains('^\d{2}:\d{2}$')
df[m] = df[m].shift(-1, axis=1)
df = df.reset_index()
print(df)
X0 X1 X2 X3
0 anytext 12:40 anytext anytext
1 anytext 12:44 anytext NaN
2 anytext 14:06 anytext NaN
3 anytext 15:44 anytext anytext
4 anytext 16:01 anytext anytext
Is there a way to shift pandas data frame first row only one cell to the right?
Yes, you can do something like this shift he first row of a dataframe to the right one column. Use iloc
to select this row all columns which returns a pd.Series, then use shift
to shift the values of this series one position and assign this newly shifted series back to the first row of the dataframe.
df.iloc[0, :] = df.iloc[0, :].shift()
MCVE:
import pandas as pd
import numpy as np
df = pd.DataFrame([[*'ABCD']+[np.nan],[1,2,3,4,5],[5,6,7,9,10],[11,12,13,14,15]])
df
# Input DataFrame
# 0 1 2 3 4
# 0 A B C D NaN
# 1 1 2 3 4 5.0
# 2 5 6 7 9 10.0
# 3 11 12 13 14 15.0
df.iloc[0, :] = df.iloc[0, :].shift()
df
# Output DataFrame
# 0 1 2 3 4
# 0 NaN A B C D
# 1 1 2 3 4 5
# 2 5 6 7 9 10
# 3 11 12 13 14 15
Remove all cells containing 0 and move values to the left
you can also do:
read.table(text=gsub('\\b0\\b','',do.call(paste,df)),fill=T,col.names = names(df))
Item X35 X45 X55 X65 X75 X85 X95 X100
1 1 35 85 NA NA NA NA NA NA
2 2 55 65 NA NA NA NA NA NA
3 3 75 85 NA NA NA NA NA NA
4 4 45 100 NA NA NA NA NA NA
5 5 85 95 NA NA NA NA NA NA
Move non-empty cells to the left in pandas DataFrame
Here's what I did:
I unstacked your dataframe into a longer format, then grouped by the name column. Within each group, I drop the NaNs, but then reindex to the full h1 thought h4 set, thus re-creating your NaNs to the right.
from io import StringIO
import pandas
def defragment(x):
values = x.dropna().values
return pandas.Series(values, index=df.columns[:len(values)])
datastring = StringIO("""\
Name h1 h2 h3 h4
A 1 nan 2 3
B nan nan 1 3
C 1 3 2 nan""")
df = pandas.read_table(datastring, sep='\s+').set_index('Name')
long_index = pandas.MultiIndex.from_product([df.index, df.columns])
print(
df.stack()
.groupby(level='Name')
.apply(defragment)
.reindex(long_index)
.unstack()
)
And so I get:
h1 h2 h3 h4
A 1 2 3 NaN
B 1 3 NaN NaN
C 1 3 2 NaN
Using R to shift values to the left of data.frame
We can loop over the rows and concatenate the non-NA elements followed by the NA elements and assign it back to the dataset
df[] <- t(apply(df, 1, function(x) c(x[!is.na(x)], x[is.na(x)])))
df
# A B C
#1 yellow purple <NA>
#2 yellow <NA> <NA>
#3 orange yellow <NA>
#4 orange brown <NA>
#5 brown purple <NA>
#6 yellow purple pink
#7 purple green pink
#8 yellow pink green
#9 purple orange <NA>
#10 purple brown <NA>
data
df <- structure(list(A = c("yellow", NA, "orange", "orange", NA, "yellow",
"purple", "yellow", "purple", "purple"), B = c("purple", NA,
"yellow", NA, "brown", "purple", "green", "pink", "orange", NA
), C = c(NA, "yellow", NA, "brown", "purple", "pink", "pink",
"green", NA, "brown")), .Names = c("A", "B", "C"), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")
Python Pandas: How to move one row to the first row of a Dataframe?
Reindexing is probably the optimal solution for putting the rows in any new order in 1 apparent step, except it may require producing a new DataFrame which could be prohibitively large.
For example
import pandas as pd
t = pd.read_csv('table.txt',sep='\s+')
t
Out[81]:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
2 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
t.index
Out[82]: Int64Index([0, 1, 2, 3], dtype='int64')
t2 = t.reindex([2,0,1,3]) # cannot do this in place
t2
Out[93]:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
2 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
0 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
Now the index can be set back to range(4) without reindexing:
t2.index=range(4)
Out[102]:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
1 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
2 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
It can also be done with 'tuple switching' and row selection as a basic mechanism and without creating a new DataFrame. For example:
import pandas as pd
t = pd.read_csv('table.txt',sep='\s+')
t.ix[1], t.ix[2] = t.ix[2], t.ix[1]
t.ix[0], t.ix[1] = t.ix[1], t.ix[0]
t
Out[96]:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
1 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
2 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
Another in place method sets the DataFrame index for the desired ordering so that, for example, the 3rd row gets index 0, etc. and then the DataFrame is sorted in place. It's encapsulated in the following function that assumes the rows are indexed with some range(m) for positive integer m and the DataFrame is simply indexed (no MultiIndex) as in the example provided in the question.
def putfirst(n,df):
if not isinstance(n, int):
print 'error: 1st arg must be an int'
return
if n < 1:
print 'error: 1st arg must be an int > 0'
return
if n == 1:
print 'nothing to do when first arg == 1'
return
if n > len(df):
print 'error: n exceeds the number of rows in the DataFrame'
return
df.index = range(1,n) + [0] + range(n,df.index[-1]+1)
df.sort(inplace=True)
The arguments of putfirst are n, which is the ordinal position of the row to relocate to the first row position, so that if the 3rd row is to be so relocated then n = 3; and df is the DataFrame containing the row to be relocated.
Here is a demo:
import pandas as pd
df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
df.set_index("a") # ineffective without assignment or inplace=True
Out[182]:
b c d e
a
1.394072 -1.076742 -0.192466 -0.871188 0.420852
-1.211411 -0.258867 -0.581647 -1.260421 0.464575
-1.070241 0.804223 -0.156736 2.010390 -0.887104
-0.977936 -0.267217 0.483338 -0.400333 0.449880
0.399594 -0.151575 -2.557934 0.160807 0.076525
-0.297204 -1.294274 -0.885180 -0.187497 -0.493560
-0.115413 -0.350745 0.044697 -0.897756 0.890874
-1.151185 -2.612303 1.141250 -0.867136 0.383583
-0.437030 0.347489 -1.230179 0.571078 0.060061
-0.225524 1.349726 1.350300 -0.386653 0.865990
df
Out[183]:
a b c d e
0 1.394072 -1.076742 -0.192466 -0.871188 0.420852
1 -1.211411 -0.258867 -0.581647 -1.260421 0.464575
2 -1.070241 0.804223 -0.156736 2.010390 -0.887104
3 -0.977936 -0.267217 0.483338 -0.400333 0.449880
4 0.399594 -0.151575 -2.557934 0.160807 0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745 0.044697 -0.897756 0.890874
7 -1.151185 -2.612303 1.141250 -0.867136 0.383583
8 -0.437030 0.347489 -1.230179 0.571078 0.060061
9 -0.225524 1.349726 1.350300 -0.386653 0.865990
df.index
Out[184]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')
putfirst(3,df)
df
Out[186]:
a b c d e
0 -1.070241 0.804223 -0.156736 2.010390 -0.887104
1 1.394072 -1.076742 -0.192466 -0.871188 0.420852
2 -1.211411 -0.258867 -0.581647 -1.260421 0.464575
3 -0.977936 -0.267217 0.483338 -0.400333 0.449880
4 0.399594 -0.151575 -2.557934 0.160807 0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745 0.044697 -0.897756 0.890874
7 -1.151185 -2.612303 1.141250 -0.867136 0.383583
8 -0.437030 0.347489 -1.230179 0.571078 0.060061
9 -0.225524 1.349726 1.350300 -0.386653 0.865990
Moving data from right to left column in a tibble
Using dplyr and tidyr. Reshape from wide to long, exclude "^RSY"
and NA
diagnosis, reshape long to wide.
library(dplyr)
library(tidyr)
gather(data, key = "k", value = "v", -id) %>%
filter(!(grepl("^[R|S|Y]", v) | is.na(v))) %>%
group_by(id) %>%
mutate(diagN = paste0("diagnosis_", row_number())) %>%
select(-k) %>%
spread(key = "diagN", value = "v") %>%
ungroup()
# # A tibble: 10 x 3
# id diagnosis_1 diagnosis_2
# <int> <chr> <chr>
# 1 1 F32 F40
# 2 2 F431 NA
# 3 3 F65 NA
# 4 4 F431 NA
# 5 5 F11 F19
# 6 6 F60 NA
# 7 7 G35 NA
# 8 8 F32 NA
# 9 9 F32 F11
# 10 10 Z032 NA
Related Topics
Calculate the Mean For Each Column of a Matrix in R
Dummify Character Column and Find Unique Values
What Is the Purpose of Setting a Key in Data.Table
Read Multiple CSV Files into Separate Data Frames
Combine Two or More Columns in a Dataframe into a New Column With a New Name
Dplyr::Select Function Clashes With Mass::Select
Pass Arguments to Dplyr Functions
How to Use Facets With a Dual Y-Axis Ggplot
How to Format a Number as Percentage in R
Pasting Two Vectors With Combinations of All Vectors' Elements
Calculate Cumulative Sum (Cumsum) by Group
Reasons For Using the Set.Seed Function
How Can One Work Fully Generically in Data.Table in R With Column Names in Variables
Capitalize the First Letter of Both Words in a Two Word String
What Are the "Standard Unambiguous Date" Formats For String-To-Date Conversion in R
Using Regex in R to Find Strings as Whole Words (But Not Strings as Part of Words)
Why Do I Get "Warning Longer Object Length Is Not a Multiple of Shorter Object Length"