Pandas - Replace values based on index
Use loc
:
df.loc[0:15,'A'] = 16
print (df)
A B
0 16 45
1 16 5
2 16 97
3 16 58
4 16 26
5 16 87
6 16 51
7 16 17
8 16 39
9 16 73
10 16 94
11 16 69
12 16 57
13 16 24
14 16 43
15 16 77
16 41 0
17 3 21
18 0 98
19 45 39
20 66 62
21 8 53
22 69 47
23 48 53
Solution with ix
is deprecated.
Pandas - Replace values based on index and not in index
np.where
I'm making an assumption that there is a better way to do both 'Yes'
and 'No'
at the same time. If you truly just want to fill in the 'No'
after you've already got the 'Yes'
then refer to Fatemehhh's answer
df.loc[:, 'A'] = np.where(df.index.isin(dl), 'Yes', 'No')
Experimental Section
Not meant for actual suggestions
f = dl.__contains__
g = ['No', 'Yes'].__getitem__
df.loc[:, 'A'] = [*map(g, map(f, df.index))]
df
A
0 Yes
1 No
2 Yes
3 Yes
4 Yes
5 No
6 No
7 Yes
8 No
9 No
Replace value in column by value in list by index
Dont use list
like variable name, because builtin
(python code word).
Then use Series.map
with enumerate in Series.mask
:
L = ['a', 'b', 'c', 'd']
df['idx'] = df['idx'].mask(df['idx'] >=1, df['idx'].map(dict(enumerate(L))))
print (df)
id idx
0 A 0
1 B 0
2 C c
3 D b
Similar idea is processing only matched rows by mask:
L = ['a', 'b', 'c', 'd']
m = df['idx'] >=1
df.loc[m,'idx'] = df.loc[m,'idx'].map(dict(enumerate(L)))
print (df)
id idx
0 A 0
1 B 0
2 C c
3 D b
Python Pandas - Replacing values of a part of data frame column based on index
From the way you used
df1.head(first_idx)
I assume your indices are numeric. Thus, a simple
df1.iloc[first_idx + 1:, :]['Column1'].replace({'10': '3'}, inplace=True)
Should do.
Replace matching values from one dataframe with index value from another dataframe
TRY:
df1['fruit'] = df1.fruit.map(dict(df2[['fruit','id']].values))
Replace values in a dataset based off an index of values in another using base R
You can use match
to change content using a lookup table.
i <- startsWith(colnames(x), "diagnosis_")
x[,i] <- y[match(unlist(x[,i]), y[,1]),2]
x
# ID diagnosis_1 diagnosis_2 diagnosis_3 diagnosis_4 diagnosis_5 diagnosis_6 diagnosis_7 diagnosis_8 diagnosis_9 diagnosis_10 diagnosis_11 diagnosis_12 diagnosis_13 age
#1 123 1 3 NA NA NA NA NA NA NA NA NA NA NA 54
#2 5345 2 3 1 NA NA NA NA NA NA NA NA NA NA 65
#3 234 3 NA NA NA NA NA NA NA NA NA NA NA NA 23
#4 453 4 1 NA NA NA NA NA NA NA NA NA NA NA 22
#5 3656 5 NA NA NA NA NA NA NA NA NA NA NA NA 33
#6 345 1 4 3 1 NA NA NA NA NA NA NA NA NA 77
And in case the lookup has a the given different structure:
zz <- strsplit(z, "[, ]+")
zz <- setNames(rep(seq_along(zz), lengths(zz)), unlist(zz))
i <- startsWith(colnames(x), "diagnosis_")
x[,i] <- zz[unlist(x[,i])]
In case codes are not found and you don't want to set them to NA.
i <- startsWith(colnames(x), "diagnosis_")
j <- match(unlist(x[,i]), y[,1])
k <- !is.na(j)
tt <- unlist(x[,i])
tt[k] <- y[j[k],2]
x[,i] <- tt
rm(i, j, k, tt)
Data:
x <- structure(list(ID = c(123, 5345, 234, 453, 3656, 345), diagnosis_1 = c("B657",
"B658", "B659", "B660", "B661", "B662"), diagnosis_2 = c("F8827",
"G432", NA, "B657", NA, "H8940"), diagnosis_3 = c(NA, "B657",
NA, NA, NA, "G432"), diagnosis_4 = c(NA, NA, NA, NA, NA, "B657"
), diagnosis_5 = c(NA, NA, NA, NA, NA, NA), diagnosis_6 = c(NA,
NA, NA, NA, NA, NA), diagnosis_7 = c(NA, NA, NA, NA, NA, NA),
diagnosis_8 = c(NA, NA, NA, NA, NA, NA), diagnosis_9 = c(NA,
NA, NA, NA, NA, NA), diagnosis_10 = c(NA, NA, NA, NA, NA,
NA), diagnosis_11 = c(NA, NA, NA, NA, NA, NA), diagnosis_12 = c(NA,
NA, NA, NA, NA, NA), diagnosis_13 = c(NA, NA, NA, NA, NA,
NA), age = c(54, 65, 23, 22, 33, 77)), row.names = c(NA,
-6L), class = "data.frame")
y <- read.table(text="B657 1
B658 2
B659 3
B660 4
B661 5
B662 1
F8827 3
G432 3
H8940 4")
z <- readLines(con=textConnection("B657, B662
B658
B659, F8827, G432
B660 H8940
B661"))
Replace value of column by index value
Assuming your data looks like the following:
id 1-t 2-t 3-t
Index
2022-01-06 0 5 4 2
2022-01-05 1 3 5 4
2022-01-04 2 4 3 5
2022-01-03 3 3 0 1
2022-01-02 4 2 3 0
2022-01-01 5 1 2 4
i.e. what you labeled Index in your table above is the actual index of the Pandas dataframe, all you need to do is use the Dataframe.filter
routine like so:
for col in data.filter(like='-t'):
data[col] = data.index[data[col]]
print(data)
# id 1-t 2-t 3-t
#Index
#2022-01-06 0 2022-01-01 2022-01-02 2022-01-04
#2022-01-05 1 2022-01-03 2022-01-01 2022-01-02
#2022-01-04 2 2022-01-02 2022-01-03 2022-01-01
#2022-01-03 3 2022-01-03 2022-01-06 2022-01-05
#2022-01-02 4 2022-01-04 2022-01-03 2022-01-06
#2022-01-01 5 2022-01-05 2022-01-04 2022-01-02
there might even be a way to replace all the columns at once. In case Index
is just the name of a column, replace data.index
by data.Index
.
EDIT: I forgot to use the index value in the column. Should work now.
Replacing values in n-dimensional tensor given indices from np.argwhere()
In [486]: data = np.random.randn(3,3,3)
With this creation all terms are finite, so nonzero
returns a tuple of (27,) arrays:
In [487]: idx = np.nonzero(np.isfinite(data))
In [488]: len(idx)
Out[488]: 3
In [489]: idx[0].shape
Out[489]: (27,)
argwhere
produces the same numbers, but in a 2d array:
In [490]: idxs = np.argwhere(np.isfinite(data))
In [491]: idxs.shape
Out[491]: (27, 3)
So you select a subset.
In [492]: dropidxs = idxs[np.random.choice(idxs.shape[0], 3, replace=False)]
In [493]: dropidxs.shape
Out[493]: (3, 3)
In [494]: dropidxs
Out[494]:
array([[1, 1, 0],
[2, 1, 2],
[2, 1, 1]])
We could have generated the same subset by x = np.random.choice(...)
, and applying that x
to the arrays in idxs
. But in this case, the argwhere array is easier to work with.
But to apply that array to indexing we still need a tuple of arrays:
In [495]: tup = tuple([dropidxs[:,i] for i in range(3)])
In [496]: tup
Out[496]: (array([1, 2, 2]), array([1, 1, 1]), array([0, 2, 1]))
In [497]: data[tup]
Out[497]: array([-0.27965058, 1.2981397 , 0.4501406 ])
In [498]: data[tup]=np.nan
In [499]: data
Out[499]:
array([[[-0.4899279 , 0.83352547, -1.03798762],
[-0.91445783, 0.05777183, 0.19494065],
[ 0.6835925 , -0.47846423, 0.13513958]],
[[-0.08790631, 0.30224828, -0.39864576],
[ nan, -0.77424244, 1.4788093 ],
[ 0.41915952, -0.09335664, -0.47359613]],
[[-0.40281937, 1.64866377, -0.40354504],
[ 0.74884493, nan, nan],
[ 0.13097487, -1.63995208, -0.98857852]]])
Or we could index with:
In [500]: data[dropidxs[:,0],dropidxs[:,1],dropidxs[:,2]]
Out[500]: array([nan, nan, nan])
Actually, a transpose of dropidxs
might be be more convenient:
In [501]: tdrop = dropidxs.T
In [502]: tuple(tdrop)
Out[502]: (array([1, 2, 2]), array([1, 1, 1]), array([0, 2, 1]))
In [503]: data[tuple(tdrop)]
Out[503]: array([nan, nan, nan])
Sometimes we can use *
to expand a list/array into a tuple, but not when indexing:
In [504]: data[*tdrop]
File "<ipython-input-504-cb619d907adb>", line 1
data[*tdrop]
^
SyntaxError: invalid syntax
but we can create the tuple with:
In [506]: data[(*tdrop,)]
Out[506]: array([nan, nan, nan])
Related Topics
Pandas Update and Add Rows One Dataframe With Key Column in Another Dataframe
How to Install Pypdf2 Module Using Windows
How to Make a Discord Bot Leave a Server from a Command in Another Server
How to Extract the Entire Row and Columns When Condition Met in Numpy Array
How to Write Python Array (Data = []) to Excel
Regex to Append Some Characters in a Certain Position
Determining Neighbours of Cell Two Dimensional List
Pythone :How to Use Dataframe Output in Email Body as Text
Converting Two Lists into a Matrix
How to Write to an Existing Excel File Without Overwriting Data (Using Pandas)
Retrieving Subfolders Names in S3 Bucket from Boto3
Python 3D Polynomial Surface Fit, Order Dependent
How to Skip Blank Line While Reading CSV File Using Python
Finding Non-Numeric Rows in Dataframe in Pandas
Python | Count Number of False Statements in 3 Rows
How to Test If a Column Exists and Is Not Null in a Dataframe