Replacing Values in a Dataframe for Given Indices

Pandas - Replace values based on index

Use loc:

df.loc[0:15,'A'] = 16
print (df)
A B
0 16 45
1 16 5
2 16 97
3 16 58
4 16 26
5 16 87
6 16 51
7 16 17
8 16 39
9 16 73
10 16 94
11 16 69
12 16 57
13 16 24
14 16 43
15 16 77
16 41 0
17 3 21
18 0 98
19 45 39
20 66 62
21 8 53
22 69 47
23 48 53

Solution with ix is deprecated.

Pandas - Replace values based on index and not in index

np.where

I'm making an assumption that there is a better way to do both 'Yes' and 'No' at the same time. If you truly just want to fill in the 'No' after you've already got the 'Yes' then refer to Fatemehhh's answer

df.loc[:, 'A'] = np.where(df.index.isin(dl), 'Yes', 'No')

Experimental Section

Not meant for actual suggestions

f = dl.__contains__
g = ['No', 'Yes'].__getitem__
df.loc[:, 'A'] = [*map(g, map(f, df.index))]

df

A
0 Yes
1 No
2 Yes
3 Yes
4 Yes
5 No
6 No
7 Yes
8 No
9 No

Replace value in column by value in list by index

Dont use list like variable name, because builtin (python code word).

Then use Series.map with enumerate in Series.mask:

L  = ['a', 'b', 'c', 'd'] 
df['idx'] = df['idx'].mask(df['idx'] >=1, df['idx'].map(dict(enumerate(L))))
print (df)
id idx
0 A 0
1 B 0
2 C c
3 D b

Similar idea is processing only matched rows by mask:

L  = ['a', 'b', 'c', 'd'] 
m = df['idx'] >=1
df.loc[m,'idx'] = df.loc[m,'idx'].map(dict(enumerate(L)))
print (df)
id idx
0 A 0
1 B 0
2 C c
3 D b

Python Pandas - Replacing values of a part of data frame column based on index

From the way you used

df1.head(first_idx)

I assume your indices are numeric. Thus, a simple

df1.iloc[first_idx + 1:, :]['Column1'].replace({'10': '3'}, inplace=True)

Should do.

Replace matching values from one dataframe with index value from another dataframe

TRY:

df1['fruit'] = df1.fruit.map(dict(df2[['fruit','id']].values))

Replace values in a dataset based off an index of values in another using base R

You can use match to change content using a lookup table.

i <- startsWith(colnames(x), "diagnosis_")
x[,i] <- y[match(unlist(x[,i]), y[,1]),2]
x
# ID diagnosis_1 diagnosis_2 diagnosis_3 diagnosis_4 diagnosis_5 diagnosis_6 diagnosis_7 diagnosis_8 diagnosis_9 diagnosis_10 diagnosis_11 diagnosis_12 diagnosis_13 age
#1 123 1 3 NA NA NA NA NA NA NA NA NA NA NA 54
#2 5345 2 3 1 NA NA NA NA NA NA NA NA NA NA 65
#3 234 3 NA NA NA NA NA NA NA NA NA NA NA NA 23
#4 453 4 1 NA NA NA NA NA NA NA NA NA NA NA 22
#5 3656 5 NA NA NA NA NA NA NA NA NA NA NA NA 33
#6 345 1 4 3 1 NA NA NA NA NA NA NA NA NA 77

And in case the lookup has a the given different structure:

zz <- strsplit(z, "[, ]+")
zz <- setNames(rep(seq_along(zz), lengths(zz)), unlist(zz))
i <- startsWith(colnames(x), "diagnosis_")
x[,i] <- zz[unlist(x[,i])]

In case codes are not found and you don't want to set them to NA.

i <- startsWith(colnames(x), "diagnosis_")
j <- match(unlist(x[,i]), y[,1])
k <- !is.na(j)
tt <- unlist(x[,i])
tt[k] <- y[j[k],2]
x[,i] <- tt
rm(i, j, k, tt)

Data:

x <- structure(list(ID = c(123, 5345, 234, 453, 3656, 345), diagnosis_1 = c("B657", 
"B658", "B659", "B660", "B661", "B662"), diagnosis_2 = c("F8827",
"G432", NA, "B657", NA, "H8940"), diagnosis_3 = c(NA, "B657",
NA, NA, NA, "G432"), diagnosis_4 = c(NA, NA, NA, NA, NA, "B657"
), diagnosis_5 = c(NA, NA, NA, NA, NA, NA), diagnosis_6 = c(NA,
NA, NA, NA, NA, NA), diagnosis_7 = c(NA, NA, NA, NA, NA, NA),
diagnosis_8 = c(NA, NA, NA, NA, NA, NA), diagnosis_9 = c(NA,
NA, NA, NA, NA, NA), diagnosis_10 = c(NA, NA, NA, NA, NA,
NA), diagnosis_11 = c(NA, NA, NA, NA, NA, NA), diagnosis_12 = c(NA,
NA, NA, NA, NA, NA), diagnosis_13 = c(NA, NA, NA, NA, NA,
NA), age = c(54, 65, 23, 22, 33, 77)), row.names = c(NA,
-6L), class = "data.frame")
y <- read.table(text="B657 1
B658 2
B659 3
B660 4
B661 5
B662 1
F8827 3
G432 3
H8940 4")
z <- readLines(con=textConnection("B657, B662
B658
B659, F8827, G432
B660 H8940
B661"))

Replace value of column by index value

Assuming your data looks like the following:

            id  1-t  2-t  3-t
Index
2022-01-06 0 5 4 2
2022-01-05 1 3 5 4
2022-01-04 2 4 3 5
2022-01-03 3 3 0 1
2022-01-02 4 2 3 0
2022-01-01 5 1 2 4

i.e. what you labeled Index in your table above is the actual index of the Pandas dataframe, all you need to do is use the Dataframe.filter routine like so:

for col in data.filter(like='-t'):
data[col] = data.index[data[col]]

print(data)
# id 1-t 2-t 3-t
#Index
#2022-01-06 0 2022-01-01 2022-01-02 2022-01-04
#2022-01-05 1 2022-01-03 2022-01-01 2022-01-02
#2022-01-04 2 2022-01-02 2022-01-03 2022-01-01
#2022-01-03 3 2022-01-03 2022-01-06 2022-01-05
#2022-01-02 4 2022-01-04 2022-01-03 2022-01-06
#2022-01-01 5 2022-01-05 2022-01-04 2022-01-02

there might even be a way to replace all the columns at once. In case Index is just the name of a column, replace data.index by data.Index.

EDIT: I forgot to use the index value in the column. Should work now.

Replacing values in n-dimensional tensor given indices from np.argwhere()

In [486]: data = np.random.randn(3,3,3)

With this creation all terms are finite, so nonzero returns a tuple of (27,) arrays:

In [487]: idx = np.nonzero(np.isfinite(data))
In [488]: len(idx)
Out[488]: 3
In [489]: idx[0].shape
Out[489]: (27,)

argwhere produces the same numbers, but in a 2d array:

In [490]: idxs = np.argwhere(np.isfinite(data))
In [491]: idxs.shape
Out[491]: (27, 3)

So you select a subset.

In [492]: dropidxs = idxs[np.random.choice(idxs.shape[0], 3, replace=False)]
In [493]: dropidxs.shape
Out[493]: (3, 3)
In [494]: dropidxs
Out[494]:
array([[1, 1, 0],
[2, 1, 2],
[2, 1, 1]])

We could have generated the same subset by x = np.random.choice(...), and applying that x to the arrays in idxs. But in this case, the argwhere array is easier to work with.

But to apply that array to indexing we still need a tuple of arrays:

In [495]: tup = tuple([dropidxs[:,i] for i in range(3)])
In [496]: tup
Out[496]: (array([1, 2, 2]), array([1, 1, 1]), array([0, 2, 1]))
In [497]: data[tup]
Out[497]: array([-0.27965058, 1.2981397 , 0.4501406 ])
In [498]: data[tup]=np.nan
In [499]: data
Out[499]:
array([[[-0.4899279 , 0.83352547, -1.03798762],
[-0.91445783, 0.05777183, 0.19494065],
[ 0.6835925 , -0.47846423, 0.13513958]],

[[-0.08790631, 0.30224828, -0.39864576],
[ nan, -0.77424244, 1.4788093 ],
[ 0.41915952, -0.09335664, -0.47359613]],

[[-0.40281937, 1.64866377, -0.40354504],
[ 0.74884493, nan, nan],
[ 0.13097487, -1.63995208, -0.98857852]]])

Or we could index with:

In [500]: data[dropidxs[:,0],dropidxs[:,1],dropidxs[:,2]]
Out[500]: array([nan, nan, nan])

Actually, a transpose of dropidxs might be be more convenient:

In [501]: tdrop = dropidxs.T
In [502]: tuple(tdrop)
Out[502]: (array([1, 2, 2]), array([1, 1, 1]), array([0, 2, 1]))
In [503]: data[tuple(tdrop)]
Out[503]: array([nan, nan, nan])

Sometimes we can use * to expand a list/array into a tuple, but not when indexing:

In [504]: data[*tdrop]
File "<ipython-input-504-cb619d907adb>", line 1
data[*tdrop]
^
SyntaxError: invalid syntax

but we can create the tuple with:

In [506]: data[(*tdrop,)]
Out[506]: array([nan, nan, nan])


Related Topics



Leave a reply



Submit