subset a column in data frame based on another data frame/list
We can use %in%
to get a logical vector and subset
the rows of the 'table1' based on that.
subset(table1, gene_ID %in% accessions40$V1)
A better option would be data.table
library(data.table)
setDT(table1)[gene_ID %chin% accessions40$V1]
Or use filter
from dplyr
library(dplyr)
table1 %>%
filter(gene_ID %in% accessions40$V1)
R: Filter a dataframe based on another dataframe
If you are only wanting to keep the rownames in e
that occur in pf
(or that don't occur, then use !rownames(e)
), then you can just filter
on the rownames:
library(tidyverse)
e %>%
filter(rownames(e) %in% rownames(pf))
Another possibility is to create a rownames column for both dataframes. Then, we can do the semi_join
on the rownames (i.e., rn
). Then, convert the rn
column back to the rownames.
library(tidyverse)
list(e, pf) %>%
map(~ .x %>%
as.data.frame %>%
rownames_to_column('rn')) %>%
reduce(full_join, by = 'rn') %>%
column_to_rownames('rn')
Output
JHU_113_2.CEL JHU_144.CEL JHU_173.CEL JHU_176R.CEL JHU_182.CEL JHU_186.CEL JHU_187.CEL JHU_188.CEL JHU_203.CEL
2315374 6.28274 6.79161 6.11265 6.13997 6.68056 6.48156 6.45415 6.04542 5.99176
2315376 5.81678 5.71165 6.02794 5.37082 5.95527 5.75999 5.87863 5.54830 6.35571
2315587 8.88557 8.95699 8.36898 8.28993 8.41361 8.64980 8.74305 8.31915 8.43548
2315588 6.28650 6.66750 6.07503 6.76625 6.19819 6.84260 6.13916 6.40219 6.45059
2315591 6.97515 6.61705 6.51994 6.74982 6.60917 6.55182 6.62240 6.44394 5.76592
2315595 5.94179 5.39178 5.09497 4.96199 2.96431 4.95204 5.00979 4.06493 5.38048
2315598 4.99420 5.56888 5.57912 5.43960 5.19249 5.87991 5.60540 5.09513 5.43618
2315603 7.67845 7.90005 7.47594 6.75087 7.62805 8.00069 7.34296 6.81338 7.52014
2315604 6.20952 6.59687 6.14608 5.70518 6.49572 6.12622 6.23690 6.39569 6.70869
2315640 5.85307 6.07303 6.41875 6.07282 6.28283 6.13699 6.16377 6.48616 6.34162
Filtering dataframe based on another dataframe
You can use .isin() to filter to the list of tickers available in df2.
df1_filtered = df1[df1['ticker'].isin(df2['ticker'].tolist())]
Flag subset of a dataframe based on another dataframe values
First create Multiindex on both the dataframes then use MultiIndex.isin
to test for the occurrence of the index values of first dataframe in the index of second dataframe in order the create boolean flag:
i1 = first_df.set_index([first_df['A'] * 10, 'B']).index
i2 = second_df.set_index(['C1', 'C2']).index
first_df['Match'] = i1.isin(i2)
Result
print(first_df)
A B C D Match
1 1 a q zz True
2 2 b w xx True
3 3 c e yy False
4 4 d r vv False
Subset of dataframe based on values in another dataframe
As mentioned in the comments there were whitespaces in the data hence it didn't match. We can use trimws
to remove the whitespace and then try to subset it.
df2[trimws(df2$relevantcolumn) %in% trimws(df1), ]
Or if df1
is dataframe
df2[trimws(df2$relevantcolumn) %in% trimws(df1$relevant_column), ]
Find a subset of columns based on another dataframe?
I was able to put together a function that I think works for this, but assumes that columns don't change orders or more get added. If there would be changes to the df shape, this would need to be updated for that.
First, I merged together your example_g_table
and example_s_table
to get them all together.
df = pd.merge(left=example_g_table,right=example_s_table,on=['Date_Time','CID'],how='left')
Date_Time CID 0 1 2 3 4 5 event_1 event_2 event_3
0 4/20/21 4:20 302 0 1.0 2.0 3.0 4.0 5.0 0 2 3
1 2/17/21 9:20 135 1 1.4 1.8 2.0 8.0 10.0 0 1 4
2 2/17/21 9:20 111 4 5.0 5.1 5.2 5.3 5.4 3 4 5
Now we use a new function that will pull out the values of event_2
and event_3
, and return the average of the values of those previous column-values. We will later run df.apply
on this, so it will take in just a row at a time, as a series (I think, anyway).
def func(df):
event_2 = df['event_2']
event_3 = df['event_3']
start = int(event_2 + 2) # this assumes that the column called 0 will be the third (and starting at 0, it'll be the called 2), column 1 will be the third column, etc
end = int(event_3 + 2) # same as above
total = sum(df.iloc[start:end+1]) # this line is the key. It takes the sum of the values of columns in the range of start to finish
avg = total/(end-start+1) #(end-start+1) gets the count of things in our range
return avg
Last, we run df.apply
on this to get our new column.
df['avg'] = df.apply(func,axis=1)
df
Date_Time CID 0 1 2 3 4 5 event_1 event_2 event_3 avg
0 4/20/21 4:20 302 0 1.0 2.0 3.0 4.0 5.0 0 2 3 2.50
1 2/17/21 9:20 135 1 1.4 1.8 2.0 8.0 10.0 0 1 4 3.30
2 2/17/21 9:20 111 4 5.0 5.1 5.2 5.3 5.4 3 4 5 5.35
Subsetting a data frame based on contents of another data frame
Both %in%
and match()
can be used for this. Here is the former:
> which( df1$x %in% df2$y )
[1] 1 2 3 4 27 28 29 30 53 54 55 56 79 80 81 82 105
[18] 106 107 108 131 132 133 134 157 158 159 160 183 184 185 186 209 210
[35] 211 212 235 236 237 238 261 262 263 264 287 288 289 290 313 314 315
[52] 316 339 340 341 342 365 366 367 368 391 392 393 394
>
>
> table(df1[ which( df1$x %in% df2$y ), "x"])
a b c d e f g h i j k l m n o p q r s t u v w x y
16 16 16 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
z
0
>
Related Topics
Twitter, Roauth and Windows: Register Ok, But Certificate Verify Failed
R Shiny: Reactivevalues VS Reactive
Changing Facet Label to Math Formula in Ggplot2
Error ".Onload Failed in Loadnamespace() for 'Tcltk'"
How to Convert R Markdown to HTML? I.E., What Does "Knit HTML" Do in Rstudio 0.96
Why Is Allow.Cartesian Required at Times When When Joining Data.Tables with Duplicate Keys
Changing Fonts for Graphs in R
R Ggplot2 Merge with Shapefile and CSV Data to Fill Polygons
How to Produce Stacked Bars Within Grouped Barchart in R
Saving Grid.Arrange() Plot to File
How to Update R Packages in Default Library on Windows 7
Evaluating Both Column Name and the Target Value Within 'J' Expression Within 'Data.Table'
Extract Every Nth Element of a Vector
How to Connect Two Coordinates with a Line Using Leaflet in R
Select Values from Different Columns Based on a Variable Containing Column Names
In R Markdown in Rstudio, How to Prevent the Source Code from Running Off a PDF Page