How to plot a subset of a data frame in R?
with(dfr[dfr$var3 < 155,], plot(var1, var2))
should do the trick.
Edit regarding multiple conditions:
with(dfr[(dfr$var3 < 155) & (dfr$var4 > 27),], plot(var1, var2))
Plotting a subset of a dataframe with R?
Without a sample of your data, I can't test the answers below, but you have some errors in your code, which I've tried to fix:
When you use
with
orsubset
you don't need to restate the name
of the data frame when your refer to individual columns.Original code:
with(subset(fin,fin$Species == "TRAT"), plot(fin$FR.CoYear, fin$Young /fin$Sample))
Change to:
with(subset(fin, Species == "TRAT"), plot(FR.CoYear, Young/Sample))
Here you misplaced a parenthesis in addition to not needing to restate the name of the data frame in the call to
plot
:Original code:
with(fin[fin$Species == "TRAT",], plot((fin$FR.CoYear, fin$Young / fin$Sample))
##gives the error: unexpected ',' in "with(fin[fin$Species == "TRAT",], plot((fin$FR.CoYear,"Change to:
with(fin[fin$Species == "TRAT",], plot(FR.CoYear, Young / Sample))
fin$Young
must also be indexed bySpecies
Original code:
plot(fin$FR.CoYear[fin$Species == "BLKI"],fin$Young / fin$Sample[fin$Species == "BLKI"])
##Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differChange to:
plot(fin$FR.CoYear[fin$Species == "BLKI"],
fin$Young[fin$Species == "BLKI"]/ fin$Sample[fin$Species == "BLKI"])
If you're willing to learn ggplot2
, you can easily create separate plots for each value of Species. For example (once again, I couldn't test this without a sample of your data):
library(ggplot2)
# One panel, separate lines for each species
ggplot(fin, aes(FR.CoYear, Young/Sample, group=Species, colour=Species)) +
geom_point() + geom_line()
# One panel for each species
ggplot(fin, aes(FR.CoYear, Young/Sample)) +
geom_point() + geom_line() +
facet_grid(Species ~ .)
Spatial subset of data frame in R
The point data needs to be in a specific format (i.e., a matrix with x and y) when you use plot
and for getpoly
to recognize the coordinates.
library(splancs)
library(tidyverse)
library(sf)
set.seed(543)
xy <-
cbind(x = runif(n = 25, min = -118, max = -117),
y = runif(n = 25, min = 40, max = 42))
plot(xy)
# Draw a polygon for study area.
poly <- getpoly()
# Convert to sf objects.
polysf <- st_as_sf(as.data.frame(poly), coords = c("V1", "V2"), crs = 4326) %>%
dplyr::summarise() %>%
st_cast("POLYGON") %>%
st_convex_hull()
xysf <- st_as_sf(as.data.frame(xy), coords = c("x", "y"), crs = 4326)
# Do an intersection to keep only points inside the drawn polygon.
xy_intersect <- st_intersection(polysf, xysf)
Output
Simple feature collection with 9 features and 0 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -117.7913 ymin: 40.82405 xmax: -117.4264 ymax: 41.7448
Geodetic CRS: WGS 84
geometry
1 POINT (-117.4264 41.18712)
2 POINT (-117.5756 41.7448)
3 POINT (-117.7913 40.82405)
4 POINT (-117.7032 41.15077)
5 POINT (-117.5634 41.23936)
6 POINT (-117.7441 40.84163)
7 POINT (-117.692 41.27514)
8 POINT (-117.6864 40.98462)
9 POINT (-117.5759 40.88477)
Plotted with mapview::mapview(xy_intersect)
from library(mapview)
However, if you want to extract rows from your original dataframe, then here is another hack for extracting the points that fall within a drawn polygon (when the polygon coordinates look like 0.003456 for example).
library(splancs)
library(tidyverse)
set.seed(543)
xy <-
cbind(x = runif(n = 25, min = -118, max = -117),
y = runif(n = 25, min = 40, max = 42))
plot(xy)
# Draw a polygon for study area.
poly <- getpoly()
# Plot the results.
plot(xy)
polygon(poly)
# This will return a logical vector for points in the polygon
io <- inout(xy, poly)
points(xy[io,], pch = 16, col = "blue")
# Then, can use the index from io to extract the points that
# are inside the polygon from the original set of points.
extract_points <- as.data.frame(xy)[which(io == TRUE),]
extract_points
Output
x y
2 -117.4506 41.17794
3 -117.4829 40.71030
8 -117.4679 40.71702
19 -117.3354 40.53687
21 -117.5219 40.47077
22 -117.4876 40.18188
25 -117.2015 40.86243
subset dataframe and plot all the subsets with a loop [R]
Here's how to plot the charts in a loop
. In the example you gave, we only have one file number. However, it should create a chart for every number in the file column. On Windows, you can use savePlot
to save to your drive. I simplified your example because I was getting errors.
DataOzono <- read.table(text="pressure height Temperature RH Ozone file LogP
753.6 2541 16.8 76 0 80131 0.3475673
748.0 2604 17.7 32 0 80131 0.347959
743.5 2656 15.9 38 0 80131 0.3482766
739.8 2697 15.4 39 0 80131 0.3485396
736.6 2734 15.0 41 0 80131 0.3487685
731.8 2790 14.5 42 0 80131 0.3491142", header=TRUE, stringsAsFactors=FALSE)
original_par <- par()
par(mar=c(5.1, 8.1, 4.1, 3.1))
for (i in unique(DataOzono$file)){
DataOzono_subset <- DataOzono[DataOzono$file==i,] #keep only rows for that file number
plot(DataOzono_subset$LogP, DataOzono_subset$Temperature, axes= F,type="l",col="red", ylab = "", xlab = 'LogP',xaxt="n",yaxt="n" )
axis(2,col="red",col.axis="red")
mtext(text = 'T',line = 2,side = 2,col="red",col.lab="red")
par(new=TRUE)
plot(DataOzono_subset$LogP, DataOzono_subset$RH,type="l",col="blue",xaxt="n",yaxt="n",xlab="",ylab="")
axis(4,col="blue",col.axis="blue")
mtext("RH",side=4,line=2,col="blue",col.lab="blue" )
par(new=TRUE)
plot(DataOzono_subset$LogP, DataOzono_subset$Ozone,type="l",col="darkgreen",xaxt="n",yaxt="n",xlab="",ylab="")
mtext("O3",side=2,line=6,,col="darkgreen",col.lab="darkgreen")
axis(2, line = 4,col="darkgreen",col.axis="darkgreen")
savePlot(filename=paste0("c:/temp/",i,".png"),type="png")
}
par() <- original_par #restore par to initial value.
Subsetting data for ggplot2
For your specific case the problem is that you are not quoting Male/Female
and Weighted Average Income
. Also your data and basic aesthetics should likely be part of ggplot
and not geom_line
. Doing so isolates these to the single layer, and you would have to add the code to every layer of your plot if you were to add for example geom_smooth
.
So to fix your problem you could do
library(tidyverse)
plot <- ggplot(data = dt[Country == 'Germany'],
aes(x = Birthyear,
y = sym("Weighted Average Income"),
col = sym("Weighted Average Income")
) + #Could use "`x`" instead of sym(x)
geom_line() +
facet_grid(Country ~ sym("Male/Female")) ##Could use "`x`" instead of sym(x)
plot
Now ggplot2
actually has a (lesser known) builtin functionality for changing your data, so if you wanted to compare this to the plot with all of your countries included you could do:
plot %+% dt # `%+%` is used to change the data used by one or more layers. See help("+.gg")
Subset and plot data frames with the same column names in ggplot in R
We get the unique
column names from all the list
elements ('un1'), loop over the names, extract the column names that are the same from each of the 'samp' in a nested lapply
, use cbind.fill
from rowr
to cbind
the list
elements (while fill
ing the unequal rows with NA for those datasets that have less number of rows) to create 'lst1'. Another list
is created to get the index the list
element where the column names comes from ('lst2'). Use these two lists in Map
to extract the corresponding 'h' column based on the index from 'lst2', and cbind
with each of the datasets of 'lst1'
library(rowr)
un1 <- setdiff(unique(unlist(lapply(samp, names))), "h")
lst1 <- lapply(un1, function(nm) do.call(cbind.fill,
c(Filter(length, lapply(samp, function(x)
x[colnames(x) == nm])), fill = NA)))
lst2 <- lapply(un1, function(nm) which(do.call(c,
lapply(samp, function(x) any(names(x) == nm)))))
out <- Map(function(dat1, ind) {
tmp <- do.call(cbind.fill, c(lapply(samp[ind], `[[`, 'h'), fill = NA))
names(tmp) <- paste0("h", seq_along(tmp))
cbind(dat1, tmp)},
lst1, lst2)
length(out)
#[1] 22
-checking the output
lapply(out, head, 2)
#[[1]]
# DLC12s h1
#1 86.19998 -52.500
#2 83.16610 -43.375
#[[2]]
# DLC17p h1
#1 0.5184452 -52.500
#2 1.5012423 -43.375
#[[3]]
# DLC17q h1
#1 0.2929875 -52.500
#2 0.3105346 -43.375
#[[4]]
# DLC21gs h1
#1 12.7175189 -52.500
#2 0.1544069 -43.375
#[[5]]
# DLC24as h1
#1 0.2228264 -52.500
#2 0.2411541 -43.375
#[[6]]
# DLC24bs h1
#1 0.02773543 -52.500
#2 0.04170485 -43.375
#[[7]]
# DLC31s h1
#1 0.001799534 -52.500
#2 0.451788609 -43.375
#[[8]]
# DLC41es h1
#1 0.0003281455 -52.500
#2 0.0094817520 -43.375
#[[9]]
# DLC41is h1
#1 0.001144196 -52.500
#2 0.369375492 -43.375
#[[10]]
# DLC41ms h1
#1 0.003163386 -52.500
#2 0.121520955 -43.375
#[[11]]
# DLC64h DLC64h DLC64h h1 h2 h3
#1 0.003437833 0.01828710 0.0682039 -52.500 -69.3 -75.4
#2 1.063494100 0.08393471 0.3838715 -43.375 -65.0 -66.0
#[[12]]
# DLC64l DLC64l DLC64l h1 h2 h3
#1 2.456927e-16 0.07751714 0.0491324765 -52.500 -69.3 -75.4
#2 1.902683e+00 0.13670254 0.0006464645 -43.375 -65.0 -66.0
#[[13]]
# DLC72 DLC72 DLC72 h1 h2 h3
#1 0.01063255 12.82851 8.336495 -52.500 -69.3 -75.4
#2 10.66651137 27.71747 36.174530 -43.375 -65.0 -66.0
#[[14]]
# DLC12 DLC12 h1 h2
#1 86.53149 54.44353 -69.3 -75.4
#2 70.64820 60.40582 -65.0 -66.0
#[[15]]
# DLC24a DLC24a h1 h2
#1 0.2187664 0.1598862 -69.3 -75.4
#2 0.1533400 0.1716777 -65.0 -66.0
#[[16]]
# DLC24b DLC24b h1 h2
#1 0.04532141 0.01841368 -69.3 -75.4
#2 0.04852150 0.02924072 -65.0 -66.0
#[[17]]
# DLC31 DLC31 h1 h2
#1 0.1142758 0.1051915 -69.3 -75.4
#2 0.4196964 0.3760683 -65.0 -66.0
#[[18]]
# DLC41e DLC41e h1 h2
#1 0.001120229 0.001992596 -69.3 -75.4
#2 0.005298573 0.009939579 -65.0 -66.0
#[[19]]
# DLC41i DLC41i h1 h2
#1 0.1384648 0.0763053 -69.3 -75.4
#2 0.6957711 0.4806988 -65.0 -66.0
#[[20]]
# DLC41m DLC41m h1 h2
#1 0.02624807 0.1084238 -69.3 -75.4
#2 0.09105723 0.2136423 -65.0 -66.0
#[[21]]
# DLCE4 h1
#1 31.8570262 -75.4
#2 0.2500975 -66.0
#[[22]]
# DLCE7 h1
#1 4.775404 -75.4
#2 1.503764 -66.0
If we don't have rowr
, then an option is to create rows for the list
elements that have less number of rows with NA
un1 <- setdiff(unique(unlist(lapply(samp, names))), "h")
lst1 <- lapply(un1, function(nm) {
tmplst <- Filter(length, lapply(samp, function(x)
x[colnames(x) == nm]))
mx <- max(sapply(tmplst, nrow))
do.call(cbind, lapply(tmplst, function(x) {
if(mx > nrow(x)) x[nrow(x):mx, ] <- NA
x}))})
lst2 <- lapply(un1, function(nm) which(do.call(c,
lapply(samp, function(x) any(names(x) == nm)))))
out <- Map(function(dat1, ind) {
tmplst <- lapply(samp[ind], `[[`, 'h')
mx <- max(lengths(tmplst))
tmplst1 <- do.call(cbind, lapply(tmplst, `length<-`, mx))
colnames(tmplst1) <- paste0('h', seq_len(ncol(tmplst1)))
cbind(dat1, tmplst1)
}, lst1, lst2)
sapply(out, dim)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] #[,15] [,16] [,17] [,18] [,19] [,20]
#[1,] 22 22 22 22 22 22 22 22 22 22 38 38 38 38 #38 38 38 38 38 38
#[2,] 2 2 2 2 2 2 2 2 2 2 6 6 6 4 #4 4 4 4 4 4
# [,21] [,22]
#[1,] 24 24
#[2,] 2 2
Update
With the named list
, we can change the
colnames(tmplst1) <- paste0('h', seq_len(ncol(tmplst1)))
to
colnames(tmplst1) <- paste0('h', colnames(tmplst1))
ie.
out <- Map(function(dat1, ind) {
tmplst <- lapply(samp[ind], `[[`, 'h')
mx <- max(lengths(tmplst))
tmplst1 <- do.call(cbind, lapply(tmplst, `length<-`, mx))
colnames(tmplst1) <- paste0('h', colnames(tmplst1))
cbind(dat1, tmplst1)
}, lst1, lst2)
Plotly Express - plot subset of dataframe columns by default and the rest as option
You can use the visible
property of the traces to state it is only in the legend. Below shows all columns in the figure then first two columns are set as visible, all other columns are only in the legend.
import plotly.express as px
import pandas as pd
import numpy as np
# simulate dataframe
df = pd.DataFrame(
{c: np.random.uniform(0, 1, 100) + cn for cn, c in enumerate("ABCDEF")}
)
fig = px.line(df, x=df.index, y=df.columns)
# for example only display first two columns of data frame, all others can be displayed
# by clicking on legend item
fig.for_each_trace(
lambda t: t.update(visible=True if t.name in df.columns[:2] else "legendonly")
)
Use index to subset dataframe based on unique values in a column
You should subset with a logical vector:
df[df$ID %in% unique(df$ID)[1:5], ]
df[df$ID %in% unique(df$ID)[6:10], ]
You can also use split
with cut
to split your dataframe into n
datasets (here, 2) by group.
split(df, cut(as.numeric(as.factor(df$ID)), 2))
Basic bar plot with subset of data frame in R
There are a few typos in your code... but if I'm interpreting correctly what you are trying to accomplish then this is what you want:
library(dplyr)
library(ggplot2)
Df1 <- data.frame(
education = c("high", "high", "high", "high", "high", "college", "college", "college", "college", "grad", "grad", "grad", "grad", "grad"),
salary = c("65", "65", "65", "90", "65", "65", "65", "90", "90", "90", "90", "65", "75", "75")
)
Df2 <- Df1 %>%
filter(education == "high") %>%
group_by(education, salary) %>%
summarise(SCount = n())
ggplot(Df2, aes(x = salary, y = SCount)) +
geom_bar(stat = "identity") +
coord_flip()
..which produces this plot
Related Topics
Running Multiple Linear Regressions Across Several Columns of a Data Frame in R
Fill Area Between Multiple Lines in Plot
How to Learn How to Write C Code to Speed Up Slow R Functions
Dplyr::Select One Column and Output as Vector
Plot Mixed Effects Model in Ggplot
Plot a Legend and Well-Spaced Universal Y-Axis and Main Titles in Grid.Arrange
Label Minimum and Maximum of Scale Fill Gradient Legend with Text: Ggplot2
What's the Difference Between Reactive Value and Reactive Expression
Convert Data from Many Rows to Many Columns
Meaning of Band Width in Ggplot Geom_Smooth Lm
Scale_Color_Manual Colors Won't Change
Aggregate by Specific Year in R
Speeding Up Julia's Poorly Written R Examples
How to Convert Ensembl Id to Gene Symbol in R
Make Dataframe of Top N Frequent Terms for Multiple Corpora Using Tm Package in R
Formatting Ggplot2 Axis Labels with Commas (And K? Mm) If I Already Have a Y-Scale
Is It Bad Practice to Access S4 Objects Slots Directly Using @