Boxplot of Table Using Ggplot2

Boxplot of table using ggplot2

ggplot2 requires data in a specific format. Here, you need x= and y= where y will be the values and x will be the corresponding column ids. Use melt from reshape2 package to melt the data to get the data in this format and then plot.

require(reshape2)
ggplot(data = melt(dd), aes(x=variable, y=value)) + geom_boxplot(aes(fill=variable))

ggplot2_boxplot

How to use ggplot2 in R to plot a boxplot with number of observations?

Neither the base boxplot or the ggplot geom_boxplot functions expect data with weights/counts like this, so I think your best bet is to expand the data into individual observations.

expanded_data = data[rep(seq_len(nrow(data)), times = data$NumberObservations), ]
ggplot(data = expanded_data,
aes(x = Grouping, y = Value, group = Grouping)) +
geom_boxplot()

Box plot with ggplot2 using data from read.table

I think the best plot to represent a vector of data is an histogram. However you could use the boxplot by create a dummy factor that group your observation. i.e.

data %>%
pivot_longer(cols = everything()) %>%
mutate(type="student") %>%
ggplot(aes(x=type, y=value)) +
geom_boxplot() +
theme_classic() +
xlab("Students") +
ylab("Height") +
ggtitle("Height of students")

if you want a histogram (I think much better for your situation), you don'ty need the dummy factor and you could do something like :

data %>%
pivot_longer(cols = everything()) %>%
ggplot(aes(x=value)) +
geom_histogram() +
theme_classic() +
xlab("Students") +
ylab("Height") +
ggtitle("Height of students")

How to write a function for boxplot using list of dataframes in R

Here is a solution.

When splitting the data, split by year and state in the same instruction. Then loop through the split list plotting each data set. Save with ggave.

In the function below the output filenames depend on the combination year/state and I have included an argument verbose that prints the filenames as they are written to disk.

library(ggplot2)

boxplotter <- function(X, file = "boxplotter%s.pdf", width = 7, height = 5, verbose = FALSE){
# create a list of data.frame's by year and state
year_list <- split(X, list(X[["year"]], X[["state"]]), sep = "_")
# remove from the list the empty sub-lists. This is needed
# because there might be combinations of year/state not
# present in the input data and 'split' will create them
# anyway
year_list <- year_list[sapply(year_list, nrow) > 0L]

# loop with an index into the list to make it possible
# to get the data and also the names attribute, used
# to form the output filenames
for(i in seq_along(year_list)){
# work with a copy, this just makes the code that
# follows easier to read
Y <- year_list[[i]]
# plot and save the plot
filename <- sprintf(file, names(year_list)[i])
g <- ggplot(Y, aes(x=dept, y=corp_tax)) +
geom_boxplot(outlier.colour="red", outlier.shape=8,outlier.size=4) +
scale_y_continuous(limits=c(0, max(Y$corp_tax, na.rm=TRUE)))
ggsave(filename, plot = g, device = "pdf", width = width, height = height)
# want to see what was written to disk?
if(verbose){
msg <- paste("output file:", filename)
message(msg)
}
}
# return nothing
invisible(NULL)
}

boxplotter(df, verbose = TRUE)

ggplot2 boxplot from count table

Ggplot is able to work with weights, so you could try this:

ggplot(df1, aes(x=1,y=nSiblings,weights=count)) + geom_boxplot()

creating a boxplot for two different column of data frame using ggplot

Maybe you are looking for this. The key is reshaping data to long using pivot_longer() after that you can sketch the plot. Here the code:

library(tidyverse)
#Data
level <-c(1,2,3,5,2,4,3,1,3)
pay1 <- c(10,21,32,12,41,21,36,14,17)
pay2 <- c(26,36,5,6,52,12,18,17,19)
data <- data.frame(level, pay1, pay2)
#Plot
data %>% pivot_longer(-level) %>%
ggplot(aes(x=name,y=value,fill=name))+
geom_boxplot()

Output:

Sample Image

Or if level is relevant:

#Plot 2
data %>% pivot_longer(-level) %>%
ggplot(aes(x=name,y=value,fill=factor(level)))+
geom_boxplot()

Output:

Sample Image

R ggplot2 and boxplot() - different plots?

I think your problem is caused by the use of limits on your call to scale_y_continuous. This appears to be filtering the data before calculating the statistics used for the box and whisker plots.

The solution is to use coord_cartesian(). This allows ggplot to use the whole dataframe to calculate the statistics and then "zooms" the plot to required size and location:

ggplot(d, aes(x = Location, y = Value, fill = Variable, na.rm = TRUE)) +
geom_boxplot(outlier.shape = NA, na.rm = TRUE) +
scale_fill_manual(values=c("grey","red","lightblue")) +
scale_y_continuous(breaks = c(0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5)) +
coord_cartesian(ylim=c(0, 3.7))

Sample Image

See this page for more details.



Related Topics



Leave a reply



Submit