Is There an Equivalent in Ggplot to The Varwidth Option in Plot

Is there an equivalent in ggplot to the varwidth option in plot?

Not elegant but you can do that by:

data <- data.frame(rbind(cbind(rnorm(700, 0,10), rep("1",700)),
                         cbind(rnorm(50, 0,10), rep("2",50))))
data[ ,1] <- as.numeric(as.character(data[,1]))
w <- sqrt(table(data$X2)/nrow(data))
ggplot(NULL, aes(factor(X2), X1)) +
geom_boxplot(width = w[1], data = subset(data, X2 == 1)) +
geom_boxplot(width = w[2], data = subset(data, X2 == 2))

Sample Image

If you have several levels for X2, then you can do without hardcoding all levels:

ggplot(NULL, aes(factor(X2), X1)) + 
llply(unique(data$X2), function(i) geom_boxplot(width = w[i], data = subset(data, X2 == i)))

Also you can post a feature request:
https://github.com/hadley/ggplot2/issues

Using ggplotly on a ggplot2 graph does not work with boxplot variable width or outliers shape change

Plotly does not seem to inherit all ggplot arguments indeed. At least the outliers can be changed following this thread: https://github.com/ropensci/plotly/issues/1114

library(tidyverse)
library(plotly)
p <- iris %>% ggplot(aes(Species, Sepal.Length)) +
geom_boxplot()

# Need to modify the plotly object and make outlier points have opacity equal to 0
p <- plotly_build(p)

for(i in 1:length(p$x$data)) {
p$x$data[[i]]$marker$opacity = 0
}

p

Outliers are removed. I am not sure if it is possible to inlcude var.width into plotly boxplots.

I am not sure though if var.width actually helps visualisation - many people including myself are not very good in comparing the width of bars ... In order to compare the sample size, it may be clearer to actually show the values, e.g. with geom_jitter:

p <- db %>% ggplot(aes(Species, Sepal.Length)) +
geom_boxplot() +
geom_jitter(width = 0.2)

ggplotly(p)

Sample Image

How can I resize the boxes in a boxplot created with R and ggplot2 to account for different frequencies amongst different boxplots?

As @aosmith mentioned, varwidth is the argument you want. It looks like it may have been accidentally removed from ggplot2 at some point (https://github.com/hadley/ggplot2/blob/master/R/geom-boxplot.r). If you look at the commit title, it is adding back in the varwidth parmeter. I'm not sure if that ever made into the cran package, but you might want to check your version. It works with my version: ggplot2 v.1.0.0 I'm not sure how recently the feature was added.

Here is an example:

library(ggplot2)

set.seed(1234)
df <- data.frame(cond = factor( c(rep("A",200), rep("B",150), rep("C",200), rep("D",10)) ),
rating = c(rnorm(200),rnorm(150, mean=0.2), rnorm(200, mean=.8), rnorm(10, mean=0.6)))

head(df, 5)
tail(df, 5)

p <- ggplot(df, aes(x=cond, y=rating, fill=cond)) +
guides(fill=FALSE) + coord_flip()

p + geom_boxplot()

Gives:
Sample Image

p + geom_boxplot(varwidth=T)

Gives:
Sample Image

For a couple of more options, you can also use a violin plot with scaled widths (the scale="count" argument):

p+ geom_violin(scale="count")

Sample Image

Or combine violin and boxplots to maximize your information.

p+ geom_violin(scale="count") + geom_boxplot(fill="white", width=0.2, alpha=0.3)

Sample Image

Increase maximum width of boxplots with facet_wrap in ggplot2

The width parameter in geom_boxplot should do the trick:

ggplot(diamonds, aes(x = cut, y = price)) + 
geom_boxplot(width = .6, position = "dodge") +
facet_wrap("color")

histogram with varying bin widths

Technically speaking, this is a barplot and not a histogram (histograms specifically refer to barplots used to represent binned frequencies of continuous variables) ...

Your cbind() is messing things up (converting abatement and cost to factors):

data <- data.frame(measure, abatement, cost)

Here's a start:

with(dplyr::arrange(data,cost),
barplot(width=abatement,height=cost,space=0))

How to make a continuous fill in a ggplot2 bar plot with one variable

You can reverse the legend with: + guides(fill=guide_colourbar(reverse=TRUE)), however, a colour gradient doesn't seem very informative. Another option would be to cut rating into discrete ranges, as in the example below, which provides a more clear indication of the distribution of ratings within each mpaa category. Nevertheless, because of the different bar heights, it's not clear how the average rating or distribution of ratings varies by mpaa category.

library(tidyverse)
library(ggplot2movies)
theme_set(theme_classic())

movies %>%
filter(mpaa != "") %>%
mutate(rating = fct_rev(cut(rating, seq(0,ceiling(max(rating)),2)))) %>%
ggplot(aes(mpaa, fill=rating)) +
geom_bar(colour="white", size=0.2) +
scale_fill_manual(values=c(hcl(240,100,c(30,70)), "yellow", hcl(0,100,c(70,30))))

Sample Image

Perhaps a boxplot or violin plot would be more informative. In the boxplot example below, the box widths are proportional to the square root of the number of movies rated, due to the varwidth=TRUE argument (I'm not that wild about this because the square-root transformation is difficult to interpret, but I thought I'd put it out there as an option). In the violin plot, the area of each violin is proportional to the number of movies in each mpaa category (due to the scale="count" argument). I've also put the number of movies in each category in the x-axis label, and marked in blue the mean rating for each mpaa category.

p = movies %>% 
filter(mpaa != "") %>%
group_by(mpaa) %>%
mutate(xlab = paste0(mpaa, "\n(", format(n(), big.mark=","), ")")) %>%
ggplot(aes(xlab, rating)) +
labs(x="MPAA Rating\n(number of movies)",
y="Viewer Rating") +
scale_y_continuous(limits=c(0,10))

pl = list(geom_boxplot(varwidth=TRUE, colour="grey70"),
geom_violin(colour="grey70", scale="count",
draw_quantiles=c(0.25,0.5,0.75)),
stat_summary(fun.y=mean, geom="text", aes(label=sprintf("%1.1f", ..y..)),
colour="blue", size=3.5))

gridExtra::grid.arrange(p + pl[-2], p + pl[-1], ncol=2)

Sample Image



Related Topics



Leave a reply



Submit