Use hist() function in R to get percentages as opposed to raw frequencies
Simply using the freq=FALSE
argument does not give a histogram with percentages, it normalizes the histogram so the total area equals 1.
To get a histogram of percentages of some data set, say x, do:
h = hist(x) # or hist(x,plot=FALSE) to avoid the plot of the histogram
h$density = h$counts/sum(h$counts)*100
plot(h,freq=FALSE)
Basically what you are doing is creating a histogram object, changing the density property to be percentages, and then re-plotting.
How do you use hist to plot relative frequencies in R?
you can try using the histogram()
function in lattice
a <- c(0,0,0,1,1,2)
library(lattice)
histogram(a)
defaults to percent.
My hist function is not doing plots in R, what can i do?
General considerations
The hist
function in your case does exactly what it supposed to do. This function is used mainly for it's sideffect i.e. plotting the histogram (when argument plot
is TRUE
which is the default). But the value function returns is a list with several components (breaks
, counts
, density
, mids
, xname
, equidist
). So when you call t = hist(b)
you assign the value returned by hist
function, to the variable t
(btw t
is the function in R, I wouldn't use t
as a variable name). As a result, when you cal t
it prints returned value, a list mentioned above, but not a plot which you probably expect to see.
Getting your hands dirty (update)
Assuming that your objective is to save a histogram and plot it latter, you can use following observations:
Reproducible data
First we create a reproducible example:
set.seed(72158867)
x <- rnorm(100)
Here we assigned 100 random values to the variable x
.
Create histogram, without plotting, assign the result to the variable
Now we are ready to create a histogram, and assign the result returned by hist()
to the variable h
:
h <- hist(x, plot = FALSE)
Here, with the argument plot = FALSE
we tell the function to return result without plotting.
Inspect the content of h
The value returned by hist()
is an object. We can dig into this object, to find how to work with it in order to achieve our objectives.
Structure of the returned object
First we can examine the structure of the returned object using the str()
function:
str(h)
# List of 6
# $ breaks : num [1:11] -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 ...
# $ counts : int [1:10] 5 8 15 20 17 14 14 5 1 1
# $ density : num [1:10] 0.1 0.16 0.3 0.4 0.34 0.28 0.28 0.1 0.02 0.02
# $ mids : num [1:10] -1.75 -1.25 -0.75 -0.25 0.25 0.75 1.25 1.75 2.25 2.75
# $ xname : chr "x"
# $ equidist: logi TRUE
# - attr(*, "class")= chr "histogram"
Here we can see that the object returned by hist()
is a list-like structure, with several elements. We can also see that the result hist()
returned is a object of a class histogram
.
Class of the returned object
Another way to identify the class of the object returned by hist()
is calling a class()
function on it.
class(h)
#[1] "histogram"
You can also call ?hist
from your console, and find the information about returned value for the hist()
.
Methods defined on the returned object
To find out what kind of methods are defined on the object of class histogram
we can use methods()
function, as shown in example below:
methods(class = class(h))
# [1] lines plot
# see '?methods' for accessing help and source code
Now we know that object of class histogram
accepts two methods lines
and plot
. We are not interested in first one (however I wonder what it does), but second one seems to be something we are looking for.
Call plot
method on the returned object
Let's call the plot()
function on a histogram
object:
plot(h, col = 'skyblue', border = 'yellow', main = 'Histogram for SOq #72158867')
It works as expected!
Basically, it is what hist()
function does when you call it:
- It creates an object of class
histogram
from your data; - It call
plot
method on yourhistogram
object; - It invisibly returns a resulting
histogram
object.
Of course there is a shorter way to find it out. Just type ?hist
in your console, and you will find all hist
-related information by yourself.
Find exact probability of values in range using hist() R function
You want the ratio of (A) times that you got more than 5 visitors to (B) the total number of times:
sum(n_visitors>5) / length(n_visitors)
Or, equivalently:
sum(n_visitors>5) / n_samples
How to plot a density curve in R using percentages?
You can change the stat
used by a geom_*
to get the desired output.
I'll use the mpg
data set from the ggplot2
package for this example.
As you noted,
library(ggplot2)
ggplot(mpg) + aes(x = hwy, y = ..count../sum(..count..)) + geom_histogram()
yields the wanted output as a histogram:
By calling geom_density
with the stat = 'bin'
, the same stat as geom_histogram
, instead of the default stat = 'density'
for geom_density
you'll get what I think you are looking for:
ggplot(mpg) + aes(x = hwy, y = ..count../sum(..count..)) + geom_density(stat = 'bin')
Show the percentage instead of count in histogram using ggplot2 | R
We can replace the y aesthetic by the relative value of the count
computed statistic, and set the scale to show percentages :
ggplot2.histogram(data=dat, xName='dens',
groupName='lines', legendPosition="top",
alpha=0.1) +
labs(x="X", y="Count") +
theme(panel.border = element_rect(colour = "black"),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black")) +
theme_bw()+
theme(legend.title=element_blank()) +
aes(y=stat(count)/sum(stat(count))) +
scale_y_continuous(labels = scales::percent)
Related Topics
Different Robust Standard Errors of Logit Regression in Stata and R
R - Ggplot2 - Highlighting Selected Points and Strange Behavior
How to Add a Page Break in Word Document Generated by Rstudio & Markdown
Legends for Multiple Fills in Ggplot
Index Unique Values in Data.Table
Identifying the Outliers in a Data Set in R
Combinations of Multiple Vectors in R
Dynamic Linking with Rpath Not Working Under Ubuntu 17.10
How to Strsplit Using '|' Character, It Behaves Unexpectedly
Convert/Export Googleway Output to Data Frame
How to Read the Source Code for an R Function
How to Find Difference Between Values in Two Rows in an R Dataframe Using Dplyr
Filter a Vector of Strings Based on String Matching
Changing the Symbol in the Legend Key in Ggplot2
How to Add a Condition to the Geom_Point Size
Replacing Values in a Column with Another Column R