Use Hist() Function in R to Get Percentages as Opposed to Raw Frequencies

Use hist() function in R to get percentages as opposed to raw frequencies

Simply using the freq=FALSE argument does not give a histogram with percentages, it normalizes the histogram so the total area equals 1.

To get a histogram of percentages of some data set, say x, do:

h = hist(x) # or hist(x,plot=FALSE) to avoid the plot of the histogram
h$density = h$counts/sum(h$counts)*100
plot(h,freq=FALSE)

Basically what you are doing is creating a histogram object, changing the density property to be percentages, and then re-plotting.

How do you use hist to plot relative frequencies in R?

you can try using the histogram() function in lattice

a <- c(0,0,0,1,1,2)
library(lattice)
histogram(a)

defaults to percent.

My hist function is not doing plots in R, what can i do?

General considerations

The hist function in your case does exactly what it supposed to do. This function is used mainly for it's sideffect i.e. plotting the histogram (when argument plot is TRUE which is the default). But the value function returns is a list with several components (breaks, counts, density, mids, xname, equidist). So when you call t = hist(b) you assign the value returned by hist function, to the variable t (btw t is the function in R, I wouldn't use t as a variable name). As a result, when you cal t it prints returned value, a list mentioned above, but not a plot which you probably expect to see.

Getting your hands dirty (update)

Assuming that your objective is to save a histogram and plot it latter, you can use following observations:

Reproducible data

First we create a reproducible example:

set.seed(72158867)

x <- rnorm(100)

Here we assigned 100 random values to the variable x.

Create histogram, without plotting, assign the result to the variable

Now we are ready to create a histogram, and assign the result returned by hist() to the variable h:

h <- hist(x, plot = FALSE)

Here, with the argument plot = FALSE we tell the function to return result without plotting.

Inspect the content of h

The value returned by hist() is an object. We can dig into this object, to find how to work with it in order to achieve our objectives.

Structure of the returned object

First we can examine the structure of the returned object using the str() function:

str(h)

# List of 6
# $ breaks : num [1:11] -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 ...
# $ counts : int [1:10] 5 8 15 20 17 14 14 5 1 1
# $ density : num [1:10] 0.1 0.16 0.3 0.4 0.34 0.28 0.28 0.1 0.02 0.02
# $ mids : num [1:10] -1.75 -1.25 -0.75 -0.25 0.25 0.75 1.25 1.75 2.25 2.75
# $ xname : chr "x"
# $ equidist: logi TRUE
# - attr(*, "class")= chr "histogram"

Here we can see that the object returned by hist() is a list-like structure, with several elements. We can also see that the result hist() returned is a object of a class histogram.

Class of the returned object

Another way to identify the class of the object returned by hist() is calling a class() function on it.

class(h)

#[1] "histogram"

You can also call ?hist from your console, and find the information about returned value for the hist().

Methods defined on the returned object

To find out what kind of methods are defined on the object of class histogram we can use methods() function, as shown in example below:

methods(class = class(h))

# [1] lines plot
# see '?methods' for accessing help and source code

Now we know that object of class histogram accepts two methods lines and plot. We are not interested in first one (however I wonder what it does), but second one seems to be something we are looking for.

Call plot method on the returned object

Let's call the plot() function on a histogram object:

plot(h, col = 'skyblue', border = 'yellow', main = 'Histogram for SOq #72158867')

Sample Image

It works as expected!

Basically, it is what hist() function does when you call it:

  • It creates an object of class histogram from your data;
  • It call plot method on your histogram object;
  • It invisibly returns a resulting histogram object.

Of course there is a shorter way to find it out. Just type ?hist in your console, and you will find all hist-related information by yourself.

Find exact probability of values in range using hist() R function

You want the ratio of (A) times that you got more than 5 visitors to (B) the total number of times:

sum(n_visitors>5) / length(n_visitors)

Or, equivalently:

sum(n_visitors>5) / n_samples

How to plot a density curve in R using percentages?

You can change the stat used by a geom_* to get the desired output.

I'll use the mpg data set from the ggplot2 package for this example.

As you noted,

library(ggplot2)
ggplot(mpg) + aes(x = hwy, y = ..count../sum(..count..)) + geom_histogram()

yields the wanted output as a histogram:
Sample Image

By calling geom_density with the stat = 'bin', the same stat as geom_histogram, instead of the default stat = 'density' for geom_density you'll get what I think you are looking for:

ggplot(mpg) + aes(x = hwy, y = ..count../sum(..count..)) + geom_density(stat = 'bin')

Sample Image

Show the percentage instead of count in histogram using ggplot2 | R

We can replace the y aesthetic by the relative value of the count computed statistic, and set the scale to show percentages :

ggplot2.histogram(data=dat, xName='dens',
groupName='lines', legendPosition="top",
alpha=0.1) +
labs(x="X", y="Count") +
theme(panel.border = element_rect(colour = "black"),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black")) +
theme_bw()+
theme(legend.title=element_blank()) +
aes(y=stat(count)/sum(stat(count))) +
scale_y_continuous(labels = scales::percent)


Related Topics



Leave a reply



Submit