How can I do standard deviation in Ruby?
module Enumerable
def sum
self.inject(0){|accum, i| accum + i }
end
def mean
self.sum/self.length.to_f
end
def sample_variance
m = self.mean
sum = self.inject(0){|accum, i| accum +(i-m)**2 }
sum/(self.length - 1).to_f
end
def standard_deviation
Math.sqrt(self.sample_variance)
end
end
Testing it:
a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ]
a.standard_deviation
# => 4.594682917363407
01/17/2012:
fixing "sample_variance" thanks to Dave Sag
How do I find the standard deviation in Ruby?
Use
variance = variance / contents.size # or contents.length
How do I find the average and standard deviation in ruby?
I think the problem comes from splitting the data using \r\n
which is OS dependant: if you're on Linux, it should be contents.split('\n')
. Either way, you probably would be better off using IO#each
to iterate over every line in the file and let Ruby deal with line ending characters.
data = File.open("test.txt", "r+")
count = 0
sum = 0
variance = 0
data.each do |line|
value = line.split(',')[1]
sum = sum + value.to_f
count += 1
end
avg = sum / count
puts "The average of your large data set is: #{ avg.round(3)} (Answer is rounded to nearest thousandth place)"
# We need to get back to the top of the file
data.rewind
data.each do |line|
value = line.split(',')[1]
variance = variance + (value.to_f - avg)**2
end
variance = variance / count
variance = Math.sqrt(variance)
puts "The standard deviation of your large data set is: #{ variance.round(3)} (Answer is rounded to nearest thousandth place)"
Square root and Looping on Standard Deviation in Ruby
The error is because you are giving Math::sqrt
a negative number as argument.
To calculate the difference of num
and average
, use its absolute value:
Math::sqrt((num - average).abs)
iterating over an array of objects and adding to an earlier element...is any more efficient than iterating over a small portion?
stddev
is sqrt(variance)
. Population variance is the mean of the sum of squares of the population. You say you want the running stddev over sublists of 20 elements. So you could calculate this faster by starting by calculating the sum of the squares of the first 20 elements, then iterate through the remaining elements, subtracting the square of the n-20th element and adding the square of the new element and calculating sqrt(current_sum_of_squares/20.0)
for the stddev. This will result in about a factor of 20 fewer computations as calculating the stddev independently over N-20 20-element sub-lists.
Pushing the stdev onto the n-20th element is trivial as it doesn't involve any major mutation to the big list, just an append to that one element.
I've gotta run to a meeting now or I'd show some code. Perhaps later tonight if this isn't clear.
Algorithm for deviations
It's a very easy thing to calculate, but you will need to tune one parameter. You want to know if any given value is X standard deviations from the mean. To figure this out, calculate the standard deviation (see Wikipedia), then compare each value's deviation abs(mean - value)
from the mean to this value. If a value's deviation is say, more than two standard deviations from the mean, flag it.
Edit:
To track deviations by weekday, keep an array of integers, one for each day. Every time you encounter a deviation, increment that day's counter by one. You could also use doubles and instead maintain a percentage of deviations for that day (num_friday_deviations/num_fridays)
for example.
How to find cumulative variance or standard deviation in R
You can also check the wiki site for Algorithms for calculating variance and implement Welford's Online algorithm with Rcpp
as follows
library(Rcpp)
func <- cppFunction(
"arma::vec roll_var(arma::vec &X){
const arma::uword n_max = X.n_elem;
double xbar = 0, M = 0;
arma::vec out(n_max);
double *x = X.begin(), *o = out.begin();
for(arma::uword n = 1; n <= n_max; ++n, ++x, ++o){
double tmp = (*x - xbar);
xbar += (*x - xbar) / n;
M += tmp * (*x - xbar);
if(n > 1L)
*o = M / (n - 1.);
}
if(n_max > 0)
out[0] = std::numeric_limits<double>::quiet_NaN();
return out;
}", depends = "RcppArmadillo")
# it gives the same
x <- c(1, 4, 5, 6, 9)
drop(func(x))
#R [1] NaN 4.50 4.33 4.67 8.50
sapply(seq_along(x), function(i) var(x[1:i]))
#R [1] NA 4.50 4.33 4.67 8.50
# it is fast
x <- rnorm(1e+3)
microbenchmark::microbenchmark(
func = func(x),
sapply = sapply(seq_along(x), function(i) var(x[1:i])))
#R Unit: microseconds
#R expr min lq mean median uq max neval
#R func 9.09 9.88 30.7 20.5 21.9 1189 100
#R sapply 43183.49 45040.29 47043.5 46134.4 47309.7 80345 100
Taking the square root gives you the standard deviation.
A major advantage of this method is that there are no issues with cancellation. E.g., compare
# there are no issues with cancellation
set.seed(99858398)
x <- rnorm(1e2, mean = 1e8)
cumvar <- function (x, sd = FALSE) {
n <- seq_along(x)
v <- (cumsum(x ^ 2) - cumsum(x) ^ 2 / n) / (n - 1)
if (sd) v <- sqrt(v)
v
}
z1 <- drop(func(x))[-1]
z2 <- cumvar(x)[-1]
plot(z1, ylim = range(z1, z2), type = "l", lwd = 2)
lines(seq_along(z2), z2, col = "DarkBlue")
The blue line is the algorithm where you subtract the squared values from the squared mean.
Finding standard deviation using only mean, min, max?
A standard deviation cannot in general be computed from just the min, max, and mean. This can be demonstrated with two sets of scores that have the same min, and max, and mean but different standard deviations:
- 1 2 4 5 : min=1 max=5 mean=3 stdev≈1.5811
- 1 3 3 5 : min=1 max=5 mean=3 stdev≈0.7071
Also, what does an 'overall score' of 90 mean if the maximum is 84?
Related Topics
Can't Use Compass After Installing It
Adding an Instance Variable to a Class in Ruby
Building a Windows Executable from My Ruby App
Generating Unique Token on the Fly with Rails
Converting an Array of Keys and an Array of Values into a Hash in Ruby
Simple_Form: Remove Outer Label for an Inline Checkbox with Label
How to Apply a Patch to Ruby on Rails
How to Get Files Count in a Directory Using Ruby
Rails Activeadmin: Showing Table of a Related Resource in the Same View
Find Records with Datetime That Match Today's Date - Ruby on Rails
How to Render a Partial in Sinatra View (Haml in Haml)
Iterate Every Month with Date Objects
Ruby: Initialize() VS Class Body
How to Pass Arguments from the Parent Task to the Child Task in Rake
Ruby: How to Chain Multiple Method Calls Together with "Send"