Recursive formula for variance

I'd like to know how I can recursively (iteratively) compute variance, so that I may calculate the standard deviation of a very large dataset in javascript. The input is a sorted array of positive integers.

15.1k 14 14 gold badges 58 58 silver badges 94 94 bronze badges asked Apr 28, 2013 at 3:50 Rose Perrone Rose Perrone 405 1 1 gold badge 4 4 silver badges 12 12 bronze badges

$\begingroup$ You can estimate the variance by randomly sampling the dataset. See Wikipedia for details. $\endgroup$

Commented Apr 28, 2013 at 3:56

$\begingroup$ Unless the data are being made available to you one at a time, recursive methods for computing the variance usually require more computation than straightforward calculatiom. Since the data set is large, one suggestion is to calculate the sum and the sum of squares simultaneously so that only one pass through the array is needed rather than two (as in compute $\sum_i x_i$ and divide by $n$ to get $\bar$. Then compute $\sum_i (x_i-\bar)^2$). $\endgroup$

Commented Apr 28, 2013 at 4:03

$\begingroup$ The values in the data set are too large to compute the sum of all values at once. $\endgroup$

Commented Apr 28, 2013 at 4:25

5 Answers 5

$\begingroup$

Recall that, for every $n\geqslant1$, $$ \bar x_n=\frac1n\sum_^nx_k, $$ and $$ \bar\sigma^2_n=\frac1n\sum_^n(x_k-\bar x_n)^2=\frac1n\sum_^nx_k^2-(\bar x_n)^2. $$ Hence simple algebraic manipulations starting from the identities $$ (n+1)\bar x_=n\bar x_n+x_, $$ and $$ (n+1)(\bar\sigma^2_+(\bar x_)^2)=n(\bar\sigma^2_n+(\bar x_n)^2)+x_^2, $$ lead to $$ \bar x_=\bar x_n+\frac, $$ and $$ \bar\sigma^2_=\bar\sigma^2_n+(\bar x_n)^2-(\bar x_)^2+\frac. $$ Thus, $(n,\bar x_n,x_)$ yield $\bar x_$ and $(n,\bar\sigma^2_n,\bar x_n,\bar x_,x_)$ yield $\bar\sigma^2_$.

answered Apr 28, 2013 at 8:11 281k 27 27 gold badges 310 310 silver badges 591 591 bronze badges

$\begingroup$ @hackape: The answer is correct. It differs from Schatzi's answer because Schatzi incorrectly treated the data as a sample. The question doesn't talk about a sample; it's not about estimating the variance of a population from a sample, but about calculating the variance of the data. $\endgroup$

Commented Jan 6, 2020 at 9:24 $\begingroup$ @joriki you're right. I delete my comment. $\endgroup$ Commented Jan 7, 2020 at 8:43

$\begingroup$ @joriki: I couldn't match the final answer of Did with that of DolphinDream. I think $\bar x_$ should be replaced with $x_$ in the last equation. $\endgroup$

Commented Jan 13, 2022 at 12:25 $\begingroup$

There are two problems in the preceding answer, the first being the formula for the variance is incorrect(see the formula below for the correct version) and the second is that the formula for the recursion ends up subtracting large, nearly equal, numbers.

The definition for unbiased estimates of mean( $\bar x$ ) and variance( $\sigma^2$ ) for a sample of size n are: $$ \bar x_n=\frac1n\sum_^nx_k, $$ and $$ \bar\sigma^2_n=\frac1\sum_^n(x_k-\bar x_n)^2 $$

Define the recursion variables

$$ M_n = n \bar x_n=\sum_^nx_k, $$ and $$ S_n = (n-1)\bar\sigma^2_n=\sum_^n(x_k-\bar x_n)^2 $$

The recursion relation for $M_$ is obvious $$ M_ = M_n + x_ $$ and the recursion relation for $S_n$ is obtained via $$ S_ = (x_-\bar x_)^2+\sum_^n(x_k-\bar x_)^2\phantom\\ \phantom = (x_-\bar x_)^2+\sum_^n(x_k-\bar x_n+\bar x_n-\bar x_)^2\\ \phantom = (x_-\bar x_)^2+\sum_^n(x_k-\bar x_n)^2+2(\bar x_n-\bar x_)\sum_^n(x_n-\bar x_n) + \sum_^n(\bar x_n -\bar x_)^2\\ $$ And since $$S_n = \sum_^n(x_k-\bar x_n)^2$$ $$\sum_^n(\bar x_n-\bar x_)^2 = n(\bar x_n-\bar x_)^2$$ and $$\sum_^n(x_k-\bar x_n) = 0$$ this simplifies to $$ S_ = S_n+(x_-\bar x_)^2 +n(\bar x_n -\bar x_)^2 $$

Now, this recursion relation has the nice property that it $S_n$ a sum of squared terms, and thus cannot be negative. Written in terms of $M_n$ and $S_n$ , the recursion relations are: $$ M_ = M_n + x_ $$ $$ S_ = S_n+\left(x_-\frac\right)^2 +n\left(\frac -\frac\right)^2 $$ and we can further simplify the recursion relation for $S_n$ to \begin S_ &= S_n + \left(\frac\right)^2+n\left(\frac\right)^2\\ &=S_n+ \left(1+\frac1n\right)\left(\frac\right)^2\\ &=S_n+ \frac<(n x_-M_n)^2> \end

So we have the simple recursion relations: $$ M_ = M_n + x_ $$ $$ S_ = S_n + \frac<(n x_-M_n)^2>$$

with the mean given by $$\bar x_n = \fracn$$ and the unbiased estimate of the variance is given by $$\sigma_n^2 = \frac$$ .