Function File: p = anderson_darling_cdf (A, n)

Return the CDF for the given Anderson-Darling coefficient A computed from n values sampled from a distribution. For a vector of random variables x of length n, compute the CDF of the values from the distribution from which they are drawn. You can uses these values to compute A as follows:

A = -n - sum( (2*i-1) .* (log(x) + log(1 - x(n:-1:1,:))) )/n;

From the value A, anderson_darling_cdf returns the probability that A could be returned from a set of samples.

The algorithm given in [1] claims to be an approximation for the Anderson-Darling CDF accurate to 6 decimal points.

Demonstrate using:

n = 300; reps = 10000;
z = randn(n, reps);
x = sort ((1 + erf (z/sqrt (2)))/2);
i = [1:n]' * ones (1, size (x, 2));
A = -n - sum ((2*i-1) .* (log (x) + log (1 - x (n:-1:1, :))))/n;
p = anderson_darling_cdf (A, n);
hist (100 * p, [1:100] - 0.5);

You will see that the histogram is basically flat, which is to say that the probabilities returned by the Anderson-Darling CDF are distributed uniformly.

You can easily determine the extreme values of p:

[junk, idx] = sort (p);

The histograms of various p aren’t very informative:

histfit (z (:, idx (1)), linspace (-3, 3, 15));
histfit (z (:, idx (end/2)), linspace (-3, 3, 15));
histfit (z (:, idx (end)), linspace (-3, 3, 15));

More telling is the qqplot:

qqplot (z (:, idx (1))); hold on; plot ([-3, 3], [-3, 3], ';;'); hold off;
qqplot (z (:, idx (end/2))); hold on; plot ([-3, 3], [-3, 3], ';;'); hold off;
qqplot (z (:, idx (end))); hold on; plot ([-3, 3], [-3, 3], ';;'); hold off;

Try a similarly analysis for z uniform:

z = rand (n, reps); x = sort(z);

and for z exponential:

z = rande (n, reps); x = sort (1 - exp (-z));

[1] Marsaglia, G; Marsaglia JCW; (2004) "Evaluating the Anderson Darling distribution", Journal of Statistical Software, 9(2).

See also: anderson_darling_test.

Package: statistics