Navigation

Operators and Keywords

Function List:

C++ API

 Grubbs tests for one or two outliers in data sample
 
 Description:
 
      Performs Grubbs' test for one outlier, two outliers on one tail,
      or two outliers on opposite tails, in small sample.
 
 Usage:
 
      [pval,G,U] = grubbstest(x,type,opposite,twosided)
 
 Arguments:
 
        x: a numeric vector or matrix of data values. Matrices are treated
           columnwise (each column as independent set).
 
 opposite: a logical (default 0) indicating whether you want to check not the value
           with largest difference from the mean, but opposite (lowest,
           if most suspicious is highest etc.)
 
     type: Integer value indicating test variant. 10 is a test for one
           outlier (side is detected automatically and can be reversed
           by 'opposite' parameter). 11 is a test for two outliers on
           opposite tails, 20 is test for two outliers in one tail. Default 10.
 
 two.sided: Logical value indicating if there is a need to treat this
           test as two-sided. Default 0.
 
 Details:
 
      The function can perform three tests given and discussed by Grubbs
      (1950).
 
      First test (10) is used to detect if the sample dataset contains
      one outlier, statistically different than the other values. Test
      is based by calculating score of this outlier G (outlier minus
      mean and divided by sd) and comparing it to appropriate critical
      values. Alternative method is calculating ratio of variances of
      two datasets - full dataset and dataset without outlier. The
      obtained value called U is bound with G by simple formula.
 
      Second test (11) is used to check if lowest and highest value are
      two outliers on opposite tails of sample. It is based on
      calculation of ratio of range to standard deviation of the sample. 
 
      Third test (20) calculates ratio of variance of full sample and
      sample without two extreme observations. It is used to detect if
      dataset contains two outliers on the same tail.
 
      The p-values are calculated using 'grubbscdf' function.
 
 Value:
 
 G,U: the value statistic. For type 10 it is difference between
           outlier and the mean divided by standard deviation, and for
           type 20 it is sample range divided by standard deviation.
           Additional value U is ratio of sample variances with and
           withour suspicious outlier. According to Grubbs (1950) these
           values for type 10 are bound by simple formula and only one
           of them can be used, but function gives both. For type 20 the
           G is the same as U.
 
  pval: the p-value for the test.
 
 
 Author(s):
 
      Lukasz Komsta, ported from R package "outliers".
	See R News, 6(2):10-13, May 2006
 
 References:
 
      Grubbs, F.E. (1950). Sample Criteria for testing outlying
      observations. Ann. Math. Stat. 21, 1, 27-58.