Unit ProbFun/ProbF87


Probability distribution functions for statistical calculations

Copyright 1990 by J. W. Rider


This unit uses the math and specfun units (available separately)

for defining float types, and the beta-, erf-, and gamma-related



This unit is not intended to be a self-contained tutorial in

probability.  The probability cumulative distribution functions

(cdf's) are provided with the caveat that they only work with the

correct inputs.  The "probability" returned assumes that the

"null hypothesis" (that some number is a random variate with a

particular probability distribution) is true.  Based upon the

returned probability, you can determine at what level you want to

accept or reject the "null hypothesis".


This unit provide "probability distributions" rather than

"statistical routines".  They will not help you compute

statistics.  However, they will tell you what the probability

of computing a statistic from a particular distribution will be.


Because of the number of probability distributions available, I

have tried to adopt a consistent (but not standard anywhere else)

way to describing the functions.  Each distribution has a base

"prefix" to which is appended either "CDF", "INV", or "PF".

"CDF" indicates that the function is a "cumulative distribution

function".  Where possible, I've strived to make this

consistently the probability that a random variate will be less

than or equal to ("<=") the "x" argument.  "INV" indicates an

"inverse cumulative distribution function" which takes a

probability as the argument and returns an "x" for which the

"CDF" would yield the given probability.  "PF" indicates a

"probability density function" which is the derivative of the

"CDF".  (Or, inversely, the "CDF" is the integral from minus

infinity to "x" of the "PF".)


Not all distributions have a complete set of functions defined.


Probability function supplied for:


  PROB DIST             PREFIX  CDF     PD      INV     type

  Beta                  beta-    y      y               cnts

  Binomial              bin-     y      y               disc

  Cauchy                cauchy-  y      y        y      cnts

  Chi-square            chs-     y      y               cnts


     (Laplacian)        dx-      y      y        y      cnts

  Error                 erf-     y      y               cnts

  Exponential           x-       y      y        y      cnts

  Snedecor's F          f-       y      y               cnts

  Gamma                 gam-     y      y               cnts

  Gaussian (Normal)     g-       y      y               cnts

  Geometric             geo-            y               disc

  Hypergeometric        hgeo-           y               disc


     D                  ks-      y                      cnts

  Maxwell               maxwell-                        cnts


    (Negative Binomial) pas-     y      y               disc

  Poisson               poi-     y      y               disc

  Rayleigh              ray-                            cnts

  Student's T           t-       y      y               cnts


     (Rectangular)      u-       y      y        y      cnts



In most of references cited at the end of this document, some

sort of mathematical expression is provided for any particular

distribution.  Unfortunately, this is often insufficient to

determine when the practitioner should use a particular

distribution.  Knowing precisely what circumstances yield random

variates that follow a particular distribution can be especially

fruitful in determining which hypotheses can be tested.

Consequently, I have strived to explain such in the descriptions

that follow.


The concept of a "Bernoulli trial" occurs frequently in

relationship with probability distributions.  Briefly, a

Bernoulli trial has a fixed probability of success without regard

to when it is tried, and any one Bernoulli trial is independent

of all others.



BETA. The beta distribution is continuous distribution related to

the binomial distribution.  Mathematically, the Beta distribution

describes the distribution of the probability of success of "n"

Bernoulli trials given "s" successes, rather than the

distribution of the number of successes out of "n" Bernoulli

trials with a given probablity "p". If X1 and X2 are independent

chi-square random variates with "degrees of freedom" v1 and v2

respectively, then the expression "X1/(X1+X2)" will follow a beta

distribution with parameters v1/2 and v2/2.


     Limits:  0 <= x <= 1

              0 < dof1,dof2


function betapdf(x,dof1,dof2:xfloat):xfloat

function betapf(x,dof1,dof2:xfloat):xfloat



BINOMIAL. The binomial distribution describes the number of

successes out of a specific number of Bernoulli trials.


        The CDF returns the probability of less than "k"

successes out of "n" trials with probability "p" of success per

trial.  Low values indicate that "p" is likely too big;  high

values, "p" too small, for less than "k" events out of "n".


     Limits:  0 <= k <= n

              0 <= p <= 1


function bincdf(k,n,p:xfloat):xfloat

function binpf(k,n,p:xfloat):xfloat



CAUCHY. The continuous Cauchy distribution is peculiar in the

sense that it has no well-defined mean or variance.  However, it

arises in some physical phenomena.


function cauchycdf(x:xfloat):xfloat

function cauchyinv(prob:xfloat):xfloat

function cauchypf(x:xfloat):xfloat



CHI-SQUARE. If X1, X2, ..., XN are independent gaussian random

variates of zero mean and unit variance, then the sum of the

squares of the variates will follow a continuous Chi-square

distribution with "N-1" degrees of freedom.


        The CDF returns the probability that an observed

chi-square statistic will be less than "chs" with "dof"

degrees-of-freedom.  Low values indicate "cooked" or "biased"

experimentation.  High values indicate significant differences

between model predictions and experimental outcomes.


     Limits:  0 < chs,dof


function chscdf(chs,dof:xfloat):xfloat

function chspf(chs,dof:xfloat):xfloat



DOUBLE-EXPONENTIAL. The double-exponential or Laplacian is a

continous distribution that is a double-ended version of the



function dxcdf(x:xfloat):xfloat

function dxinv(prob:xfloat):xfloat

function dxpf(x:xfloat):xfloat



ERROR.  The distribution of the absolute values of a gaussian

variates (with zero mean and unit variance).  For positive x,

this is the same as the "error function" (ERF) defined in the

SpecFun unit.  There is a difference.  ERF is an "odd" function

in that "ERF(-x)=-ERF(x)".  For negative values of "x", the CDF

and PF functions here are strictly zero.


    Limits:  0 <= x


function erfcdf(x:xfloat):xfloat

function erfpf(x:xfloat):xfloat



EXPONENTIAL.  The continuous exponential distribution describes

the intervals between Poisson events.


        The CDF returns the probability that an observed

exponential deviate (mean 1) will be less than "x".  Another

easy-to-understand function, not as useless as "ucdf".


     Limits:  0 <= x


function xcdf(x:float):float

function xinv(prob:xfloat):xfloat

function xpf(x:xfloat):xfloat



SNEDECOR'S F.  If X1 and X2 are independent chi-square variates

with v1 and v2 degrees of freedom, then the expression (the

"F-ratio") (X1/v1)/(X2/v2) follows an F-distribution.


        The CDF returns the probability that an observed F-ratio

will be less than "f" with "dof1" and "dof2" degreess of freedom.

Low and high values indicate significant differences between two

sample variances.


     Limits:  0 < f,dof1,dof2


function fcdf(f,dof1,dof2:xfloat):xfloat

function fpf(f,dof1,dof2:xfloat):xfloat





     Limits: 0 <= x

             0 < p < 1


function gamcdf(x,p:xfloat):xfloat

function gampf(x,p:xfloat):xfloat



GAUSSIAN.  Gaussian (Normal) sum of many small variates.


        The CDF returns the probability that a random gaussian

deviate (mean 0, var 1) will be less than "x". Another

easy-to-understand function, and quite useful considering the

number of ways that the gaussian distribution arises.


function gcdf(x:xfloat):xfloat

function gpf(x:xfloat):xfloat



GEOMETRIC. Interval between Bernoulli successes, or number of

trials until first success.


function geopf(x,p:xfloat):xfloat



HYPERGEOMETRIC.  This is perhaps the most primitive of the

probability distributions in this collection.  In a finite

population "Npop" of items there is a specific number of "T" of

items of interest.  Examine "Nsamp" of the population items

(sampled without replacement).  The number of items of interest

in the sample follows a Hypergeometric distribution.


     Limits: 0 <= x <= min(Nsamp,T)

             0 <= Nsamp,T <= Npop


function hgeopf(x,Nsamp,T,Npop:xfloat):xfloat





        The CDF returns probability that the observed D-statistic

will be less than "d". High values indicate significant

difference between source distributions.


function kscdf(d,dof:xfloat):xfloat; { NR calls this "PROBKS" }



MAXWELL.  If X1, X2, and X3 are independent, gaussian random

variates with zero mean and unit variance, then sqrt( sqr(X1) +

sqr(X2) + sqr(X3)) has a Maxwell distribution.  This distribution

arises with three dimensional applications with "spherical error



     Limits:  0 <= x



PASCAL. (Negative Binomial)  The distribution of failures in a

run of Bernoulli trials that have exactly "n" successes where the

probability of success of each trial is "p".


     Limits:  0 <= x

              0 < p < 1


function pascdf(x,n,p:xfloat):xfloat

function paspf(x,n,p:xfloat):xfloat



POISSON. Poisson is a limiting case of the binomial distribution

as the probability of each individual Bernoulli event goes to

zero, and the number of trials goes to infinity, but the expected

number of events remains constant.


        The CDF returns the probability that a Poisson (mean

"mu") random event will be less than "k" (that is, 0 to k-1).

Low values indicate that "mu" is too high; high, "mu" too low.


     Domain: 0 <= k

             0 < mu


function poicdf(k,mu:xfloat):xfloat

function poipf(k,mu:xfloat):xfloat



RAYLEIGH.  If X1 and X2 are independent gaussian random variates

with zero mean and unit variance, then sqrt( sqr(X1) + sqr(X2))

has a Rayleigh distribution.  This distribution arises in two

dimensional applications with "circular error probabilities".


     Limits: 0 <= x



STUDENT'S T. Student's T distribution of sample means drawn from

a normal distribution.


        The CDF returns the probability that an observed

t-statistic will be greater than "t" (or less than "-t") with

"dof" degrees of freedom.  Two-tail test.  Low values indicate

significant differences between sample means.


function tcdf(t,dof:xfloat):xfloat

function tpf(t,dof:xfloat):xfloat



UNIFORM. (Rectangular) The trivial "uniform" probability

distribution function.


        The CDF returns the probability that an observed uniform

deviate between 0 and 1 will be less than "x". Not particularly

useful, but provided because the distribution is easy to



       Limits: 0 <= x <= 1


function ucdf(x:xfloat):xfloat

function uinv(prob:xfloat):xfloat

function upf(x:xfloat):xfloat





[HMF]   Abramowitz and Stegun, Handbooks of Mathmetical Functions,

        Government Printing Office. (also available as a Dover



[HMS]   Beyer, Handbook of Mathematical Sciences, CRC Press.


[BST]   Beyer, Basic Statistical Tables, CRC Press.


[SNA]   Knuth, Semi-numerical Algorithms.


[FFP]   Menzel, Fundamental Formulas of Physics, Dover reprint.


[HAM]   Pearson, Handbook of Applied Mathematics, Van Nostrand



[NR]    Press, et al., Numerical Recipes, Cambridge.