Introduction

We look at some of the basic operations associated with probability distributions. There are a large number of probability distributions available, but we only look at a few. If you would like to know what distributions are available you can do a search using the command help.search(“distribution”).

Here we give details about the commands associated with the normal distribution and briefly mention the commands for other distributions. The functions for different distributions are very similar where the differences are noted below.

For every distribution there are four commands. The commands for each distribution are prepended with a letter to indicate the functionality:

The Normal Distribution

There are four functions that can be used to generate the values associated with the normal distribution.

dnorm

The first function we look at it is dnorm. Given a set of values it returns the height of the probability distribution at each point. If you only give the points it assumes you want to use a mean of zero and standard deviation of one. There are options to use different values for the mean and standard deviation, though:

dnorm(0)
## [1] 0.3989423
dnorm(0) * sqrt(2 * pi)
## [1] 1
dnorm(0, mean = 4)
## [1] 0.0001338302
dnorm(0, mean = 4, sd = 10)
## [1] 0.03682701
v <- c(0, 1, 2)
dnorm(v)
## [1] 0.39894228 0.24197072 0.05399097
x <- seq(-20, 20, by = .1)
y <- dnorm(x)
plot(x, y)

y <- dnorm(x, mean = 2.5, sd = 0.1)
plot(x, y)

pnorm

The second function we examine is pnorm. Given a number or a list it computes the probability that a normally distributed random number will be less than that number. This function also goes by the rather ominous title of the “Cumulative Distribution Function.” It accepts the same options as dnorm:

pnorm(0)
## [1] 0.5
pnorm(1)
## [1] 0.8413447
pnorm(0, mean = 2)
## [1] 0.02275013
pnorm(0, mean = 2, sd = 3)
## [1] 0.2524925
v <- c(0, 1, 2)
pnorm(v)
## [1] 0.5000000 0.8413447 0.9772499
x <- seq(-20, 20, by = .1)
y <- pnorm(x)
plot(x,y)

y <- pnorm(x, mean = 3, sd = 4)
plot(x,y)

If you wish to find the probability that a number is larger than the given number you can use the lower.tail option:

pnorm(0, lower.tail = FALSE)
## [1] 0.5
pnorm(1, lower.tail = FALSE)
## [1] 0.1586553
pnorm(0, mean = 2, lower.tail = FALSE)
## [1] 0.9772499

qnorm

The next function we look at is qnorm which is the inverse of pnorm. The idea behind qnorm is that you give it a probability and it returns the number whose cumulative distribution matches the probability. For example, if you have a normally distributed random variable with mean zero and standard deviation one, then if you give the function a probability it returns the associated Z-score:

qnorm(0.5)
## [1] 0
qnorm(0.5, mean = 1)
## [1] 1
qnorm(0.5, mean = 1, sd = 2)
## [1] 1
qnorm(0.5, mean = 2, sd = 2)
## [1] 2
qnorm(0.5, mean = 2, sd = 4)
## [1] 2
qnorm(0.25, mean = 2, sd = 2)
## [1] 0.6510205
qnorm(0.333)
## [1] -0.4316442
qnorm(0.333, sd = 3)
## [1] -1.294933
qnorm(0.75, mean = 5, sd = 2)
## [1] 6.34898
v = c(0.1, 0.3, 0.75)
qnorm(v)
## [1] -1.2815516 -0.5244005  0.6744898
x <- seq(0, 1, by = .05)
y <- qnorm(x)
plot(x, y)

y <- qnorm(x, mean = 3, sd = 2)
plot(x, y)

y <- qnorm(x, mean = 3, sd = 0.1)
plot(x, y)

rnorm

The last function we examine is the rnorm function which can generate random numbers whose distribution is normal. The argument that you give it is the number of random numbers that you want, and it has optional arguments to specify the mean and standard deviation:

rnorm(4)
## [1]  0.07298852 -1.32810844 -0.28272296  0.12861971
rnorm(4, mean = 3)
## [1] 3.493688 2.070186 2.933694 2.899632
rnorm(4, mean = 3, sd = 3)
## [1] 1.929643 3.092662 2.095764 6.711923
rnorm(4, mean = 3, sd = 3)
## [1] 3.837437 3.038792 3.543452 4.120511
y <- rnorm(200)
hist(y)

y <- rnorm(200, mean = -2)
hist(y)

y <- rnorm(200, mean = -2, sd = 4)
hist(y)

qqnorm(y)
qqline(y)

The t Distribution

There are four functions that can be used to generate the values associated with the t distribution.

These commands work just like the commands for the normal distribution. One difference is that the commands assume that the values are normalized to mean zero and standard deviation one so you have to use a little algebra to use these functions in practice. The other difference is that you have to specify the number of degrees of freedom. The commands follow the same kind of naming convention, and the names of the commands are dt, pt, qt, and rt.

dt

A few examples are given below to show how to use the different commands. The first distribution function, dt:

x <- seq(-20, 20, by = .5)
y <- dt(x, df = 10)
plot(x, y)

y <- dt(x, df = 50)
plot(x, y)

pt

Next we have the cumulative probability distribution function:

pt(-3, df = 10)
## [1] 0.006671828
pt(3, df = 10)
## [1] 0.9933282
1 - pt(3, df = 10)
## [1] 0.006671828
pt(3, df = 20)
## [1] 0.9964621
x = c(-3, -4, -2, -1)
pt((mean(x) - 2)/sd(x), df = 20)
## [1] 0.001165548
pt((mean(x) - 2)/sd(x), df = 40)
## [1] 0.000603064

qt

Next we have the inverse cumulative probability distribution function:

qt(0.05, df = 10)
## [1] -1.812461
qt(0.95, df = 10)
## [1] 1.812461
qt(0.05, df = 20)
## [1] -1.724718
qt(0.95, df =20)
## [1] 1.724718
v <- c(0.005, .025, .05)
qt(v, df = 253)
## [1] -2.595401 -1.969385 -1.650899
qt(v, df = 25)
## [1] -2.787436 -2.059539 -1.708141

rt

Finally random numbers can be generated according to the t distribution:

rt(3, df = 10)
## [1]  0.4652373 -0.6712735 -0.6656343
rt(3, df = 20)
## [1] -0.8350284  1.1645995  0.6598491
rt(3, df = 20)
## [1] -0.2318627  0.5637573  2.4368563

The Binomial Distribution

There are four functions that can be used to generate the values associated with the binomial distribution.

These commands work just like the commands for the normal distribution. The binomial distribution requires two extra parameters, the number of trials and the probability of success for a single trial. The commands follow the same kind of naming convention, and the names of the commands are dbinom, pbinom, qbinom, and rbinom.

A few examples are given below to show how to use the different commands.

dbinom

First we have the distribution function, dbinom:

x <- seq(0, 50, by = 1)
y <- dbinom(x, 50, 0.2)
plot(x, y)

y <- dbinom(x, 50, 0.6)
plot(x, y)

x <- seq(0, 100, by = 1)
y <- dbinom(x, 100, 0.6)
plot(x, y)

pbinom

Next we have the cumulative probability distribution function:

pbinom(24, 50, 0.5)
## [1] 0.4438624
pbinom(25, 50, 0.5)
## [1] 0.5561376
pbinom(25, 51, 0.5)
## [1] 0.5
pbinom(26, 51, 0.5)
## [1] 0.610116
pbinom(25, 50, 0.5)
## [1] 0.5561376
pbinom(25, 50, 0.25)
## [1] 0.999962
pbinom(25, 500, 0.25)
## [1] 4.955658e-33

qbinom

Next we have the inverse cumulative probability distribution function:

qbinom(0.5, 51, 1/2)
## [1] 25
qbinom(0.25, 51, 1/2)
## [1] 23
qbinom(23, 51, 1/2)
## Warning in qbinom(23, 51, 1/2): NaNs produced
## [1] NaN
qbinom(22, 51, 1/2)
## Warning in qbinom(22, 51, 1/2): NaNs produced
## [1] NaN

rbinom

Finally random numbers can be generated according to the binomial distribution:

rbinom(5, 100, .2)
## [1] 30 30 22 20 21
rbinom(5, 100, .7)
## [1] 69 70 61 66 73

The Chi-Squared Distribution

There are four functions that can be used to generate the values associated with the Chi-Squared distribution.

These commands work just like the commands for the normal distribution. The first difference is that it is assumed that you have normalized the value so no mean can be specified. The other difference is that you have to specify the number of degrees of freedom. The commands follow the same kind of naming convention and the names of the commands are dchisq, pchisq, qchisq, and rchisq.

A few examples are given below to show how to use the different commands.

dchisq

First we have the distribution function, dchisq:

x <- seq(-20, 20, by = .5)
y <- dchisq(x, df = 10)
plot(x, y)

y <- dchisq(x, df = 12)
plot(x, y)

pchisq

Next we have the cumulative probability distribution function:

pchisq(2, df = 10)
## [1] 0.003659847
pchisq(3, df = 10)
## [1] 0.01857594
1 - pchisq(3, df = 10)
## [1] 0.9814241
pchisq(3, df = 20)
## [1] 4.097501e-06
x = c(2, 4, 5, 6)
pchisq(x, df = 20)
## [1] 1.114255e-07 4.649808e-05 2.773521e-04 1.102488e-03

qchisq

Next we have the inverse cumulative probability distribution function:

qchisq(0.05, df = 10)
## [1] 3.940299
qchisq(0.95, df = 10)
## [1] 18.30704
qchisq(0.05, df = 20)
## [1] 10.85081
qchisq(0.95, df = 20)
## [1] 31.41043
v <- c(0.005, .025, .05)
qchisq(v, df = 253)
## [1] 198.8161 210.8355 217.1713
qchisq(v, df = 25)
## [1] 10.51965 13.11972 14.61141

rchisq

Finally random numbers can be generated according to the Chi-Squared distribution:

rchisq(3, df = 10)
## [1] 13.361635 21.328800  8.636825
rchisq(3, df = 20)
## [1] 22.65361 16.17566 16.40695

This was originally published here: See http://www.cyclismo.org/tutorial/R/probability.html. I made a few modifications/