**********************
Statistical Treatment
**********************

.. role:: bmagenta
.. role:: magenta
.. highlight:: r

R contains a very comprehensive library with statistical functions, including the most common probability distributions:

.. image:: images/distri.png
    :scale: 50 %
    :align: center


Associated Functions
====================

There are several functions associated to every probability distribution, and they can be accessed adding a 
prefix to the distribution name:


.. code-block::ini
   
::

  _______       _______________________________________________________________________
  
  Prefix		                    Meaning
  _______       _______________________________________________________________________
  
    d		density function
    p		distribution function (cumulative function)
    q		inverse of the distribution function (quantile function)
    r		random generation of numbers following the probability distribution
  _______       _______________________________________________________________________   

  
The arguments are obviously different for each associated function. For the *Normal Distribution*:

::

  > ?Normal
  
  Normal                  package:stats                  R Documentation

  The Normal Distribution

  Description:

     Density, distribution function, quantile function and random
     generation for the normal distribution with mean equal to ‘mean’
     and standard deviation equal to ‘sd’.

  Usage:


    dnorm(x, mean = 0, sd = 1, log = FALSE)
    pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
    qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
    rnorm(n, mean = 0, sd = 1)
    
  ...
  
  
1. :magenta:`dnorm(x, mean = 0, sd = 1, log = FALSE)`

It evaluates the density of the normal distribution with mean ``mean`` and standard deviation ``sd`` in ``x`` abscissa.
The normal distribution has density 

:math:`f(x) = \frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{(x - \mu)^2}{2 \sigma^2}}`

where :math:`\mu` is the mean of the distribution and :math:`\sigma` the standard deviation.

::

  > x <- seq(-10,10,by=.5)			# sequence of numbers
  > x
   [1] -10.0  -9.5  -9.0  -8.5  -8.0  -7.5  -7.0  -6.5  -6.0  -5.5  -5.0  -4.5
  [13]  -4.0  -3.5  -3.0  -2.5  -2.0  -1.5  -1.0  -0.5   0.0   0.5   1.0   1.5
  [25]   2.0   2.5   3.0   3.5   4.0   4.5   5.0   5.5   6.0   6.5   7.0   7.5
  [37]   8.0   8.5   9.0   9.5  10.0

  > y <- dnorm(x, mean=3, sd=2)			# Normal distribution with mean=3 and sd=2

  > plot(x,y,main="Normal Distribution Example")        # Plot the result
  
.. image:: images/dnorm_ejemplo.png
    :scale: 50 %
    :align: center

2. :magenta:`rnorm(n, mean = 0, sd = 1)`

Random sequence of ``n`` numbers following a normal distribution with mean ``mean`` and standard deviation ``sd``.

::

  > x <- rnorm(1000,mean=3,sd=2)		# 1000 random numbers with mean=3 and sd=2
  > summary(x)
     Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  -3.783   1.694   3.003   3.047   4.436   9.864 

  > hist(x,main="Normal Distribution Simulation", ylab="Frequency", plot=TRUE)
  
.. image:: images/rnorm_ejemplo.png
    :scale: 50 %
    :align: center

To ensure reproducibility, it is important to set the random number seed when performing simulations:

::

  > set.seed(1000)
  > rnorm(10)
    [1] -0.44577826 -1.20585657  0.04112631  0.63938841 -0.78655436 -0.38548930
    [7] -0.47586788  0.71975069 -0.01850562 -1.37311776
  > rnorm(10)
    [1] -0.98242783 -0.55448870  0.12138119 -0.12087232 -1.33604105  0.17005748
    [7]  0.15507872  0.02493187 -2.04658541  0.21315411
  > set.seed(1000)
  > rnorm(10)
    [1] -0.44577826 -1.20585657  0.04112631  0.63938841 -0.78655436 -0.38548930
    [7] -0.47586788  0.71975069 -0.01850562 -1.37311776


3. :magenta:`pnorm(q, mean = 0, sd = 1, lower.tail = FALSE, log.p = FALSE)`

It evaluates the distribution function (area below the probability distribution) for a normal distribution with 
mean ``mean`` and standard deviation ``sd``. By default, ``lower.tail = TRUE`` returns the area in the left wing of 
the distribution (:math:`P[X \le x]`) and ``lower.tail = FALSE`` returns the right wing (:math:`P[X > x]`). 

.. image:: images/pnorm_ejemplo.png
    :scale: 50 %
    :align: center

::

  > pnorm(1.5,mean=3,sd=2)				# left wing (default)
  [1] 0.2266274

  > pnorm(1.5,mean=3,sd=2,lower.tail=FALSE)		# right wing
  [1] 0.7733726

  
The R object ``ecdf(x)`` lets us calculate and plot the *Empirical Cumulative Distribution Function* (useful when the 
cumulative distribution is not known). Let's see with an example how to plot the cumulative function in the case of a normal distribution:

::

  > par(mfrow = c(1, 2))		# define 1 row and 2 columns to plot
  > x <- rnorm(50, 2, 4)		# random numbers following normal distribution
  
  > plot(ecdf(x),verticals = TRUE, col.points = "blue",
    + col.hor = "red", col.vert = "bisque")     # plot Empirical Cumulative Distribution Function

which is equivalent to:

::

  > y <- pnorm(x, 2, 4)
  > plot(x,y, main="CDF using pnorm",
  +      col="darkolivegreen",pch=20) # plot Cumulative Distribution Function using 'pnorm'
  
.. image:: images/ecdf.png
    :scale: 60 %
    :align: center

  
4. :magenta:`qnorm(p, mean = 0, sd = 1, lower.tail = FALSE, log.p = FALSE)`

It evaluates the inverse of the distribution function (the abscissa for an area ``p`` under the probability 
distribution) for a normal distribution with mean ``mean`` and standard deviation ``sd``. By default, 
``lower.tail = TRUE`` assumes that the area is that of the left wing of the distribution and 
``lower.tail = FALSE`` assumes that is the right wing area.

::

  > qnorm(0.2266274,mean=3,sd=2)			# left wing (default)
  [1] 1.5
  
  > qnorm(0.7733726,mean=3,sd=2,lower.tail=FALSE)       # right wing
  [1] 1.5

  
Common probability distributions
=================================

.. code-block::ini

::

  ______________________________________________________________________________

  Distribution			Associated Function
  ______________________________________________________________________________
  
  Uniform			dunif, punif, qunif, runif
  Binomial			dbinom, pbinom, qbinom, rbinom
  Poisson			dpois, ppois, qpois, rpois
  ...				d..., p..., q..., r...
  ______________________________________________________________________________
  
  Normal			dnorm, pnorm, qnorm, rnorm
  t de Student			dt, pt, qt, rt
  chi				dchisq, pchisq, qchisq, rchisq
  F de Fisher			df, pf, qf, rf
  ...				d..., p..., q..., r...
  ______________________________________________________________________________

  
Example script
===============

**Purpose**: Estimation of the value of :math:`\pi` using random points generated inside a square.

**Procedure**: Calculate the ratio between the *inner* and *outer* points in a circle with radius equal to 1, 
inscribed in a square of side equal to 2 (i.e., the circle's diameter is equal to the square's side).

We save the script in a file called ``pirandom.R``:

.. code-block::ini

::

  # estimate PI by using random numbers
  #    A.squ = n = (2*r)²
  #    A.cir = n.inside = pi * r²
  #  
  #    pi = n.inside/ r² = 4*n.inside/n	
  #
  pirandom <- function(n) 		# define function
    {
      x <- runif(n,-1,1) 		# random numbers in [-1,1]
      y <- runif(n,-1,1) 		# random numbers in [-1,1]
      plot(x,y) # plot
      r <- sqrt(x*x+y*y) 		# distance to centre
      rinside <- r[r<1] 		# inside circle with r=1?
      n.inside <- length(rinside)
      print(4*n.inside/n) 		# print pi estimation
  }

The code is executed in R as follows:

::

  > source("pirandom.R")		# load the code (function) in the script
  > pirandom(1000)  			# run the function for 1000 points
  [1] 3.184				# 'pi' value estimation
  
  
.. image:: images/pirandom.jpg
    :scale: 50 %
    :align: center