image
image
image
image
image
image
image
image
image
image

Statistics Formula Sheet

This page will help you to revise formulas and concepts of Statistics instantly for various exams.
Shape 2
Shape 3
Shape 4
Shape 5
Shape 7
Shape 8
Shape 9
Shape 10

Neetesh Kumar | May 29, 2024                                       \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space \space Share this Page on: Reddit icon Discord icon Email icon WhatsApp icon Telegram icon

1. Measures of Central Tendency:

An average or central value of a distribution is the variable's value representing the entire distribution. These representative values are called the measures of central tendency. Generally, there are the following five measures of central tendency:

(a) Mathematical average

  • (i) Arithmetic mean
  • (ii) Geometric mean
  • (iii) Harmonic mean

(b) Positional average

  • (i) Median
  • (ii) Mode

2. Arithmetic Mean:

(i) For ungrouped distribution: If x1,x2,,xnx_1, x_2, \ldots, x_n are nn values of variate xix_i then their mean x\overline{x} is defined as
x=x1+x2++xnn=i=1nxin    i=1nxi=nx\overline{x} = \frac{x_1 + x_2 + \ldots + x_n}{n} = \frac{\sum_{i=1}^{n} x_i}{n} \implies \sum_{i=1}^{n} x_i = n \overline{x}

(ii) For ungrouped and grouped frequency distribution: If x1,x2,,xnx_1, x_2, \ldots, x_n are values of variate with corresponding frequencies f1,f2,,fnf_1, f_2, \ldots, f_n then their mean is given by
x=f1x1+f2x2++fnxnf1+f2++fn=i=1nfixii=1nfi=i=1nfixiN\overline{x} = \frac{f_1 x_1 + f_2 x_2 + \ldots + f_n x_n}{f_1 + f_2 + \ldots + f_n} = \frac{\sum_{i=1}^{n} f_i x_i}{\sum_{i=1}^{n} f_i} = \frac{\sum_{i=1}^{n} f_i x_i}{N}

where N=i=1nfiN = \sum_{i=1}^{n} f_i

(iii) By shortcut method:
Let di=xiad_i = x_i - a x=a+i=1nfidiN\overline{x} = a + \frac{\sum_{i=1}^{n} f_i d_i}{N}

where aa is the assumed mean.

3. Median:

The median of a series is the value of the middle term of the series when the values are written in ascending order. Therefore, the median divides an arranged series into two equal parts.

Formulae of Median: (i) For ungrouped distribution: Let nn be the number of variates in a series then

Median={Middle term,if n is oddMean of (n2)th and (n2+1)th terms,if n is even\text{Median} = \begin{cases} \text{Middle term}, & \text{if } n \text{ is odd} \\ \text{Mean of } \left(\frac{n}{2}\right)^{\text{th}} \text{ and } \left(\frac{n}{2} + 1\right)^{\text{th}} \text{ terms}, & \text{if } n \text{ is even} \end{cases}

(ii) For ungrouped frequency distribution: First, we prepare the cumulative frequency (c.f.) column and find the value of NN. Then

Median={(N+12)th term,if N is oddMean of (N2)th and (N2+1)th terms,if N is even\text{Median} = \begin{cases} \left(\frac{N + 1}{2}\right)^{\text{th}} \text{ term}, & \text{if } N \text{ is odd} \\ \text{Mean of } \left(\frac{N}{2}\right)^{\text{th}} \text{ and } \left(\frac{N}{2} + 1\right)^{\text{th}} \text{ terms}, & \text{if } N \text{ is even} \end{cases}

(iii) For grouped frequency distribution: Prepare the cumulative frequency column and find the value of N2\frac{N}{2}. Then, find the class that contains the value of c.f. equal to or just greater than N2\frac{N}{2}, this is the median class.

Median=l+(N2Ff)×h\text{Median} = l + \left(\frac{\frac{N}{2} - F}{f}\right) \times h

where:

  • ll — lower limit of median class
  • ff — frequency of median class
  • FF — cumulative frequency of the class preceding the median class
  • hh — class interval of median class

4. Mode:

In a frequency distribution, the mode is the value of that variate that has the maximum frequency.

Method for Determining Mode:

(i) For ungrouped distribution: The value of that variate is repeated the maximum number of times.

(ii) For ungrouped frequency distribution: The value of that variates with the maximum frequency.

(iii) For grouped frequency distribution: First, we find the class with the maximum frequency, the modal class.

Mode=l+(f0f12f0f1f2)×h\text{Mode} = l + \left(\frac{f_0 - f_1}{2f_0 - f_1 - f_2}\right) \times h

where:

  • ll — lower limit of modal class
  • f0f_0 — frequency of the modal class
  • f1f_1 — frequency of the class preceding the modal class
  • f2f_2 — frequency of the class succeeding the modal class
  • hh — class interval of modal class

5. Relation Between Mean, Median, and Mode:

In a moderately asymmetric distribution, the following is the relation between a distribution's mean, median, and mode. It is known as the empirical formula.

Mode=3×Median2×Mean\text{Mode} = 3 \times \text{Median} - 2 \times \text{Mean}

Note:

(i) The median always lies between the mean and mode.

(ii) The mean, median, and mode coincide for a symmetric distribution.

6. Measures of Dispersion:

The dispersion of a statistical distribution is the measure of the deviation of its values about their average (central) value. Generally, the following measures of dispersion are commonly used:

(i) Range

(ii) Mean deviation

(iii) Variance and standard deviation

Range:

The difference between the greatest and least values of a variate of a distribution is called the range of that distribution. If the distribution is a grouped distribution, then its range is the difference between the upper limit of the maximum class and the lower limit of the minimum class.

Coefficient of range=Difference of extreme valuesSum of extreme values\text{Coefficient of range} = \frac{\text{Difference of extreme values}}{\text{Sum of extreme values}}

Mean Deviation (M.D.): The mean deviation of a distribution is the mean of the absolute value of deviations of variate from their statistical average (Mean, Median, Mode).

If AA is any statistical average of a distribution, then the mean deviation about AA is defined as

Mean deviation=i=1nxiAn(for ungrouped distribution)\text{Mean deviation} = \frac{\sum_{i=1}^{n} |x_i - A|}{n} \quad \text{(for ungrouped distribution)}

Mean deviation=i=1nfixiAN(for frequency distribution)\text{Mean deviation} = \frac{\sum_{i=1}^{n} f_i |x_i - A|}{N} \quad \text{(for frequency distribution)}

Note: It is minimum when taken about the median.

Coefficient of Mean deviation=Mean deviationA\text{Coefficient of Mean deviation} = \frac{\text{Mean deviation}}{A}

(where AA is the central tendency about which Mean deviation is taken)

Variance and Standard Deviation: The variance of a distribution is the mean of squares of deviations of variate from their mean. It is denoted by σ2\sigma^2 or var(x)\text{var}(x).

The positive square root of the variance is called the standard deviation. It is denoted by σ\sigma or S.D.

Standard deviation=variance\text{Standard deviation} = \sqrt{\text{variance}}

Formulae for Variance:

(i) For ungrouped distribution:

σx2=(xix)2n\sigma_x^2 = \frac{\sum (x_i - \overline{x})^2}{n}

σx2=xi2nx2=xi2n(xin)2\sigma_x^2 = \frac{\sum x_i^2}{n} - \overline{x}^2 = \frac{\sum x_i^2}{n} - \left(\frac{\sum x_i}{n}\right)^2

σx2=di2n(din)2where di=xia\sigma_x^2 = \frac{\sum d_i^2}{n} - \left(\frac{\sum d_i}{n}\right)^2 \quad \text{where } d_i = x_i - a

(ii) For frequency distribution:

σx2=fi(xix)2N\sigma_x^2 = \frac{\sum f_i (x_i - \overline{x})^2}{N}

σx2=fixi2Nx2=fixi2N(fixiN)2\sigma_x^2 = \frac{\sum f_i x_i^2}{N} - \overline{x}^2 = \frac{\sum f_i x_i^2}{N} - \left(\frac{\sum f_i x_i}{N}\right)^2

σd2=fidi2N(fidiN)2\sigma_d^2 = \frac{\sum f_i d_i^2}{N} - \left(\frac{\sum f_i d_i}{N}\right)^2

σu2=h2(fiui2N(fiuiN)2)where ui=xiah\sigma_u^2 = h^2 \left( \frac{\sum f_i u_i^2}{N} - \left(\frac{\sum f_i u_i}{N}\right)^2 \right) \quad \text{where } u_i = \frac{x_i - a}{h}

Some Results on Standard Deviation: 1. σx=0\sigma_x = 0 when all the variate values are equal.

2. σx\sigma_x is independent of change of origin but is dependent on the change of scale.

σa+bx=bσx\sigma_{a + bx} = |b| \sigma_x

Combined Standard Deviation: If x1\overline{x_1}, σ1\sigma_1, N1N_1 and x2\overline{x_2}, σ2\sigma_2, N2N_2 are the means, standard deviations, and number of observations of two distributions respectively, then the standard deviation σ\sigma of the combined distribution is given by

σ=N1σ12+N2σ22N1+N2+N1N2(x1x2)2(N1+N2)2\sigma = \sqrt{\frac{N_1 \sigma_1^2 + N_2 \sigma_2^2}{N_1 + N_2} + \frac{N_1 N_2 (\overline{x_1} - \overline{x_2})^2}{(N_1 + N_2)^2}}

Quartile Deviation:

Quartiles divide a series into four equal parts.

Formulae for Quartiles:

(i) For ungrouped distribution:

The quartiles are the [(n+1)4]th[\frac{(n+1)}{4}]^{\text{th}} and [3(n+1)4]th[\frac{3(n+1)}{4}]^{\text{th}} terms.

(ii) For frequency distribution:

Prepare the cumulative frequency column and find N4\frac{N}{4} and 3N4\frac{3N}{4}.

Interquartile Range (Q):

Q=Q3Q1Q = Q_3 - Q_1

Quartile Deviation:

Q.D.=Q3Q12Q.D. = \frac{Q_3 - Q_1}{2}

[Coefficient of Q.D.=Q3Q1Q3+Q1][\text{Coefficient of Q.D.} = \frac{Q_3 - Q_1}{Q_3 + Q_1}]

7. Moments, Skewness, and Kurtosis:

Moments: The rrth central moment about the mean x\overline{x} of a distribution is given by

[μr=(xix)rnorfi(xix)rN][\mu_r = \frac{\sum (x_i - \overline{x})^r}{n} \quad \text{or} \quad \frac{\sum f_i (x_i - \overline{x})^r}{N} ]

The first four central moments are used to describe the shape of a distribution.

Skewness: Skewness measures the asymmetry of a distribution. It is given by

β1=μ32μ23\beta_1 = \frac{\mu_3^2}{\mu_2^3}

Coefficient of Skewness=μ3μ23/2\text{Coefficient of Skewness} = \frac{\mu_3}{\mu_2^{3/2}}

Kurtosis: Kurtosis measures the peakedness or flatness of a distribution. It is given by

β2=μ4μ22\beta_2 = \frac{\mu_4}{\mu_2^2}

Coefficient of Kurtosis=μ4μ223\text{Coefficient of Kurtosis} = \frac{\mu_4}{\mu_2^2} - 3

Related Pages:\color{red} \bold{Related \space Pages:}
Function Relation Formula Sheet
Vector operation Calculators
Vector Formula sheet