# Statistics

Statistics is the field of Mathematics that collects, organizes, analyses, and interprets data.
Statistics is also the term used for the data.
When determining what is to be studied, it is important to know what the population to be studied is.
Is it going to be all cats, all long haired cats, or all curly haired dogs?
To study the whole population you would use a census.
However, most of the time, the population is too large, so a sample is chosen. A sample will be a subset of the whole population. There are several different Sampling Techniques.
Cluster
Convenience
Random
Stratified
Systematic
.
The data that is collected will be of two different Data Types.
Qualitative Data: names & similar
Quantitative Data: numbers, which can be discrete (only particular numbers) or continuous (any real number value).
Data can be further sorted into Levels of Measurement:
Nominal: data can only be categorized, i.e. named
Ordinal: data can be ordered or ranked
Interval: data can be used for calculations, zero does not mean 'none'.
Ratio: data can be used for calculations, cannot be negative, zero means 'none'.
Videos on Data Types and Levels of Measurement are on the Data Types page.

Now the data can be organized and displayed using a variety of methods:
Frequency Distribution is a chart which includes how many(frequency) of each piece of data
A histogram is a bar chart displaying the frequency on the vertical, with the classes on the horizontal
A frequency polygon is a line graph, similar to the histogram. Often the ends of the graph are connected to the x-axis.
A pie chart is a circular graph, often used for comparing parts of the data to all of the data.
A Stem and Leaf Display is a chart which lists each piece of data
Videos on Charting and Graphing Data are on the Graphing Data page.

Where is one value in relation to the rest of the data?
When you have a lot of data, normally over 100 pieces of data, sorting it into percentiles may be a good idea. The data will be divided into 100 sections (percent = per 100). We often see this with nationwide Statistics, like a child's height in the 65th percentile.
Similar to percentiles, quartiles are good for large amounts of data, but can also be useful when there is under 100 pieces of data. The quartiles are found by finding the median of all the data (Q2), then the medians of the 'halves' of data (Q1 & Q3).
The main 'averages' are the mean, median, mode, and midrange. The Mean and Median are terms most people are familiar with.
The Mean is the arithmetic average, where you add all your pieces of data and divide by how many there are.
The Median is the physical middle of the data.
The Mode is the piece of data that occurs most often.
The Midrange is when you take the smallest & largest and only 'average' those 2.
When you want to know how far a piece of data is from the mean, the Standard Deviation is the calculation you need. This calculation with the Empirical Rule will help you keep track of how much of the data is within a certain 'distance' from the mean.
Another tricky but useful calculation, is the Correlation Coefficient, r. When r is close to 1 or -1 there may be a relation bewteen the two variables, when it is close to 0 there is no relation between them. Do not confuse relation with cause and effect.
Videos on these calculations are on the Relational Calculations page.

Definitions and Formula are from numerous years of teaching the topics, but have recently been double checked with:
Borowski, E. J., & Borwein, J. M. (2006). Collins Web-linked dictionary of mathematics. New York, NY: HarperCollins Pub.