**Measures of Central Tendency **

__Mean:__the average of the data. Is not outlier resistant__Median:__midpoint of data. If even number of data points, then median will be the average of two points. Outlier resistant.- If mean and median far from each other, indicates presence of outliers or skewed distribution.

__Mode:__number that appears the most often in a set of data.

**Distributions**

__Normal Distribution:__can transform any normal distribution to a**standard distribution**with a mean of zero and a standard deviation one 1.__Skewed Distribution:__Contains a**tail**on one side of the data set and is thus not symmetric.__Negative-Skewed:__has a tail on the left, mean will be lower than the median__Positively-skewed:__has a tail on the right, mean will be larger than the median.

__Bimodal Distribution:__Has two peaks, can sometimes be measured as two different distributions.

**Measures of Distribution**

__Range:__difference between the largest and smallest values of a data set. Heavily affected by presence of data outliers. Standard deviation can be approximated as ¼ * range__Interquartile Range:__The third quartile minus the first quartile__Quartiles:__divide data into groups that comprise one-fourth of the entire data set.- To calculate position of first quartile: sort data in ascending order and multiply
*n*by 1/4 - If this is a whole number, the quartile is the mean of the value at this position and the next highest position
- If this is a decimal, round up to the next whole number and take that as the quartile position.
- For 3
^{rd}quartile, multiply n by 3/4. Do same process as first quartile.

- To calculate position of first quartile: sort data in ascending order and multiply
- Outliers are those points that fall outside of 1.5*IQR

__Standard Deviation:__- If data point falls more than three standard deviations from the mean, it is considered an outlier.
- On a normal distribution: 68-95-99 rule applies.

__Outliers:__usually results from one of three causes:- True statistical anomaly
- A measurement error
- Distribution is not approximated by a normal distribution.

**Probability **

__Mutually Exclusive Outcomes:__cannot occur at the same time__Exhaustive set of outcomes:__no other possible outcomes.

** Calculations **

- For independent events, probability of two or more events occurring at the same time is the product of their probabilities alone
- The probability of at least one of two events occurring is equal to the sum of their initial probabilities minus the probability that will both occur.

**Statistical Testing**

**Hypothesis Testing**

__Null Hypothesis:__hypothesis of equivalence, says that two populations are equal.__Alternative Hypothesis:__non-direction (not equal) or direction (greater than or less than)- Z-tests or t-tests are commonly used tests.
**Test Statistic**is calculated form collected data, and compared to a table in order to determine the likelihood that the statistic was obtained by random choice. This likelihood is known as the**p-value**. - If p-value >
**level of significance**(usually 0.05) then the null hypothesis cannot be rejected.- When null is rejected, results are statistically significant since there is a difference between the two groups.
- Level of significance is the level of risk that is accepted for incorrectly rejecting the null hypothesis. Also known as a
**type I error**. __Type I Error:__Likelihood that we report a difference between the two population when one does not actually exist__Type II Error:__incorrectly fail to reject the null hypothesis. When no difference is reported when there actually is one. (b)__Power:__the probability of correctly rejecting the null hypothesis: 1-b__Confidence:__the probability of correctly failing to reject the null hypothesis when no difference exists.

Ho True(no difference) | Ha true (difference exists) | |

Reject Ho | Type I error (a) | Power (1-B) |

Fail to Reject Ho | Confidence | Type II error (B) |

**Confidence Intervals **

- Reverse of hypothesis testing, start off with a desired confidence (usually 95%) and use a table to find corresponding Z/t values. Scores are then multiplied by standard deviation and then added/subtracted from the mean

**Charts, Graphs, and Tables**

**Types of Charts**

__Pie/Circle Charts:__represent relative amounts of entities. Loses impact as number of categories increases.__Bar Charts and Histograms:__Bar charts are used for categorical data, while histograms are for numerical data.__Box Plots:__used to show the range, median, quartiles and outliers for a set of data.**Box-and-whisker**is a labeled box plot.__Box:__bounded by Q1 and Q3, Q2 is the line in the middle (median).__End of Whiskers:__largest and smallest values in the data set that are not outliers.

__Maps:__data is demonstrated geographically

**Graphs and Axes**

__Linear Graphs:__can be linear, parabolic, exponential or logarithmic- Axes of a linear graph will have units that occupy the same amount of space
__Semilog and Log-Log Graphs:__changes are made to one or both of the**axis ratio’s**.

**Applying Data**

**Correlation**refers to a connection – direction relationship, inverse relationship, etc. – between data. This does not imply**causation**.