Correlation and Confidence Level

Pearson Correlation coefficient: r

Pearson is a measure of the the strength of a linear relationship between 2 variables. The Pearson Product Moment attempts to draw a line of best fit through the data of 2 variables and r quantifies how far any data ppoints are from the line. is the correlation coefficient for the sample an is the correlation coefficient for the population.

  • if , no relationship (not linearly correlated)
  • => strong positive relationship
  • < strong negative relationship

The coefficient of determination is the square of the Pearson correlation :

Pearson correlation assumption:

  • interval or ratio level
  • linearly related
  • bivariate normally distributed

If the data does not meet the above assumptions, then use Spearman’s rank correlation.

t-test

The t-test is used to establish if the correlation coefficient is significantly different from zero, and hence that there is evidence of a relationship between the 2 variables. Formula for the t-test for the correlation coefficient:

is called the t-score, with degrees of freedom , i.e the number of pairs (x, y) of data minus 2.

Consult a t-table to compare the calculated t-value (t-score) with the critical t value to determine statistical significance. The probability that the correlation between and is simply due to error or chance is less than the calculated value.

The score formula is;

where is the sample mean, the population mean and the sample standard deviation and the sample size.

We use the degrees of freedom along with the confidence level we are willing to accept to decide whether to support or reject the Null hypothesis.

p-test

The p-test determines if the correlation between variables is significant by comparing the p-value to the significance level . Typically, : it means the risk of concluding that the correlation exists when actually there is no correlation is 5%. The level of confidence , tells how confident we are in our decision. The level of signiicance is : .

p-value is used for testing statistical hypothesis: to test or reject the null hypothesis.

  • : The null hypothesis is the hypothesis that there is no difference or no correlation between specified populations and , any observed difference being due to sampling or experimental error.
  • : the alternative hypothesis means that there is a significant correlation in the population.

The null hypothesis and the alternative hypothesis are opposite. For example and . We always assume the Null hypothesis is true, and the result of the test is:

  1. reject the null hypothesis
  2. Fail to reject the null hypothesis

The p-value for Pearson correlation uses the t-distribution:

  • strong evidence against the null hypothesis - reject the null hypothesis
  • weak evidence against the null hypothesis

** Decision rule:**

  1. if , the correlation is statistically significant

  2. if , the correlation is not statistically significant

The size of the sample population has an influence on the significance.

  • a strong relationship , may not be significant
  • a weak relationship , can be significant

Correlation coefficients

Pearson correlations can be used when the residuals have a normal distribution. If the residual are not normal, use Spearman correlations.

APearson correlation coefficient is when the outcome variable is continuous, the independent variable is continuous and we want to see if the 2 variables have a systematic relationship. Spearman correlation is similar but it also tsts for non linear relationship.