# Relationship between correlation and regression for dummies

### Introduction to Correlation and Regression Analysis

Three main reasons for correlation and regression together are, 1) Test a hypothesis for causality, 2) See association between variables, 3) Estimating a value of. Correlation is a measure of association between two variables. The value of a correlation coefficient can vary from minus one to plus one. . that 49% of the variance in the dependent variable can be explained by the regression equation. You simply are computing a correlation coefficient (r) that tells you how much one variable tends to change when the other one does. When r is.

Linear regression finds the best line that predicts Y from X. Correlation does not fit a line. What kind of data?

## What is the difference between correlation and linear regression?

Correlation is almost always used when you measure both variables. It rarely is appropriate when one variable is something you experimentally manipulate. Linear regression is usually used when X is a variable you manipulate time, concentration, etc.

• Introduction to Correlation and Regression Analysis
• Correlation and Regression
• Difference Between Correlation and Regression

Does it matter which variable is X and which is Y? With correlation, you don't have to think about cause and effect. It doesn't matter which of the two variables you call "X" and which you call "Y".

You'll get the same correlation coefficient if you swap the two. Either a simple or multiple regression model is initially posed as a hypothesis concerning the relationship among the dependent and independent variables. The least squares method is the most widely used procedure for developing estimates of the model parameters. As an illustration of regression analysis and the least squares method, suppose a university medical centre is investigating the relationship between stress and blood pressure.

Assume that both a stress test score and a blood pressure reading have been recorded for a sample of 20 patients. The data are shown graphically in the figure below, called a scatter diagram. Values of the independent variable, stress test score, are given on the horizontal axis, and values of the dependent variable, blood pressure, are shown on the vertical axis.

The line passing through the data points is the graph of the estimated regression equation: A correlation close to zero suggests no linear association between two continuous variables.

### Correlation and Regression

You say that the correlation coefficient is a measure of the "strength of association", but if you think about it, isn't the slope a better measure of association?

We use risk ratios and odds ratios to quantify the strength of association, i. The analogous quantity in correlation is the slope, i. And "r" or perhaps better R-squared is a measure of how much of the variability in the dependent variable can be accounted for by differences in the independent variable. The analogous measure for a dichotomous variable and a dichotomous outcome would be the attributable proportion, i.

Therefore, it is always important to evaluate the data carefully before computing a correlation coefficient. Graphical displays are particularly useful to explore associations between variables. The figure below shows four hypothetical scenarios in which one continuous variable is plotted along the X-axis and the other along the Y-axis.

Scenario 3 might depict the lack of association r approximately 0 between the extent of media exposure in adolescence and age at which adolescents initiate sexual activity. Example - Correlation of Gestational Age and Birth Weight A small study is conducted involving 17 infants to investigate the association between gestational age at birth, measured in weeks, and birth weight, measured in grams.

Correlation & Regression: Concepts with Illustrative examples

We wish to estimate the association between gestational age and infant birth weight. In this example, birth weight is the dependent variable and gestational age is the independent variable.

The data are displayed in a scatter diagram in the figure below. Each point represents an x,y pair in this case the gestational age, measured in weeks, and the birth weight, measured in grams. Note that the independent variable is on the horizontal axis or X-axisand the dependent variable is on the vertical axis or Y-axis.