# Regression Analysis and Linear Models

## Concepts, Applications, and Implementation

### Richard B. Darlington and Andrew F. Hayes

1. Statistical Control and Linear Models

1.1 Statistical Control

1.1.1 The Need for Control

1.1.2 Five Methods of Control

1.1.3 Examples of Statistical Control

1.2 An Overview of Linear Models

1.2.1 What You Should Know Already

1.2.2 Statistical Software for Linear Modeling and Statistical Control

1.2.3 About Formulas

1.2.4 On Symbolic Representations

1.3 Chapter Summary

2. The Simple Regression Model

2.1 Scatterplots and Conditional Distributions

2.1.1 Scatterplots

2.1.2 A Line through Conditional Means

2.1.3 Errors of Estimate

2.2 The Simple Regression Model

2.2.1 The Regression Line

2.2.2 Variance, Covariance, and Correlation

2.2.3 Finding the Regression Line

2.2.4 Example Computations

2.2.5 Linear Regression Analysis by Computer

2.3 The Regression Coefficient versus the Correlation Coefficient

2.3.1 Properties of the Regression and Correlation Coefficients

2.3.2 Uses of the Regression and Correlation Coefficients

2.4 Residuals

2.4.1 The Three Components of Y

2.4.2 Algebraic Properties of Residuals

2.4.3 Residuals as Y Adjusted for Differences in X

2.4.4 Residual Analysis

2.5 Chapter Summary

3. Partial Relationship and the Multiple Regression Model

3.1 Regression Analysis with More Than One Predictor Variable

3.1.1 An Example

3.1.2 Regressors

3.1.3 Models

3.1.4 Representing a Model Geometrically

3.1.5 Model Errors

3.1.6 An Alternative View of the Model

3.2 The Best-Fitting Model

3.2.1 Model Estimation with Computer Software

3.2.2 Partial Regression Coefficients

3.2.3 The Regression Constant

3.2.4 Problems with Three or More Regressors

3.2.5 The Multiple Correlation R

3.3 Scale-Free Measures of Partial Association

3.3.1 Semipartial Correlation

3.3.2 Partial Correlation

3.3.3 The Standardized Regression Coefficient

3.4 Some Relations among Statistics

3.4.1 Relations among Simple, Multiple, Partial, and Semipartial Correlations

3.4.2 Venn Diagrams

3.4.3 Partial Relationships and Simple Relationships May Have Different Signs

3.4.4 How Covariates Affect Regression Coefficients

3.4.5 Formulas for bj, prj, srj, and R

3.5 Chapter Summary

4. Statistical Inference in Regression

4.1 Concepts in Statistical Inference

4.1.1 Statistics and Parameters

4.1.2 Assumptions for Proper Inference

4.1.3 Expected Values and Unbiased Estimation

4.2 The ANOVA Summary Table

4.2.1 Data = Model + Error

4.2.2 Total and Regression Sums of Squares

4.2.3 Degrees of Freedom

4.2.4 Mean Squares

4.3 Inference about the Multiple Correlation

4.3.1 Biased and Less Biased Estimation of TR2

4.3.2 Testing a Hypothesis about TR

4.4 The Distribution of and Inference about a Partial Regression Coefficient

4.4.1 Testing a Null Hypothesis about Tbj

4.4.2 Interval Estimates for Tbj

4.4.3 Factors Affecting the Standard Error of bj

4.4.4 Tolerance

4.5 Inferences about Partial Correlations

4.5.1 Testing a Null Hypothesis about Tprj and Tsrj

4.5.2 Other Inferences about Partial Correlations

4.6 Inferences about Conditional Means

4.7 Miscellaneous Issues in Inference

4.7.1 How Great a Drawback Is Collinearity?

4.7.2 Contradicting Inferences

4.7.3 Sample Size and Nonsignificant Covariates

4.7.4 Inference in Simple Regression (When k = 1)

4.8 Chapter Summary

5. Extending Regression Analysis Principles

5.1 Dichotomous Regressors

5.1.1 Indicator or Dummy Variables

5.1.2 Y Is a Group Mean

5.1.3 The Regression Coefficient for an Indicator Is a Difference

5.1.4 A Graphic Representation

5.1.5 A Caution about Standardized Regression Coefficients for Dichotomous Regressors

5.1.6 Artificial Categorization of Numerical Variables

5.2 Regression to the Mean

5.2.1 How Regression Got Its Name

5.2.2 The Phenomenon

5.2.3 Versions of the Phenomenon

5.2.4 Misconceptions and Mistakes Fostered by Regression to the Mean

5.2.5 Accounting for Regression to the Mean Using Linear Models

5.3 Multidimensional Sets

5.3.1 The Partial and Semipartial Multiple Correlation

5.3.2 What It Means If PR = 0 or SR = 0

5.3.3 Inference Concerning Sets of Variables

5.4 A Glance at the Big Picture

5.4.1 Further Extensions of Regression

5.4.2 Some Difficulties and Limitations

5.5 Chapter Summary

6. Statistical versus Experimental Control

6.1 Why Random Assignment?

6.1.1 Limitations of Statistical Control

6.1.2 The Advantage of Random Assignment

6.1.3 The Meaning of Random Assignment

6.2 Limitations of Random Assignment

6.2.1 Limitations Common to Statistical Control and Random Assignment

6.2.2 Limitations Specific to Random Assignment

6.2.3 Correlation and Causation

6.3 Supplementing Random Assignment with Statistical Control

6.3.1 Increased Precision and Power

6.3.2 Invulnerability to Chance Differences between Groups

6.3.3 Quantifying and Assessing Indirect Effects

6.4 Chapter Summary

7. Regression for Prediction

7.1 Mechanical Prediction and Regression

7.1.1 The Advantages of Mechanical Prediction

7.1.2 Regression as a Mechanical Prediction Method

7.1.3 A Focus on R Rather Than the Regression Weights

7.2 Estimating True Validity

7.2.1 Shrunken versus Adjusted R

7.2.2 Estimating TRS

7.2.3 Shrunken R Using Statistical Software

7.3 Selecting Predictor Variables

7.3.1 Stepwise Regression

7.3.2 All Subsets Regression

7.3.3 How Do Variable Selection Methods Perform?

7.4 Predictor Variable Configurations

7.4.1 Partial Redundancy (the Standard Configuration)

7.4.2 Complete Redundancy

7.4.3 Independence

7.4.4 Complementarity

7.4.5 Suppression

7.4.6 How These Configurations Relate to the Correlation between Predictors

7.4.7 Configurations of Three or More Predictors

7.5 Revisiting the Value of Human Judgment

7.6 Chapter Summary

8. Assessing the Importance of Regressors

8.1 What Does It Mean for a Variable to Be Important?

8.1.1 Variable Importance in Substantive or Applied Terms

8.1.2 Variable Importance in Statistical Terms

8.2 Should Correlations Be Squared?

8.2.1 Decision Theory

8.2.2 Small Squared Correlations Can Reflect Noteworthy Effects

8.2.3 Pearson’s r as the Ratio of a Regression Coefficient to Its Maximum Possible Value

8.2.4 Proportional Reduction in Estimation Error

8.2.5 When the Standard Is Perfection

8.2.6 Summary

8.3 Determining the Relative Importance of Regressors in a Single Regression Model

8.3.1 The Limitations of the Standardized Regression Coefficient

8.3.2 The Advantage of the Semipartial Correlation

8.3.3 Some Equivalences among Measures

8.3.4 Cohen’s f 2

8.3.5 Comparing Two Regression Coefficients in the Same Model

8.4 Dominance Analysis

8.4.1 Complete and Partial Dominance

8.4.2 Example Computations

8.4.3 Dominance Analysis Using a Regression Program

8.5 Chapter Summary

9. Multicategorical Regressors

9.1 Multicategorical Variables as Sets

9.1.1 Indicator (Dummy) Coding

9.1.2 Constructing Indicator Variables

9.1.3 The Reference Category

9.1.4 Testing the Equality of Several Means

9.1.5 Parallels with Analysis of Variance

9.1.6 Interpreting Estimated Y and the Regression Coefficients

9.2 Multicategorical Regressors as or with Covariates

9.2.1 Multicategorical Variables as Covariates

9.2.2 Comparing Groups and Statistical Control

9.2.3 Interpretation of Regression Coefficients

9.2.4 Adjusted Means

9.2.5 Parallels with ANCOVA

9.2.6 More Than One Covariate

9.3 Chapter Summary

10. More on Multicategorical Regressors

10.1 Alternative Coding Systems

10.1.1 Sequential (Adjacent or Repeated Categories) Coding

10.1.2 Helmert Coding

10.1.3 Effect Coding

10.2 Comparisons and Contrasts

10.2.1 Contrasts

10.2.2 Computing the Standard Error of a Contrast

10.2.3 Contrasts Using Statistical Software

10.2.4 Covariates and the Comparison of Adjusted Means

10.3 Weighted Group Coding and Contrasts

10.3.1 Weighted Effect Coding

10.3.2 Weighted Helmert Coding

10.3.3 Weighted Contrasts

10.3.4 Application to Adjusted Means

10.4 Chapter Summary

11. Multiple Tests

11.1 The Multiple-Test Problem

11.1.1 An Illustration through Simulation

11.1.2 The Problem Defined

11.1.3 The Role of Sample Size

11.1.4 The Generality of the Problem

11.1.5 Do Omnibus Tests Offer “Protection”?

11.1.6 Should You Be Concerned about the Multiple-Test Problem?

11.2 The Bonferroni Method

11.2.1 Independent Tests

11.2.2 The Bonferroni Method for Nonindependent Tests

11.2.3 Revisiting the Illustration

11.2.4 Bonferroni Layering

11.2.5 Finding an “Exact” p-Value

11.2.6 Nonsense Values

11.2.7 Flexibility of the Bonferroni Method

11.2.8 Power of the Bonferroni Method

11.3 Some Basic Issues Surrounding Multiple Tests

11.3.1 Why Correct for Multiple Tests at All?

11.3.2 Why Not Correct for the Whole History of Science?

11.3.3 Plausibility and Logical Independence of Hypotheses

11.3.4 Planned versus Unplanned Tests

11.4 Summary

11.5 Chapter Summary

12. Nonlinear Relationships

12.1 Linear Regression Can Model Nonlinear Relationships

12.1.1 When Must Curves Be Fitted?

12.1.2 The Graphical Display of Curvilinearity

12.2 Polynomial Regression

12.2.1 Basic Principles

12.2.2 An Example

12.2.3 The Meaning of the Regression Coefficients for Lower-Order Regressors

12.2.4 Centering Variables in Polynomial Regression

12.2.5 Finding a Parabola’s Maximum or Minimum

12.3 Spline Regression

12.3.1 Linear Spline Regression

12.3.2 Implementation in Statistical Software

12.3.3 Polynomial Spline Regression

12.3.4 Covariates, Weak Curvilinearity, and Choosing Joints

12.4 Transformations of Dependent Variables or Regressors

12.4.1 Logarithmic Transformation

12.4.2 The Box–Cox Transformation

12.5 Chapter Summary

13. Linear Interaction

13.1 Interaction Fundamentals

13.1.1 Interaction as a Difference in Slope

13.1.2 Interaction between Two Numerical Regressors

13.1.3 Interaction versus Intercorrelation

13.1.4 Simple Linear Interaction

13.1.5 Representing Simple Linear Interaction with a Cross-product

13.1.6 The Symmetry of Interaction

13.1.7 Interaction as a Warped Surface

13.1.8 Covariates in a Regression Model with an Interaction

13.1.9 The Meaning of the Regression Coefficients

13.1.10 An Example with Estimation Using Statistical Software

13.2 Interaction Involving a Categorical Regresson

13.2.1 Interaction between a Dichotomous and a Numerical Regressor

13.2.2 The Meaning of the Regression Coefficients

13.2.3 Interaction Involving a Multicategorical and a Numerical Regressor

13.2.4 Inference When Interaction Requires More Than One Regression Coefficient

13.2.5 A Substantive Example

13.2.6 Interpretation of the Regression Coefficients

13.3 Interaction between Two Categorical Regressors

13.3.1 The 2 × 2 Design

13.3.2 Interaction between a Dichotomous and a Multicategorical Regressor

13.3.3 Interaction between Two Multicategorical Regressors

13.4 Chapter Summary

14. Probing Interactions and Various Complexities

14.1 Conditional Effects as Functions

14.1.1 When the Interaction Involves Dichotomous or Numerical Variables

14.1.2 When the Interaction Involves a Multicategorical Variable

14.2 Inference about a Conditional Effect

14.2.1 When the Focal Predictor and Moderator Are Numerical or Dichotomous

14.2.2 When the Focal Predictor or Moderator Is Multicategorical

14.3 Probing an Interaction

14.3.1 Examining Conditional Effects at Various Values of the Moderator

14.3.2 The Johnson–Neyman Technique

14.3.3 Testing versus Probing an Interaction

14.3.4 Comparing Conditional Effects

14.4 Complications and Confusions in the Study of Interactions

14.4.1 The Difficulty of Detecting Interactions

14.4.2 Confusing Interaction with Curvilinearity

14.4.3 How the Scaling of Y Affects Interaction

14.4.4 The Interpretation of Lower-Order Regression Coefficients When a Cross-Product Is Present

14.4.5 Some Myths about Testing Interaction

14.4.6 Interaction and Nonsignificant Linear Terms

14.4.7 Homogeneity of Regression in ANCOVA

14.4.8 Multiple, Higher-Order, and Curvilinear Interactions

14.4.9 Artificial Categorization of Continua

14.5 Organizing Tests on Interaction

14.5.1 Three Approaches to Managing Complications

14.5.2 Broad versus Narrow Tests

14.6 Chapter Summary

15. Mediation and Path Analysis

15.1 Path Analysis and Linear Regression

15.1.1 Direct, Indirect, and Total Effects

15.1.2 The Regression Algebra of Path Analysis

15.1.3 Covariates

15.1.4 Inference about the Total and Direct Effects

15.1.5 Inference about the Indirect Effect

15.1.6 Implementation in Statistical Software

15.2 Multiple Mediator Models

15.2.1 Path Analysis for a Parallel Mediation Model

15.2.2 Path Analysis for a Serial Mediation Model

15.3 Extensions, Complications, and Miscellaneous Issues

15.3.1 Causality and Causal Order

15.3.2 The Causal Steps Approach

15.3.3 Mediation of a Nonsignificant Total Effect

15.3.4 Multicategorical Independent Variables

15.3.5 Fixing Direct Effects to Zero

15.3.6 Nonlinear Effects

15.3.7 Moderated Mediation

15.4 Chapter Summary

16. Detecting and Managing Irregularities

16.1 Regression Diagnostics

16.1.1 Shortcomings of Eyeballing the Data

16.1.2 Types of Extreme Cases

16.1.3 Quantifying Leverage, Distance, and Influence

16.1.4 Using Diagnostic Statistics

16.1.5 Generating Regression Diagnostics with Computer Software

16.2 Detecting Assumption Violations

16.2.1 Detecting Nonlinearity

16.2.2 Detecting Non-Normality

16.2.3 Detecting Heteroscedasticity

16.2.4 Testing Assumptions as a Set

16.2.5 What about Nonindependence?

16.3 Dealing with Irregularities

16.3.1 Heteroscedasticity-Consistent Standard Errors

16.3.2 The Jackknife

16.3.3 Bootstrapping

16.3.4 Permutation Tests

16.4 Inference without Random Sampling

16.5 Keeping the Diagnostic Analysis Manageable

16.6 Chapter Summary

17. Power, Measurement Error, and Various Miscellaneous Topics

17.1 Power and Precision of Estimation

17.1.1 Factors Determining Desirable Sample Size

17.1.2 Revisiting the Standard Error of a Regression Coefficient

17.1.3 On the Effect of Unnecessary Covariates

17.2 Measurement Error

17.2.1 What Is Measurement Error?

17.2.2 Measurement Error in Y

17.2.3 Measurement Error in Independent Variables

17.2.4 The Biggest Weakness of Regression: Measurement Error in Covariates

17.2.5 Summary: The Effects of Measurement Error

17.2.6 Managing Measurement Error

17.3 An Assortment of Problems

17.3.1 Violations of the Basic Assumptions

17.3.2 Collinearity

17.3.3 Singularity

17.3.4 Specification Error and Overcontrol

17.3.5 Noninterval Scaling

17.3.6 Missing Data

17.3.7 Rounding Error

17.4 Chapter Summary

18. Logistic Regression and Other Linear Models

18.1 Logistic Regression

18.1.1 Measuring a Model’s Fit to Data

18.1.2 Odds and Logits

18.1.3 The Logistic Regression Equation

18.1.4 An Example with a Single Regressor

18.1.5 Interpretation of and Inference about the Regression Coefficients

18.1.6 Multiple Logistic Regression and Implementation in Computing Software

18.1.7 Measuring and Testing the Fit of the Model

18.1.8 Further Extensions

18.1.9 Discriminant Function Analysis

18.1.10 Using OLS Regression with a Dichotomous Y

18.2 Other Linear Modeling Methods

18.2.1 Ordered Logistic and Probit Regression

18.2.2 Poisson Regression and Related Models of Count Outcomes

18.2.3 Time Series Analysis

18.2.4 Survival Analysis

18.2.5 Structural Equation Modeling

18.2.6 Multilevel Modeling

18.2.7 Other Resources

18.3 Chapter Summary

Appendices

A. The RLM Macro for SPSS and SAS

B. Linear Regression Analysis Using R

C. Statistical Tables

D. The Matrix Algebra of Linear Regression Analysis

Author Index

Subject Index

References

About the Authors