Social Statistics/Key Terms

This is a list of the key terms from each chapter, for sake of convenience.
 * Chapter 1:
 * Conceptualization is the process of developing a theory about some aspect of the social world.
 * Cases are the individuals or entities about which data have been collected.
 * Databases are arrangements of data into variables and cases.
 * Dependent variables are variables that are thought to depend on other variables in a model.
 * Generalization is the act of turning theories about specific situations into theories that apply to many situations.
 * Independent variables are variables that are thought to cause the dependent variables in a model.
 * Metadata are additional attributes of cases that are not meant to be included in analyses.
 * Operationalization is the process of turning a social theory into specific hypotheses about real data.
 * Scatter plots are very simple statistical models that depict data on a graph.
 * Statistical models are mathematical simplifications of the real world.
 * Variables are analytically meaningful attributes of cases.


 * Chapter 2:
 * Expected values are the values that a dependent variable would be expected to have based solely on values of the independent variable.
 * Linear regression models are statistical models in which expected values of the dependent variable are thought to rise or fall in a straight line according to values of the independent variable.
 * Outliers are data points in a statistical model that are far away from most of the other data points.
 * Regression error is the degree to which an expected value of a dependent variable in a linear regression model differs from its actual value.
 * Robustness is the extent to which statistical models give similar results despite changes in operationalization.
 * Slope is the change in the expected value of the dependent variable divided by the change in the value of the independent variable.


 * Chapter 3:
 * Extrapolation is the process of using a regression model to compute predicted values inside the range of the observed data.
 * Intercepts are the places where regression lines cross the dependent variable axis in a scatter plot.
 * Interpolation is the process of using a regression model to compute predicted values inside the range of the observed data.
 * Predicted values are expected values of a dependent variable that correspond to selected values of the independent variable.
 * Regression coefficients are the slopes and intercepts that define regression lines.


 * Chapter 4:
 * Conditional means are the expected values of dependent variables for specific groups of cases.
 * Degrees of freedom are the number of errors in a model that are actually free to vary.
 * Mean models are very simple statistical models in which a variable has just one expected value, its mean.
 * Means are the expected values of variables.
 * Parameters are the figures associated with statistical models, like means and regression coefficients.
 * Regression error standard deviation is a measure of the amount of spread in the error in a regression model.
 * Standard deviation is a measure of the amount of spread in a variable, which is the same thing as the amount of spread in the error in a mean model.


 * Chapter 5:
 * Case-specific error is error resulting from any of the millions of influences and experiences that may cause a specific case to have a value that is different from its expected value.
 * Descriptive statistics is the use of statistics to describe the data we actually have in hand.
 * Inferential statistics is the use of statistics to make conclusions about characteristics of the real world underlying our data.
 * Measurement error is error resulting from accidents, mistakes, or misunderstandings in the measurement of a variable.
 * Observed parameters are the actually observed values of parameters like means, intercepts, and slopes based on the data we actually have in hand.
 * Sampling error is error resulting from the random chance of which research subjects are included in a sample.
 * Standard error is a measure of the amount of error associated with an observed parameter.
 * True parameters are the true values of parameters like means, intercepts, and slopes based on the real (but unobserved) characteristics of the world.


 * Chapter 6:
 * Paired samples are databases in which each case represents two linked observations.
 * Statistical significance is when a statistical result is so large that is unlikely to have occurred just by chance.
 * Substantive significance is when a statistical result is large enough to be meaningful in the view of the researcher and society at large.
 * t statistics are measures based on observed parameters that are used to make specific inferences about the probabilities of true parameters.


 * Chapter 7:
 * Complementary controls are control variables that complement an independent variable of interest by unmasking its explanatory power in a multiple regression model.
 * Competing controls are control variables that compete with an independent variable of interest by splitting its explanatory power in a multiple regression model.
 * Control variables are variables that are "held constant" in a multiple regression analysis in order to highlight the effect of a particular independent variable of interest.
 * Multicausal models are statistical models that have one dependent variable but two or more independent variables.
 * Multiple linear regression models are statistical models in which expected values of the dependent variable are thought to rise or fall in a straight lines according to values of two or more independent variables.
 * Predictors are the independent variables in regression models.


 * Chapter 8:
 * Correlation (r) is a measure of the strength of the relationship between two variables that runs from r = −1 (perfect negative correlation) through r = 0 (no correlation) to r = +1 (perfect positive correlation).
 * R2 is a measure of the proportion of the total variability in the dependent variable that is explained by a regression model.
 * Standardized coefficients are the coefficients of regression models that have been estimated using standardized variables.
 * Standardized variables are variables that have been transformed by subtracting the mean from every observed value and then dividing by the standard deviation.
 * Unstandardized coefficients are the coefficients of regression models that have been estimated using original unstandardized variables.
 * Unstandardized variables are variables that are expressed in their original units.


 * Chapter 9:
 * Base models are initial models that include all of the background independent variables in an analysis that are not of particular theoretical interest for a regression analysis.
 * Confounding variables are variables that might affect both the dependent variable and an independent variable of interest.
 * Explanatory models are regression models that are primarily intended to be used for evaluating different theories for explaining the differences between cases in their values of the dependent variable.
 * Parsimony is the virtue of using simple models that are easy to understand and interpret.
 * Predictive models are regression models that are primarily intended to be used for making predictions about dependent variables as outcomes.
 * Saturated models are final models that include all of the variables used in a series of models in an analysis.


 * Chapter 10:
 * Analysis of variance (ANOVA) is a type of regression model that focuses on the proportion of the total variability in a dependent variable that is explained by a categorical variable.
 * ANOVA variables are the numerical variables in a regression model that together describe the effects of categorical group memberships.
 * Categorical variables are variables that divide cases into two or more groups.
 * Mixed models are regression models that include both ANOVA components and ordinary independent variables.
 * Numerical variables are variables that take numerical values that represent meaningful orderings of the cases from lower numbers to highest numbers.
 * Reference groups are the groups that are set aside in ANOVA variables and not explicitly included as variables in ANOVA models.


 * Chapter 11:
 * Interaction effects are the coefficients of the interaction variables in an interaction model.
 * Interaction models are regression models that allow the slopes of some variables to differ for different categorical groups.
 * Interaction variables are variables created by multiplying an ANOVA variable by an independent variable of interest.
 * Intercept effects are the coefficients of the ANOVA variables in an interaction model.
 * Main effects are the coefficients of the independent variable of interest in an interaction model for the reference group.