Recipes for the Design of Experiments/Chapter 1: One Factor, Two Level Experiments

1.1 One Factor, Two Level Experiments (Shamus W, Alexis Z)

In experimental design, the number of factors and levels dictate how effects are calculated and which statistical inference tests are used. A one factor two level experiment studies the effect of only one independent variable on the response of a dependent variable. Only two levels of the factor are studied in this experiment.

A main effect is the effect that the change in level of a factor has on the response. In a one factor, two level experiment, the main effect is the difference in the average of the response variable caused by the change of the factor from one level to the other level. An interaction effect is when the difference in response variable for one factor's levels is dependent on the levels of the other factor. In one factor experiments, there are no interaction effects.To be able to make inferences about a population, samples of the population are taken. These samples should be randomly selected and randomly assigned to a treatment, and the experimental runs should also be randomized. It is also assumed that the data are from a normal distribution. A normal distribution is when the probability distribution of a sample can be defined by the equation in the figure on the right: For one factor, two level experiments, a t-test is used to indicate whether the difference between the two sample response averages is due to a difference in the effect of the two levels, or randomization. T-tests are commonly used when sample size is small. The formula for a t-Test test statistic is provided on the right:

To determine whether we can reject the null hypothesis the test statistic needs to be compared to the critical t-value. The degrees of freedom and significance level are used to calculate the critical value. The critical t-value and test statistic can both be plotted on a distribution of test statistics. On this distribution, the “upper tail” region of the curve is region under the curve that is greater than the critical t-value. For the Null Hypothesis, we assume the variance of each sample (μ) is equal, μ1 is equal to μ2. If the test statistic is lower than the critical t-value, we accept the null hypothesis.

1.2 Main Effects (Liang Z, Joonhyuk B)

In this section, we would like to discuss about the Main Effects in experimental design. Especially, when our treatments have a factorial structure, we could utilize a factorial analysis of the data. The main concepts of this factorial analysis are main effects and interactions. When we design an experiment to analyze the problems we meet, we cannot avoid thinking about the effect relationships. For example, when economists study the difference of average hourly earnings (AHE) among people, they need to consider whether it is their education level, gender, or other effects causing the difference. In pharmaceutical science, scientists need to consider whether a particular medicine works for people. Furthermore, in a tennis racket experiment discussed during the class, we could change all factors of string tension, racket mass, balance of racket, and hardness of ball at the same time in order to check power (main effects). However, we cannot guarantee research results without interaction effects between factors. we are not sure which combination of factors has the greatest power because the paper does not consider interaction effects between factors. In this chapter, we start from the beginning – One Factor, Two Level Experiments. We will see what main effects are and how we test them.

1.2.1. Concept of Main Effect

A Main Effect is the effect of an independent variable on a dependent variable across the levels without considering other effects. In other words, Main effects are differences in means over levels of one factor collapsed over levels of the other factor. For example, the main effect of Method is simply the difference between the means of final exam score for the two levels of Method, ignoring or collapsing over experience. In One Factor, Two Level Experiments, we consider only one factor, and two levels. For instance, in AHE example above, if we want test whether studying in college has effect on AHE difference, we have one factor – College Degree, and two levels (Bachelor = 1, High School or Below =0). If we study latter chapters, we may also consider other factors such as gender, district difference, etc. We may not conclude them in this chapter.

1.2.2. Experiment Design

Right now, you may have a question. How can we design the experiment and get the result close to the truth? You may have note that other effects may bias our test result if we do not deal with them properly. Here, we can use complete randomized design (one-factor). In a completely randomized design, there is only on factor, and subjects are randomly assigned to treatments. Here we do not need to consider the effect of individual difference, and it is a one-way experimental treatment, such as the effect of fertilizer on wheat production, etc. We can use t-test if the factor has 2 levels, and F-test if the factor has 3 levels. This method will be discussed in detail in later section in this chapter. In our AHE example above, we may randomly pick people who go to the college and who does not, and then analyze their salary difference. In pharmaceutical science example, we may pick people and separate them into two groups randomly, one with medicine and another with placebo. For another example, let’s look at a research in which 10-year-olds and 17-year-olds are given IQ tests. These students will be selected completely at random, without regard to their actual test scores, to see if teacher expectations alone have an impact on student performance. We include age as another factor to see if teacher expectations have a different effect depending on the age of the student. This would be a 2 (teacher expectations: high or average) x 2 (age of student: 10 years or 17 years) factorial design.

1.3 Interaction Effects (Andreas V, Prasanna D)

Interaction describes the failure of one factor to generate the same influence on response at varying levels of another factor. This is a major issue with one-factor-at-a-time (OFAT) approach. The interaction effect between factors can be either substantial or virtually non-existent, and it is important to determine which factors influence each other at varying levels of severity. Factorial experiments are designed to negate the interaction effects in a scientific experiment. An example of high interaction effect could be racket weight vs. power, and an example of a very weak or no interaction effect may be the type of hat worn by a golfer and the type of driver used.

Let us understand the significance of interaction effects and their computation by the means of an example. Let us say we have a group of 100 students which have been randomly divided into four groups of 25 students each as follows: As can be seen here we have two factors: 'Study Time' and 'Subject'. 'Study Time' has two levels: 'Day' and 'Night', and 'Subject' has two levels: 'Math' and 'Social Science'. The mean effect for both 'Study Time' and 'Subject' is -1. The interaction effect can be computed as follows: Thus, in this example, the interaction effect is much greater than either of the two main effects.
 * Group 1 studies during the day and gives a math test, obtaining a mean score of 13.
 * Group 2 studies during the day and gives a social science test, obtaining a mean score of 9.
 * Group 3 studies at night and gives a math test, obtaining a mean score of 9.
 * Group 4 studies at night and gives a social science test, obtaining a mean score of 11.
 * 1) Take the mean of (Night, Math) and (Day, Social Science), which is equal to 9.
 * 2) Take the mean of (Night, Social Science) and (Day, Math), which is equal to 12.
 * 3) Take the difference of (2) from (1), which is equal to 9 - 12 = -3.

1.4 Simple Two-way Comparative Experiments (Trilce E, Bjarke H)

Simple two-way comparative experiments evaluate the effects of two different treatments on subsets of a population. The subjects of the experiments are then grouped into pairs, based on some blocking variable. By assuming that the two groups are probabilistically equivalent, we can use random assignment to select the individuals or samples that receive the treatment within each pair. Consider the example of a study conducted by a Portland cement mortar. The engineer in charge of the study has created two populations of 10 samples each, with one set of experiments receiving the treatment. In this case, he added a polymer latex emulsion to determine if this impacts the curing time and tension strength of the mortar. The factor is the mortar formulation and the two levels are: "modified" and "unmodified". The observations are shown in the following box plot.

Link to box plot: http://imgur.com/AMTyEQl

In order to evaluate the treatment effect, the statistical technique of hypothesis testing allows the comparison of two formulations to be made on objective terms. By using hypothesis testing, such as either t-test, one way Analysis of Variance (ANOVA), as well as confidence interval procedures, we can compare the two treatment means to determine whether the populations differ due to the effect of the treatment of due to random chance.

1.5 The t-test and the 1-way Analysis of Variance (ANOVA)(TC)

A t-test is an inferential statistic which is used to “draw conclusions about the properties of populations from related properties of the sample.” (Dunn, 10/16, Introduction to the Design of Experiments) The paired t-test is the gold standard procedure for testing null hypotheses. The formula for a one sample t-test can be found in the following link http://imgur.com/gallery/miVeT A t-test is generally utilized while determining if two datasets are significantly different from one another.

ANOVA, which stands for Analysis of Variance, is Fisher’s statistical method of analysis for factorial experiments. It takes into account all possible combinations of factors and levels, each in solitary experimental runs. ANOVA can be used to aid in the determination of main effects. One-way ANOVA is a specialized model for computing main effects of factors on response variables. In other words, “with ANOVA, an inference procedure is included to assess the likelihood that there is a model relationship between the factor(s) and response variable that is something other than randomization.” (Dunn, 10/16, Introduction to the Design of Experiments)

1.6 Sample Recipes

This is the collection of R recipes for the analysis of one factor, two level experiments. Each of these recipes are structured in the Setting, Design, Analysis paradigm and the Design portion is structured in four parts: Exploratory Analysis, Testing, Estimation and Model Adequacy Checking.

The Setting, Design, Analysis Paradigm

The Project Outline
 * 1) Exploratory Analysis
 * 2) Testing
 * 3) Estimation (of Parameters)
 * 4) Model Adequacy Checking

Sample Recipes

-> These data are a collection of various measures, including variables such as elevation, temperature (surface and air), ozone, air pressure, and cloud cover. For this section, a t-test was conducted to explore whether a statistically significant difference existed between the two temperature variables. The H0 was that no difference existed between the mean temperature values of both factors, whereas the HA was that a statistically significant difference existed between the two factors. Based on the results of the test, the H0 was rejected W.R.T. an alpha value of .05. However, after model adequacy checking was performed, the results were invalidated due to the fact that the t-test assumes normality, and the Shapiro-Wilks normality test confirmed that the data are not normal. Further steps to "coerce" the data to be normally distributed include performing data transformations, but this action has not yet been completed. http://rpubs.com/manzat/28671

-> The following is an analysis of a data set which includes a collection of wind speed, barometric pressure, longitude, latitude and time points for a large number of storms recorded by NASA. In my analysis I decided to examine if there was a significant statistical difference in the wind speed and pressure readings between hurricanes and extratropical storms. This analysis was performed through the use of a t-test and a QQ plot as a check for model adequacy. http://rpubs.com/adamato/28910

-> Using fuel economy data from the EPA collected from 1985-2015, a one-factor, two-level experiment is performed to see if the “make” of a vehicle has a statistically significant effect on the fuel economy of that vehicle. The two “makes” that are considered in this analysis include ‘Toyota’ and ‘Audi’. Additionally, this analysis separately considers two different metrics for measuring fuel economy, including both highway fuel economy [in mpg] (“hwy”) and city fuel economy [in mpg] (“cty”). Upon performing this analysis, it was determined that the fact that these two vehicles are classified by different “makes” likely does appear to have an effect on the average city fuel economy that is achieved when driving either “make” of vehicle. However, the fact that these two vehicles are classified by different “makes” does not appear to have an effect on the average highway fuel economy that is achieved when driving either “make” of vehicle. - Brendan Howell http://rpubs.com/howelb/29127

-> The following analysis is conduced based on the EPA fuel economy data from 1985 to 2015. To explore the effect of vehicle engine power (number of cylinders) on fuel economy, a one factor (number of cylinders), two level (cyl = 4 and cyl = 6) experiment was carried out and a linear regression model is estimated to quantify the marginal effect. The result shows that the vehicle fuel economy decreases with the number of cylinders in the vehicle. http://rpubs.com/serena049/doehw1

-> The html file below utilizes fuel economy data that has been collect by the EPA from 1985 to 2015. Specific data is analyzed to conduct a one-factor, two-level experiment to determine if the variation in the make of a vehicle is responsible for the variation in highway gas mileage. The factor, "make," is grouped into two levels, "Toyota" and "Honda," with a response variable of "hwy" (the highway mileage). An unpaired, two-sample t-test, with a null hypothesis that the highway mileage means of both vehicle makes are equal, will be conducted. http://rpubs.com/maxwinkelman/28916

This analysis utilizes ‘nycflights13’ data set containing all information about flights that departed from NYC (JFK, EWR and LGA) in 2013. There are 336,776 flights in total. In order to study the underlying causes of flight delay an experiment was conducted which can be termed as ‘one factor- two level’ experiment. Here ‘time-delay’ is the factor and ‘departure time-delay’ and ‘arrival time-delay’ are two levels. We conduct a t-test to test the null hypothesis, ‘mean of the departure time delay is equal to the mean of the arrival time delay’. This is essentially to try and test whether there are other factors that contribute to the delay of an aircraft apart from the delay in its take-off time. For this purpose we take a sub-set of the data (only UA flights departing from JFK). Our t-test experiment leads to the rejection of the null hypothesis. However, our exploratory data analysis as well as diagnostic check leads us to the conclusion that the data is not normally distributed. Therefore the assumption based on which we conduct the t-test is flawed. This leads us to the conclusion that our t-test is not valid for our current data set. http://rpubs.com/Uzma_1004/28917

The following analysis is based on the data set called 'nasaweather' collected from National Hurricane Center. It takes the classification of different types of storms and their respective air pressure measurement to perform a t-test. As you will see in the analysis, the results show that we reject the null hypothesis that the means of Tropical Storms does not equal the means of Hurricanes. http://rpubs.com/hsiac/28942

The following analysis investigates if electric/hybrid cars are actually more fuel efficient than cars with more traditional gasoline engines. This analysis looked at both highway and city driving where the data was derived from an EPA fuel economy dataset from 1985 to 2015. Individual analyses of highway and city driving were conducted in a one-factor, two-level manner using t-tests. Cars with electric/hybrid engines were determined to get significantly better gas mileage than cars with traditional gasoline engines on the highway as well as around the city. http://rpubs.com/JohnMariani/28953

->The following link utilizes the R data "storms" from the "nasaweather" package. "Storms" includes 4 observations a day for every storm named from 1995 to 2000. The storm name, year, month, day, hour, latitude, longitude, pressure, wind speed, and type of storm were recorded for each observation. This analysis will focus on only two of the observations a day (hour 0 and 12) and one factor, two level testing to examine if the variation in wind could be explained by the hour of each day. Exploritory data analysis, a t-test, and normality testing is performed. There is also a section at the end that covers contingencies if the assumptions are broken. http://rpubs.com/svoboa/28937

The following analysis is based on the 'nycflights13' dataset which contains information of flights from NYC, including JFK, EWR and LGA. Also plane information and weather condition are recorded in this dataset. It is known to all that several airlines are usually not on time, and this analysis is in order to test effect of airlines on time delay and help passengers choose the best one. We take test for Delta Airline as an example, considering it as a 'one factor - two level' model, where 'departure delay' and 'arrival delay' are two levels. To block possible cofounding factors we only select flights from JFK. T-test shows there is no relationship between two factors, however, following tests reveal that data is not normally distributed, therefore it is not correct to conduct t-test in this case. This analysis is the first version, it is still needed to be improved in the future. http://rpubs.com/chenh16/29271

-> The following is an experiment designed to investigate the relationship between year and storm wind speed from the 'nasaweather' dataset. The experiment will look into the normality of the data and also attempt to discover whether or not there is a significant difference between of the means via two-sample t-test. http://rpubs.com/macchm/30638

->This experiment is testing to see if there are differences in the Mean Surface Temperature from Clear Sky Composite and Mean Near-Surface Air Temperature in the month of January. The Mean Surface Temperature from Clear Sky Composite is the monthly mean temperature based on the energy being emitted from the Earth’s surface under clear sky conditions in K. The Mean Near-Surface Air Temperature is the monthly mean temperature of the air near the surface of the Earth in K. http://rpubs.com/hsiac/30643 -by Cheryl Tran

-> The following link is an experiment that utilizes flight data from NYC, in particular date of flight, departure and arrival times, departure and arrival times, departure and arrival locations and some information about the plane. The experiment looks to examine the effects of the origin on the departure delay time. The assumption being that the a flight's delay is due to the problems in the departure airport. The study firsts looks to explore the data with summary statistics, box plots and histograms. The data is analyzed using t-tests and the normality of the data is also checked using QQ plots and the Shapiro-Wilk test. Lastly it addresses possible contingencies in the experiment. http://rpubs.com/Tothk2/31630 -By Kevin Toth

-> The following analysis examined flight data from planes leaving from three of New York City's major airports. The data includes factors such as the date, the time of arrival/departure, the arrival/departure delays, the airline carrier, the origin and destination, along with multiple others. In particular, this experiment analyzed delays, and how they are related to the origin and departure city, along with the specific airline carrier responsible for the flights. A one factor, two level t-test was completed for two different airline carriers and two different origin locations. Unfortunately, after completing a Shapiro-Wilks test for normality, it was found that the data is nonparametric, and would therefore need additional testing using nonparametric methods. http://rpubs.com/braunj6/31855

-> The following analysis of a one factor two level experiment uses a t-test to test the difference in means of departure delays across two of NYC's airports. http://rpubs.com/konraz/39536