q1

hich consists of 100 homes purchased in 2018. It includes variables regarding the number of bedrooms, number of bathrooms, whether the house has a pool or garage, the age, size and price of the home, what the house is constructed from, and the appraisals in 2016 and 2017.

**Assigned Problem 1: ** It has been expressed by real estate professionals that swimming pools do not increase the value of a home. Conduct a hypothesis test for two independent samples to determine if the mean sales are different for homes with and without a pool. Use a .05 significance level. Describe your findings using proper statistical language. Note: that this problem requires an assumption that homes with a pool versus those without one are not much different otherwise; i.e., if those with a pool are in better locations, made of better materials, are newer and larger, then those homes will be worth more and it will not have anything to do with a pool. For the sake of simplicity, we will assume that the homes or without a pool have similar variability in them.

**Assigned Problem 2:** We would like to find out how different two real estate agents can be in their appraisals of a property. Conduct a hypothesis test for paired samples and test if there is a difference in the mean appraisal prices given by these agents on the same homes. Use a .05 significance level. Describe your findings using proper statistical language.

**Assigned Problem 3:** If people are going to invest in their homes by constructing them out of brick, are they going to take the plunge and install a swimming pool? Conduct a hypothesis test of proportions to determine if the proportion of homes made of brick are more likely to have a swimming pool versus homes made of other materials. Use a .05 significance level. Describe your findings using proper statistical language.

**Assigned Problem 4: ** You might expect that homes with more bedrooms are worth more since they are probably larger, but is there more to the value; i.e., location, construction, age, etc.? Using the sample of 100 homes in the data file, conduct a hypothesis test using Analysis of Variance (ANOVA) to determine if there is a difference in the mean sale price of homes with two bedrooms versus those with three, four or five bedrooms. Use a .05 significance level. Since there are homes made of varying sizes at different locations and made of different material for this sample, it would be reasonable to assume that location and construction are not factors in this test. Describe your findings as you do on the other problems.

q2

which consists of 50 states plus D.C. in 2010. It includes variables regarding the names of the states, whether the population density is relatively low / moderate / high, whether gun ownership by residents is relatively low / moderate / high, and whether murders committed by guns is relatively low / moderate / high.

**Assigned Problem 1: ** In 2000, gun ownership in the U.S. had shown a decline from prior years, with about 10% of the states having high gun ownership, about 30% having moderate ownership, and about 60% having low ownership. Since then we have seen 9/11 and other terrorist acts as well as much discussion about tightened gun control. Conduct a Chi Square Goodness-of-Fit test to determine if the 2010 distribution fits the pattern from a decade earlier. Use a .05 significance level. Describe your findings.

**Assigned Problem 2: ** The argument for gun control is still brewing, with the main argument being that gun control results in either more or less gun-related deaths. Conduct a Chi Square Test-of-Independence to determine if gun-related deaths are dependent on the prevalence of gun ownership. Use a .05 significance level. Describe your findings.

q3

consists of 30 Major League Baseball teams’ statistics from 2015. It includes variables regarding the games played, wins & losses, at bats, runs scored, hits, homeruns, total bases, runs batted in, batting average, on base percentage, strikeouts, stolen bases, earned run average, saves, opponent runs, opponent batting average, errors, team payroll, the dollar value of each team win, and whether the team made it to the playoffs (FYI, Kansas City ultimately won the World Series).

**Assigned Problem 1:** A team must score more than their opponents to win a game, and so they must ultimately score runs, since they cannot win with a score of zero. Are runs correlated with wins; i.e., does a team that scores more runs win more games? Conduct a correlation analysis to determine if there is a correlation between Runs and Wins. Use a .05 significance level. Answer the following questions:

- State your conclusion. Base this on the p-value.
- Find the correlation coefficient and the coefficient of determination.
- What is the best fit regression equation that can predict the Wins from the Runs?
- How many games would you expect a team to win if they scored 800 runs in a season?

**Assigned Problem 2: ** Let us look at a popular argument that it takes money to win. Kansas City had one of the lowest payrolls and won it all, Houston had the second lowest payroll and made it far in the playoffs, and seven of the top ten payrolls did not even qualify for the playoffs. The evidence will show that money is not a predicting factor, but we are going to make it official. Perform a correlation analysis to determine if there is a correlation between Payroll and Wins. Use a .05 significance level. Answer the following questions:

- Is the p-value statistically significant?
- What is the correlation coefficient and the coefficient of determination?
- If the results are not statistically significant, you would normally stop at this point?
- What is the regression equation, even if the results are not statistically significant?

**Assigned Problem 3:** In order to obtain an even better model for predicting Wins, we are going to look at multiple variables. Create a multiple regression equation predicting the Wins using Runs, Hits, Home Runs, Average, Strikeouts, Stolen Bases and ERA. Use the Stepwise procedure to eliminate non-significant variables until the final equation has only significant variables. After you create your regression equation, predict the Wins for a team that has 700 runs, 1500 hits, 200 homeruns, a .260 average, 1000 strikeouts, 100 stolen bases and a 3.00 ERA.