Monday, November 13, 2017

Assignment 4

Introduction: 
For the objective of this assignment, I will learn how to calculate 'z' and 't' tests as well as know when to use them for different situations. This will also involve using the steps of hypothesis testing which is very important for the scientific method. From the hypothesis test, I will be able to make decisions about the null and alternative hypothesis along with actually utilizing real world data to connect the statistics and geography. 

Key Terms:
I will first define and explain some key terms that will help with the  better understanding of hypothesis tests.

Null Hypothesis: When performing a hypothesis test, the goal is to see whether the hypothesized mean is the same or different than the observed mean. The null hypothesis says that there is no difference between the observed mean and they hypothesized mean (or equals 0).

Alternative Hypothesis: The alternative hypothesis states that yes, there is a difference between the hypothesized mean and the observed mean (or equals 0). 

Reject or Fail to Reject: When performing a hypothesis test, the question we are trying to ask is whether we reject or fail to reject the null hypothesis, we never just accept it. By rejecting the null hypothesis, we are saying that there is a difference between the means. When we say fail to reject, we are are acknowledging that there are no differences between the mean. 

Steps in Hypothesis Testing:
1. State the null hypothesis 
2. State the alternative hypothesis
3. Choose a statistical test
4. Choose α or the level of significance
5. Calculate test statistic 
6. Make decision about the null and alternative hypothesis 

Part One: 

Question 1: For part one of the assignment, I was given a chart that was partially filled with information about 't' and 'z' tests. My responsibility was to complete the rest of the chart (Figure 1) filling in the spaces of 'α' which is the significance level for the test, 'z or t' which is asking for t-test or z-test, and 'z or t value' which is asking for the critical value for the significance level. 

Figure 1: Chart containing information on 'z' and 't' tests.
Question 2: For the second question, we were given the following scenario:
In Kenya, the Live Stock Development Organization and the Department of Agriculture estimate that yields in a certain district should approach the following amounts in metric tons (averages based on  data from the whole country) per hectare: groundnuts. 0.55; cassava, 3.8; and beans, 0.28. Data was collected from 23 farmers that conclude to these results: 

           μ          σ
Ground Nuts  0.51 0.3
        Cassava   3.4            .74
     Beans  0.33      0.13


With the given information, I will now be able to test the hypothesis for each of these products. For
these tests, I will be able to assume that they are two-tailed tests with a Confidence Level of 95%. I will also be determining the probability of each crop and explaining the differences in my results. The statistical test I chose was a T-test because the sample size (n) is under 30. 

Ground Nuts
1. Null Hypothesis: There is no difference between the yield of ground nuts from the sample farmers compared to the county as a whole. 
2. Alternative Hypothesis: There is a difference between the yield of ground nuts from the sample farmers compared to the county as a whole. 
3. T-test
4. Level of Significance: 95%, Two Tailed 0.025
5. Calculation (Figure 2): -0.64
6. This is a two tailed test with a significance of 95% and critical values of -2.07 to 2.07. So, because -0.64 falls between -2.07 and 2.07, we fail to reject the null hypothesis. This means that there is not a difference between the yield of ground nuts from the sample farmers compared to the county as a whole. 
Probability: 26.4%
Figure 2: Test calculation for ground nuts.
Cassava:
1. Null Hypothesis: There is no difference between the yield of cassava from the sample farmers compared to the county as a whole. 
2. Alternative Hypothesis: There is a difference between the yield of cassava from the sample farmers compared to the county as a whole.
3. T-test
4. Level of Significance: 95%, Two Tailed 0.025
5. Calculation (Figure 3): -2.59
6: Because the significance is the same, the critical values are still the same at + or -2.07. So, because -2.59 does not fall between -2.07 and 2.07, we reject the null hypothesis. This means that there is a difference between the yield of cassava from the sample farmers compared to the county as a whole. 
Probability: 0.84%
Figure3: Test calculation for cassava. 
Beans:
1. Null Hypothesis: There is no difference between the yield of beans from the sample farmers compared to the county as a whole.
2. Alternative Hypothesis: There is a difference between the yield of beans from the sample farmers compared to the county as a whole.
3. T-test
4. Level of Significance: 95%, Two Tailed 0.025
5. Calculation (Figure 4): 1.84
6. Because the significance is the same, the probability is still the same at + or -2.07. So, because 1.84 falls between -2.07 and 2.07, we fail to reject the null hypothesis. This means that there is not a difference between the yield of beans from the sample farmers compared to the county as a whole.
Probability: 96.03%
Figure 4: Test calculation for beans.
Similarities and Differences: Both ground nuts and beans failed to reject the null hypothesis which shows that there was not a difference between the yield of that product from the sample farmers compared to the county as a whole. However, Cassava actually rejected the hypothesis. Cassava is the only product that showed a difference between the yield of the sample farmers compared to the county as a whole.

Question 3: I have now been asked to look at whether the level of pollutants in a stream is over the allowable limit of 4.4 mg/l. The sample size (n) is 17 with the mean pollutant level at 6.8mg/l and a standard deviation of 4.2. This test will also have a significant level of 95% but with as one tailed test.

Level of Pollutants:
1. Null Hypothesis: There is no difference between the sample mean pollutant level of 6.8 and the allowable limit of 4.4.
2. Alternative Hypothesis. There is a difference between the sample mean pollutant level of 6.8 and the allowable limit of 4.4.
3. T-test: n is under 30
4. Level of Significance: 95%, One Tailed 0.05
5. Calculation (Figure 5): 2.36

6. This is a one tailed test with a significance of 95% and a critical value of 1.75. The result was 2.36 which is higher than 1.75. This means that we reject the null hypothesis and that there is a difference between the sample mean pollutant level of 6.8 and the allowable limit of 4.4. This proves that the pollutant levels are higher than the allowable limit.
Probability: 98.6%
Figure 5: Test calculation for level of pollutants. 
Part Two:
For part two of the assignment, I have created a map (Figure 6) displaying the average value of homes per county block in Eau Claire county. My objective was to see whether the the average value of homes for the City of Eau Claire block groups are significantly different from the block groups for Eau Claire County. I will be using at Z-test to determine this because the sample size (n) is greater than 30. 

Average Value of Homes:
1. Null Hypothesis: There is no difference between the average values of homes in the city of Eau Claire compared to the county of Eau Claire. 
2. Alternative Hypothesis: There is a difference between the average values of homes in the city of Eau Claire compared to the county of Eau Claire.
3. Z-Test: n is over 30.
4. Level of Significance: 95%, Two Tailed 0.025
5. Calculation (Figure 7): -2.57
6. The critical value for this test was -1.96 or 1.96 and our result was -2.57 which didn't fall between the two, so we reject the null hypothesis. This means that there is a difference between the average values of homes in the city of Eau Claire compared to the county of Eau Claire. Now with the help of our map,  we can see what the difference of the values are. It appears that the value of homes that reside in the Eau Claire city county blocks are less in value than the rest of the county. The map is a good visual aid of this because we can see that the blocks within the city are more of a lighter blue color which means that they are lower in value according to the legend. 


Figure 7: Test calculation for average home values. 

Figure 6