Thursday, October 26, 2017

Assignment 3

Introduction:
An independent research consortium hired me to study the geography of foreclosures in Dane County, Wisconsin. From 2011 to 2012, there was an increase of foreclosures that left county officials concerned. Since I was hired, I had been given the addresses of all the foreclosures in Dane County for the years 2011 and 2012. With this information, I was able to analyze these foreclosures spatially, but just keeping in mind that I won't be able to find the cause of the foreclosures. My main focus was to evaluate the spatial differences between the two years and use this information to try and predict foreclosures in 2013. I also looked at three Tracts specifically as well: Tract 108, 25, and 120.01. By using Z-Scores and Probability, I was be able to provide useful information on the number of foreclosures for all of Dane County that will exceed 10% of the time and 80% of the time.

Key Terms:

I will first define and explain some key terms that will help with better understanding of the methods.

Z-Scores: Z-Scores are used to help indicate the number of standard deviations an observation is below or above the mean. This is also referred to as a standard score of a given value. To find the Z-Score, one must use a specific formula (Figure 1).  A breakdown of the formula: Zi: Z-Score, Xi: observation, U: mean of data, S: standard deviation of data.

Figure 1
Probability: The likelihood of something to occur, represented by a percentage. Z-Scores help to find  the probability, based on a normal distribution. Once the Z-Score is found, that score is used to find the probability by using a specialized chart (Figure 2). 

Figure 2
Data: The data I used for this study was foreclosure data from Dane County, Wisconsin; specifically the years 2011 and 2012.  

Methods:
First of all, I created a map (Figure 3) to show the basics of the Dane County Tracts and to highlight Tract 25, 108 and 120.01 so I was aware of the location of these Tracts in the county. There is also an inset map included to show where Dane County is located in Wisconsin. 

Figure 3
Next, I created a map (Figure 4) that displays the differences of foreclosures in Dane County between the years 2011 and 2012 which is represented by the standard deviation classification. To do this, I added a field in the attribute table and subtracted the 2012 values from the 2011 values. 

Then, I was asked to calculate by hand the Z-Scores of the three selected Tracks for both 2011 and 2012, which left me with a total of 6 scores. I was able to calculate the Z-Scores by using the mean and standard deviation from each year. I added all of the necessary information along with the results onto a spreadsheet to show the bigger picture (Figure 5). 
Figure 5
Results:
By looking at Figure 4, which is the map that shows the differences in foreclosures from 2011 to 2012, we can see that darker blues represent the increase of foreclosures in 2012. Whereas the darker brown colors represent a decrease in foreclosures since 2011. We can also see the that the center of Dane County does not show much change but the Tracts on the outer edge of the county do. An important concept to know about the center of Dane County is that is where the capital of Wisconsin is located. Another aspect to notice is that Tract 120.01 is the darkest blue color which is >2.5 Standard Deviation. 

Figure 4
To better analyze the differences, I created a map of just the 2011 (Figure 6) and just the 2012 (Figure 7) foreclosures in Dane County. First looking at Figure 6, this map also represents the standard deviation classification which was to help with calculating the Z-Scores. By looking at each of the three selected tracts individually again, it looks like Track 120.01 and 108 were greater than 1.5 standard deviation from the mean. This means that they had higher amount of foreclosures than the average during 2011. Tract 25 however, had less than the average foreclosures in 2011 because it was <-0.50 standard deviations. 

Figure 6
Now looking at Figure 7, the only Tract that displayed a change was 108, which ended up having foreclosures in the average range for the year 2012. Analyzing both Figure 6 and Figure 7, the maps show that the Tracts clustered around the center of Dane County, which we know is where the Capital of Wisconsin is placed, mostly fall <-0.50 standard deviations below the mean that represents that this area of the county has less than average foreclosures in both years. This reinforces Figure 4 showing the changes between 2011 and 2012 because that map shows that the center tracks didn't have much change between both years. 

Figure 7
Lastly, after I created each of these maps to analyze the spatial differences between the foreclosures in 2011 and 2012, I could then make my prediction for 2013 using Probability. Just to refresh, the goal was to use Probability to determine the number of foreclosures for all of Dane County that will exceed 10% of the time and 80% of the time. So, the number of foreclosures that will likely occur 80% of the time, if the patterns continue into 2013, will be 3.98 or more realistically 4 to round up to a whole foreclosure. And the number of foreclosures that will only likely occur 10% of the time will be 24.98, or once again round to 25 to for a whole foreclosure. 

Conclusions:
To tie everything together, we reviewed the differences of foreclosures in Dane County, Wisconsin in the years 2011 and 2012 with an emphasis of Tracts 108, 25, and 120.01. This showed that Tract 120.01 had the most change compared to the other two. There is a map (Figure 4) to show the difference of foreclosures between both years represented by standard deviation. We observed that the biggest changes occurred around the borders on the county and the least amount of changes in the center where the Capitol of Wisconsin in located. There are two separate maps (Figure 6 &7) that show just the foreclosures in 2011 and just 2012, also represented by the standard deviation classification. These maps were useful also with analyzing the foreclosures because another important piece of information we noticed was that the Tracts that were located in the center of Dane County were mostly <-0.50 standard deviations below the mean for both years which again ties in with Figure 4 because that map shows that the center Tracks don't have much change between both years. Lastly, using Z-Scores and Probability, we predicted foreclosures for 2013 finding that at least 4 foreclosures will likely occur 80% of the time and up to 25 foreclosures will only likely occur 10% of the time. The implications with the results is that these findings can help us with locating foreclosures spatially, however they do not tell us the cause for them. Also, these are just predictions and do not indicate that any increases or decreases will absolutely occur at all. My recommendation would be to use this information as reference on making decisions however not having it be your sole source of data.  




Wednesday, October 11, 2017

Assignment 2

Goals:
The goal of this assignment is to become familiar with a variety of statistical methods and programs. 

Part 1:
For part one of this assignment, I will be analyzing a sample of the test scores from two different high schools in the Eau Claire School District: Eau Claire North and Eau Claire Memorial. These test scores come from standardized tests taken by juniors at both schools.  Throughout the years, Eau Claire Memorial has continued to have the student with the highest test score. This leads the public to question how well the students at Eau Claire North are being taught since there is never a student with the highest test score. I will be analyzing both sets of test scores by looking at the Range, Mean, Median, Mode, Kurtosis, Skewness, and Standard Deviation. Then I will look at the results and determine if the public should actually be concerned with the teaching methods at Eau Claire North. First, I will define each of these terms and then provide the calculation. 

Range: The range is the difference between the highest number and the lowest number. For example, if the highest value in a set of data is 66 and the lowest value is 45, then the range would be 21. In relation to the data sets for this assignment, the range for Eau Claire North is 83 and the Range for Eau Claire Memorial is 91.

Mean: When finding the mean, one is finding the average of a set of values. To find the mean, you add of the values together and then divide by the number of values. The mean for Eau Claire North is 160.92 and Eau Claire Memorial's mean is 158.54.

Median: The median is finding the middle value of a set of values, but the values need to be ranked in order. If the amount of values is an odd number, the value in the middle would be the median. If the amount of values happen to equal an even number, then the difference between the two most middle values would equal the median. The median for Eau Claire North is 164.5, Eau Claire Memorial is 159.5.

Mode: The mode is number that occurs most in a set of observations. Eau Claire North's mode is 170 and Eau Claire Memorial's is 120

Kurtosis: This refers to the distribution of a data set. Kurtosis is when the distribution is more peaked or flat compared to the normal distribution. Peaked distribution means a positive kurtosis and a flatter distribution means a negative kurtosis. If the kurtosis is above a +1 or a -1 then it is a significant distribution. The kurtosis for Eau Claire North ended up being -0. 56 Eau Claire Memorial's kurtosis is -1.17.

Skewness: This shows how far away the distribution is from the mean. The distribution can be positively or negatively skewed. The skewness of Eau Claire North is -0.58 and Eau Claire Memorial's is -0.18.

Standard Deviation (SD): This is useful in showing how close the observations are to the mean of the data. So 68% of the data will fall between one standard deviation, 95% will fall between two standard deviations, and 99% will fall between three standard deviations of the mean. To better understand how standard deviation is calculated, I have physically written out the calculations for and Eau Claire Memorial (Figure 1) and Eau Claire North (Figure 2), which you can see below. The standard deviation for Eau Claire Memorial is 27.16 and Eau Claire North's is 23.63.

Figure 1: SD of Eau Claire Memorial's Test Scores

Figure 2: SD of Eau Claire North's Test Scores

Results: 
Although Eau Claire Memorial has continuously had a student achieve the highest score between both schools (198 out of 200), we can tell from the statistics that this doesn't mean that the teaching methods at Memorial are any better. The Mean for Memorial is 158.54 and for North it is 160.92, this shows that the average at North is actually higher than Memorial. The standard deviation also helps to show why North shouldn't be concerned with their test scores. First of all, the SD for North is 23.63 and the SD for Memorial is 27.16. This means that the test scores at North are bunched closer around the mean than Memorial's are. The Memorial test scores are more wide spread which means the scores vary a lot more, which isn't usually a good thing when it comes to test scores. Secondly, knowing that the maximum score a student can get is 200, we can see that most of the scores for North fall above a score of 160.9 which is good because that means over half of the students achieved a score of 80% or higher. Like I said before, the test scores for Memorial are more widespread and have a larger range. I think the mean and standard deviation are both useful statistics when comparing test scores. The mean shows the average of the scores, and the standard deviation displays how close the data clusters around the mean. So to tie everything together, Eau Claire North should not be concerned about their teaching methods because although they didn't have a student with the highest score, their overall test scores were actually higher than Eau Claire Memorial. When looking at test scores, it is more effective to look at all of the scores as a whole, not just an individual score.

Part 2:
For part two of this assignment, I have calculated the Geographic Mean Center of Population at the county level for Wisconsin as well as the Weighted Mean Center of Population for 2000 and 2015. I have also created a map to show my calculations (Figure 3). I will also explain the relationships of the weighted mean centers on the map. But first of all, I will define what these calculations actually mean.

Geographic Mean Center of Population: This is the average of x and y values on a map. So for this assignment, I calculated the geographic mean center of population for the counties in Wisconsin. This is represented by the purple circle on the map in Figure 3. 

Weighted Mean Center of Population: This is similar to the geographic mean center but the weighted mean center includes the frequencies of grouped data. Different points will be weighted more than others. Some counties in Wisconsin have higher populations than others so that would influence the mean center, which is why it would weighted. For this assignment, I compared the weighted mean center for the 2000 population and the 2015 population. The 2000 population mean center is labeled as a red circle and the 2015 population mean center is labeled as a yellow circle on the map in Figure 3.

Figure 3: Map of Geographic Mean Centers of Population in Wisconsin

Explanation:
By looking at the map, we can see that the 2015 population mean center (yellow) barely moved at all from the 2000 population mean center (red). However, the 2015 population mean center did shift slightly southwest. Each counties population either increased or decreased from the year 2000 to 2015. However, the biggest population increase was located in Dane County, which is labeled in green on the map. In 2000, Dane County's population was 426,526, and in 2015 it jumped to 510,198. That is an increase of over 80,000 people, which makes sense to why the 2015 mean center slightly shifted towards Dane County. A reason to why the population of Dane County increased the most is because that is where the capitol of Wisconsin is located.

Sources:
Census of Agriculture: 2010 SF1 Census Data