Assignment to be completed within 2 hours limit and uploaded to GitHub.
Save the four queries below in file in the GitHub repository
- Write a query that shows how many pageviews occurred in the last week
- How many sessions have organic as a medium?
- Create a list of all nodes with the amount of pageviews that came from users with an organic medium in the last 30 days with the highest number first
- Write another query using the tables above and explain what it does
- Pick two datasets, at least one from data.gov
- Combine and examine the datasets looking for interesting and actionable insights.
- Reusable scripts to run your analysis
- Short write-up of approach and findings including information about the datasets that you used
- Share all code and results in a GitHub repository.
I searched for the first dataset on data.gov, where I looked for interesting health data. The most interesting one was NCHC's dataset on leading causes of death. This dataset contains data on number deaths that the 10 deadliest diseases caused in each state from 1999 to 2015. It also contains the age-adjusted death rate, which is great for running analyses, since it accounts for the age of people living in each state and it is per 100,000 people.
I had to find other dataset and combine the two to find something interesting. I thought it might be cool to find whether any disease is more prevalent in areas with lower level of education. After a bit of looking for data with Education Attainment, I found a good dataset on the Cencus Bureau's American FactFinder. I used the 2010 data since that one should be the most accurate. This dataset contains the maximum level of education completed by population that is 25 years or older. It also contains information about race and many other things that I did not need to use in this analysis.
I tried to run Pearsons Correlation with different diseases and their correlation to the education level. Turns out that Stroke has the best correlation to education level. It is negatively correlated, which means that the lower the education, the higher the chance of dying on stroke.
The graph shows that the more the people who have Bachelor's degree in a particular state, the less death because of stroke happened in the state.