Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This is great #1

Open
bkmontgom opened this issue Apr 3, 2020 · 3 comments
Open

This is great #1

bkmontgom opened this issue Apr 3, 2020 · 3 comments

Comments

@bkmontgom
Copy link

I'm an actuary and I find this very useful. Is this based entirely on South Korean data? How many deaths are included? I didn't see the data anywhere. I'm looking to add error bars.

@yuryatin
Copy link
Owner

yuryatin commented Apr 4, 2020

Hi Brian,

Thank you for your feedback.

Data source
I buried the URL to the dataset inside the Python script on the line 96 (which is https://www.kaggle.com/kimjihoo/coronavirusdataset#PatientInfo.csv). I avoid keeping the data, which legal status is not transparent to me.
It is very difficult for me to find a case-by-case dataset from a country with widespread testing yet comparatively widespread disease. South Korea seems the best case overall, with Germany, and soon USA, may become second best alternatives.
That dataset by yesterday evening had 2771 South Korean cases, in 53 of them the death had been unfortunately an outcome.
In preprocessing (it can be followed in the Python script), I had to keep only 1723 cases with 43 death outcomes among them, due to lack of critical data for other cases (either the year of birth, or both the date of confirmation and date of first symptoms) and removing the most recent cases that hadn't had enough time to get exposed to the risk of death (this is described in the Python script).

Error bars
Depending on your preference and your goals, you may want to select between the frequentist confidence paradigm or Bayesian probability of (essentially) probability (in this case).
In the first case, as far as I understand, one needs to get the Fisher information for each parameter via calculating the second partial derivative of the (analytically non-expressible) multi-variable likelihood function (e.g., as described here on the page 4, equation 2.11 http://www.stat.umn.edu/geyer/s06/5102/notes/fish.pdf).
In the second case, one needs a primer p.d.f. for probability (the risk of death) with the age as a parameter. A likelihood function for that primer distribution should be monotonically increasing. And it will need to be transformed into a posterior age-parameterized p.d.f. But I've never thought about how to make it.

Rigor
If any insurance company decides to make any financial bets calculating the safety margins with a similar curve, it seems reasonable to first test other functions that, unlike most of the tested functions here, can, with an always positive first derivative in the domain from 0 to 120+ years, have a negative second derivative for the elderly ages, i.e., be potentially concave there.

@yuryatin
Copy link
Owner

The issue is closed.

@yuryatin
Copy link
Owner

It seems I need to do something first before closing this issue :-)

@yuryatin yuryatin reopened this Apr 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants