Covid19

Works on different public datasets

The purpose of this notebook is to provide an easy to run tool in order to have better insigts that those found in the press. Here we are going to handle with different files:

1. ISCIII Dataset

Dataset information

CCAA: region code
FECHA: date of the record
CASOS: new infections detected (mainly empty)
PCR+: referring to the CASOS means the new infections detected via a positive PCR test
TestAc+: Similar to the PCR+. Quick test focused on Antibodies
Hospitalizados: hospitalized people related with Covid-19
UCI: hospitalized people who require Intensive Care Units
Fallecidos: deceased people
Recuperados: recovered people

Some considerations

Separator: usually changes switching between ',' and ';'.
Data is being updated via appending the new data (one record for each autonomous region) day by day, so the last records correspond with the more recent information.
DISCLAIMER: data provided from media differs from data displayed at some points. Moreover, several other factors could change in this dataset (as the described separator or the addition of new columns).

Reconstruct time series

Most of the works performed on this data pointed to fix some issues:

Fill CASOS column. This could be calculated based on the difference between PCR+ day by day
Reconstruct the time series. At the moment, the features reflect the snapshot of the day by day, but there are no features representing the increments between dates. For this pupose, the reconstruction of the time series consist on calculate those increments on new features. In this manner, the following features will reflect the variation for those data respect the day before:
- IncHospitalizados
- IncUCI
- IncFallecidos
- IncRescuperados

Data exploration

You will find some functions in order to print information relevant for you.

res = ...# Reconstructed dataframe
loc = 'MD'
current_date='2020-05-05'
displayInfo(res, location=loc, date=current_day)
showStats(res, location=loc, feature='CASOS', aggregate=False)

showStats(res, location='MD', feature='CASOS', aggregate=False)

showStats(res, location=loc, feature='IncFallecidos', aggregate=False)

# Diagnosed people minus recovered
showStats(res, location=loc, feature='ActiveCases')

2. MoMo Dataset

MoMo dataset is aimed to mortality data. Is well prepared and no extra preprocessing is needed in order to have good insights. Here we have the features:

ambito: nacional or ccaa (national or regional)
cod_ambito: empty if nacional. Region ISO code if regional
cod_ine_ambito: INE region code
nombre_ambito: Name of the region
cod_sexo: INE Sex code. man: 1, woman: 6
nombre_sexo: Sex name (hombres, mujeres)
cod_gedad: Age group: menos_65, 65_74, mas_74
nombre_gedad: name of age group
fecha_defuncion: decease date
defunciones_observadas: number of deceases observed (included delay corrections)
defunciones_observadas_lim_inf: inferior limit of confidence interval
defunciones_observadas_lim_sup: superior limit of confidence interval
defunciones_esperadas: expected deceases
defunciones_esperadas_q01: Percentil 1 of expecteds
defunciones_esperadas_q99: Percentil 99 of expecteds

3. European Centre for Disease Prevention and Control

As is said at the link, the downloadable data file is updated daily and contains the latest available public data on COVID-19. Each row/entry contains the number of new cases reported per day and per country. You may use the data in line with ECDC’s copyright policy.

Columns

dateRep: date of the record (format DD/MM/YYYY)
day: disaggregated data from dateRep
month: disaggregated data from dateRep
year: disaggregated data from dateRep
cases: number of new cases
deaths: number of new deceases
countriesAndTerritories: name of the country
geoId: two letter geolocation identifier
countryterritoryCode: three letter country code
popData2018: population accounted at 2018
continentExp: continent related to that country

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
images		images
.gitignore		.gitignore
Covid19 - Official Spanish Data Curation ISCIII and other public datasets.ipynb		Covid19 - Official Spanish Data Curation ISCIII and other public datasets.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Covid19

Works on different public datasets

1. ISCIII Dataset

Dataset information

Reconstruct time series

Data exploration

2. MoMo Dataset

3. European Centre for Disease Prevention and Control

Columns

About

Releases

Packages

Languages

License

RaulRC/Covid-19

Folders and files

Latest commit

History

Repository files navigation

Covid19

Works on different public datasets

1. ISCIII Dataset

Dataset information

Reconstruct time series

Data exploration

2. MoMo Dataset

3. European Centre for Disease Prevention and Control

Columns

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages