The purpose of this notebook is to provide an easy to run tool in order to have better insigts that those found in the press. Here we are going to handle with different files:
- Instituto de Salud Carlos III
- ISCIII Daily Mortality Monitoring MoMo
- European Centre for Disease Prevention and Control
CCAA
: region codeFECHA
: date of the recordCASOS
: new infections detected (mainly empty)PCR+
: referring to theCASOS
means the new infections detected via a positive PCR testTestAc+
: Similar to thePCR+
. Quick test focused on AntibodiesHospitalizados
: hospitalized people related with Covid-19UCI
: hospitalized people who require Intensive Care UnitsFallecidos
: deceased peopleRecuperados
: recovered people
Some considerations
- Separator: usually changes switching between ',' and ';'.
- Data is being updated via appending the new data (one record for each autonomous region) day by day, so the last records correspond with the more recent information.
- DISCLAIMER: data provided from media differs from data displayed at some points. Moreover, several other factors could change in this dataset (as the described separator or the addition of new columns).
Most of the works performed on this data pointed to fix some issues:
- Fill
CASOS
column. This could be calculated based on the difference betweenPCR+
day by day - Reconstruct the time series. At the moment, the features reflect the snapshot of the day by day, but there are no features representing the increments between dates. For this pupose, the reconstruction of the time series consist on calculate those increments on new features. In this manner, the following features will reflect the variation for those data respect the day before:
IncHospitalizados
IncUCI
IncFallecidos
IncRescuperados
You will find some functions in order to print information relevant for you.
res = ...# Reconstructed dataframe
loc = 'MD'
current_date='2020-05-05'
displayInfo(res, location=loc, date=current_day)
showStats(res, location=loc, feature='CASOS', aggregate=False)
showStats(res, location='MD', feature='CASOS', aggregate=False)
showStats(res, location=loc, feature='IncFallecidos', aggregate=False)
# Diagnosed people minus recovered
showStats(res, location=loc, feature='ActiveCases')
MoMo dataset is aimed to mortality data. Is well prepared and no extra preprocessing is needed in order to have good insights. Here we have the features:
ambito
: nacional or ccaa (national or regional)cod_ambito
: empty if nacional. Region ISO code if regionalcod_ine_ambito
: INE region codenombre_ambito
: Name of the regioncod_sexo
: INE Sex code. man: 1, woman: 6nombre_sexo
: Sex name (hombres, mujeres)cod_gedad
: Age group: menos_65, 65_74, mas_74nombre_gedad
: name of age groupfecha_defuncion
: decease datedefunciones_observadas
: number of deceases observed (included delay corrections)defunciones_observadas_lim_inf
: inferior limit of confidence intervaldefunciones_observadas_lim_sup
: superior limit of confidence intervaldefunciones_esperadas
: expected deceasesdefunciones_esperadas_q01
: Percentil 1 of expectedsdefunciones_esperadas_q99
: Percentil 99 of expecteds
As is said at the link, the downloadable data file is updated daily and contains the latest available public data on COVID-19. Each row/entry contains the number of new cases reported per day and per country. You may use the data in line with ECDC’s copyright policy.
dateRep
: date of the record (format DD/MM/YYYY)day
: disaggregated data fromdateRep
month
: disaggregated data fromdateRep
year
: disaggregated data fromdateRep
cases
: number of new casesdeaths
: number of new deceasescountriesAndTerritories
: name of the countrygeoId
: two letter geolocation identifiercountryterritoryCode
: three letter country codepopData2018
: population accounted at 2018continentExp
: continent related to that country