Skip to content

RaulRC/Covid-19

Repository files navigation

Covid19

Works on different public datasets

The purpose of this notebook is to provide an easy to run tool in order to have better insigts that those found in the press. Here we are going to handle with different files:

1. ISCIII Dataset

Dataset information

  • CCAA: region code
  • FECHA: date of the record
  • CASOS: new infections detected (mainly empty)
  • PCR+: referring to the CASOS means the new infections detected via a positive PCR test
  • TestAc+: Similar to the PCR+. Quick test focused on Antibodies
  • Hospitalizados: hospitalized people related with Covid-19
  • UCI: hospitalized people who require Intensive Care Units
  • Fallecidos: deceased people
  • Recuperados: recovered people

Some considerations

  • Separator: usually changes switching between ',' and ';'.
  • Data is being updated via appending the new data (one record for each autonomous region) day by day, so the last records correspond with the more recent information.
  • DISCLAIMER: data provided from media differs from data displayed at some points. Moreover, several other factors could change in this dataset (as the described separator or the addition of new columns).

Reconstruct time series

Most of the works performed on this data pointed to fix some issues:

  • Fill CASOS column. This could be calculated based on the difference between PCR+ day by day
  • Reconstruct the time series. At the moment, the features reflect the snapshot of the day by day, but there are no features representing the increments between dates. For this pupose, the reconstruction of the time series consist on calculate those increments on new features. In this manner, the following features will reflect the variation for those data respect the day before:
    • IncHospitalizados
    • IncUCI
    • IncFallecidos
    • IncRescuperados

Data exploration

You will find some functions in order to print information relevant for you.

res = ...# Reconstructed dataframe
loc = 'MD'
current_date='2020-05-05'
displayInfo(res, location=loc, date=current_day)
showStats(res, location=loc, feature='CASOS', aggregate=False)

General information

showStats(res, location='MD', feature='CASOS', aggregate=False)

Information by region: new cases

showStats(res, location=loc, feature='IncFallecidos', aggregate=False)

Information by region: deceases

# Diagnosed people minus recovered
showStats(res, location=loc, feature='ActiveCases')

Information by region: deceases

2. MoMo Dataset

MoMo dataset is aimed to mortality data. Is well prepared and no extra preprocessing is needed in order to have good insights. Here we have the features:

  • ambito: nacional or ccaa (national or regional)
  • cod_ambito: empty if nacional. Region ISO code if regional
  • cod_ine_ambito: INE region code
  • nombre_ambito: Name of the region
  • cod_sexo: INE Sex code. man: 1, woman: 6
  • nombre_sexo: Sex name (hombres, mujeres)
  • cod_gedad: Age group: menos_65, 65_74, mas_74
  • nombre_gedad: name of age group
  • fecha_defuncion: decease date
  • defunciones_observadas: number of deceases observed (included delay corrections)
  • defunciones_observadas_lim_inf: inferior limit of confidence interval
  • defunciones_observadas_lim_sup: superior limit of confidence interval
  • defunciones_esperadas: expected deceases
  • defunciones_esperadas_q01: Percentil 1 of expecteds
  • defunciones_esperadas_q99: Percentil 99 of expecteds

Deceases data

3. European Centre for Disease Prevention and Control

As is said at the link, the downloadable data file is updated daily and contains the latest available public data on COVID-19. Each row/entry contains the number of new cases reported per day and per country. You may use the data in line with ECDC’s copyright policy.

Columns

  • dateRep: date of the record (format DD/MM/YYYY)
  • day: disaggregated data from dateRep
  • month: disaggregated data from dateRep
  • year: disaggregated data from dateRep
  • cases: number of new cases
  • deaths: number of new deceases
  • countriesAndTerritories: name of the country
  • geoId: two letter geolocation identifier
  • countryterritoryCode: three letter country code
  • popData2018: population accounted at 2018
  • continentExp: continent related to that country

European cases and deaths