Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add readme #1

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
321b0b8
add first version of readme
Nov 27, 2023
ed8d959
add first version of README
Nov 28, 2023
7fbc417
[DATALAD] Recorded changes
Nov 28, 2023
7c9c432
[DATALAD] Recorded changes
Nov 28, 2023
aa2b06e
[DATALAD] Recorded changes
Nov 28, 2023
868d03c
[DATALAD] added README
Nov 28, 2023
9960b71
adding parts from UNFCCC repo and automatically created DataLad READM…
Nov 28, 2023
6ce811d
Update README.md
crdanielbusch Nov 29, 2023
89002fc
Update README.md
crdanielbusch Nov 29, 2023
fcbd988
Update README.md
crdanielbusch Nov 29, 2023
3890456
Update README.md
crdanielbusch Nov 29, 2023
32a6aab
Update README.md
crdanielbusch Nov 29, 2023
349cd7e
Update README.md
crdanielbusch Nov 29, 2023
7e99679
Update README.md
crdanielbusch Nov 29, 2023
bae14ec
Update README.md
crdanielbusch Nov 29, 2023
a39a747
Update README.md
crdanielbusch Nov 29, 2023
f0b82e7
Update .gitignore
crdanielbusch Nov 29, 2023
44f6c01
Update README.md
crdanielbusch Nov 29, 2023
b8ce4b5
Update README.md
crdanielbusch Nov 29, 2023
c1aff32
Update README.md
crdanielbusch Nov 29, 2023
cbd3b37
Update README.md
crdanielbusch Nov 29, 2023
2e972e5
test
Nov 29, 2023
e95393f
Apply suggestions from code review
crdanielbusch Nov 29, 2023
03d11b0
Merge remote-tracking branch 'origin/main' into add-readme
Nov 29, 2023
1f27624
small changes to readme
Nov 29, 2023
ecbf4c7
[DATALAD RUNCMD] Read data for v230913.
Nov 30, 2023
2c18ba9
[DATALAD RUNCMD] Read data for v230428.
Nov 30, 2023
06f653f
Apply suggestions from code review
crdanielbusch Nov 30, 2023
9251287
formatting and small additions
Nov 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
__pycache__
venv
.doit.db.*
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
.doit.db
.DS_Store
.idea
*.ipynb
.ipynb_checkpoints
207 changes: 207 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
# Global CO2 from Cement Production Dataset

This repository downloads the Andrew dataset on global CO2 emissions from cement production from [Zenodo](https://zenodo.org/records/831454).
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
The dataset is converted to the PRIMAP2 format and provided in the csv based interchange format and the netCDF based native primap2 format. Several version of the dataset are available.
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved

crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
test
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
## Description

This repository downloads data on global CO2 emissions from cement production from [Zenodo](https://zenodo.org/records/831454).
The downloaded dataset can then be converted into CSV (.csv file extension) or NetCDF (.nc file extension) format. Converted data are available for the following versions:
| v231016 |[Zenodo](https://zenodo.org/records/10008931) |
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
| v230913 |[Zenodo](https://zenodo.org/records/8339353) |
| v230428 |[Zenodo](https://zenodo.org/records/7875557) |
| v220915 |[Zenodo](https://zenodo.org/records/7081360) |
| v220516 |[Zenodo](https://zenodo.org/records/6553090) |
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
The data management tool [DataLad](http://docs.datalad.org/en/stable/) is used to version control the data sets.
Commands to run the scripts are executed via the pydoit package.
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved

## DataLad datasets and how to use them

This repository is a [DataLad](https://www.datalad.org/) dataset. It provides
fine-grained data access down to the level of individual files, and allows for
tracking future updates. In order to use this repository for data retrieval,
[DataLad](https://www.datalad.org/) is required. It is a free and open source
command line tool, available for all major operating systems, and builds up on
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
Git and [git-annex](https://git-annex.branchable.com/) to allow sharing,
synchronizing, and version controlling collections of large files.
znichollscr marked this conversation as resolved.
Show resolved Hide resolved

## Installation

- Install datalad according to the [DataLad handbook](https://handbook.datalad.org/en/latest/intro/installation.html). It is recommended to install globally.
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
- Install [Python](https://www.python.org)
- [pydoit](https://pydoit.org/install.html)
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved

## Getting Started

### Clone the repository

A DataLad dataset can be `cloned` by running
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
```
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
datalad clone
```
Do not use **git clone** to download the repository! This way DataLad will not have the necessary
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
information to run the program. Once a dataset is cloned, it is a light-weight directory on your local machine.
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
At this point, it contains only small metadata and information on the identity
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
of the files in the dataset, but not actual *content* of the (sometimes large)
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
data files.
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved


### Easy Access
Users who simply want to retrieve the dataset have the option to access both the
original and extracted files with
```
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
dataland get <filename>
```
This command will trigger a download of the files, directories, or subdatasets
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
you have specified.

For example, the CSV file for the 2023/09/13 release can be downloaded with
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
```
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
datalad get extracted_data/v230913/Robbie_Andrew_Cement_Production_CO2_230913.csv
```
### Stay up-to-date
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved

DataLad datasets can be updated. The command `datalad update` will *fetch*
updates and store them on a different branch (by default
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
`remotes/origin/master`). Running

```
datalad update --merge
```

will *pull* available updates and integrate them in one go.
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved

### Find out what has been done

DataLad datasets contain their history in the ``git log``. By running ``git
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
log`` (or a tool that displays Git history) in the dataset or on specific
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
files, you can find out what has been done to the dataset or to individual
files by whom, and when.

## Contributing

For those who wish to contribute to the repository, below we go through the key commands you will need to use.

#### Set up the virtual environment with doit
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
```
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
doit setup_env
```
#### <a name="download"></a>Download the version from the command line.
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
This will download all files from Zenodo as they are for a specific version (note this version must already be in `versions.py`, if you want to add a new version, see the section on adding a new version below).
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
```
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
doit download_version --version <YYMMDD>
```
znichollscr marked this conversation as resolved.
Show resolved Hide resolved
#### <a name="convert"></a>Convert the data sets into CSV and NetCDF files.
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
```
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
doit read_version --version <YYMMDD>
```

znichollscr marked this conversation as resolved.
Show resolved Hide resolved
## <a name="newversion"></a> How to add a new version


To add a new version go to **versions.py** in the **src** directory and create a new value in the
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
`versions` dictionary. Fill all the required information similar to the previous entries.
For example, the value _v230913_ in the _versions_ dictionary describes the 13-Sep-2023 release.
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
````python
versions = {
"v230913": {
'date': '13-Sep-2023',
'ver_str_long': 'version 230913',
'ver_str_short': '230913',
"folder": "v230913",
"transpose": False,
"filename": "0. GCP-CEM.csv",
'ref': '10.5281/zenodo.8339353',
'ref2': '10.5194/essd-11-1675-2019',
'title': 'Global CO2 emissions from cement production',
'institution': "CICERO - Center for International Climate Research",
'filter_keep': {},
'filter_remove': {},
'contact': "johannes.guetschow@climate-resource.com",
'comment': ("Published by Robbie Andrew, converted to PRIMAP2 format by "
"Johannes Gütschow"),
'unit': 'kt * CO2 / year',
'country_code': True,
},
}
````
Then run the two commands `read_version` and `download_version` as described in ****.

## Help
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
Show all doit commands
```
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
doit help
```
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
See a list with possible doit commands specific to this repository
```
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
doit list
```

Get help on a specific command

```
doit help <command>
```

## Contributing
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
### Repository structure
- **.datalad/** contains config file for datalad
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
- **downloaded_data/** contains original data from Zenodo.
- **extracted_data/** contains data in .csv and .nc format
- **literature/** contains link to publication by Robbie M. Andrew. Can be downloaded with _datalad get_ command
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
- **src/**
- **download_version.py** downloads files from zenodo for a given version. The version to read will be taken from the command line using _argparse_.
- **download_version_datalad.py** calls datalad to run the data reading function.
- **helper_functions.py** contains a function to map country codes.
- **read_version.py** reads the data for a given version and saves to [PRIMAP2](https://primap2.readthedocs.io/en/stable/) native and
interchange format.
- **read_version_datalad.py** calls datalad to run the data reading function.
- **version.py** is a dictionary that contains metadata for each release. This file should be updated when [adding a new version](#a-namenewversiona-how-to-add-a-new-version)
- **dodo.py** defines pydoit commands.
- **pyproject.toml** configuration file
- **requirements.txt** requirements
- **requirements_dev.txt** development requirements
- **setup.cfg** requirements
- **setup.py** installs python packages

### Make sure to correctly set up the DataLad siblings
Git repositories can configure clones of a dataset as _remotes_ in order to fetch, pull, or push from and to them. A `datalad sibling` is the equivalent of a git clone that is configured as a remote.
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved

**Query information** about about all known siblings with
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
```
crdanielbusch marked this conversation as resolved.
Show resolved Hide resolved
datalad siblings
```

**Add a sibling** to allow pushing to github
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```
```sh

datalad siblings add --dataset . --name <name> --url git@github.com:JGuetschow/Global_CO2_from_cement_production.git
```
SSH-access is needed to run this command. Note that _name_ can be freely chosen.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SSH-access is needed to run this command. Note that _name_ can be freely chosen.
SSH-access is needed to run this command. Note that `name` can be freely chosen (we tend to just use "github" for GitHub siblings)


**Push to the github repository**
znichollscr marked this conversation as resolved.
Show resolved Hide resolved
```
datalad push --to <name>
```

znichollscr marked this conversation as resolved.
Show resolved Hide resolved
### Issues
znichollscr marked this conversation as resolved.
Show resolved Hide resolved
There always issues open regarding coding, some of them easy to resolve, some harder.

### Your ideas
znichollscr marked this conversation as resolved.
Show resolved Hide resolved
Contributing is ouf course not limited to the categories above. I you have ideas for improvements just open an issue or a discussion page to discuss you idea with the community.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Contributing is ouf course not limited to the categories above. I you have ideas for improvements just open an issue or a discussion page to discuss you idea with the community.
Contributing is ouf course not limited to the categories above. If you have ideas for improvements just open an issue or a discussion page to discuss your idea with the community.


### Technical HowTo for contributors
znichollscr marked this conversation as resolved.
Show resolved Hide resolved
As we have a datalad repository using github and gin the process of contributing code and data is a bit different from
znichollscr marked this conversation as resolved.
Show resolved Hide resolved
pure git repositories. As the data is only stored on gin, the gin repository is the source to start
from. As gin currently has a problem with forks (the annexed data is not
forked) we have to use branches for development and, thus, to contribute you
first need to contact the maintainers to get write access to the gin repository.
You have to clone the repository using ssh to be able to push to it.
For that you first need to store your public ssh key on the gin server
(settings -> SSH Keys).

### Instructions for merge requests
znichollscr marked this conversation as resolved.
Show resolved Hide resolved
Once you have everything set up you can create a new branch branch and work there.
When you're done create a pull request to integrate your work into the main
znichollscr marked this conversation as resolved.
Show resolved Hide resolved
branch. This should be done first on github to allow for discussions and review (gin servers don't have the same review features). Afterwards the changes
can be actually merged on gin (so that the annex is merged properly too).