TorchGeo 0.5.0 Release Notes

0.5.0 encompasses over 8 months of hard work and new features contributed by 20 users from around the world. Below, we detail specific features worth highlighting.

Highlights of this release

New command-line interface

TorchGeo has always had tight integration with PyTorch Lightning, including datamodules for common benchmark datasets and trainers for most computer vision tasks. TorchGeo 0.5.0 introduces a new command-line interface for model training based on LightningCLI. It can be invoked in two ways:

# If torchgeo has been installed
torchgeo
# If torchgeo has been installed, or if it has been cloned to the current directory
python3 -m torchgeo

It supports command-line configuration or YAML/JSON config files. Valid options can be found from the help messages:

# See valid stages
torchgeo --help
# See valid trainer options
torchgeo fit --help
# See valid model options
torchgeo fit --model.help ClassificationTask
# See valid data options
torchgeo fit --data.help EuroSAT100DataModule

Using the following config file:

trainer:
  max_epochs: 20
model:
  class_path: ClassificationTask
  init_args:
    model: "resnet18"
    in_channels: 13
    num_classes: 10
data:
  class_path: EuroSAT100DataModule
  init_args:
    batch_size: 8
  dict_kwargs:
    download: true

we can see the script in action:

# Train and validate a model
torchgeo fit --config config.yaml
# Validate-only
torchgeo validate --config config.yaml
# Calculate and report test accuracy
torchgeo test --config config.yaml

It can also be imported and used in a Python script if you need to extend it to add new features:

from torchgeo.main import main

main(["fit", "--config", "config.yaml"])

See the Lightning documentation for more details.

Self-supervised learning and Landsat

Self-supervised learning has become a dominant technique for model pre-training, especially in domains (like remote sensing) that are rich in data but lacking in large labeled datasets. The 0.5.0 release adds powerful trainers for the following SSL techniques:

BYOL [1]
MoCo [1, 2, 3]
SimCLR [1, 2]

large unlabeled datasets for multiple satellite platforms:

SeCo [1]
SSL4EO-L [1]
SSL4EO-S12 [1]

and the first ever models pre-trained on Landsat imagery. See our SSL4EO-L paper for more details.

Utilities for splitting GeoDatasets

In prior releases, the only way to create train/val/test splits of GeoDatasets was to use a Sampler roi. This limited the types of splits you could perform, and was unintuitive for users coming from PyTorch where the dataset can be split into multiple datasets. TorchGeo 0.5.0 introduces new splitting utilities for GeoDatasets in torchgeo.datasets, including:

random_bbox_assignment: randomly assigns each scene to a different split
random_bbox_splitting: randomly split each scene and assign each half to a different split
random_grid_cell_assignment: overlay a grid and randomly assign each grid cell to a different split
roi_split: split using a roi just like with Sampler
time_series_split: split along the time axis

Splitting with a Sampler roi is not yet deprecated, but users are encouraged to adopt the new dataset splitting utility functions.

GeoDatasets now accept lists as input

Previously, each GeoDataset accepted a single root directory as input. Now, users can pass one or more directories, or a list of files they want to include. At first glance, this doesn't seem like a big deal, but it actually opens a lot of possibilities for how users can construct GeoDatasets. For example, users can use custom filters:

files = []
for file in glob.glob("*.tif"):
    # check pixel QA band or metadata file
    if cloud_cover < 20:  # select images with minimal cloud cover
        files.append(file)
ds = Landsat8(files)

or use remote files from S3 buckets or Azure blob storage. Basically, as long as GDAL knows how to read the file, TorchGeo supports it, wherever the file lives.

Note that some datasets may not support a list of files if you also want to automatically download the dataset because we need to know the directory to download to.

Building a community

With over 50 contributors from around the world, we needed a better way to discuss ideas and share announcements. TorchGeo now has a public Slack channel! Join us and say hello 👋

Now that the majority of the features we've needed have been implemented, one of our goals for the next release is to improve our documentation and tutorials. Expect to see TorchGeo tutorials at all the popular ML/RS conferences next year! We're excited to meet our users in person and learn more about their unique use cases and needs.

Backwards-incompatible changes

GeoDataset: first parameter renamed from root to paths (#1442, #1597)
Trainers: many parameters renamed (#1541)
FAIR1M datamodule: *_split_pct parameters removed (#1275)
Inria datamodule: *_split_pct parameters removed (#1540)
SemanticSegmentationTask: changes to weights parameter (#1046)

Dependencies

Drop Python 3.7 and 3.8 support following NEP 29 (#1058, #1246)
Dependencies now listed in pyproject.toml (#1446)
Drop upper bounds on dependencies (#1480)
Lightly: new required dependency (#1252, #1285)
Lightning: extra dependencies now required (#1559)
Omegaconf: no longer a dependency (#1559)
Pandas: now supports v2.1 (#1537)
Pandas: new required dependency (#1586)
Scikit-Learn: no longer a dependency (#1063)
TorchMetrics: now supports v1 (#1465)

Datamodules

New datamodules:

EuroSAT 100 (#1130)
FireRisk (#1265)
L7 Irish (#1197)
L8 Biome (#1200)
SeCo (#1168)
SKIPP'D (#1267)
SSL4EO-L (#1332)
SSL4EO-L Benchmark (#1338)
SSL4EO-S12 (#1151)
SustainBench (#1253)

Changes to existing datamodules:

FAIR1M: add val/test splits, drop split parameters (#1275)
Inria: add val split, drop split parameters (#654, #1540)
RESISC45: better normalization (#1349)
So2Sat: support RGB-only mode (#1283)
So2Sat: control size of validation dataset (#1283)

New base classes:

BaseDataModule (#1260)

Changes to existing base classes:

GeoDataModule: automatically infer epoch length (#1257)
BaseDataModule: better error messages (#1307, #1441)

Datasets

New datasets:

BioMassters (#1560)
EuroSAT 100 (#1130)
FireRisk (#1265)
L7 Irish (#1197)
L8 Biome (#1200)
LandCover.ai Geo (#1126)
MapInWild (#1096, #1131)
NLCD (#1244)
PASTIS (#315)
Rwanda Field Boundary (#1574)
SeasoNet (#1466)
SKIPP'D (#1267, #1548)
SSL4EO-L (#1332, #1424)
SSL4EO-L Benchmark (#1338, #1431)
SSL4EO-S12 (#1151)
SustainBench (#1253)
Western USA Live Fuel Moisture (#1262)

Changes to existing datasets:

CDL: add years parameter (#1337)
CDL: add classes parameter (#1392)
CDL: map class labels to ordinal numbers (#1364, #1368)
CDL: return figure (#1369)
CMS Mangrove Canopy: return figure (#1369)
DFC2022: avoid interpolation in colormap (#1372)
FAIR1M: add val/test splits (#1275)
FAIR1M: add download support (#1275)
Inria: add validation split (#654, #1540)
SeCo: add seasons parameter (#1168)
SeCo: faster initialization (#1168)
SeCo: support new directory structure (#1235)
So2Sat: add version 3 (#1086, #1283)
UCMerced: fix image shape bug (#1238)
USAVars: return lat/lon of centroid (#1240)
USAVars: convert image to float32 (#1433)
USAVars: download from Hugging Face (#1453)

Changes to existing base classes:

GeoDataset: accept list of files or directories (#1427, #1442, #1597)
GeoDataset: add files property (#1442, #1597)
Intersection/UnionDataset: fix crs/res propagation (#1341, #1344)
RasterDataset: add dtype attribute (#1149)
RasterDataset: allow sampling outside bounds of image (#1329, #1344)

New utility functions:

Add utilities to split GeoDatasets (#536, #866)
BoundingBox has a new split function (#866)

Models

Changes to existing models:

RCF: add empirical sampling mode (#1339)

New pre-trained model weights:

GASSL (#1325)
SSL4EO-L (#1482)

Changes to existing pre-trained model weights:

SeCo: fix weight loading (#1234, #1593)

Samplers

Changes to existing samplers:

GridGeoSampler: don't change stride of last patch (#1245, #1329)

Trainers

New trainers:

MoCo (#1285, #1357)
Pixelwise Regression (#849, #1241, #1306)
SimCLR (#1252, #1357)

Changes to existing trainers:

Add ability to freeze backbones and decoders (#1290)
Fix support for datasets without a plot method (#1551, #1585)
BYOL: add random season contrast (#1168)
Classification: add class weights for cross entropy loss (#1592)
Semantic Segmentation: add class weights for cross entropy loss (#1221)
Semantic Segmentation: add support for pre-trained model weights (#1046)
Semantic Segmentation: fix ignore index weighting (#1245, #1331)

New base classes:

BaseTask (#1393, #1541)

Transforms

New transforms:

Random Grayscale (#1301)

Scripts

New scripts:

Add command-line script (#228, #1237, #1352, #1559)

Documentation

CDL: fix documented data source (#1248)
UCMerced: fix documented dataset size (#1291)
Remove buggy benchmarking tutorial (#1521)

Testing

Add Python 3.11 tests (#1180)
Ensure that none of our minimum version tests are skipped (#1276, #1587)
Improve CI concurrency robustness (#1412, #1423)
Test fewer models in trainers to avoid exceeding RAM (#1377)
Windows CI: replace pacman with choco (#1266)

Contributors

This release is thanks to the following contributors:

@AABNassim
@adamjstewart
@adrianboguszewski
@adriantre
@ashnair1
@briktor
@burakekim
@calebrob6
@dkosm
@estherrolf
@isaaccorley
@nilsleh
@nsutezo
@ntw-au
@pmandiola
@shradhasehgal
@Tarandeep97
@urbanophile
@wangyi111
@yichiac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.5.0

TorchGeo 0.5.0 Release Notes

Highlights of this release

New command-line interface

Self-supervised learning and Landsat

Utilities for splitting GeoDatasets

GeoDatasets now accept lists as input

Building a community

Backwards-incompatible changes

Dependencies

Datamodules

Datasets

Models

Samplers

Trainers

Transforms

Scripts

Documentation

Testing

Contributors

Contributors