v0.5.0
TorchGeo 0.5.0 Release Notes
0.5.0 encompasses over 8 months of hard work and new features contributed by 20 users from around the world. Below, we detail specific features worth highlighting.
Highlights of this release
New command-line interface
TorchGeo has always had tight integration with PyTorch Lightning, including datamodules for common benchmark datasets and trainers for most computer vision tasks. TorchGeo 0.5.0 introduces a new command-line interface for model training based on LightningCLI. It can be invoked in two ways:
# If torchgeo has been installed
torchgeo
# If torchgeo has been installed, or if it has been cloned to the current directory
python3 -m torchgeo
It supports command-line configuration or YAML/JSON config files. Valid options can be found from the help messages:
# See valid stages
torchgeo --help
# See valid trainer options
torchgeo fit --help
# See valid model options
torchgeo fit --model.help ClassificationTask
# See valid data options
torchgeo fit --data.help EuroSAT100DataModule
Using the following config file:
trainer:
max_epochs: 20
model:
class_path: ClassificationTask
init_args:
model: "resnet18"
in_channels: 13
num_classes: 10
data:
class_path: EuroSAT100DataModule
init_args:
batch_size: 8
dict_kwargs:
download: true
we can see the script in action:
# Train and validate a model
torchgeo fit --config config.yaml
# Validate-only
torchgeo validate --config config.yaml
# Calculate and report test accuracy
torchgeo test --config config.yaml
It can also be imported and used in a Python script if you need to extend it to add new features:
from torchgeo.main import main
main(["fit", "--config", "config.yaml"])
See the Lightning documentation for more details.
Self-supervised learning and Landsat
Self-supervised learning has become a dominant technique for model pre-training, especially in domains (like remote sensing) that are rich in data but lacking in large labeled datasets. The 0.5.0 release adds powerful trainers for the following SSL techniques:
large unlabeled datasets for multiple satellite platforms:
and the first ever models pre-trained on Landsat imagery. See our SSL4EO-L paper for more details.
Utilities for splitting GeoDatasets
In prior releases, the only way to create train/val/test splits of GeoDatasets was to use a Sampler roi
. This limited the types of splits you could perform, and was unintuitive for users coming from PyTorch where the dataset can be split into multiple datasets. TorchGeo 0.5.0 introduces new splitting utilities for GeoDatasets in torchgeo.datasets
, including:
random_bbox_assignment
: randomly assigns each scene to a different splitrandom_bbox_splitting
: randomly split each scene and assign each half to a different splitrandom_grid_cell_assignment
: overlay a grid and randomly assign each grid cell to a different splitroi_split
: split using aroi
just like with Samplertime_series_split
: split along the time axis
Splitting with a Sampler roi
is not yet deprecated, but users are encouraged to adopt the new dataset splitting utility functions.
GeoDatasets now accept lists as input
Previously, each GeoDataset accepted a single root directory as input. Now, users can pass one or more directories, or a list of files they want to include. At first glance, this doesn't seem like a big deal, but it actually opens a lot of possibilities for how users can construct GeoDatasets. For example, users can use custom filters:
files = []
for file in glob.glob("*.tif"):
# check pixel QA band or metadata file
if cloud_cover < 20: # select images with minimal cloud cover
files.append(file)
ds = Landsat8(files)
or use remote files from S3 buckets or Azure blob storage. Basically, as long as GDAL knows how to read the file, TorchGeo supports it, wherever the file lives.
Note that some datasets may not support a list of files if you also want to automatically download the dataset because we need to know the directory to download to.
Building a community
With over 50 contributors from around the world, we needed a better way to discuss ideas and share announcements. TorchGeo now has a public Slack channel! Join us and say hello 👋
Now that the majority of the features we've needed have been implemented, one of our goals for the next release is to improve our documentation and tutorials. Expect to see TorchGeo tutorials at all the popular ML/RS conferences next year! We're excited to meet our users in person and learn more about their unique use cases and needs.
Backwards-incompatible changes
- GeoDataset: first parameter renamed from
root
topaths
(#1442, #1597) - Trainers: many parameters renamed (#1541)
- FAIR1M datamodule:
*_split_pct
parameters removed (#1275) - Inria datamodule:
*_split_pct
parameters removed (#1540) - SemanticSegmentationTask: changes to
weights
parameter (#1046)
Dependencies
- Drop Python 3.7 and 3.8 support following NEP 29 (#1058, #1246)
- Dependencies now listed in
pyproject.toml
(#1446) - Drop upper bounds on dependencies (#1480)
- Lightly: new required dependency (#1252, #1285)
- Lightning: extra dependencies now required (#1559)
- Omegaconf: no longer a dependency (#1559)
- Pandas: now supports v2.1 (#1537)
- Pandas: new required dependency (#1586)
- Scikit-Learn: no longer a dependency (#1063)
- TorchMetrics: now supports v1 (#1465)
Datamodules
New datamodules:
- EuroSAT 100 (#1130)
- FireRisk (#1265)
- L7 Irish (#1197)
- L8 Biome (#1200)
- SeCo (#1168)
- SKIPP'D (#1267)
- SSL4EO-L (#1332)
- SSL4EO-L Benchmark (#1338)
- SSL4EO-S12 (#1151)
- SustainBench (#1253)
Changes to existing datamodules:
- FAIR1M: add val/test splits, drop split parameters (#1275)
- Inria: add val split, drop split parameters (#654, #1540)
- RESISC45: better normalization (#1349)
- So2Sat: support RGB-only mode (#1283)
- So2Sat: control size of validation dataset (#1283)
New base classes:
- BaseDataModule (#1260)
Changes to existing base classes:
- GeoDataModule: automatically infer epoch length (#1257)
- BaseDataModule: better error messages (#1307, #1441)
Datasets
New datasets:
- BioMassters (#1560)
- EuroSAT 100 (#1130)
- FireRisk (#1265)
- L7 Irish (#1197)
- L8 Biome (#1200)
- LandCover.ai Geo (#1126)
- MapInWild (#1096, #1131)
- NLCD (#1244)
- PASTIS (#315)
- Rwanda Field Boundary (#1574)
- SeasoNet (#1466)
- SKIPP'D (#1267, #1548)
- SSL4EO-L (#1332, #1424)
- SSL4EO-L Benchmark (#1338, #1431)
- SSL4EO-S12 (#1151)
- SustainBench (#1253)
- Western USA Live Fuel Moisture (#1262)
Changes to existing datasets:
- CDL: add years parameter (#1337)
- CDL: add classes parameter (#1392)
- CDL: map class labels to ordinal numbers (#1364, #1368)
- CDL: return figure (#1369)
- CMS Mangrove Canopy: return figure (#1369)
- DFC2022: avoid interpolation in colormap (#1372)
- FAIR1M: add val/test splits (#1275)
- FAIR1M: add download support (#1275)
- Inria: add validation split (#654, #1540)
- SeCo: add seasons parameter (#1168)
- SeCo: faster initialization (#1168)
- SeCo: support new directory structure (#1235)
- So2Sat: add version 3 (#1086, #1283)
- UCMerced: fix image shape bug (#1238)
- USAVars: return lat/lon of centroid (#1240)
- USAVars: convert image to float32 (#1433)
- USAVars: download from Hugging Face (#1453)
Changes to existing base classes:
- GeoDataset: accept list of files or directories (#1427, #1442, #1597)
- GeoDataset: add files property (#1442, #1597)
- Intersection/UnionDataset: fix crs/res propagation (#1341, #1344)
- RasterDataset: add dtype attribute (#1149)
- RasterDataset: allow sampling outside bounds of image (#1329, #1344)
New utility functions:
Models
Changes to existing models:
- RCF: add empirical sampling mode (#1339)
New pre-trained model weights:
Changes to existing pre-trained model weights:
Samplers
Changes to existing samplers:
Trainers
New trainers:
Changes to existing trainers:
- Add ability to freeze backbones and decoders (#1290)
- Fix support for datasets without a plot method (#1551, #1585)
- BYOL: add random season contrast (#1168)
- Classification: add class weights for cross entropy loss (#1592)
- Semantic Segmentation: add class weights for cross entropy loss (#1221)
- Semantic Segmentation: add support for pre-trained model weights (#1046)
- Semantic Segmentation: fix ignore index weighting (#1245, #1331)
New base classes:
Transforms
New transforms:
- Random Grayscale (#1301)
Scripts
New scripts:
Documentation
- CDL: fix documented data source (#1248)
- UCMerced: fix documented dataset size (#1291)
- Remove buggy benchmarking tutorial (#1521)
Testing
- Add Python 3.11 tests (#1180)
- Ensure that none of our minimum version tests are skipped (#1276, #1587)
- Improve CI concurrency robustness (#1412, #1423)
- Test fewer models in trainers to avoid exceeding RAM (#1377)
- Windows CI: replace pacman with choco (#1266)
Contributors
This release is thanks to the following contributors:
@AABNassim
@adamjstewart
@adrianboguszewski
@adriantre
@ashnair1
@briktor
@burakekim
@calebrob6
@dkosm
@estherrolf
@isaaccorley
@nilsleh
@nsutezo
@ntw-au
@pmandiola
@shradhasehgal
@Tarandeep97
@urbanophile
@wangyi111
@yichiac