FLAIR#2 Dataset and Datamodule Integration #2394

MathiasBaumgartinger · 2024-11-05T21:53:20Z

FLAIR#2 dataset

The FLAIR #2 <https://github.com/IGNF/FLAIR-2> dataset is an extensive dataset from the French National Institute of Geographical and Forest Information (IGN) that provides a unique and rich resource for large-scale geospatial analysis.
The dataset is sampled countrywide and is composed of over 20 billion annotated pixels of very high resolution aerial imagery at 0.2 m spatial resolution, acquired over three years and different months (spatio-temporal domains).

The FLAIR2 dataset is a dataset for semantic segmentation of aerial images. It contains aerial images, sentinel-2 images and masks for 13 classes.
The dataset is split into a training and test set.

Dataset features:

* over 20 billion annotated pixels
* aerial imagery
    * 5x512x512
    * 0.2m spatial resolution
    * 5 channels (RGB-NIR-Elevation)
* Sentinel-2 imagery
    * 10-20m spatial resolution
    * 10 spectral bands
    * snow/cloud masks (with 0-100 probability)
    * multiple time steps (T)
    * Tx10xWxH, T, W, H are variable
* label (masks)
    * 512x512
    * 13 classes

Dataset classes:

0: "building",
1: "pervious surface",
2: "impervious surface",
3: "bare soil",
4: "water",
5: "coniferous",
6: "deciduous",
7: "brushwood",
8: "vineyard",
9: "herbaceous vegetation",
10: "agricultural land",
11: "plowed land",
12: "other"  

If you use this dataset in your research, please cite the following paper:

* https://doi.org/10.48550/arXiv.2310.13336

Implementation Details

`NonGeoDataset`, `init()`

After discussions following #2303, we decided that at least until faulty mask data are fixed the flair2 ds will be of type NonGeoDataset. Other than with common NonGeoDatasets, FLAIR2 exposes a use_toy and use_sentinel argument. The use_toy-flag will instead use the toy data which is a small subset of data. The use_sentinel argument on the other hand decides whether a sample includes the augmented sentinel data provided by the maintainers of FLAIR2.

`_verify`, `_download`, `_extract`

As each of the splits/sample-types (i.e. [train, test], [aerial, sentinel, labels] are contained in a individual zip download, download and extraction has to happen multiple times. On the other hand, the toy dataset is contained in a singular zip. Furthermore, to map the super-patches of the sentinel data to the actual input image, a flair-2_centroids_sp_to_patch.json is required, which has to be equally has to be downloaded as an individual zip.

`_load_image`, `_load_sentinel`, `_load_target`

For storage reasons, the elevation (5th band) of the image is stored as a uint. The original height thus is multiplied by 5. We decided to divide the height by 5 to get the original height, to make the trained model more usable for other data. See Questions please.

As mentioned previously, additional metadata has to be used to get from the sentinel.npy to the actual area. Initially for debugging reasons, we implemented to return not the cropped image but the original data and the cropping-slices (i.e. indices). Consequently, the images can be plot in a more meaningful matter. Otherwise, the resolution is so low that one can hardly recognize features. This was crucial for debugging to find the correct logic (classic y, x instead of x, y ordering mistake). We do not know if this is smart for "production code". See Questions please.
Moreover, the dimensions of the sentinel data $T \times C=10 \times W \times H$ vary both $T$ and $W$, $H$. This is problematic for the datamodule. We have not done extensive research, but the varying dimensions seem to bug the module. Disabling the use_sentinel-flag will make the module work.

The labels include values from 1 to 19. The datapaper clearly mentions grouping classes $> 13$ into one class other due to underrepresentation. We followed this suggestion. Furthermore, rescaling from 0 to 12 was applied. See Questions please.

Questions

Do you consider the Elevation rescaling as distortion of the dataset? Shall I exclude it? The argument for it would be easier re-usability on new datasets.

For storage optimization reasons, this elevation information is multiplied by a factor of 5 and encoded as a 8bit unsigned integer datatype.

How shall we load/provide sentinel data? As cropped data or any other way. I do not see the current implementation as fit for production.
- Also, how do we want to plot it? The small red rectangle in the example plot above is the actual region. The low resolution is quite observable there.
Shall we rescale the Classes to start from 0? Shall we group the classes as suggested in the datapaper?
Check integrity in download_url does not seem to work (in unit-tests), why?
- I have to call an own check_integrity call otherwise it passes, even if md5s do not match.
The github actions on the forked repo produce a magic ruff error (https://github.com/MathiasBaumgartinger/torchgeo/actions/runs/11687694109/job/32556175383#step:7:1265). Can you help me resolve this mystery?

TODOs/FIXMEs

Extend tests for toy datasets and apply md5 check
Find correct band for plotting sentinel
Datamodule cannot handle sentinel data yet

…mg and msk) Updates in the custom raster dataset tutorial and the actual file documentation. The previous recommended approach (overriding `__get_item__`) is outdated. Refs: microsoft#2292 (reply in thread)

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

… type individually

…rrect rng ranges

Not fully functioning yet, contains copy paste from other datasets

Additionally, some small code refactors are done

…d refine plotting

Using the entire sentinel-2 image and a matplotlib patch to debug, otherwise it is really hard to find correct spot due to low resolution

…y()` for sentinel With the nested dict, it was not possible to download dynamically

…sion

md5s might change due to timestamps, this eases the process of changing md5

JacobJeppesen · 2024-11-28T13:00:24Z

Finally got it debugged, and it was a system error. Download, extract, and training on the aerial data seems to all work as it should. Sorry about the noise!

Completely unrelated to this PR, but it was a corruption in one of the zpools in ZFS. So if you start using that at some point, and you get weird errors that looks like disk errors, but checking the disk says everything is fine, then try checking the status on your zpools. Spent too much time identifying this 😑

adamjstewart · 2024-11-28T13:12:29Z

Can you resolve the merge conflicts so we can run CI on this PR?

…l and minor bug fixes

…ing inconsistencies in original dataset Naming inconsistencies: `flair-2_centroids_sp_to_patch.json` vs `flair_2_centroids_sp_to_patch.zip`

Refs: 1c2ca19

MathiasBaumgartinger · 2024-11-28T17:10:27Z

I think everything should be on track so far. I am getting a ruff error for unsorted tuples in the [datasets/datamodules]/__init.py__ files. I explicitly tried sorting them using with the vscode sort ascending command (8779f89), but apparently this did work 🤷‍♂️. Let me know if there is anything else I can do for you =)

adamjstewart · 2024-11-28T19:41:24Z

If you use ruff 0.8.0 it will sort __all__ correctly. You can also copy-n-paste the file from the latest version and add the new line.

nilsleh · 2024-11-29T07:03:23Z

torchgeo/datasets/flair2.py

+        """Get statistics (min, max, means, stdvs) for each used band in order.
+
+        Args:
+            split (str): Split for which to get statistics (currently only for train)


For the docstring notion in torchgeo, we do not include the type in the docstring again, only function arguments.

I think this will also resolve the failing docs test

nilsleh · 2024-11-29T07:04:02Z

torchgeo/datasets/flair2.py

+        return tensor
+
+    def _load_sentinel(self, path: Path) -> Tensor:
+        # FIXME: should this really be returned as a tuple?


This can be removed, right?

nilsleh · 2024-11-29T07:04:57Z

torchgeo/datasets/flair2.py

+            self.root,
+            md5=self.md5s.get(url, None) if self.checksum else None,
+        )
+        # FIXME: Why is download_url not checking integrity (tests run through)?


Is this fixed?

This was still an issue last time i checked. As mentioned in the first text of the PRQ, omehow, when I run the pytests with wrong md5 hashes, the integrity is not checked unless I explicitly call it here again (it is implicity called in download_url).

nilsleh · 2024-11-29T07:05:47Z

torchgeo/datasets/flair2.py

+            self.root,
+            md5=self.md5s.get(url, None) if self.checksum else None,
+        )
+        # FIXME: Why is download_url not checking integrity (tests run through)?


how about this FIXME?

I left this here as somewhat of a reminder, because it has the same behavior as the parent class. I.e. currently all tests pass even with wrong md5s.

nilsleh · 2024-11-29T07:06:56Z

torchgeo/datamodules/flair2.py

+        """
+        super().__init__(FLAIR2, batch_size, num_workers, **kwargs)
+
+        self.patch_size = _to_tuple(patch_size)


I think that's a good idea. Could either be included here, or in a separate PR.

torchgeo/datamodules/flair2.py

nilsleh · 2024-11-29T07:09:23Z

tests/data/flair2/FLAIR2/md5s.txt

@@ -0,0 +1,8 @@
+/home/mathias/Dev/forks/torchgeo/tests/data/flair2/FLAIR2/flair_2_labels_test.zip: b13c4a3cb7ebb5cadddc36474bb386f9


maybe remove the personal directory from this text file.

nilsleh · 2024-11-29T07:11:24Z

torchgeo/datasets/flair2.py

+
+        rgb_indices = [self.all_bands.index(band) for band in self.rgb_bands]
+        # Check if RGB bands are present in self.bands
+        if not all([band in self.bands for band in self.rgb_bands]):


The Codecoverage is indicating that the RGB Band Missing is not being hit, so I think you just need to add a separate plot test similar to

torchgeo/tests/datasets/test_eurosat.py

Line 110 in 2f3e8fd

def test_plot_rgb(self, dataset: EuroSAT, tmp_path: Path) -> None:

for example.

JacobJeppesen · 2024-11-30T14:55:58Z

torchgeo/datamodules/flair2.py

+            K.Normalize(mean=self.mean, std=self.std), data_keys=['image', 'mask']
+        )
+
+        self.augs = augs if augs is not None else self.aug


Is self.augs intended to act on the data somewhere? My immediate thought was that it would be related to the augmentations part of the base datamodule:

torchgeo/torchgeo/datamodules/geo.py

Lines 70 to 79 in fedf993

# Data augmentation

Transform = Callable[[dict[str, Tensor]], dict[str, Tensor]]

self.aug: Transform = K.AugmentationSequential(

K.Normalize(mean=self.mean, std=self.std), data_keys=None, keepdim=True

)

self.train_aug: Transform | None = None

self.val_aug: Transform | None = None

self.test_aug: Transform | None = None

self.predict_aug: Transform | None = None

But it's not in there, and setting it to an arbitrary value doesn't seem to do anything. Or perhaps I'm missing something?

self.aug is applied here if the split specific augmentations are not specified

That was also my understanding, and perhaps it's a typo where self.augs was intended to be self.aug, such that the augmentations would be applied automatically through the base datamodule. However, with the current implementation, if the user provides augmentations through the augs parameter in the FLAIR2 datamodule, they won't have an effect, as they are being added to self.augs, which doesn't seem to be applied to the data (as far as I can tell). I.e., maybe the intention was self.aug = augs if augs is not None else self.aug(?)

JacobJeppesen · 2024-12-01T09:05:25Z

torchgeo/datasets/flair2.py

+    def _load_image(self, path: Path) -> Tensor:
+        """Load a single image.
+
+        Args:
+            path: path to the image
+
+        Returns:
+            Tensor: the loaded image
+        """
+        with rasterio.open(path) as f:
+            array: np.typing.NDArray[np.int_] = f.read()
+            tensor = torch.from_numpy(array).float()
+            if 'B05' in self.bands:
+                # Height channel will always be the last dimension
+                tensor[-1] = torch.div(tensor[-1], 5)
+
+        return tensor


Should the bands perhaps be extracted here based on self.bands? E.g., something like

def _load_image(self, path: Path) -> Tensor: """Load a single image. Args: path: path to the image Returns: Tensor: the loaded image """ with rasterio.open(path) as f: array: np.typing.NDArray[np.int_] = f.read() tensor = torch.from_numpy(array).float() if 'B05' in self.bands: # Height channel will always be the last dimension tensor[-1] = torch.div(tensor[-1], 5) # Extract the bands to be used. E.g., self.bands=("B01", "B02", "B03") will extract the RGB bands. tensor = tensor[[int(band[-2:]) - 1 for band in self.bands]] return tensor

Then when a user has defined n bands in self.bands, the returned sample will only contain those bands, instead of all five. Perhaps self.bands should also be renamed to self.aerial_bands, and a self.sentinel_bands should be added(?)

refs: microsoft#2394 (comment)

… bands when indexing, include RGBMissing testcase refs: microsoft#2394 (comment), microsoft#2394 (comment)

refs: microsoft#2394 (comment)

MathiasBaumgartinger · 2024-12-09T09:46:02Z

Thanks everyone for reviewing. Tried to apply all suggested changes.

As for the checks, according to the log we face the following error: /home/docs/checkouts/readthedocs.org/user_builds/torchgeo/checkouts/2394/docs/api/datasets.rst:194: ERROR: "csv-table" widths do not match the number of columns in table (10).
Truly, I have messed something up here (i.e. something with line endings, separations and quotation marks). However, I resolved this problem and the error in the check seems to persist.

Error: reference before assignment

adamjstewart

Does anyone have general opinions on whether we should call this:

FLAIR2()

or:

FLAIR(version=2)

adamjstewart · 2024-12-12T00:20:47Z

docs/api/datasets/non_geo_datasets.csv

@@ -18,6 +18,7 @@ Dataset,Task,Source,License,# Samples,# Classes,Size (px),Resolution (m),Bands
 `FAIR1M`_,OD,Gaofen/Google Earth,"CC-BY-NC-SA-3.0","15,000",37,"1,024x1,024",0.3--0.8,RGB
 `Fields Of The World`_,"S,I",Sentinel-2,"Various","70795","2,3",256x256,10,MSI
 `FireRisk`_,C,NAIP Aerial,"CC-BY-NC-4.0","91,872",7,"320x320",1,RGB
+`FLAIR2`_,S,"IGN, Sentinel-2",OPENLICENSE-2.0,7741,13--18,512x512,0.2--20,"RGB+NIR+NDSM, MSI",


Suggested change

`FLAIR2`_,S,"IGN, Sentinel-2",OPENLICENSE-2.0,7741,13--18,512x512,0.2--20,"RGB+NIR+NDSM, MSI",

`FLAIR2`_,S,"IGN, Sentinel-2",OPENLICENSE-2.0,7741,13--18,512x512,0.2--20,"RGB+NIR+NDSM, MSI"

This should fix the doc tests

The correct SPDX license identifier is actually etalab-2.0, not OPENLICENSE-2.0. There are apparently a lot of "open license" licenses: https://spdx.org/licenses/

adamjstewart · 2024-12-12T00:27:14Z

tests/data/flair2/data.py

We seem to be creating a lot of fake test, is it possible to reduce this number and still test things appropriately?

adamjstewart · 2024-12-12T00:28:07Z

torchgeo/datamodules/flair2.py

+# FLAIR dataset is released under open license 2.0
+# ..... https://www.etalab.gouv.fr/wp-content/uploads/2018/11/open-licence.pdf
+# ..... https://ignf.github.io/FLAIR/#FLAIR2


No need for this, it's already included in the docs. The important thing is that this file is MIT licensed like the rest of TorchGeo.

adamjstewart · 2024-12-12T00:31:31Z

tests/datamodules/test_flair2.py

We can delete this file and instead test the dataset using a tests/conf/flair2.yaml file and 1 line of code in tests/trainers/test_segmentation.py. We are actively trying to get rid of tests/datamodules since it doesn't test compatibility.

adamjstewart · 2024-12-12T00:33:05Z

tests/datasets/test_flair2.py

+    def dataset(
+        self, monkeypatch: MonkeyPatch, tmp_path: Path, request: SubRequest
+    ) -> FLAIR2:
+        md5s = {


Could also just skip checksum=True and not bother monkeypatching any MD5s

adamjstewart · 2024-12-12T00:42:12Z

torchgeo/datasets/flair2.py

+        Raises:
+            DatasetNotFoundError
+
+        ..versionadded:: 0.7


Suggested change

..versionadded:: 0.7

.. versionadded:: 0.7

adamjstewart · 2024-12-12T00:42:25Z

torchgeo/datasets/flair2.py

+            sentinel_bands: which bands to load from sentinel data (B01, B02, ..., B10)
+
+        Raises:
+            DatasetNotFoundError


Needs a description for when this error is raised

adamjstewart · 2024-12-12T00:44:38Z

torchgeo/datasets/flair2.py

+
+    def _extract(self, file_path: str) -> None:
+        """Extract the dataset."""
+        assert isinstance(self.root, str | os.PathLike)


No need to assert this, it's already checked by mypy

adamjstewart · 2024-12-12T00:45:50Z

torchgeo/datasets/flair2.py

+            a matplotlib Figure with the rendered sample
+        """
+
+        def normalize_plot(tensor: Tensor) -> Tensor:


I would instead use .utils.percentile_normalization

adamjstewart · 2024-12-12T00:46:31Z

torchgeo/datasets/flair2.py

+        if showing_predictions:
+            predictions = sample['prediction'].numpy().astype('uint8').squeeze()
+
+        # Remove none available plots


Suggested change

# Remove none available plots

# Remove non-available plots

JacobJeppesen · 2024-12-13T09:13:05Z

Does anyone have general opinions on whether we should call this:
FLAIR2()
or:
FLAIR(version=2)

I think it'd make sense to use FLAIR(version=2), as it seems like each new version is a superset of the previous version. Most users will probably use the latest version, so if they are individual datasets, FLAIR1() and FLAIR2() might end up as somewhat unused datasets once FLAIR3() is released. As I understand #2303 (comment), once version 3 is released, version 1 and 2 data can be directly loaded from version 3 by filtering the files. So the lowest complexity solution would probably be a FLAIR() dataset, where version 3 is the full dataset, version 2 is reduced coverage by filtering files/area, and version 1 is the same reduced coverage, but only aerial.

MathiasBaumgartinger · 2024-12-13T09:31:27Z

Does anyone have general opinions on whether we should call this:
FLAIR2()
or:
FLAIR(version=2)
I think it'd make sense to use FLAIR(version=2), as it seems like each new version is a superset of the previous version. Most users will probably use the latest version, so if they are individual datasets, FLAIR1() and FLAIR2() might end up as somewhat unused datasets once FLAIR3() is released. As I understand #2303 (comment), once version 3 is released, version 1 and 2 data can be directly loaded from version 3 by filtering the files. So the lowest complexity solution would probably be a FLAIR() dataset, where version 3 is the full dataset, version 2 is reduced coverage by filtering files/area, and version 1 is the same reduced coverage, but only aerial.

I am in contact with @agarioud. If I do understand him correctly, I doubt that new datasets will have the requirement to be backward-compatible.

Unfortunately there is no direct compatibility with FLAIR#1/#2 apart the aerial images as we reworked the supervision (land cover and now LPIS)

But I agree. Adding versioning to FLAIR() will probably result in less dead code. From a design perspective:

We have a parent dataset FLAIR handling all the common logic. By passing a specific version (default will always be latest) a new child class is initialized which overrides version specific logic.
We have a parent datamodule FLAIRModule. Which passes a version down to datasets. I.e. probably only a single FLAIRModule is necessary (no inheritance).
Points 1 and 2 will be applied for FLAIRToy and FLAIRToyModule too (probably both will inherit from the corresponding non-toy classes.)

agarioud · 2024-12-13T10:15:04Z

The upcoming FLAIR-INC dataset will include all data from FLAIR#1 (aerial images) and FLAIR#2 (which added Sentinel-2 data that were previously NPY files covering larger extents but will now have the same spatial extent as the aerial patches and be in TIFF format). Additionally, it will introduce five new modalities and a second supervision dataset.
If needed, a toy dataset can already be shared with @MathiasBaumgartinger

Backward compatibility is not fully ensured. For example, aerial images will have one less channel, as DSM/DTM has been introduced as a separate modality. Additionally, the supervision dataset regarding land-cover has reordered classes.

Therefore, I also believe a FLAIR() dataset would probably be more efficient ?

JacobJeppesen · 2024-12-13T12:37:20Z

But I agree. Adding versioning to FLAIR() will probably result in less dead code. From a design perspective:

We have a parent dataset FLAIR handling all the common logic. By passing a specific version (default will always be latest) a new child class is initialized which overrides version specific logic.

We have a parent datamodule FLAIRModule. Which passes a version down to datasets. I.e. probably only a single FLAIRModule is necessary (no inheritance).

Points 1 and 2 will be applied for FLAIRToy and FLAIRToyModule too (probably both will inherit from the corresponding non-toy classes.)

This sounds like a good approach 👍

@agarioud sounds great. Looking forward to the release 🙂

adamjstewart · 2024-12-13T15:17:52Z

It's not really a matter of compatibility. Basically, we either use:

class FLAIR(NonGeoDataset, abc.ABC):
    # shared base class

class FLAIR2(FLAIR):
    # override specific stuff

or:

class FLAIR(NonGeoDataset):
    def __init__(self, version=2, ...):
        if version == 2:
            # override specific stuff

It's more a question of whether the devs think of this as a new version of an existing dataset or a new dataset. We usually go with the former because it offers a bit better reproducibility (if the default version gets changed, then reproducibility is broken). See our EuroSAT dataset for an example of this. I think the only cases where we use version instead are in MoCoTask and SimCLRTask. I'm fine with both solutions, just want to make sure we're all on the same page.

Mathias Baumgartinger and others added 30 commits September 13, 2024 12:14

docs: update recommended strategy for models with input and output (i…

b96c78b

…mg and msk) Updates in the custom raster dataset tutorial and the actual file documentation. The previous recommended approach (overriding `__get_item__`) is outdated. Refs: microsoft#2292 (reply in thread)

fix: grammar and formatting

0ce9b78

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

fix: grammar

d48acd7

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

Merge branch 'microsoft:main' into main

8bafb95

feat/WIP: draft for data.py generation file for FLAIR2 dataset

431220d

fix/WIP: fix formatting conventions and populate directories for each…

c9d989d

… type individually

fix/WIP: use alternative hashing algorithm for reproducibility and co…

8c16f69

…rrect rng ranges

feat/WIP: draft for datasets and docs

3f7fec7

Not fully functioning yet, contains copy paste from other datasets

refactor/WIP: mark TODOS, add some documentation

7c4eaf9

feat/WIP: first draft for test FLAIR2

21e5b08

feat: add FLAIR2 import to __init.py__

541630f

feat/WIP: adds sentinel 2 loading/plotting logic

4c1602f

Additionally, some small code refactors are done

Merge branch 'microsoft:main' into main

da45219

feat/WIP: update flair2 unit tests

4bdbe9a

fix&refactor/WIP: provide correct download address for toy dataset an…

8d6d687

…d refine plotting

fix/WIP: properly crop sentinel 2 data

563f7f4

Using the entire sentinel-2 image and a matplotlib patch to debug, otherwise it is really hard to find correct spot due to low resolution

fix/WIP: update cropping slices to match the actual size, fix `_verif…

6aeff90

…y()` for sentinel With the nested dict, it was not possible to download dynamically

feat/WIP: add test data creation of sentinel files

592c336

fix: update tests for FLAIR2 dataset

6d3a189

feat: add dummy data for flair2

9930f6a

fix: properly expose FLAIR2 dataet in __init__.py

ddef891

docs: update documentation of flair2 dataset

63ddc2f

feat: proper integrity checks using correct md5s

ed5ed59

feat: expose an option for using/not using sentinel data

a5af101

feat: add flair2 datamodule

6040cd1

Merge branch 'microsoft:main' into main

2704b3f

feat: new dummy data

96a2207

refactor: syntax for mypy and ruff

6e3301e

fix: bug where sentinel data could be of dimension 0 in first T dimen…

09d5fd6

…sion

feat: save md5s of newly created zips to txt

3393953

md5s might change due to timestamps, this eases the process of changing md5

Mathias Baumgartinger and others added 3 commits November 28, 2024 15:28

feat: update data generation for 1 to n mapping for sentinel to aeria…

676db57

…l and minor bug fixes

feat: 1:n mapping from sentinel to aerial in tests, also build in nam…

efc4cfd

…ing inconsistencies in original dataset Naming inconsistencies: `flair-2_centroids_sp_to_patch.json` vs `flair_2_centroids_sp_to_patch.zip`

Merge branch 'main' into main

25b1060

MathiasBaumgartinger changed the title ~~[DRAFT] FLAIR#2 Dataset and Datamodule Integration~~ FLAIR#2 Dataset and Datamodule Integration Nov 28, 2024

Mathias Baumgartinger added 2 commits November 28, 2024 18:00

fix: retry removing changes in the entire file

a33c143

Refs: 1c2ca19

refactor: sorting of __all__ tuple in ascending order (ruff)

8779f89

refactor: ruff 0.8 formatting

68a29e6

nilsleh reviewed Nov 29, 2024

View reviewed changes

JacobJeppesen reviewed Nov 30, 2024

View reviewed changes

JacobJeppesen reviewed Dec 1, 2024

View reviewed changes

Mathias Baumgartinger added 5 commits December 7, 2024 20:59

refactor(test-data): remove personal directories from printed md5s

6ad1c8f

docs(headers): add proper microsoft headings

e8cd3e5

refs: microsoft#2394 (comment)

feat(band-indexing): add sentinel bands and actually apply the chosen…

f6c52eb

… bands when indexing, include RGBMissing testcase refs: microsoft#2394 (comment), microsoft#2394 (comment)

fix(datamodule): typo self.augs instead of self.aug

61e2f2f

refs: microsoft#2394 (comment)

refactor: ruff changes

f49b84f

fix: initialize elevation and nir_r_g as null

ccb30be

Error: reference before assignment

adamjstewart reviewed Dec 12, 2024

View reviewed changes

fix: incorrect stds in pre-computed statistics

d6d62c6

fix: add explicit aerial and sentinel band naming to toy dataset

bf0b7fc

adamjstewart mentioned this pull request Dec 17, 2024

Add BigEarthNetv2 dataset #2371

Open

		@@ -0,0 +1,8 @@
		/home/mathias/Dev/forks/torchgeo/tests/data/flair2/FLAIR2/flair_2_labels_test.zip: b13c4a3cb7ebb5cadddc36474bb386f9

	# Data augmentation
	Transform = Callable[[dict[str, Tensor]], dict[str, Tensor]]
	self.aug: Transform = K.AugmentationSequential(
	K.Normalize(mean=self.mean, std=self.std), data_keys=None, keepdim=True
	)

	self.train_aug: Transform \| None = None
	self.val_aug: Transform \| None = None
	self.test_aug: Transform \| None = None
	self.predict_aug: Transform \| None = None

	`FLAIR2`_,S,"IGN, Sentinel-2",OPENLICENSE-2.0,7741,13--18,512x512,0.2--20,"RGB+NIR+NDSM, MSI",
	`FLAIR2`_,S,"IGN, Sentinel-2",OPENLICENSE-2.0,7741,13--18,512x512,0.2--20,"RGB+NIR+NDSM, MSI"

FLAIR#2 Dataset and Datamodule Integration #2394

Are you sure you want to change the base?

FLAIR#2 Dataset and Datamodule Integration #2394

Conversation

MathiasBaumgartinger commented Nov 5, 2024 • edited Loading

FLAIR#2 dataset

Implementation Details

NonGeoDataset, __init()__

_verify, _download, _extract

_load_image, _load_sentinel, _load_target

Questions

TODOs/FIXMEs

JacobJeppesen commented Nov 28, 2024

adamjstewart commented Nov 28, 2024

MathiasBaumgartinger commented Nov 28, 2024

adamjstewart commented Nov 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JacobJeppesen Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MathiasBaumgartinger commented Dec 9, 2024

adamjstewart left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JacobJeppesen commented Dec 13, 2024

MathiasBaumgartinger commented Dec 13, 2024 • edited Loading

agarioud commented Dec 13, 2024 • edited Loading

JacobJeppesen commented Dec 13, 2024

adamjstewart commented Dec 13, 2024

MathiasBaumgartinger commented Nov 5, 2024 •

edited

Loading

`NonGeoDataset`, `init()`

`_verify`, `_download`, `_extract`

`_load_image`, `_load_sentinel`, `_load_target`

JacobJeppesen Dec 2, 2024 •

edited

Loading

MathiasBaumgartinger commented Dec 13, 2024 •

edited

Loading

agarioud commented Dec 13, 2024 •

edited

Loading