The Deep4Dist dataset is a comprehensive, high-resolution remote sensing data product specifically designed for forest disturbance mapping. It comprises approximately 17,500 georeferenced image patches extracted from high-resolution digital orthophotos acquired in Rhineland-Palatinate, Germany. Each image patch measures 500 × 500 pixels at a spatial resolution of 20 cm and contains five spectral channels: red, green, blue, near-infrared (NIR), and a normalized digital surface model (nDSM). Together, these channels capture both spectral and structural information essential for distinguishing various forest disturbance types, including bark beetle damage, clear-cuts, and windthrow events.
Key Features:
- High Resolution: 20 cm spatial resolution enables fine-grained mapping of forest disturbances.
- Multiple Disturbance Classes: Bark beetle damage, clear-cut and windthrow.
- Multispectral & Structural Data: Five channels (RGB, NIR, and nDSM) provide detailed spectral and structural insights.
- Large-Scale Coverage: ~17,500 georeferenced image patches support robust statistical analysis and deep learning applications.
- Rigorous Curation: Data were generated from high-resolution digital orthophotos and ground disturbance records. Extensive quality control, including expert-based external validation.
- Deep Learning Ready: The dataset is organized and annotated for direct use in semantic segmentation tasks. Train (~70%), validation (~25%) and test (~5%) splits are provided.
Applications:
Deep4Dist is ideally suited for:
- Developing and validating deep learning models for forest disturbance mapping.
- Investigating the spatial dynamics of forest disturbances.
- Supporting adaptive forest management and conservation strategies.
- Integrating with medium-resolution satellite data for multi-modal forest disturbance
mapping.
Class Description:
The classes in the segmentation masks are encoded as integers ranging from 0 to 3, corresponding to: 0: Background 1: Bark beetle damage 2: Clear-cuts 3: Windthrow
Metadata Description:
The metadata.csv file contains the following fields and information: tile_name: corresponding to the image/mask name split: the assigned data partition set (train, validation, test) x_center: the x coordinate of the tile centroid (EPSG:25832) y_center: the y coordinate of the tile centroid (EPSG:25832) acquisition_date: aerial image acquisition/flight date
Spatial Data Description:
The tile_geometries.gpkg is a vector file containing the geometries (polygons) for each image sample (EPSG:25832). Coordinate Reference System: Both, images and masks are georeferenced using the EPSG: 25832 (DE_ETRS89_UTM32) crs.
Folder Structure:
Each folder set, train, validation and test, contains the subfolder "image" where the 5-channels aerial images and the subfolder "mask" contains the multiclass segmentation masks.
Additional Resources:
Code : The GitHub repository (https://github.com/enmanuelrodpau/deep4dist) contains code for model training and dataset validation. Model weights: The HuggingFace repository (https://huggingface.co/enmanuelrp/Dee4Dist-ResU-net-34) holds the pretrained model weights. Acknowledgement: (2025-03-20)