About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
AGU Fall 2023
Conference paper
Area Sampling for Training Geospatial Foundation Models
Abstract
To accurately train geospatial unsupervised models, ensuring diversity and integrity in the datasets is paramount. This study presents a novel method that focuses on enhancing the diversity of statistics within geospatial information, providing a more accurate representation of the underlying geographical characteristics. Our approach involves extracting multiple statistics, including land use, temperature, and precipitation, from specific areas at resolutions finer than the defined tiles. By clustering similar geographical statistics, we create distinct clusters enabling a more comprehensive understanding of the data distribution. To ensure representative sampling from each cluster, we count the data points within each area and establish weighted sampling. To enhance diversity, our method down-weights higher frequency data points, favoring less frequent data for sampling. This strategy guarantees a balanced representation across the entire dataset, enhancing the overall accuracy of the geospatial foundation model. The results of our study demonstrate the potential in optimizing geospatial data sampling for a wide array of applications and modeling tasks, ultimately leading to improved model accuracy and broader practicality.