Workshop
Computational Sustainability: Promises and Pitfalls from Theory to Deployment
Suzanne Stathatos · Christopher Yeh · Laura Greenstreet · Tarun Sharma · Katelyn Morrison · Yuanqi Du · Chenlin Meng · Sherrie Wang · Fei Fang · Pietro Perona · Yoshua Bengio
Room 238 - 239
Computational sustainability (CompSust) is an interdisciplinary research area that uses compu- tational methods to help address the 17 United Nations Sustainable Development Goals (UN SDGs), including but not limited to hunger and poverty reduction, infrastructure development, and environmental conservation. Computational sustainability is a two-way street: sustain- ability domains benefit from computational tools and methods and computational research areas benefit from the unique challenges that arise in attempting to address sustainability problems, including noisy and biased data, complex multi-agent systems, and multi-objective problems. Previous computational sustainability problems have led to new approaches in computer vision, reinforcement learning, multi-agent systems, and decision-focused learning. While computational sustainability problems span many domains, they share common challenges. This workshop will bring the community together to focus on two topics:1. The path from theory to deployment: Many challenges arise on the path from theory to deployment. This workshop will help researchers navigate this path by bringing together participants and speakers from academia, industry, and non-profits, highlighting successes going from theory to deployment, and facilitating collaboration.2. Promises and pitfalls: Advances on ML benchmarks do not always translate to improvements in computational sustainability problems, with contributing factors including low- signal-to-noise ratios, ever changing conditions, and biased or imbalanced data. However, due to the difficulties of publishing negative results, these findings rarely reach the community leading to duplicated effort and obscuring important gaps in existing methods.The goals of this workshop are to (i) identify pathways from theory to deployment, including best-practices and measures to quantify success, (ii) facilitate discussion and collaboration between participants from academia, industry, and the non-profit sector, and (iii) identify common failure modes and high-impact research directions, including “moonshot” challenges.
Schedule
Fri 6:45 a.m. - 7:00 a.m.
|
Opening Remarks
(
Discussion
)
>
SlidesLive Video |
🔗 |
Fri 7:00 a.m. - 7:30 a.m.
|
AI-for-climate: A call for impact-guided innovation
(
Invited Talk
)
>
SlidesLive Video Machine learning is increasingly being used to help tackle climate change, from optimizing electrical grids to emulating climate models and monitoring biodiversity. As such applications grow, however, it is becoming clear that high-powered ML tools often fall short. Methods designed using standard benchmarks may fail to capture the constraints or metrics of real-world problems, while a “one size fits all” approach ignores useful auxiliary information in specific applications. In this talk, we show how problem-centered design can lead to ML algorithms that are both methodologically innovative and highly impactful in the fight against climate change. |
David Rolnick 🔗 |
Fri 7:30 a.m. - 8:00 a.m.
|
Rewarding The Wild: Collaborative Machine Learning for the Natural World
(
Invited Talk
)
>
SlidesLive Video Nature has been deteriorating at rates unparalleled in human history and the implications are global. Unfortunately, we cannot value what we cannot measure. And we are failing to capture nature’s full contributions to society. In this talk, we argue that machine learning (ML) and specifically paying for forest data can play a significant role in responding to this critical call for action – but only when we develop collaborative algorithms and incentives in co-design with local and Indigenous communities that respect local ‘data’ realities. We will present our work at Gainforest, a global science-based non-profit and currently a Finalist of the $10M XPRIZE Rainforest, and how Gainforest is deploying real-world data payments on the ground in partnership with governments and conservation partners in the Global South to empower affordable top-down and bottom-up monitoring. |
David Dao 🔗 |
Fri 8:00 a.m. - 9:00 a.m.
|
Panel Discussion
(
Panel with Moderator
)
>
SlidesLive Video |
Eric Orenstein · Sherrie Wang · Grant Horn · Caleb Robinson · Emily Aiken · Carla Gomes 🔗 |
Fri 9:00 a.m. - 10:00 a.m.
|
Collaborathon
(
Breakout Groups
)
>
|
🔗 |
Fri 10:00 a.m. - 11:30 a.m.
|
Lunch
(
Break
)
>
|
🔗 |
Fri 11:30 a.m. - 12:30 p.m.
|
Poster Session
(
Poster Session
)
>
|
🔗 |
Fri 12:30 p.m. - 1:00 p.m.
|
Unlocking the Potential of Planetary-Scale Machine Learning for a Sustainable Future
(
Invited Talk
)
>
SlidesLive Video Remote sensing satellites capture peta-scale, multi-modal data capturing our dynamic planet across space, time, and spectrum. This rich data source holds immense potential for addressing local and planetary-scale challenges including food insecurity, poverty, climate change, and ecosystem preservation. Fully realizing this potential will require a new paradigm of machine learning approaches capable of tackling the unique character of remote sensing data. Machine learning approaches must be flexible enough to make use of the multi-modal multi-fidelity satellite data, process meter-scale observations over planetary scales, and generalize to the challenging diversity of remote sensing tasks. In this talk, I will present examples of how we are developing machine learning approaches for planetary data processing including self-supervised transformers for remote sensing data. I will also demonstrate how treating ML research and deployment as a unified approach instead of siloed steps leads to research advances that result in immediate societal impact, highlighting examples of how we are partnering directly with stakeholders to deploy our innovations in areas of critical need across the globe. |
Hannah Kerner 🔗 |
Fri 1:00 p.m. - 1:30 p.m.
|
Break
(
Break
)
>
|
🔗 |
Fri 1:30 p.m. - 2:30 p.m.
|
Spotlight Talks
(
Talk
)
>
SlidesLive Video The best submissions from each of our 4 categories (Promise, Pitfall, Theory, Deployment) will give lightning talks about their works followed by 5min Q&A for each |
🔗 |
Fri 2:30 p.m. - 3:00 p.m.
|
Lessons learned in deploying CV for ecology
(
Invited Talk
)
>
SlidesLive Video Good benchmark performance is the first step to impact, but is only a small piece of the complex system necessary to enable computer vision models to be deployed and trusted in sustainability and conservation applications – a system that requires human and computational infrastructure, iterative development, software support and maintenance, and continual quality control. I will speak about lessons learned in deployed computer vision systems for applications in ecology, discussing differences in what is needed for end users with unequal access to resources and expertise, different priorities and risks of failure, and different operational needs from real-time decision support to post-hoc analysis. |
Sara Beery 🔗 |
Fri 3:00 p.m. - 3:15 p.m.
|
Concluding remarks
(
Concluding remarks
)
>
SlidesLive Video |
🔗 |
-
|
Aggregate Representation Measure for Predictive Model Reusability
(
Poster
)
>
link
In this paper, we propose a predictive quantifier to estimate the retraining cost of a trained model in distribution shifts. The proposed Aggregated Representation Measure (ARM) quantifies the change in the model's representation from the old to new data distribution. It provides, before actually retraining the model, a single concise index of resources - epochs, energy, and carbon emissions - required for the retraining. This enables reuse of a model with a much lower cost than training a new model from scratch. The experimental results indicate that ARM reasonably predicts retraining costs for varying noise intensities and enables comparisons among multiple model architectures to determine the most cost-effective and sustainable option. |
Lokesh Vishwesh Sangarya · Richard Bradford · Jung-Eun Kim 🔗 |
-
|
AI for Whom? Shedding Critical Light on AI for Social Good
(
Poster
)
>
link
In recent years, AI for Social Good (AI4SG) projects have grown in scope and popularity, covering a variety of topics from climate change to education and being the subject of numerous workshops and conferences at a global scale. In the current article, we reflect upon AI4SG, its definition and its current limitations. We propose ways to address these limitations, from connecting with relevant disciplines to a better consideration of the constraints and context of project deployment. We conclude with a proposal to refocus the field of AI4SG around the concept of sustainability from a variety of angles, arguing that this will help the field evolve while taking its own impacts into account. |
Nyalleng Moorosi · Raesetje Sefala · Sasha Luccioni 🔗 |
-
|
REALLIGHT: DRL based Intersection Control in Developing Countries without Traffic Simulators
(
Poster
)
>
link
Effective traffic intersection control is a crucial problem for urban sustainability. State of the art research seeking Artificial Intelligence (AI), for example Deep Reinforcement Learning (DRL) based traffic control are using traffic simulators, ignoring the shortcomings of traffic simulators used to train the DRL control algorithms. These simulators are limited in capturing fine nuances in traffic flow changes, which can make the trained models unrealistic. This is especially true in developing countries, where traffic flow is non-laned and chaotic, and extremely hard to simulate based on standard microscopic-model based traffic simulation rules. In the given paper, we seek to do away with traffic simulators, and try to train DRL systems with 40 hours of real traffic data deploying cameras at a New Delhi busy traffic intersection, making intelligent traffic intersection control more realistic for developing countries, and hence has been termed as REALLIGHT. |
Sachin Chauhan · Rijurekha Sen 🔗 |
-
|
Multi-fidelity Bayesian Optimisation of Syngas Fermentation Simulators
(
Poster
)
>
link
A Bayesian optimization approach for maximizing the gas conversion rate in syngas fermentation is presented. We have access to an expensive-to-evaluate, computational fluid dynamic (CFD) reactor model and a cheap ideal-mixing based reactor model. The goal is to maximize the gas conversion rate with respect to the input variables. Due to the high cost of the industrial simulator, a multi-fidelity Bayesian optimization is adopted to solve the optimization problem using both high and low fidelities. We first describe the problem of syngas fermentation followed by our approach to solving simulator optimisation using multiple fidelities. We discuss concerns regarding significant differences in fidelity cost and their impact on fidelity-sampling and conclude with a discussion on the integration of real-world fermentation data. |
Mahdi Eskandari · Lars Puiman · Jakob Zeitler 🔗 |
-
|
Data-Driven Traffic Reconstruction for Identifying Stop-and-Go Waves
(
Poster
)
>
link
Identifying stop-and-go events (SAGs) in traffic flow presents an important avenue for advancing data-driven research for climate change mitigation and sustainability, owing to their substantial impact on carbon emissions, travel time, fuel consumption, and roadway safety. In fact, SAGs are estimated to account for 33-50% of highway driving externalities. However, insufficient attention has been paid to precisely quantifying where, when, and how often these SAGs take place––necessary for downstream decision-making, such as intervention design and policy analysis. A key challenge is that the data available to researchers and governments are typically sparse and aggregated to a granularity that obscures SAGs. To overcome such data limitations, this study thus explores the use of traffic reconstruction techniques for SAG identification. In particular, we introduce a kernel-based method for identifying spatiotemporal features in traffic and leverage bootstrapping to quantify the uncertainty of the reconstruction process. Experimental results on California highway data demonstrate the promise of the method for capturing SAGs. This work contributes to a foundation for data-driven decision-making to advance the sustainability of traffic systems. |
Shreyaa Raghavan · Edgar Ramirez Sanchez · Cathy Wu 🔗 |
-
|
Is the Facebook Ad Algorithm a Climate Discourse Influencer?
(
Poster
)
>
link
Sponsored climate discourse, driven by both climate contrarians and advocates, influences public attitudes towards climate change. We present an experimental study suggesting that the Facebook advertisement algorithm also influences climate discourse. The algorithm preferentially delivers ads to Facebook audiences in certain locations and demographics, at least partially based upon the ad image. Further, the algorithm is biased in terms of how it delivers ads featuring images of non-renewable sources of energy, and does not always fulfill targeting intentions as requested. This may result in inadvertent manipulation of ad delivery with consequences for climate discourse and algorithmic fairness. |
Aruna Sankaranarayanan · Erik Hemberg · Piotr Sapiezynski · Una-May O'Reilly 🔗 |
-
|
AI-Assisted System to Detect and Track Communal Roosts in Weather Radar Data
(
Poster
)
>
link
We have developed an AI-assisted system to annotate communal roosts of birds and bats in weather radar data. This system comprises detection, tracking, confounder filtering, and human screening components. We have deployed this system to gather information on swallows from 612,786 scans taken from 12 radar stations around the Great Lakes over 21 years. The 15,628 annotated roost signatures have uncovered population trends and phenological shifts in swallows and martins. These species are rapidly declining aerial insectivores, and the data gathered has facilitated crucial sustainability analyses. While human screening is still required with the deployed system, we estimate that the screening process is approximately 7$\times$ faster than manual annotation. Furthermore, we found that incorporating temporal signals enhances the deployed detector's performance, increasing the mean average precision (mAP) from 48\% to 56\%. Our ongoing work aims to expand the analysis to bird and bat roosts at a continental scale.
|
Wenlong Zhao · Gustavo Perez · Zezhou Cheng · Maria Belotti · Yuting Deng · Victoria Simons · Elske Tielens · Jeffrey Kelly · Kyle Horton · Subhransu Maji · Daniel Sheldon
|
-
|
\textit{Focus on What's Important!}\\ Inspecting Variational Distributions for \\ Gaussian Processes for better \textit{AQ} Station Deployment
(
Poster
)
>
link
In urban locales, the intricate dynamics of air quality indicators such as Particulate Matter (PM2.5) and Carbon Monoxide (CO) necessitate sophisticated modeling for precise prediction and monitoring. However, monitoring stations are sparse, and effective placement is a key problem in the domain. This study explores a novel approach utilizing Variational Multi-Task Gaussian Processes (VMTGP) endowed with a Spectral Mixture (SM) kernel to model the spatiotemporal distribution of these pollutants in Beijing, which beats the state-of-the-art Gaussian Process techniques on this dataset in the exact MTGP case. However, our innovation lies in an in-depth examination of the variational distribution of the inducing points, which are critical for scalability and accurate approximations in GP models. Through an empirical lens, we observe a pronounced clustering of inducing points around certain monitoring stations, hinting at a higher information content in these locales. Our findings underscore the inherent value in exploiting the clustering phenomenon of inducing points, opening up new vistas for enhancing the efficacy and interpretability of multi-task learning paradigms in air quality forecasting. This insight holds promise for developing more robust and localized air quality prediction models, crucial for urban planning and public health policy formulations, and adaptively deciding the most effective locations for placing AQ monitoring stations. |
Progyan Das · Mihir Agarwal 🔗 |
-
|
Segment Any Stream: Scalable Water Extent Detection with the Segment Anything Model
(
Poster
)
>
link
The accurate detection of water extent in streams and rivers is pivotal to understanding inland water hydrodynamics and terrestrial-aquatic interactions of biogeochemical cycles, in particular bank erosion and the resulting transfer of nutrient elements such as phosphorus (P). This is highly relevant to the United Nations Sustainable Development Goals (SDGs), notably sustainable land use (Goal 15) and climate change mitigation (Goal 13), as well as national to regional water quality efforts, notably the Mississippi River Basin nutrient reduction targets set by U.S. Environmental Protection Agency (EPA). Prior studies have employed a variety of computational methods, ranging from hand-crafted decision rules based on spectral indices to advanced image segmentation techniques. However, these methods are limited in their generalizability when implemented in new regions. Furthermore, the recent development of vision foundation models such as the Segment Anything Model (***SAM***) has brought about opportunities for water extent detection due to their exceptional generalization capabilities. ***SAM*** has few-shot or even zero-shot generalization, which emerged from pretraining a large model with an enormous amount of data. Nevertheless, the adaptation of these models remains challenging due to the computational overhead of fully fine-tuning the entire model and the potential degradation in their emergent capabilities when the data is out-of-distribution. Taking these desiderata into account, this work proposes $\underline{\textbf{S}}$egment $\underline{\textbf{A}}$ny $\underline{\textbf{S}}$tream ***SAS***, which employs the Low-Rank Adaptation (LoRA) method to perform low-rank updates on a pretrained ***SAM*** with a small amount of curated high-resolution aerial imagery to map the water extents in the Mackinaw watershed, a HUC-8 in central Illinois. Through our experiments, we show that ***SAS*** is lightweight yet highly effective: it enables efficient fine-tuning on a single consumer-grade GPU while achieving a high IoU of 0.76. This research highlights a generalizable framework for repurposing foundation models in computer vision to support river/stream segmentation. We believe this framework can benefit the accurate and scalable quantification of streambank erosion as assessed by bank migration and width changes over time, a significant source of sediment and nutrient losses in agricultural landscapes, and an important indicator of a variety of aspects of SDG. Code and data will be released upon paper acceptance.
|
Haozhen Zheng · Chenhui Zhang · Kaiyu Guan · Yawen Deng · Sherrie Wang · Bruce Rhoads · Andrew J Margenot · Shengnan Zhou · Sheng Wang 🔗 |
-
|
Cooperative Logistics: Can Artificial Intelligence Enable Trustworthy Cooperation at Scale?
(
Poster
)
>
link
Cooperative Logistics studies the setting where logistics companies pool their resources together to improve their individual performance. Prior literature suggests carbon savings of approximately 22%. If attained globally, this equates to 480,000,000 tonnes of CO2. Whilst well-studied in operations research – industrial adoption remains limited due to a lack of trustworthy cooperation. A key remaining challenge is fair and scalable gain sharing (i.e., how much should each company be fairly paid?). This paper introduces the novel algorithmic challenges that Cooperative Logistics offers AI, and novel applications of AI towards Cooperative Logistics. We further present findings from our initial experiments. |
Stephen Mak · Tim Pearce · Matthew Macfarlane · Liming Xu · Michael Ostroumov · Alexandra Brintrup 🔗 |
-
|
A table is worth a thousand pictures: Multi-modal contrastive learning in house burning classification in wildfire events
(
Poster
)
>
link
Wildfires have increased in frequency and duration over the last decade in the Western United States. This not only poses a risk to human life, but also results in billions of dollars in private and public infrastructure damages. As climate change potentially worsens the frequency and severity of wildfires, understanding their risk is critical for human adaptation and optimal fire prevention techniques. However, current fire spread models are often dependent on idealized fire and soil parameters, hard to compute, and not predictive of property damage. In this paper, we use a Dual Encoder (DE), a model with image and text embeddings that allows both image and text representations in the same latent space, to predict which houses will burn down in the event of wildfires. Our results indicate that the DE model achieves better performance than the baselines for image-only and text-only models (i.e. ResNet50 and XGBoost), and text or vision only models. Moreover, following other models in the literature, it outperform these models also in low-data regimes. |
Iván Higuera-Mendieta · Jeff Wen · Marshall Burke 🔗 |
-
|
Joint time–frequency scattering-enhanced representation for bird vocalization classification
(
Poster
)
>
link
Neural Networks (NNs) have been widely used in passive acoustic monitoring. Typically, audio is converted into a Mel Spectrogram as a preprocessing step before being fed into NNs. In this study, we investigate the Joint Time-Frequency Scattering transform as an alternative preprocessing technique for analyzing bird vocalizations.We highlight its superiority over the Mel Spectrogram because it captures intricate time-frequency patterns and emphasizes rapid signal transitions. While the Mel Spectrogram often gives similar importance to all sounds, the scattering transform differentiates between rapid and slow variations better. We use a Convolution Neural Network architecture and an attention-based transformer. Our results demonstrate that both the NN architectures can benefit from this enhanced preprocessing, where scattering transform can provide a more discriminative representation of bird vocalizations than the traditional Mel Spectrogram. |
Yimeng Min · Carla Gomes 🔗 |
-
|
Solving Satisfiability Modulo Counting Problems in Computational Sustainability with Guarantees
(
Poster
)
>
link
Many real-world problems in computational sustainability require tight integrations of symbolic and statistical AI. Interestingly, Satisfiability Modulo Counting (SMC) captures a wide variety of such problems. SMC searches for policy interventions to control probabilistic outcomes. Solving SMC is challenging because of its highly intractable nature ($NP^{PP}$-complete), incorporating statistical inference andsymbolic reasoning. Previous research on SMC solving lacks provable guarantees and/or suffers from sub-optimal empirical performance, especially when combinatorial constraints are present. We propose XOR-SMC, a polynomial algorithm with access to NP-oracles, to solve highly intractable SMC problems with constant approximation guarantees. XOR-SMC transforms the highly intractable SMC into satisfiability problems, replacing the model counting in SMC with SAT formulae subject to randomized XOR constraints. Experiments on solving important SMC problems in computational sustainability demonstrate that XOR-SMC finds solutions close to the true optimum, outperforming several baselines which struggle to find good approximations for the intractable model counting in SMC.
|
Jinzhao Li · Nan Jiang · Yexiang Xue 🔗 |
-
|
Model Evaluation for Geospatial Problems
(
Spotlight
)
>
link
Geospatial problems often involve spatial autocorrelation and covariate shift, which violate the independent, identically distributed assumption underlying standard cross-validation. In this work, we establish a theoretical criterion for unbiased cross-validation, introduce a preliminary categorization framework to guide practitioners in choosing suitable cross-validation strategies for geospatial problems, reconcile conflicting recommendations on best practices, and develop a novel, straightforward method with both theoretical and empirical guarantees. |
Jing Wang · Tyler Hallman · Laurel Hopkins · John Kilbride · W. Douglas Robinson · Rebecca Hutchinson 🔗 |
-
|
Moving targets: When does a poverty prediction model need to be updated?
(
Spotlight
)
>
link
A key challenge in the design of effective social protection programs is determining who should be eligible for program benefits. In low and middle-income countries, one of the most common criteria is a Proxy Means Test (PMT) -- a rudimentary application of machine learning that uses a short list of household characteristics to predict whether each household is poor, and therefore eligible, or non-poor, and therefore ineligible. Using nationwide survey data from six low and middle-income countries, this paper documents an important weakness in this use of machine learning: that the accuracy of the PMT prediction algorithm decreases steadily over time, by roughly 1.5-1.9 percentage points per year. We illustrate the implications of this finding for real-world anti-poverty programs, which typically update the PMT model only every 5-8 years, and then show that the aggregate effect can be decomposed into two forces: "model decay" caused by model drift, and "data decay" caused by changing household characteristics. Our final set of results show how an understanding of these forces can be used to optimize data collection policies to improve the efficiency of social protection programs. |
Emily Aiken · Tim Ohlenburg · Joshua Blumenstock 🔗 |
-
|
Unsupervised Domain Adaptation in the Real World: A Case Study
(
Spotlight
)
>
link
In real world applications of machine learning, adaptation to new domains (e.g. new regions, new populations, new sensors, or new points in time) has been shown to be an ongoing challenge. In unsupervised domain adaptation, the assumption is that the user has access to a large labeled set of source domain data, and the goal is to adapt to a new target domain without the use of any labeled target data. The open question is how unlabeled samples from the target domain should be incorporated into the model training process. In this work we document our experiences applying recently proposed unsupervised domain adaption techniques for object detection to a novel application domain: counting fish in sonar video. We find that: (i) prior works that show progress on standard domain adaptation benchmark datasets do not necessarily translate to our domain, (ii) validation methods are often unrealistic in these prior works, and (iii) higher complexity (in terms of implementation and parameters) techniques work better. We aim for this work to be a useful guide for other practitioners looking to use unsupervised domain adaptation techniques in real world applications. |
Justin Kay · Suzanne Stathatos · Grant Horn · Sara Beery · Pietro Perona · Siqi Deng · Erik Young 🔗 |
-
|
Satellite Imagery and AI: A New Era in Ocean Conservation, from Research to Deployment and Impact
(
Spotlight
)
>
link
Illegal, underreported, and unregulated (IUU) fishing poses a global threat to ocean habitats. Publicly available satellite data offered by NASA and the European Space Agency provide an opportunity to actively monitor this activity. Effectively leveraging satellite data for maritime conservation requires highly reliable machine learning models operating globally with minimal latency. This paper introduces three specialized computer vision models designed for various sensors across VIIRS, Sentinel-1, and Sentinel-2. It also presents best practices for developing and delivering real-time computer vision services for conservation. These models have been deployed in Skylight, a real time maritime monitoring platform, which is provided at no cost to users worldwide. |
Patrick Beukema · Favyen Bastani · Piper Wolters · Henry Herzog · Joseph Ferdinando 🔗 |