Workshop

Human Computation for Science and Computational Sustainability

Theodoros Damoulas · Thomas Dietterich · Edith Law · Serge Belongie

Project Page

Abstract

http://www.cs.cornell.edu/~damoulas/Site/HCSCS.html

Researchers in several scientific and sustainability fields have recently achieved exciting results by involving the general public in the acquisition of scientific data and the solution of challenging computational problems. One example is the eBird project (www.ebird.org) of the Cornell Lab of Ornithology, where field observations uploaded by bird enthusiasts are providing continent-scale data on bird distributions that support the development and testing of hypotheses about bird migration. Another example is the FoldIt project (www.fold.it), where volunteers interacting with the FoldIt software have been able to solve the 3D structures of several biologically important proteins.

Despite these early successes, the involvement of the general public in these efforts poses many challenges for machine learning. Human observers can vary hugely in their degree of expertise. They conduct observations when and where they see fit, rather than following carefully designed experimental protocols. Paid participants (e.g., from Amazon Mechanical Turk) may not follow the rules or may even deliberately mislead the investigators.

A related challenge is that problem instances presented to human participants can vary in difficulty. Some instances (e.g., of visual tasks) may be impossible for most people to solve. This leads to a bias toward easy instances, which can confuse learning algorithms.

A third issue with crowdsourcing is that in many of these problems, there is no available ground truth because the true quantities of interest are only indirectly observed. For example, the BirdCast project seeks to model the migration of birds. However, the eBird reports only provide observations of birds on or near the ground, rather than in migratory flight (which occurs predominantly at night). In such situations, it is hard to evaluate the accuracy of the learned models, because predictive accuracy does not guarantee that the values of latent variables are correct or that the model is identifiable.

This workshop will bring together researchers at the interface of machine learning, citizen science, and human computation. The goals of the workshop are i) to identify common problems, ii) to propose benchmark datasets, common practices and improved methodologies for dealing with such phenomena, iii) to identify methods for evaluating such models in the absence of ground truth, iv) to share approaches for implementing and deploying citizen science and human computation projects in scientific and sustainability domains, and v) to foster new connections between the scientific, sustainability, and human computation research communities.

There will be two awards (250$$ book vouchers) for Best Contribution for the oral and/or poster presentations sponsored by the Institute for Computational Sustainability (www.cis.cornell.edu/ics)

We welcome submissions* related to (but not limited to) the following topics:

• Biases, probabilistic observation processes, noise processes, and other imperfections in citizen science and human computation
• Novel citizen science and human computation projects in science and sustainability
• Human-machine interactions and human computation in science and sustainability
• Novel modeling paradigms for citizen science and human computation projects in science and sustainability
• Methods for recruitment, retention, and modeling of human participants
• Dataset shift and domain adaptation methods for citizen science and human computation projects
• Spatio-temporal active learning and general inference techniques on citizen science and human computation projects

*Every submission should include both a paper and a poster. The paper should be in standard nips format (final format, not a blind-author submission), up to 4 pages maximum and the poster in portrait format and up to 33” x 47“ (A0) dimensions.

Expected outcomes of this workshop include: a list of open problems, acquaintance with relevant work between scientific domains, a 5-year research roadmap, emerging collaborations between participants, and attracting more people to work in Computational Sustainability and Human Computation. In addition, we will explore avenues for organizing a future machine learning competition, on a large-scale HC problem, to foster and strengthen the community around a prototypical HC effort.

Video

Chat is not available.