Timezone: »
In an effort to catalog insect biodiversity, we propose a new large dataset of hand-labelled insect images, the BIOSCAN-1M Insect Dataset. Each record is taxonomically classified by an expert, and also has associated genetic information including raw nucleotide barcode sequences and assigned barcode index numbers, which are genetic-based proxies for species classification. This paper presents a curated million-image dataset, primarily to train computer-vision models capable of providing image-based taxonomic assessment, however, the dataset also presents compelling characteristics, the study of which would be of interest to the broader machine learning community. Driven by the biological nature inherent to the dataset, a characteristic long-tailed class-imbalance distribution is exhibited. Furthermore, taxonomic labelling is a hierarchical classification scheme, presenting a highly fine-grained classification problem at lower levels. Beyond spurring interest in biodiversity research within the machine learning community, progress on creating an image-based taxonomic classifier will also further the ultimate goal of all BIOSCAN research: to lay the foundation for a comprehensive survey of global biodiversity. This paper introduces the dataset and explores the classification task through the implementation and analysis of a baseline classifier. The code repository of the BIOSCAN-1M-Insect dataset is available at https://github.com/zahrag/BIOSCAN-1M
Author Information
Zahra Gharaee (University of Waterloo)
ZeMing Gong (Simon Fraser University)
Nicholas Pellegrino (University of Waterloo)
Nicholas Pellegrino is a doctoral student in Systems Design Engineering at the University of Waterloo. He is supervised by Prof. Paul Fieguth and associated with the Vision and Image Processing (VIP) Lab and the Statistical Image Processing (SIP) Lab. His main research focus is on machine vision, specifically object recognition. In support of the BIOSCAN program, associated with the International Barcode of Life project, Nicholas has undertaken the task of taxonomic order-level insect image classification. Broadly, this research will enable a far more extensive and detailed understanding of global biodiversity and the interactions between species and ecosystems. During his master's degree (also in Systems Design Engineering at the University of Waterloo), Nicholas was associated with PhotoMedicine Labs, and was co-supervised by Dr. Parsin Haji Reza and Prof. Paul Fieguth. His main research contributions were in the areas of signal processing, multi-spectral unmixing, and chromophore-selective PARS® imaging with applications in PARS histology and ophthalmology. From his master's, Nicholas was awarded the Alumni Gold Medal, which recognizes the top graduating master’s student across the whole university for their academic achievement. Nicholas graduated from Mechatronics Engineering at the University of Waterloo in 2019, completed his master's degree with PhotoMedicine labs in 2022, and now is completing a PhD at the Vision and Image Processing lab.
Iuliia Zarubiieva (University of Guelph)
Joakim Bruslund Haurum (Aalborg University & Pioneer Centre for AI)
Scott Lowe (Vector Institute)
Jaclyn McKeown (University of Guelph)
Chris Ho (University of Guelph)
Joschka McLeod (University of Guelph)
Yi-Yun Wei (University of Guelph)
Jireh Agda (University of Guelph)
Sujeevan Ratnasingham (University of Guelph)
Dirk Steinke (University of Guelph)
Angel Chang (Simon Fraser University)
Graham Taylor (University of Guelph / Vector Institute)
Paul Fieguth (University of Waterloo)
More from the Same Authors
-
2020 : Building LEGO using Deep Generative Models of Graphs »
Rylee Thompson · Graham Taylor · Terrance DeVries · Elahe Ghalebi -
2021 Spotlight: Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · John Turner · Noah Maestre · Mustafa Mukadam · Devendra Singh Chaplot · Oleksandr Maksymets · Aaron Gokaslan · Vladimír Vondruš · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2021 : Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI »
Santhosh Kumar Ramakrishnan · Aaron Gokaslan · Erik Wijmans · Oleksandr Maksymets · Alexander Clegg · John Turner · Eric Undersander · Wojciech Galuba · Andrew Westbury · Angel Chang · Manolis Savva · Yili Zhao · Dhruv Batra -
2021 : An Empirical Study of Neural Kernel Bandits »
Michal Lisicki · Arash Afkanpour · Graham Taylor -
2022 : Fifteen-minute Competition Overview Video »
Dhruv Batra · Manolis Savva · Zsolt Kira · Vincent-Pierre Berges · Karmesh Yadav · Angel Chang · Andrew Szot · Alexander Clegg · Aaron Gokaslan -
2023 : Bandit-Driven Batch Selection for Robust Learning under Label Noise »
Michal Lisicki · Mihai Nica · Graham Taylor -
2023 : Zero-shot Clustering of Embeddings with Pretrained and Self-Supervised Learnt Encoders »
Scott Lowe · Joakim Bruslund Haurum · Sageev Oore · Thomas Moeslund · Graham Taylor -
2023 : Towards Stable Preferences for Stakeholder-aligned Machine Learning »
Haleema Sheraz · Stefan C Kremer · Gus Skorburg · Graham Taylor · Walter Sinnott-Armstrong · Kyle Boerstler -
2023 : HomeRobot: Open-Vocabulary Mobile Manipulation »
Sriram Yenamandra · Arun Ramachandran · Karmesh Yadav · Austin Wang · Mukul Khanna · Theophile Gervet · Tsung-Yen Yang · Vidhi Jain · Alexander Clegg · John Turner · Zsolt Kira · Manolis Savva · Angel Chang · Devendra Singh Chaplot · Dhruv Batra · Roozbeh Mottaghi · Yonatan Bisk · Chris Paxton -
2023 : Bandit-Driven Batch Selection for Robust Learning under Label Noise »
Michal Lisicki · Graham Taylor · Mihai Nica -
2023 : Zero-shot Clustering of Embeddings with Self-Supervised Learnt Encoders »
Scott Lowe · Joakim Bruslund Haurum · Sageev Oore · Thomas Moeslund · Graham Taylor -
2023 : BarcodeBERT: Transformers for Biodiversity Analysis »
Pablo Millan Arias · Niousha Sadjadi · Monireh Safari · ZeMing Gong · Austin T. Wang · Scott Lowe · Joakim Bruslund Haurum · Iuliia Zarubiieva · Dirk Steinke · Lila Kari · Angel Chang · Graham Taylor -
2022 Spotlight: Lightning Talks 6A-4 »
Xiu-Shen Wei · Konstantina Dritsa · Guillaume Huguet · ABHRA CHAUDHURI · Zhenbin Wang · Kevin Qinghong Lin · Yutong Chen · Jianan Zhou · Yongsen Mao · Junwei Liang · Jinpeng Wang · Mao Ye · Yiming Zhang · Aikaterini Thoma · H.-Y. Xu · Daniel Sumner Magruder · Enwei Zhang · Jianing Zhu · Ronglai Zuo · Massimiliano Mancini · Hanxiao Jiang · Jun Zhang · Fangyun Wei · Faen Zhang · Ioannis Pavlopoulos · Zeynep Akata · Xiatian Zhu · Jingfeng ZHANG · Alexander Tong · Mattia Soldan · Chunhua Shen · Yuxin Peng · Liuhan Peng · Michael Wray · Tongliang Liu · Anjan Dutta · Yu Wu · Oluwadamilola Fasina · Panos Louridas · Angel Chang · Manik Kuchroo · Manolis Savva · Shujie LIU · Wei Zhou · Rui Yan · Gang Niu · Liang Tian · Bo Han · Zhongcong XU · Guy Wolf · Yingying Zhu · Brian Mak · Difei Gao · Masashi Sugiyama · Smita Krishnaswamy · Rong-Cheng Tu · Wenzhe Zhao · Weijie Kong · Chengfei Cai · WANG HongFa · Dima Damen · Bernard Ghanem · Wei Liu · Mike Zheng Shou -
2022 Spotlight: MultiScan: Scalable RGBD scanning for 3D environments with articulated objects »
Yongsen Mao · Yiming Zhang · Hanxiao Jiang · Angel Chang · Manolis Savva -
2022 Competition: Habitat Rearrangement Challenge »
Andrew Szot · Karmesh Yadav · Alexander Clegg · Vincent-Pierre Berges · Aaron Gokaslan · Angel Chang · Manolis Savva · Zsolt Kira · Dhruv Batra -
2022 Poster: MultiScan: Scalable RGBD scanning for 3D environments with articulated objects »
Yongsen Mao · Yiming Zhang · Hanxiao Jiang · Angel Chang · Manolis Savva -
2022 Poster: Logical Activation Functions: Logit-space equivalents of Probabilistic Boolean Operators »
Scott Lowe · Robert Earle · Jason d'Eon · Thomas Trappenberg · Sageev Oore -
2021 : DeepRNG: Towards Deep Reinforcement Learning-Assisted Generative Testing of Software »
Chuan-Yung Tsai · Graham Taylor -
2021 : Neural Structure Mapping For Learning Abstract Visual Analogies »
Shashank Shekhar · Graham Taylor -
2021 Poster: Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · John Turner · Noah Maestre · Mustafa Mukadam · Devendra Singh Chaplot · Oleksandr Maksymets · Aaron Gokaslan · Vladimír Vondruš · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2021 Poster: Brick-by-Brick: Combinatorial Construction with Deep Reinforcement Learning »
Hyunsoo Chung · Jungtaek Kim · Boris Knyazev · Jinhwi Lee · Graham Taylor · Jaesik Park · Minsu Cho -
2021 Poster: Parameter Prediction for Unseen Deep Architectures »
Boris Knyazev · Michal Drozdzal · Graham Taylor · Adriana Romero Soriano -
2020 Poster: Instance Selection for GANs »
Terrance DeVries · Michal Drozdzal · Graham Taylor -
2020 Poster: MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation »
Saim Wani · Shivansh Patel · Unnat Jain · Angel Chang · Manolis Savva -
2020 Session: Orals & Spotlights Track 08: Deep Learning »
Graham Taylor · Mario Lucic -
2019 Poster: Understanding Attention and Generalization in Graph Neural Networks »
Boris Knyazev · Graham Taylor · Mohamed Amer -
2017 : Poster spotlights »
Hiroshi Kuwajima · Masayuki Tanaka · Qingkai Liang · Matthieu Komorowski · Fanyu Que · Thalita F Drumond · Aniruddh Raghu · Leo Anthony Celi · Christina Göpfert · Andrew Ross · Sarah Tan · Rich Caruana · Yin Lou · Devinder Kumar · Graham Taylor · Forough Poursabzi-Sangdeh · Jennifer Wortman Vaughan · Hanna Wallach -
2015 : Learning Multi-scale Temporal Dynamics with Recurrent Neural Networks »
Graham Taylor -
2011 Workshop: Big Learning: Algorithms, Systems, and Tools for Learning at Scale »
Joseph E Gonzalez · Sameer Singh · Graham Taylor · James Bergstra · Alice Zheng · Misha Bilenko · Yucheng Low · Yoshua Bengio · Michael Franklin · Carlos Guestrin · Andrew McCallum · Alexander Smola · Michael Jordan · Sugato Basu -
2011 Poster: Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines »
Matthew D Zeiler · Graham Taylor · Leonid Sigal · Iain Matthews · Rob Fergus -
2010 Poster: Pose-Sensitive Embedding by Nonlinear NCA Regression »
Graham Taylor · Rob Fergus · George Williams · Ian Spiro · Christoph Bregler -
2008 Poster: The Recurrent Temporal Restricted Boltzmann Machine »
Ilya Sutskever · Geoffrey E Hinton · Graham Taylor -
2006 Poster: Modeling Human Motion Using Binary Latent Variables »
Graham Taylor · Geoffrey E Hinton · Sam T Roweis -
2006 Spotlight: Modeling Human Motion Using Binary Latent Variables »
Graham Taylor · Geoffrey E Hinton · Sam T Roweis