Timezone: »

Affinity Workshop
New in ML 2
Haozhe Sun · Wenzhuo Liu · Joseph Pedersen

Tue Dec 07 11:00 AM -- 02:33 PM (PST) @

Is this your first time to a top conference? Have you ever wanted your own work recognized by this huge and active community? Do you encounter difficulties in polishing your ideas, experiments, paper writing, etc? Then, this session is exactly for you!

This year, we are organizing the New in ML workshop, co-locating with NeurIPS 2021. We are targeting anyone who has not published a paper at the NeurIPS main conference yet. We invited top researchers to review your work and share with you their experience. The best papers will get oral presentations!

Our biggest goal is to help you publish papers at next year’s NeurIPS conference, and generally provide you with the guidance you need to contribute to ML research fully and effectively!

 Tue 11:00 a.m. - 12:00 p.m. Invited Talk: Oriol Vinyals (DeepMind) (Keynote talk) Oriol Vinyals 🔗 Tue 12:00 p.m. - 12:15 p.m. Live Q&A session: Oriol Vinyals (DeepMind) (Live Q&A session) Oriol Vinyals 🔗 Tue 12:15 p.m. - 12:25 p.m. Contributed Talk (Oral): MAML is a Noisy Contrastive Learner (Oral)    Model-agnostic meta-learning (MAML) is one of the most popular and widely adopted meta-learning algorithms nowadays, which achieves remarkable success in various learning problems. Yet, with the unique design of nested inner-loop and outer-loop updates which govern the task-specific and meta-model-centric learning respectively, the underlying learning objective of MAML still remains implicit and thus impedes a more straightforward understanding of it. In this paper, we provide a new perspective of the working mechanism of MAML. We discover that MAML is analogous to a meta-learner using a supervised contrastive objective function, where the query features are pulled towards the support features of the same class and against those of different classes, in which such contrastiveness is experimentally verified via an analysis based on the cosine similarity. Moreover, we reveal that the vanilla MAML algorithm has an undesirable interference term originating from the random initialization and the cross-task interaction. We therefore propose a simple but effective technique, zeroing trick, to alleviate such interference, where extensive experiments are then conducted on both miniImagenet and Omniglot datasets to demonstrate the consistent improvement brought by our proposed technique thus validating its effectiveness. Chia-Hsiang Kao · Wei-Chen Chiu · Pin-Yu Chen 🔗 Tue 12:25 p.m. - 12:30 p.m. Live Q&A session: MAML is a Noisy Contrastive Learner (Live Q&A session) Chia-Hsiang Kao · Wei-Chen Chiu · Pin-Yu Chen 🔗 Tue 12:30 p.m. - 12:40 p.m. Contributed Talk (Oral): MAPLE: Microprocessor A Priori for Latency Estimation (Oral)    Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption. As such, neural architecture search (NAS) algorithms take these two constraints into account when generating a new architecture. However, efficiency metrics such as latency are typically hardware dependent requiring the NAS algorithm to either measure or predict the architecture latency.Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process. Here we propose Microprocessor A Priori for Latency Estimation MAPLE that does not rely on transfer learning or domain adaptation but instead generalizes to new hardware by incorporating a prior hardware characteristics during training. MAPLE takes advantage of a novel quantitative strategy to characterize the underlying microprocessor by measuring relevant hardware performance metrics, yielding a fine-grained and expressive hardware descriptor. Moreover, the proposed MAPLE benefits from the tightly coupled I/O between the CPU and GPU and their dependency to predict DNN latency on GPUs while measuring microprocessor performance hardware counters from the CPU feeding the GPU hardware. Through this quantitative strategy as the hardware descriptor, MAPLE can generalize to new hardware via a few shot adaptation strategy where with as few as 3 samples it exhibits a 3% improvement over state-of-the-art methods requiring as much as 10 samples. Experimental results showed that, increasing the few shot adaptation samples to 10 improves the accuracy significantly over the state-of-the-art methods by 12%. Furthermore, it was demonstrated that MAPLE exhibiting 8-10% better accuracy, on average, compared to relevant baselines at any number of adaptation samples. The proposed method provides a versatile and practical latency prediction methodology inferring DNN run-time on multiple hardware devices while not imposing any significant overhead for sample collection. Saad Abbasi · Alexander Wong · Mohammad Javad Shafiee 🔗 Tue 12:40 p.m. - 12:45 p.m. Live Q&A session: MAPLE: Microprocessor A Priori for Latency Estimation (Live Q&A session) Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption. As such, neural architecture search (NAS) algorithms take these two constraints into account when generating a new architecture. However, efficiency metrics such as latency are typically hardware dependent requiring the NAS algorithm to either measure or predict the architecture latency.Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process. Here we propose Microprocessor A Priori for Latency Estimation MAPLE that does not rely on transfer learning or domain adaptation but instead generalizes to new hardware by incorporating a prior hardware characteristics during training. MAPLE takes advantage of a novel quantitative strategy to characterize the underlying microprocessor by measuring relevant hardware performance metrics, yielding a fine-grained and expressive hardware descriptor. Moreover, the proposed MAPLE benefits from the tightly coupled I/O between the CPU and GPU and their dependency to predict DNN latency on GPUs while measuring microprocessor performance hardware counters from the CPU feeding the GPU hardware. Through this quantitative strategy as the hardware descriptor, MAPLE can generalize to new hardware via a few shot adaptation strategy where with as few as 3 samples it exhibits a 3% improvement over state-of-the-art methods requiring as much as 10 samples. Experimental results showed that, increasing the few shot adaptation samples to 10 improves the accuracy significantly over the state-of-the-art methods by 12%. Furthermore, it was demonstrated that MAPLE exhibiting 8-10% better accuracy, on average, compared to relevant baselines at any number of adaptation samples. The proposed method provides a versatile and practical latency prediction methodology inferring DNN run-time on multiple hardware devices while not imposing any significant overhead for sample collection. Saad Abbasi · Alexander Wong · Mohammad Javad Shafiee 🔗 Tue 12:45 p.m. - 12:55 p.m. Contributed Talk (Oral): XCI-Sketch: Extraction of Color Information from Images for Generation of Colored Outlines and Sketches (Oral)    Sketches are a medium to convey a visual scene from an individual's creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color clustering. The second method uses a generative adversarial network to develop a model that can generate colored sketches from previously unobserved images. We assess the results obtained through quantitative and qualitative evaluations. V MANUSHREE · Sameer Saxena · Parna Chowdhury · MANISIMHA VARMA MANTHENA · Harsh Rathod · Ankita Ghosh · Sahil Khose 🔗 Tue 12:55 p.m. - 1:00 p.m. Live Q&A session: XCI-Sketch: Extraction of Color Information from Images for Generation of Colored Outlines and Sketches (Live Q&A session) Sketches are a medium to convey a visual scene from an individual's creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color clustering. The second method uses a generative adversarial network to develop a model that can generate colored sketches from previously unobserved images. We assess the results obtained through quantitative and qualitative evaluations. V MANUSHREE · Sameer Saxena · Parna Chowdhury · MANISIMHA VARMA MANTHENA · Harsh Rathod · Ankita Ghosh · Sahil Khose 🔗 Tue 1:00 p.m. - 1:10 p.m. Contributed Talk (Oral): Guided Evolution for Neural Architecture Search (Oral)    Neural Architecture Search (NAS) methods have been successfully applied to image tasks with excellent results. However, NAS methods are often complex and tend to converge to local minima as soon as generated architectures seem to yield good results. In this paper, we propose G-EA, a novel approach for guided evolutionary NAS. The rationale behind G-EA, is to explore the search space by generating and evaluating several architectures in each generation at initialization stage using a zero-proxy estimator, where only the highest-scoring network is trained and kept for the next generation. This evaluation at initialization stage allows continuous extraction of knowledge from the search space without increasing computation, thus allowing the search to be efficiently guided. Moreover, G-EA forces exploitation of the most performant networks by descendant generation while at the same time forcing exploration by parent mutation and by favouring younger architectures to the detriment of older ones. Experimental results demonstrate the effectiveness of the proposed method, showing that G-EA achieves state-of-the-art results in NAS-Bench-201 search space in CIFAR-10, CIFAR-100 and ImageNet16-120, with mean accuracies of 93.98%, 72.12% and 45.94% respectively. Vasco Lopes · Bruno Degardin · Luís A. Alexandre 🔗 Tue 1:10 p.m. - 1:15 p.m. Live Q&A session: Guided Evolution for Neural Architecture Search (Live Q&A session) Neural Architecture Search (NAS) methods have been successfully applied to image tasks with excellent results. However, NAS methods are often complex and tend to converge to local minima as soon as generated architectures seem to yield good results. In this paper, we propose G-EA, a novel approach for guided evolutionary NAS. The rationale behind G-EA, is to explore the search space by generating and evaluating several architectures in each generation at initialization stage using a zero-proxy estimator, where only the highest-scoring network is trained and kept for the next generation. This evaluation at initialization stage allows continuous extraction of knowledge from the search space without increasing computation, thus allowing the search to be efficiently guided. Moreover, G-EA forces exploitation of the most performant networks by descendant generation while at the same time forcing exploration by parent mutation and by favouring younger architectures to the detriment of older ones. Experimental results demonstrate the effectiveness of the proposed method, showing that G-EA achieves state-of-the-art results in NAS-Bench-201 search space in CIFAR-10, CIFAR-100 and ImageNet16-120, with mean accuracies of 93.98%, 72.12% and 45.94% respectively. Vasco Lopes · Bruno Degardin · Luís A. Alexandre 🔗 Tue 1:15 p.m. - 2:15 p.m. Invited Talk: Yale Song (Microsoft Research) (Keynote talk) Yale Song 🔗 Tue 2:15 p.m. - 2:30 p.m. Live Q&A session: Yale Song (Microsoft Research) (Live Q&A session) Yale Song 🔗 Tue 2:30 p.m. - 2:33 p.m. Closing remark (Short closing remark) Joseph Pedersen 🔗 - Semantic Code Classification for Automated Machine Learning (Poster)    Generating a complete machine learning pipeline from a short description in natural language is a notoriously difficult task because the code is generally much longer than the description and uses a lot of information from different parts of the description. Using intermediate representation might help with this task since specific actions require specific information. In this work, we present a semantic code classification task and a way to represent the machine learning pipeline as a sequence of such semantic classes. Finally, we discuss methods for solving semantic code classification problem on the Natural Language to Machine Learning (NL2ML) dataset. Polina A. Guseva · Anastasia Drozdova 🔗 - Multiple Instance Learning for Brain Tumor Detection with Magnetic Resonance Spectroscopy Data (Poster)    Magnetic resonance spectroscopy (MRS) is a common tool for brain tumor detection. To help automate and improve upon today's clinical practice, we apply deep learning (DL) to distinguish between patients with and without tumors. In general, two problems arise in the application of DL for medical diagnosis. First, the amount of training data may be scarce, as it is limited by the number of patients who have acquired the medical condition in question. Second, the training data may be corrupted by various types of noise including labeling noise. Both of these problems are prominent in our data set. Furthermore, a varying number of spectra are available for the different patients. We address these issues by considering the task as a multiple instance learning (MIL) problem. Specifically, we aggregate multiple spectra from the same patient into a bag'' for classification. We also apply data augmentation techniques to increase the amount of available training data. To achieve the permutation invariance during the process of bags of spectra, we proposed two approaches: (1) to apply min-, max-, and average-pooling on the features of all samples in one bag and (2) to apply an attention mechanism. We tested these two approaches on two neural network structures, i.e., multi-layer perceptron (MLP) and an Inception variant. We demonstrate that classification performance is significantly improved when training on multiple instances rather than single spectra. We propose a simple data augmentation method, i.e., over-sampling instances from each patient to generate bags for MIL and show that this simple DA method could further improve the performance. Finally, we demonstrate that our proposed model outperforms manual classification by neuroradiologists according to most performance metrics. Diyuan Lu · Nenad Polomac · Iskra Gacheva · Elke Hattingen · Jochen Triesch 🔗 - A Data-driven Markov Chain Model for COVID-19 Transmission in South Korea (Poster)    Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2; COVID-19) has rapidly transmitted between people. Mathematical modeling of infectious diseases is emphasized to inform policy responses by capturing the ongoing pattern of COVID-19. We introduce epidemic model with the additional states such as a vaccinated and isolated state to closer to reality. A data-driven epidemic model based on the Markov chain would be a desirable approach to overcome the challenges that infer the latent states. To this aim, we take advantage of the reported data and underlying Markov chain dynamics. To verify our model, we set initial values of each states and estimated the state values by fitting COVID-19 dataset of South Korea. Throughout the investigation, it is confirmed that the proposed models can successfully estimate all states. Sujin Ahn · Minhae Kwon 🔗 - MAML is a Noisy Contrastive Learner (Poster) Model-agnostic meta-learning (MAML) is one of the most popular and widely adopted meta-learning algorithms nowadays, which achieves remarkable success in various learning problems. Yet, with the unique design of nested inner-loop and outer-loop updates which govern the task-specific and meta-model-centric learning respectively, the underlying learning objective of MAML still remains implicit and thus impedes a more straightforward understanding of it. In this paper, we provide a new perspective of the working mechanism of MAML. We discover that MAML is analogous to a meta-learner using a supervised contrastive objective function, where the query features are pulled towards the support features of the same class and against those of different classes, in which such contrastiveness is experimentally verified via an analysis based on the cosine similarity. Moreover, we reveal that the vanilla MAML algorithm has an undesirable interference term originating from the random initialization and the cross-task interaction. We therefore propose a simple but effective technique, zeroing trick, to alleviate such interference, where extensive experiments are then conducted on both miniImagenet and Omniglot datasets to demonstrate the consistent improvement brought by our proposed technique thus validating its effectiveness. Chia-Hsiang Kao · Wei-Chen Chiu · Pin-Yu Chen 🔗 - Guided Evolution for Neural Architecture Search (Poster) Neural Architecture Search (NAS) methods have been successfully applied to image tasks with excellent results. However, NAS methods are often complex and tend to converge to local minima as soon as generated architectures seem to yield good results. In this paper, we propose G-EA, a novel approach for guided evolutionary NAS. The rationale behind G-EA, is to explore the search space by generating and evaluating several architectures in each generation at initialization stage using a zero-proxy estimator, where only the highest-scoring network is trained and kept for the next generation. This evaluation at initialization stage allows continuous extraction of knowledge from the search space without increasing computation, thus allowing the search to be efficiently guided. Moreover, G-EA forces exploitation of the most performant networks by descendant generation while at the same time forcing exploration by parent mutation and by favouring younger architectures to the detriment of older ones. Experimental results demonstrate the effectiveness of the proposed method, showing that G-EA achieves state-of-the-art results in NAS-Bench-201 search space in CIFAR-10, CIFAR-100 and ImageNet16-120, with mean accuracies of 93.98%, 72.12% and 45.94% respectively. Vasco Lopes · Bruno Degardin · Luís A. Alexandre 🔗 - MAPLE: Microprocessor A Priori for Latency Estimation (Poster) Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption. As such, neural architecture search (NAS) algorithms take these two constraints into account when generating a new architecture. However, efficiency metrics such as latency are typically hardware dependent requiring the NAS algorithm to either measure or predict the architecture latency.Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process. Here we propose Microprocessor A Priori for Latency Estimation MAPLE that does not rely on transfer learning or domain adaptation but instead generalizes to new hardware by incorporating a prior hardware characteristics during training. MAPLE takes advantage of a novel quantitative strategy to characterize the underlying microprocessor by measuring relevant hardware performance metrics, yielding a fine-grained and expressive hardware descriptor. Moreover, the proposed MAPLE benefits from the tightly coupled I/O between the CPU and GPU and their dependency to predict DNN latency on GPUs while measuring microprocessor performance hardware counters from the CPU feeding the GPU hardware. Through this quantitative strategy as the hardware descriptor, MAPLE can generalize to new hardware via a few shot adaptation strategy where with as few as 3 samples it exhibits a 3% improvement over state-of-the-art methods requiring as much as 10 samples. Experimental results showed that, increasing the few shot adaptation samples to 10 improves the accuracy significantly over the state-of-the-art methods by 12%. Furthermore, it was demonstrated that MAPLE exhibiting 8-10% better accuracy, on average, compared to relevant baselines at any number of adaptation samples. The proposed method provides a versatile and practical latency prediction methodology inferring DNN run-time on multiple hardware devices while not imposing any significant overhead for sample collection. Saad Abbasi · Alexander Wong · Mohammad Javad Shafiee 🔗 - XCI-Sketch: Extraction of Color Information from Images for Generation of Colored Outlines and Sketches (Poster) Sketches are a medium to convey a visual scene from an individual's creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color clustering. The second method uses a generative adversarial network to develop a model that can generate colored sketches from previously unobserved images. We assess the results obtained through quantitative and qualitative evaluations. V MANUSHREE · Sameer Saxena · Parna Chowdhury · MANISIMHA VARMA MANTHENA · Harsh Rathod · Ankita Ghosh · Sahil Khose 🔗