Timezone: »

Affinity Workshop
Women in Machine Learning
Mariam Arab · Konstantina Palla · Sergul Aydore · Gloria Namanya · Beliz Gunel · Kimia Nadjahi · Soomin Aga Lee

Mon Nov 28 05:30 AM -- 11:00 AM (PST) @ Hall I-2

Register for our workshop on 11/28/2022 here
Join us for a night of food, music, and networking on 11/27/2022 and RSVP here

The Women in Machine Learning (WiML) workshop started in 2006 as a way of creating connections within the small community of women working in machine learning to encourage mentorship, networking, and the interchange of ideas. The workshop has attracted representatives from academia and industry, whose talks showcase some of the cutting-edge research done by women. In addition to technical presentations and discussions, the workshop aims to incite debate on future research avenues and career choices for machine learning professionals.

 Mon 5:30 a.m. - 6:30 a.m. Breakfast (Break) 🔗 Mon 6:30 a.m. - 6:45 a.m. Opening remarks - Senior Program Chair (Opening remarks) 🔗 Mon 6:45 a.m. - 7:00 a.m. D&I Chair remarks (Remarks) 🔗 Mon 7:00 a.m. - 7:10 a.m. Contributed talk (Tejaswi Kasarla) - "Maximum Class Separation as Inductive Bias in One Matrix" (Contributed talk)    Maximizing the separation between classes constitutes a well-known inductive bias in machine learning and a pillar of many traditional algorithms. By default, deep networks are not equipped with this inductive bias and therefore many alternative solutions have been proposed through differential optimization. Current approaches tend to optimize classification and separation jointly: aligning inputs with class vectors and separating class vectors angularly. This work proposes a simple alternative: encoding maximum separation as an inductive bias in the network by adding one fixed matrix multiplication before computing the softmax activations. The main observation behind our approach is that separation does not require optimization but can be solved in closed-form prior to training and plugged into a network. We outline a recursive approach to obtain the matrix consisting of maximally separable vectors for any number of classes, which can be added with negligible engineering effort and computational overhead. Despite its simple nature, this one matrix multiplication provides real impact. We show that our proposal directly boosts classification, long-tailed recognition, out-of-distribution detection, and open-set recognition, from CIFAR to ImageNet. Tejaswi Kasarla 🔗 Mon 7:10 a.m. - 7:20 a.m. Contributed talk (Taiwo Kolajo) - "Pre-processing of Social Media Feeds based on Integrated Local Knowledge Base" (Contributed talk)    Most of the previous studies on the semantic analysis of social media feeds have not considered the issue of ambiguity that is associated with slangs, abbreviations, and acronyms that are embedded in social media posts. These noisy terms have implicit meanings and form part of the rich semantic context that must be analysed to gain complete insights from social media feeds. This paper proposes an improved framework for pre-processing of social media feeds for better performance. To do this, the use of an integrated knowledge base (ikb) which comprises a local knowledge source (naijalingo), urban dictionary and internet slang was combined with the adapted Lesk algorithm to facilitate semantic analysis of social media feeds to resolve the ambiguity in the usage of slangs/acronyms/abbreviations. Experimental results showed that the proposed approach performed better than existing methods when it was tested on three machine learning models, which are support vector machines, multilayer perceptron, and convolutional neural networks. The framework had an accuracy of 94.07% on a standardized dataset, and 99.78% on localised dataset when used to extract sentiments from tweets. The improved performance on the localised dataset reveals the advantage of integrating the use of local knowledge sources into the process of resolving social media feeds particularly in handling slangs/acronyms/abbreviations that have contextually rooted meanings. Taiwo Kolajo 🔗 Mon 7:20 a.m. - 7:55 a.m. Invited talk (Dr Alice Oh) - " The importance of multiple languages and multiple cultures in NLP research" (Invited talk)    Among the thousands of human languages used throughout the world, NLP researchers have so far focused on only a handful. This is understandable from the perspective that resources and researchers are not readily available for all languages, but nevertheless it is a profound limitation of our research community, one that must be addressed. I will discuss research on Korean and other low- to medium-resource languages and share the interesting findings that extend beyond the linguistic differences. I will share our work on ethnic bias in BERT language models in six different languages which particularly illustrates the importance of studying multiple languages. I will describe our efforts in building a benchmark dataset for Korean and the main challenge of building the dataset when the sources of data are much smaller compared to English and other major languages. I will also share some preliminary results of working with non-native speakers who can potentially contribute to research in low-resource languages. Through this talk, I hope to inspire NLP researchers, myself included, to actively engage in a diverse set of languages and cultures. Alice Oh 🔗 Mon 7:55 a.m. - 8:15 a.m. Coffee Break (Break) 🔗 Mon 8:15 a.m. - 8:25 a.m. WiML President Remarks (Remarks) 🔗 Mon 8:25 a.m. - 9:00 a.m. Invited talk (Raesetje Sefala) - "Constructing visual datasets to answer research questions" (Invited talk) Raesetje Sefala 🔗 Mon 9:00 a.m. - 9:10 a.m. Contributed talk (Pascale Gourdeau) - When are Local Queries Useful for Robust Learning? (Contributed talk)    Distributional assumptions have been shown to be necessary for the robust learnability of concept classes when considering exact-in-the-ball robust risk and access to random examples by Gourdeau et al. (2019). In this paper, we study learning models where the learner is given more power through the use of \emph{local} queries and give the first \emph{distribution-free} algorithms that perform robust empirical risk minimization (ERM) for this notion of robustness. The first learning model we consider uses local membership queries (LMQ), where the learner can query the label of points near the training sample. We show that, under the uniform distribution, LMQs do not increase the robustness threshold of conjunctions and any superclass, e.g., decision lists and halfspaces. Faced with this negative result, we introduce the local \emph{equivalence} query oracle, which returns whether the hypothesis and target concept agree in the perturbation region around a point in the training sample, as well as a counterexample if it exists. We show a separation result: on one hand, if the query radius $\lambda$ is strictly smaller than the adversary's perturbation budget $\rho$, then distribution-free robust learning is impossible for a wide variety of concept classes; on the other hand, the setting $\lambda=\rho$ allows us to develop robust ERM algorithms. We then bound the query complexity of these algorithms based on online learning guarantees and further improve these bounds for the special case of conjunctions. We finish by giving robust learning algorithms for halfspaces with margins on both $\boolhc$ and $\mathbb{R}^n$ Pascale Gourdeau 🔗 Mon 9:10 a.m. - 9:20 a.m. Contributed talk (Annie S Chen) - "You Only Live Once: Single-Life Reinforcement Learning" (Contributed talk)    Reinforcement learning algorithms are typically designed to learn a performant policy that can repeatedly and autonomously complete a task, usually starting from scratch. However, in many real-world situations, the goal might not be to learn a policy that can do the task repeatedly, but simply to perform a new task successfully once in a single trial. For example, imagine a disaster relief robot tasked with retrieving an item from a fallen building, where it cannot get direct supervision from humans. It must retrieve this object within one test-time trial, and must do so while tackling unknown obstacles, though it may leverage knowledge it has of the building before the disaster. We formalize this problem setting, which we call single-life reinforcement learning (SLRL), where an agent must complete a task within a single episode without interventions, utilizing its prior experience while contending with some form of novelty. SLRL provides a natural setting to study the challenge of autonomously adapting to unfamiliar situations, and we find that algorithms designed for standard episodic reinforcement learning often struggle to recover from out-of-distribution states in this setting. Motivated by this observation, we propose an algorithm, Q-weighted adversarial learning (QWALE), which employs a distribution matching strategy that leverages the agent's prior experience as guidance in novel situations. Our experiments on several single-life continuous control problems indicate that methods based on our distribution matching formulation are 20-60% more successful because they can more quickly recover from novel states." Annie Chen 🔗 Mon 9:20 a.m. - 11:20 a.m. Mentorship session & Lunch (Break) Participating Mentors: Adam Roberts, Stephanie Hyland, Bianca Zadrozny, Mercy Asiedu, Franziska Boenisch, Eleni Triantafillou, Isabela Albuquerque, Yisong Yue, Amy Zhang, Zelda Mariet, Tristan Naumann, Danielle Belgrave, Shakir Mohamed, Tong Sun, Gintare Karolina Dziugaite, Samy Bengio, Rianne van den Berg, Maja Rudolph, Luisa Cutillo, Ioana Bica, Clara Hu, Rosanne Liu, Noga Zaslavsky, Jennifer Wei, Alice Oh, Erin Grant, Sasha Luccioni, Michela Paganini, Mounia Lalmas-Roelke, Claire Vernade, Alekh Agarwal, Neema Mduma, Vinod Prabhakaran, Savannah Thais, Sarah Brown, Sima Behpour, Jonathan Frankle, Ce Zhang, Rose Yu, Jessica Schrouff, Bo Li, and Katherine Heller. 🔗 Mon 11:20 a.m. - 11:55 a.m. Invited talk (Dr Bianca Zadrozny) - "Machine Learning for Climate Risk" (Invited talk) Bianca Zadrozny 🔗 Mon 11:55 a.m. - 12:05 p.m. Contributed talk (Elizabeth Bondi-Kelly) - "Human-AI Interaction in Selective Prediction Systems" (Contributed talk)    Recent work has shown the potential benefit of selective prediction systems that can learn to defer to a human when the predictions of the AI are unreliable, particularly to improve the reliability of AI systems in high-stakes applications like healthcare or conservation. However, most prior work assumes that human behavior remains unchanged when they solve a prediction task as part of a human-AI team as opposed to by themselves. We show that this is not the case by performing experiments to quantify human-AI interaction in the context of selective prediction. In particular, we study the impact of communicating different types of information to humans about the AI system's decision to defer. Using real-world conservation data and a selective prediction system that improves expected accuracy over that of the human or AI system working individually, we show that this messaging has a significant impact on the accuracy of human judgements. Our results study two components of the messaging strategy: 1) Whether humans are informed about the prediction of the AI system and 2) Whether they are informed about the decision of the selective prediction system to defer. By manipulating these messaging components, we show that it is possible to significantly boost human performance by informing the human of the decision to defer, but not revealing the prediction of the AI. We therefore show that it is vital to consider how the decision to defer is communicated to a human when designing selective prediction systems, and that the composite accuracy of a human-AI team must be carefully evaluated using a human-in-the-loop framework. Elizabeth Bondi-Kelly 🔗 Mon 12:05 p.m. - 12:15 p.m. Contributed talk (Gowthami Somepalli) - "Investigating Reproducibility from the Decision Boundary Perspective." (Contributed talk)    The superiority of neural networks over classical linear classifiers stems from their ability to slice image space into complex class regions. While neural network training is certainly not well understood, existing theories of neural network training primarily focus on understanding the geometry of loss landscapes. Meanwhile, considerably less is known about the geometry of class boundaries. The geometry of these regions depends strongly on the inductive bias of neural network models, which we do not currently have the tools to analyze rigorously. In this study, we use empirical tools to study the geometry of class regions and try to answer the question - Do neural networks produce decision boundaries that are consistent across random initializations? Do different neural architectures have measurable differences in inductive bias? Gowthami Somepalli 🔗 Mon 12:15 p.m. - 12:35 p.m. Coffee break (Break) 🔗 Mon 12:35 p.m. - 1:10 p.m. Invited talk (Dr Hima Lakkaraju) - "A Brief History of Explainable AI: From Simple Rules to Large Pretrained Models" (Invited talk)    As predictive models are increasingly being employed to make consequential decisions in various real-world applications, it becomes important to ensure that relevant stakeholders and decision makers correctly understand the functionality of these models so that they can diagnose errors and potential biases in them, and decide when and how to employ these models. To this end, recent research in AI/ML has focused on developing techniques which aim to explain complex models to relevant stakeholders. In this talk, I will give a brief overview of the field of explainable AI while highlighting our research in this area. More specifically, I will discuss our work on: (a) developing inherently interpretable models and post hoc explanation methods, (b) identifying the vulnerabilities and shortcomings of these methods, and addressing them, (c) evaluating the reliability (correctness, robustness, fairness) and human understanding of the explanations output by these methods, and (d) theoretical results on unifying these methods. I will conclude this talk by shedding light on some exciting future research directions – e.g., rethinking model explainability as a (natural language) dialogue between humans and AI, and redesigning explainable AI tools to cater to large pretrained models. Himabindu Lakkaraju 🔗 Mon 1:10 p.m. - 2:10 p.m. Panel Discussion (Discussion Panel) 🔗 Mon 2:10 p.m. - 2:20 p.m. Closing remarks 🔗 Mon 2:30 p.m. - 4:00 p.m. Affinity Joint Poster Session (Poster Session) 🔗 - Virtual Affinity Poster Session (Topia Poster Session) The Virtual Affinity Poster Session will be held on Monday 5 Dec (or Tuesday 6 Dec for far eastern timezones, check the link for your time). 🔗 - When are Local Queries Useful for Robust Learning? (Poster) Distributional assumptions have been shown to be necessary for the robust learnability of concept classes when considering exact-in-the-ball robust risk and access to random examples by Gourdeau et al. (2019). In this paper, we study learning models where the learner is given more power through the use of \emph{local} queries and give the first \emph{distribution-free} algorithms that perform robust empirical risk minimization (ERM) for this notion of robustness. The first learning model we consider uses local membership queries (LMQ), where the learner can query the label of points near the training sample. We show that, under the uniform distribution, LMQs do not increase the robustness threshold of conjunctions and any superclass, e.g., decision lists and halfspaces. Faced with this negative result, we introduce the local \emph{equivalence} query oracle, which returns whether the hypothesis and target concept agree in the perturbation region around a point in the training sample, as well as a counterexample if it exists. We show a separation result: on one hand, if the query radius $\lambda$ is strictly smaller than the adversary's perturbation budget $\rho$, then distribution-free robust learning is impossible for a wide variety of concept classes; on the other hand, the setting $\lambda=\rho$ allows us to develop robust ERM algorithms. We then bound the query complexity of these algorithms based on online learning guarantees and further improve these bounds for the special case of conjunctions. We finish by giving robust learning algorithms for halfspaces with margins on both $\boolhc$ and $\mathbb{R}^n$. Pascale Gourdeau · Varun Kanade · Marta Kwiatkowska · James Worrell 🔗 - Natural language processing for automated information extraction of cancer parameters from free-text pathology reports (Poster) A cancer pathology report is a valuable medical document that provides information for prognosis, personalised treatment plan and patient management. Developing countries still use the unstructured (free text) reporting format. However, this reporting format has been associated with several limitations arising from variations in the quality of reporting. Manual information extraction and report classification can be intrinsically complex and resource intensive, given a free-text format. Extracting information from these reports into a structured layout is also essential for research, auditing, and cancer incidence reporting. This study aimed to develop and evaluate strategies for extracting relevant information to classify cancer pathology reports and to develop a rule-based function to automatically extract cancer prognostic parameters from these reports and transform them into structured data to uncover the trend of the parameters over the years.We retrieved colorectal and prostate cancer diagnostic cases from the National Health Laboratory Services. TM and ML algorithms were used for data preprocessing, visualisations, feature selections, text classification and performance evaluation. Secondly, we developed a rule-based NLP algorithm that retrieved and extracted important prognostic parameters from the reports to explore their trends.Results showed inconsistencies and incompleteness in reporting each year and throughout the study period. The findings also indicate that the developed rule-based function achieved high accurate annotation for all the parameters extracted, with performance measures ranging from 83% -100%. The trend analysis result showed significant trends comparable to previous studies.In conclusion, we developed reproducible frameworks using NLP and ML algorithms that can form the basis for future studies in South Africa. Our study bridged the gap between data availability and actionable knowledge. Okechinyere Achilonu 🔗 - Discriminative Candidate Selection for Image Inpainting (Poster) Within the field of Cultural Heritage, image in- painting is a conservation process that fills in miss- ing or damaged parts of an artwork to present a complete image. Multi-modal diffusion models have brought photo-realistic results on image in- painting where content can be generated by using descriptive text prompts. However, these models fail to produce content consistent with a particular painter’s artistic style and period, being unsuitable for the reconstruction of fine arts and requiring laborious expert judgement. Moreover, genera- tive models produce many plausible outputs for a given prompt. This work presents a methodology to improve the inpainting of fine art by automating the selection process of inpainted candidates. We propose a discriminator model that processes the output of inpainting models and assigns a proba- bility that indicates the likelihood that the restored image belongs to a certain painter. Lucia Cipolina Kun · Simone Caenazzo · Sergio Manuel Papadakis 🔗 - Determination of Neural Network Parameters for Path Loss Prediction in Very High Frequency Wireless Channel (Poster) It is very important to understand the input features and the neural network parameters required for optimal path loss prediction in wireless communication channels. In this paper, an extensive investigation was conducted to determine the most appropriate neural network parameters for path loss prediction in Very High Frequency (VHF) band. Field measurements were conducted in an urban propagation environment to obtain relevant geographical and network information about the receiving mobile equipment and quantify the path losses of radio signals transmitted at 189.25 MHz and 479.25 MHz. Different neural network architectures were trained with varying kinds of input parameters, number of hidden neurons, activation functions, and learning algorithms to accurately predict corresponding path loss values. At the end of the experiments, the performance of the developed Artificial Neural Network (ANN) models are evaluated using the following statistical metrics: Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Standard Deviation (SD) and Regression coefficient (R). Results obtained show that the ANN model that yielded the best performance employed four input variables (latitude, longitude, elevation, and distance), nine hidden neurons, hyperbolic tangent sigmoid (tansig) activation function, and the Levenberg-Marquardt (LM) learning algorithm with MAE, MSE, RMSE, SD and R values of 0.58 dB, 0.66 dB, 0.81 dB, 0.56 dB and 0.99 respectively. Finally, a comparative analysis of the developed model with Hata, COST 231, ECC-33 and Egli models showed that ANN-based path loss model has better prediction accuracy and generalization ability than the empirical models. Abigail Jefia · Segun Popoola · Aderemi A. Atayero 🔗 - DEVELOPMENT OF PREDICTIVE MODEL FOR SURVIVAL OF PAEDIATRIC HIV/AIDS PATIENTS IN SOUTH WESTERN NIGERIA USING DATA MINING TECHNIQUES (Poster) Introduction Disease epidemics are common in developing nations especially in Sub Saharan Africa where Human Immunodeficiency Virus /Acquired Immunodeficiency Syndrome (HIV/AIDS) is the most prevalent of all. HIV/AIDS has a devastating impact on its carriers most especially children (1>age≥15). To promote “wellbeing for all at all ages” which is one of the seventeen sustainable development goal (SDGs) adopted by the general assembly of the United Nations, there is need to pay grave attention to the survival of HIV/AIDS pediatric patients as child’s death is emotionally and physically challenging for the mourning parents. This paper identifies survival variables for HIV/AIDS Pediatric patients who are receiving antiretroviral drugs in Southwestern Nigeria, predictive models were developed and compare in order to select the more suitable one. Methodology Pediatric HIV/AIDS patients’ data (216) were collected from two health institutions, preprocessed and the 10-fold cross validation technique was used to partition the datasets into training and testing data. Predictive models were developed using supervised learning techniques (Naïve Bayes’ and Multi-Layer Perception (MLP)) and the Waikato Environment for Knowledge Analysis (WEKA) was used to simulate the models. CD4 count, Viral Load, Opportunistic infections and Nutritional status were used as the independent variables for the prediction Result Accuracy MAER RMSE RAE ROC Area LOB MLP 99.7% 0.022 0.0962 4.48% 0.992 0.008 Naïve Bayes’ 81.02% 0.2025 0.2920 40.92% 0.993 0.007 Keywords: Naïve Bayes’, MLP, Mean absolute error (MAER), Root mean square error (RME), Root absolute error (RAE), Recall Area (ROC) References (1) Giaquinto, C., Rage, E., Giarcoment, V., Rampson, O., and Elia, D.R (1998). Mother to Child Transmission Current Knowledge and on Going Studies. International Journal of Gynaecology Obstetician, 68: 161-165 Olutola Agbelusi 🔗 - Modelling non-reinforced preferences using selective attention (Poster) How can artificial agents learn non-reinforced preferences to continuously adapt their behaviour to a changing environment? We decompose this question into two challenges: (I) encoding diverse memories and (ii) selectively attending to these for preference formation. Our proposed non-reinforced preference learning mechanism using selective attention, Nore, addresses both by leveraging the agent’s world model to collect a diverse set of experiences which are interleaved with imagined roll-outs to encode memories. These memories are selectively attended to, using attention and gating blocks, to update agent’s preferences. We validate Nore in a modified OpenAI Gym FrozenLake environment (without any external signal) with and without volatility under a fixed model of the environment—and compare its behaviour to Pepper, a Hebbian preference learning mechanism. We demonstrate that Nore provides a straightforward framework to induce exploratory preferences in the absence of external signal. Noor Sajid · Panagiotis Tigas · Zafeirios Fountas · Qinghai Guo · Alexey Zakharov · Lancelot Da Costa 🔗 - Data Analysis and Machine Learning for Speech Music Playlist Generation (Poster) Data Analysis and Machine Learning for Speech Music Playlist GenerationDigital media content is abundantly available due to technological developments. Users’ attention is very precious for systems providing such content. Therefore, they try to recommend content that is relevant or interesting to the user. Music streaming services have similar opportunities and challenges. In music streaming services, the so-called playlist generation is responsible for selecting and for sequentially arranging pieces of music.Background to the current work is larger-scale research aiming at mixed speech-music playlist generation. As part of this, a large radio broadcast dataset was collected: both audio files and features extracted from those.The aim of this paper is—besides analyzing scientific literature on playlist generation— to analyze this dataset using different technologies and tools. In the first step, a star schema was created and populated from the raw data. This allows efficient, interactive analysis with business intelligence tools, such as Tableau. In the next step, using database and business intelligence tools, patterns have been looked for. Significant features and patterns have been found, e.g., for channels of different types (pop music, classical music, speech channel). Also, daily and weekly temporal patterns, e.g., for speech ratio and silence ratio, have been found for the major channel types. Emotionally loaded words, according to the WordNet Affect library, have also been analyzed. The analysis showed how different emotions mix with each other and which channel types provide content for different emotions. The major patterns are summarized, and conclusions are drawn for a customized, automated speech-music playlist.The Final goal is creating recommendation for mixed speech music playlist generation, the current step is Data Analysis of radio recordings. Maikey Khorani 🔗 - Meta Optimal Transport (Poster) We study the use of amortized optimization to predict optimal transport (OT) maps from the input measures, which we call Meta OT. It is useful when repeatedly solving similar OT problems between different measures because it leverages the knowledge and information present from past problems to rapidly predict and solve new problems. Otherwise, standard methods ignore the knowledge of the past solutions and suboptimally re-solve each problem from scratch. We demonstrate that Meta OT models surpass the standard convergence rates of log-Sinkhornsolvers in the discrete setting and convex potentials in the continuous setting. We evaluate on transport settings between images and spherical data, and show significant improvement in the computational time of standard OT solvers. Brandon Amos · Samuel Cohen · Giulia Luise · Ievgen Redko 🔗 - Single-modality and joint fusion deep learning for diabetic retinopathy diagnosis (Poster) The current study evaluated and compared single-modality and joint fusion deep learning approaches for automatic binary classification of diabetic retinopathy (DR) using seven convolutional neural network models (VGG19, ResNet50V2, DenseNet121, InceptionV3, InceptionResNetV2, Xception, and MobileNetV2) over two datasets: APTOS 2019 blindness detection and Messidor-2. The empirical evaluations used (1) six performance metrics (accuracy, sensitivity, specificity, precision, F1-score, and area under the curve), (2) the Scott-Knott Effect Size difference (SK ESD) statistical test to rank and cluster the models based on accuracy, and (3) the Borda count voting method to rank the best models figuring in the first SK ESD cluster, based on sensitivity, specificity, precision, F1-score, and area under the curve. Results showed that the single-modality DenseNet121 and InceptionV3 were the top-performing and less sensitive approaches, with an accuracy of 90.63% and 75.25%, respectively. The joint fusion strategy outperformed single-modality techniques across the two datasets, regardless of the modality used, because of the additional information provided by the preprocessed modality to the Fundus. The Fundus modality was the most favorable modality for DR diagnosis using the seven models. Furthermore, the joint fusion VGG19 model performed best with an accuracy of 97.49% and 91.20% over APTOS19 and Messidor-2, respectively; as the VGG19 model was fine-tuned in comparison to the remaining six models. In comparison with state-of-the-art models, Attention Fusion, and Cascaded Framework, joint fusion VGG19 ranks below the Attention Fusion network and outperforms the Cascaded Framework on the Messidor dataset by 5.6% and 8%, respectively. Sara El-Ateif · Ali Idri 🔗 - Multi-Armed Bandit Problem with Temporally-Partitioned Rewards (Poster) There is a rising interest in industrial online applications where data becomes available sequentially. Inspired by the recommendation of playlists to users where their preferences can be collected during the listening of the entire playlist, we study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB), in which the stochastic reward associated with the pull of an arm is partitioned over a finite number of consecutive rounds following the pull. This setting, unexplored so far to the best of our knowledge, is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull instead of being fully disclosed in a single, potentially delayed round. We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW, which exploit the partial information disclosed by the reward collected over time. We show that our algorithms provide better asymptotical regret upper bounds than delayed-feedback bandit algorithms when a property characterizing a broad set of reward structures of practical interest, namely α-smoothness, holds. We also empirically evaluate their performance across a wide range of settings, both synthetically generated and from a real-world media recommendation problem. Giulia Romano · Andrea Agostini · Francesco Trovò · Nicola Gatti · Marcello Restelli 🔗 - The use of Region-based Convolutional Neural Network Model for Analysing Unmanned Aerial Vehicle Remote Sensing (Poster) Object detection is a fundamental task in the geospatial research. Researchers have used approaches such as computer vision and image processing, which are time consuming and cumbersome. This approach has inadvertently hindered the processing of geospatial imagery products both in real time and offline. However, Artificial Intelligence offers great processing power to overcome these limitation using improved classification efficiency. This research work employs the use of Faster Region-based Convolutional Neural Network(Faster R-CNN) to detect Unmanned Aerial Vehicle(UAV) images. The methodology entails capturing UAV data, training and validation dataset were annotated and prepared on various object classes using PASCAL VOC standard, training was performed using the Faster R-CNN model, the training model was validated with qualitative and quantitative approaches. Results from the experiment is a mean average precision of 0.87 over all sampled test images when classifying and localizing objects. To this end, it was concluded that deep learning can be used by geospatial analyst to solve visual recognition problems. Esther Oduntan 🔗 - Kernel Density Bayesian Inverse Reinforcement Learning (Poster) Inverse reinforcement learning (IRL) is a powerful framework for learning the reward function of an RL agent by observing its behavior. The earliest IRL algorithms were used to infer point estimates of the reward function, but these can be misleading when several reward functions can accurately describe an agent's behavior. In contrast, A Bayesian approach to IRL models a distribution over possible reward functions that explain the set of observations, alleviating the shortcomings of learning a single point estimate. However, most Bayesian IRL algorithms estimate the likelihood using a Q-value function that best approximates the long-term expected reward for a given state-action pair. This can be computationally demanding because it requires solving a Markov Decision Process (MDP) in every iteration of Markov chain Monte Carlo (MCMC) sampling. In response, we introduce kernel density Bayesian inverse reinforcement learning (KD-BIRL), a method that (1) uses kernel density estimation for the likelihood, leading to theoretical guarantees on the resulting posterior distribution, and (2) disassociates the number of times Q-learning is required with the number of iterations of MCMC sampling. Aishwarya Mandyam · Didong Li · Diana Cai · Andrew Jones · Barbara Engelhardt 🔗 - Multimodal Checklists for Fair Clinical Decision Support (Poster) Machine learning algorithms trained on biased data can often replicate or exacerbate existing data biases. Current clinical risk scores systems embed race into the basic data used to individualize risk assessments. Algorithms that adopt these clinical support scores may similarly propagate the embedded biases. In this work, we focus on improving the fairness of clinical decision support checklist models in a multimodal setting. Our previous work has established that medical checklists can be learned directly from health data with fairness constraints, i.e., the false positive rate for any subgroup - like Black women - should not be over 20% of that in any other group. This initial work focused purely on tabular data. However, medical data is inherently multimodal. The fusion of multiple data sources such as vitals, labs, and clinical notes can be essential for training intervention prediction models. Other work has demonstrated that multimodal learning can be difficult for deep neural networks that greedily over-optimize to a single input stream. In comparison with high-capacity models, we hope to investigate the behavior of multimodal fusion in the relatively simpler checklist models. [See full abstract in pdf] Qixuan Jin · Marzyeh Ghassemi 🔗 - Investigating Reproducibility from the Decision Boundary Perspective. (Poster) The superiority of neural networks over classical linear classifiers stems from their ability to slice image space into complex class regions. While neural network training is certainly not well understood, existing theories of neural network training primarily focus on understanding the geometry of loss landscapes. Meanwhile, considerably less is known about the geometry of class boundaries. The geometry of these regions depends strongly on the inductive bias of neural network models, which we do not currently have the tools to analyze rigorously. In this study, we use empirical tools to study the geometry of class regions and try to answer the question - Do neural networks produce decision boundaries that are consistent across random initializations? Do different neural architectures have measurable differences in inductive bias? Gowthami Somepalli · Arpit Bansal · Liam Fowl · Ping-yeh Chiang · Yehuda Dar · Richard Baraniuk · Micah Goldblum · Tom Goldstein 🔗 - DEVELOPMENT OF A MODIFIED LIKELIHOOD RATIO MODEL FOR HANDWRITING IDENTIFICATION IN FORENSIC SCIENCE (Poster) Likelihood Ratio (LR) is a means of quantifying the strength of evidence in a forensic investigation. Existing methods for estimating LR in handwriting identification employed nuisance parameters resulting into high rate of inconclusiveness and disagreement among forensic investigators. Currently, LR procedures rely on the choice of appropriate denominators that limit the repeatability and reproducibility of the estimated LR. Therefore, this study proposed developing a modified LR devoid of nuisance parameter and capable of generating consistent estimate. A total of 230 document writers were purposively selected to produce 10 paged true and disguised documents over a period of six months. Similar procedure was carried out to produce forged document for the corresponding true counterparts. Otsu’s method was used to preprocess the data, while Sobel edge detection was used to segment handwritings. The C-means was used to cluster handwriting into characters based on segmented words. Local binary pattern was used to extract features from the clustered characters and extracted features were fed into a Back Propagation Neural Network (BPNN) to learn the handwriting pattern. Exhaustive mapping algorithm with bias function was developed to replace the hitherto randomly selected denominator for the LR estimation. The derived handwriting pattern followed a normal distribution. The improved model had 0.0% inconclusive rate for KDE and LoR as against 22.2% inconclusiveness which is the minimum as reported in literature. The modified likelihood ratio produced consistent forensic estimates in terms of reproducibility and repeatability devoid of nuisance parameters. This modified likelihood ratio will give reliable estimates for forensics investigations. Adeyinka Abiodun · Sesan Adeyemo · Adegboyega Adebayo 🔗 - Self-Supervised Graph Representation Learning for chip design-partitioning on multi-FPGA platforms (Poster) Integrated circuits (ICs) are used in virtually all electronic equipment’s and have become inseparable parts of modern societies. Due to the continuously shrinking time-to-market, there is a constant need to optimize the various design, validation and manufacturing processes involved in the development of these ICs. Pre-silicon emulation is one such complex process that involves partitioning the design and mapping it on multi-FPGA platforms so that testing and software-development on these chips can be accelerated.We present a novel design-partitioning algorithm that modifies the GAP graph-partitioning architecture (Azade Nazi et. al.) to handle multiple hard-constraints. We then use a constrained greedy algorithm to map the partitions obtained onto the multi-FPGA platform. Our experiments on 3 chip-designs showed either comparable or improved results as compared to the current manual and heuristic-based process. We observe tremendous improvement in the time required to map the design onto the hardware platform which was on an average of 2 minutes as compared to 3-4 weeks required using the manual process. Divyasree T · Chiranjeevi Kunapareddy · Vikas Akalwadi · Rahul Govindan · Balaji G 🔗 - Hierarchically Clustered PCA and CCA via a Convex Clustering Penalty (Poster) We introduce an unsupervised learning approach that combines the truncated singular value decomposition with convex clustering to estimate within-cluster directions of maximum variance/covariance (in the variables) while simultaneously hierarchically clustering (on observations). In contrast to previous work on joint clustering and embedding, our approach has a straightforward formulation, is readily scalable via distributed optimization, and admits a direct interpretation as hierarchically clustered principal component analysis (PCA) or hierarchically clustered canonical correlation analysis (CCA). Through numerical experiments and real-world examples relevant to precision medicine, we show that our approach outperforms traditional and contemporary clustering methods on underdetermined problems ($p \gg N$ with tens of observations) and scales to large datasets (e.g., $N=100,000$; $p=1,000$) while yielding interpretable dendrograms of hierarchical per-cluster principal components or canonical variates. Amanda Buch · Conor Liston · Logan Grosenick 🔗 - Deep Metric Learning to predict cardiac pressure with ECG (Poster) An objective assessment of intrathoracic pressures remains an important objective for patients with heart failure. However, the gold standard for estimating central hemodynamic pressures is an invasive procedure where a pressure transducer is inserted into a great vessel and threaded into the right heart chambers. Approaches that leverage non-invasive signals – such as the electrocardiogram (ECG) – have the promise to make the routine estimation of cardiac pressures feasible in both the inpatient and outpatient settings. In this study, we leverage Deep Metric Learning (DML) to estimate intracardiac pressures from the 12-lead ECG. DML objectives learn embedding that preserves the inherent distance between ECGs where the similar/positive samples lie in the closest representation space. We use dynamic time warping distance between two ECGs to define the positive samples. Our preliminary results show that deep metric learning improves downstream cardiac pressure inference with a limited number of labeled ECGs. Hyewon Jeong · Marzyeh Ghassemi · Collin Stultz 🔗 - Spatial clustering with random partitions on ovarian cancer data (Poster) Spatial transcriptomics (ST) data and single-cell (SC) data provide valuable information for the study of cancer tissues. Our specific goal here is to develop a statistical model and inference approach using both ST and SC data to help the understanding of the tumor micro-environment. For example, whether the boundary between tumor and stromal cells is identifiable? What is the interaction between immune and tumor cells? Some recently proposed method use graph convolutional networks to do spatial clustering, like SpaGCN. Building on SpaGCN as a reference solution, we propose a statistical inference pipeline based on random partition models, to implement uncertainty quantification and joint inference on cell-types and immune profiles. Yunshan Duan · Peter Mueller · Wenyi Wang · Shuai Guo 🔗 - Object Segmentation of Cluttered Airborne LiDAR Point Clouds (Poster) Airborne topographic LiDAR is an active remote sensing technology that emits near-infrared light to map objects on the Earth's surface. Derived products of LiDAR are suitable to service a wide range of applications because of their rich three-dimensional spatial information and their capacity to obtain multiple returns. However, processing point cloud data still requires a large effort in manual editing. Certain human-made objects are difficult to detect because of their variety of shapes, irregularly-distributed point clouds, and a low number of class samples. In this work, we propose an end-to-end deep learning framework to automatize the detection and segmentation of objects defined by an arbitrary number of LiDAR points surrounded by clutter. Our method is based on a light version of PointNet that achieves good performance on both object recognition and segmentation tasks. The results are tested against manually delineated power transmission towers and show promising accuracy. Mariona Carós · Ariadna Just · Santi Seguí · Jordi Vitria 🔗 - Identifying Disparities in Sepsis Treatment using Inverse Reinforcement Learning (Poster) Sepsis is a severe reaction by the human body to infection and is associated with significant morbidity and mortality. Advances in the scale and granularity of electronic health record data offer the opportunity to apply reinforcement learning to understand clinician diagnostic and treatment policies for this complex condition, which can be used to understand the factors that drive disparities in sepsis care. The fundamental problem in using RL to model sepsis is that the reward function is unknown and involves tradeoffs between competing outcomes. In this work, we develop an inverse reinforcement learning (IRL) model to learn a reward function for patients being treated for sepsis, then leverage offline RL to map state-action pairs from retrospective data, thereby learning the expert policy. We will apply this approach to two large and independent datasets: part of MIMIC-IV data with sepsis patients admitted to ICU and the clinical data warehouse of the Mass General Brigham healthcare system which has detailed data from arrival in the emergency room until hospital discharge across 12 hospitals in the New England area from 2015 through the present. With learned policy, we will identify whether policies differ by gender and race/ethnicity subgroups, and finally, we will attempt to identify changes in recorded physician policies before and after the introduction of the national treatment guidelines. We hope this approach could help us understand the differential treatment policy across the subgroups of sepsis patients. Hyewon Jeong · Taylor Killian · Sanjat Kanjilal · Siddharth Nayak · Marzyeh Ghassemi 🔗 - Multi Mix Mask – RCNN (M3RCNN) for Instance Intervertebral Disc Segmentation (Poster) Medical imaging and computer technologies have revolutionized healthcare by improving diagnostic accuracy and increasing patient safety and comfort by making massive amounts of medical images available. Deep learning methods perform well in computer vision when labeled training data is abundant. In the practice of medical imaging, the labeling or otherwise segmentation of images is performed manually. However, manual medical image segmentation has two significant drawbacks: long delineation times and questionable reproducibility. To address this issue, we developed an automated intervertebral disc instance segmentation approach that can use T1 and T2 images during this study to address data limitation issues and computational time issues and improve the algorithm's generalization. We proposed a Multi Mix Mask-RCNN (M3RCNN) for deep learning segmentation networks based on Mask-RCNN. Our method used a mixed optimization and training data system, employing Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (Adam) with T1 and T2. Compared to segmentation methods that were commonly used in the past, the proposed method significantly improved both processing time and segmentation results. Malinda Vania · Sunghoon Lim 🔗 - The Lean Data Scientist: Recent Advances towards Overcoming the Data Bottleneck (Poster) Machine learning (ML) is revolutionizing the world, affecting almost every field of science and industry.A major bottleneck in many ML applications is obtaining data, whereas the rise of deep learning has further exacerbated this issue. Contemporary state of the art models involve millions (or even billions) of parameters and require massive amounts of data to train. Thus, the dominant learning paradigm today is based on creating a new (large) dataset whenever facing a novel task. While this approach resulted in significant advances, it suffers from a major caveat, as collecting large, high-quality datasets is often very demanding in terms of time and human resources (whereas in some cases it is impossible, e.g., rare disease detection). Moreover, while there has been much effort suggestingworkarounds to this data-bottleneck problem, they are scattered across many different sub-fields, often unaware of one another.We aim to bring order to this area by presenting a simple yet comprehensive taxonomy of ways to tackle the data bottleneck (see Figure below). We survey major research directions and organize them into a taxonomy in a practitioner centric manner. Our emphasis is not on covering methods in depth; rather, we discuss the main ideas behind various methods, the assumptions they make and their underlying concepts. For each topic, we mention several important or interesting works, and refer to surveys where possible. We wish to first raise awareness of the methods that already exist, to encourage more efficient use of data. In addition, we hope the taxonomy would also reveal gaps in current techniques and suggest novel research directions that could inspire the new, less data-hungry learning methods. Chen Shani · Jonathan Zarecki · Dafna Shahaf 🔗 - Towards an automatic classification for software requirements written in Spanish (Poster) Machine Learning (ML) algorithms have become a powerful instrument in requirements classification. Several studies have implemented these techniques (Pérez-Verdejo et al., 2020), from traditional ML algorithms to Transformers, the state-of-art in Natural Language Processing (NLP). Nevertheless, several research focuses on English requirements, with less attention to other languages. Spanish is currently the second mother tongue in the world by the number of speakers (Instituto Cervantes, 2021), hence, it is important to expand the knowledge of performance classification for requirements written in Spanish. The present work aims to investigate which combinations of text vectorization techniques with ML algorithms perform best for requirements classification, using two Spanish datasets from different sources for training and testing the models. Maria Isabel Limaylla Lunarejo 🔗 - Exploiting Pretrained Biochemical Language Models for Targeted Drug Design (Poster) The development of novel compounds targeting proteins of interest is one of the most important tasks in the pharmaceutical industry. Deep generative models have been applied to targeted molecular design and have shown promising results. Recently, target-specific molecule generation has been viewed as a translation between the protein language and the chemical language. However, such a model is limited by the availability of interacting protein–ligand pairs. On the other hand, large amounts of unlabelled protein sequences and chemical compounds are available and have been used to train language models that learn useful representations. In this study, we propose exploiting pretrained biochemical language models to initialize (i.e. warm start) targeted molecule generation models. We investigate two warm start strategies: (i) a one-stage strategy where the initialized model is trained on targeted molecule generation and (ii) a two-stage strategy containing a pre-finetuning on molecular generation followed by target-specific training. We also compare two decoding strategies to generate compounds: beam search and sampling. The results show that the warm-started models perform better than a baseline model trained from scratch. The two proposed warm-start strategies achieve similar results to each other with respect to widely used metrics from benchmarks. However, docking evaluation of the generated compounds for a number of novel proteins suggests that the one-stage strategy generalizes better than the two-stage strategy. Additionally, we observe that beam search outperforms sampling in both docking evaluation and benchmark metrics for assessing compound quality. Gökçe Uludoğan · Arzucan Özgür · Elif Ozkirimli · Kutlu Ülgen · Nilgün Karalı 🔗 - Curriculum learning for improved femur fracture classification: scheduling data with prior knowledge and uncertainty (Poster) An adequate classification of proximal femur fractures from X-ray images is crucial for the treatment choice and the patients’ clinical outcome. We rely on the commonly used AO system, which describes a hierarchical knowledge tree classifying the images into types and subtypes according to the fracture’s location and complexity. We propose a method for the automatic classification of proximal femur fractures into 3 and 7 AO classes based on a Convolutional Neural Network (CNN). As it is known, CNNs need large and representative datasets with reliable labels, which are hard to collect for the application at hand. In this paper, we design a curriculum learning (CL) approach that improves over the basic CNNs performance under such conditions. Our novel formulation reunites three curriculum strategies: individually weighting training samples, reordering the training set, and sampling subsets of data. The core of these strategies is a scoring function ranking the training samples. We define two novel scoring functions: one from domain-specific prior knowledge and an original self-paced uncertainty score. We perform experiments on a clinical dataset of proximal femur radiographs. The curriculum improves proximal femur fracture classification up to the performance of experienced trauma surgeons. The best curriculum method reorders the training set based on prior knowledge resulting into a classification improvement of 15%. Using the publicly available MNIST dataset, we further discuss and demonstrate the benefits of our unified CL formulation for three controlled and challenging digit recognition scenarios: with limited amounts of data, under class-imbalance, and in the presence of label noise. Amelia Jiménez-Sánchez · Diana Mateus · Sonja Kirchhoff · Chlodwig Kirchhoff · Peter Biberthaler · Nassir Navab · Miguel A. González Ballester · Gemma Piella 🔗 - Shortcuts in Public Medical Image Datasets (Poster) Artificial Intelligence (AI) is a promising field for medical imaging algorithms. Medical institutions are starting to integrate AI systems for screening and computer-aided diagnosis. However, recent studies show that even with high performance on the existing data, algorithms can learn “shortcuts,” like visibility of medical tools, and fail to generalize. We identify as shortcuts the presence of chest drains and images containing text for pneumothorax and breast cancer classification, respectively. The model for pneumothorax classification achieved an Area Under the Curve (AUC) of 0.93, 0.89 and 0.96 for the baseline set, the set without drains, and the set with chest drains, respectively. The model for breast cancer classification achieved an AUC of 0.78. This performance dropped to 0.682 when the images that contain text where removed. The degradation in the performance showcases the risk of this models being clinically deployed. In future work, we plan to investigate automatic ways to identify and avoid learning such shortcuts. In particular, we will research the use of meta-data to improve the robustness of AI algorithms. Amelia Jiménez-Sánchez · Andreas Skovdal · Frederik Bechmann Faarup · Kasper Thorhauge Grønbek · Veronika Cheplygina 🔗 - Adaptively Identifying Patient Populations With Treatment Benefit in Clinical Trials (Poster) We study the problem of adaptively identifying patient subpopulations that benefit from a given treatment during a confirmatory clinical trial. This type of adaptive clinical trial, often referred to as adaptive enrichment design, has been thoroughly studied in biostatistics with a focus on a limited number of subgroups (typically two) which make up (sub)populations, and a small number of interim analysis points. In this paper, we aim to relax classical restrictions on such designs and investigate how to incorporate ideas from the recent machine learning literature on adaptive and online experimentation to make trials more flexible and efficient. We find that the unique characteristics of the subpopulation selection problem -- most importantly that (i) one is usually interested in finding subpopulations with any treatment benefit (and not necessarily the single subgroup with largest effect) given a limited budget and that (ii) effectiveness only has to be demonstrated across the subpopulation on average -- give rise to interesting challenges and new desiderata when designing algorithmic solutions. Building on these findings, we propose AdaGGI and AdaGCPI, two meta-algorithms for subpopulation construction, which focus on identifying good subgroups and good composite subpopulations, respectively. We empirically investigate their performance across a range of simulation scenarios and derive insights into their (dis)advantages across different settings.[Extended abstract in PDF] Alicia Curth · Alihan Hüyük · Mihaela van der Schaar 🔗 - Improving Robustness to Distribution Shift with Methods from Differential Privacy (Poster) As machine learning models become widely considered in safety critical settings, it is important to understand when models may fail after deployment. One cause of model failure is distribution shift, where the training and test data distributions differ. In this paper we investigate the benefits of training models using methods from differential privacy (DP) toward improving model robustness. We compare the performance of DP trained models to standard empirical risk minimization (ERM) across a variety of possible distribution shifts - specifically covariate and label shifts. We find that DP models consistently have a lower generalization gap across various types of shifts and shift severities, as well as a higher absolute test performance in label shift. Neha Hulkund 🔗 - Quantifying Gender Bias in Hindi Language Models (Poster) The gender bias present in the data on which language models are trained gets reflected in the systems that use these models. Therefore, it is important to address and mitigate the bias present in these models. While extensive research is being done in the English language for this, work in other languages especially the Indian languages is relatively nascent. English being a non-gendered language, the methodologies cannot be directly translated to other languages. Spoken by more than 600 million people, Hindi is the third most spoken language in the world. It is therefore essential to address bias in this language. In our paper, we measure gender bias associated with occupations in the Hindi language model. The major contributions are the creation of a corpus to evaluate gender bias in Hindi. Using this corpus, we evaluate the gender bias present in Hindi language models. Our results indicate a presence of bias in these systems. Neeraja Kirtane · V MANUSHREE · Aditya Kane 🔗 - De novo PROTAC design using graph-based deep generative models (Poster) PROteolysis TArgeting Chimeras (PROTACs) are an emerging therapeutic modality that degrade a protein of interest (POI) by marking it for degradation by the proteasome. They often take on a three-component structure consisting of two binding domains and a linker. While a promising modality, it can be challenging to predict whether a new PROTAC will lead to protein degradation as that is dependent on the cooperation of all subunits to form a successful ternary structure. As such, PROTACs remain a laborious and unpredictable modality to design because the functionalities of each component are highly interdependent. Recent developments in artificial intelligence suggest that deep generative models can assist with de novo design of molecules displaying desired properties, yet their application to PROTAC design remains largely unexplored. Additionally, while previous AI-based approaches have optimized the linker component given two active domains, generative models have not yet been applied to optimization of the other two – the warhead and E3 ligand. Here, we show that a graph-based deep generative model (DGM) can be used to propose novel PROTAC structures. The DGM follows the approach of GraphINVENT, a gated-graph neural network which iteratively samples an action space and formulates a sequence of steps to build a new molecular graph. We also demonstrate that this model can be guided towards the generation of PROTACs that are predicted to degrade a POI through policy gradient reinforcement learning. Rewards during RL are applied based on a boosted tree surrogate model that predicts a PROTAC's degradation potential for a POI, showing that a nonlinear scoring function can fine-tune a deep molecular generative model towards desired properties. Using this approach, we achieve a model where activity against IRAK3 (a pseudokinase implicated in oncologic signaling) is predicted for >80% of sampled PROTACs after RL, compared to 50% predicted activity before any fine-tuning. Divya Nori · Connor Coley · Rocío Mercado 🔗 - Self-Contained Entity Discovery from Captioned Videos (Poster) This paper introduces the task of visual named entity discovery in videos without the need for task-specific supervision or task-specific external knowledge sources. Assigning specific names to entities (e.g. faces, scenes, or objects) in video frames is a long-standing challenge. Commonly, this problem is addressed as a supervised learning objective by manually annotating faces with entity labels. To bypass the annotation burden of this setup, several works have investigated the problem by utilizing external knowledge sources such as movie databases. While effective, such approaches do not work when task-specific knowledge sources are not provided and can only be applied to movies and TV series. In this work, we take the problem a step further and propose to discover entities in videos from videos and corresponding captions or subtitles. We introduce a three-stage method where we (i) create bipartite entity-name graphs from frame-caption pairs, (ii) find visual entity agreements, and (iii) refine the entity assignment through entity-level prototype construction. To tackle this new problem, we outline two new benchmarks SC-Friends and SC-BBT based on the Friends and Big Bang Theory TV series. Experiments on the benchmarks demonstrate the ability of our approach to discover which named entity belongs to which face or scene, with an accuracy close to a supervised oracle, just from the multimodal information present in videos.Additionally, our qualitative examples show the potential challenges of self-contained discovery of any visual entity for future work. melika ayoughi · Paul Groth · Pascal Mettes 🔗 - Reduce False Negative in Distant supervised learning using Dependency tree-LSTM to Construct a Knowledge Graph (Poster) Knowledge Graphs(KG) are a fundamental part of a wide variety of NLP applications improving the accuracy and explainability of knowledge. Constructing Knowledge Graphs from unstructured text assists the entity detection task and extracting semantic relations. A relation called a triple is the smallest part of the knowledge graph. A triple includes the subject of the relation; and the object of the relation, relation. However, extracting semantic relations has many difficulties.Supervised RE requires huge amounts of labelled data, which is labour-intensive and time-consuming. Some studies offered Distant supervision (DS). This method generates KG triplets based on the co-occurrence of entities in a sentence. In other words, any sentence containing an entity pair expresses relation. However, these methods struggle to obtain high-quality relations, suffering from False Negatives and False Positives. In our paper, we used a new Encoder-decoder model and multilayer perceptron to detect FN in two popular DS datasets (NYT10, GIDS); the possible FN is unlabeled and a model using Tree Bi-LSTM was trained to allocate new labels to improve previous research results. To summarise, our core contributions are: Construct an Encoder based on entity importance in the Distantly RE dataset. A model to Detect False negatives. Develop an algorithm to predict relation using a combination of dependency tree and tree Bi-LSTM.The result is a significant contribution and in comparison to models has a 25% improvement. False negative Detector filter FN samples from N with logits larger than threshold θ. The model discovered 6,324 FN samples from NYT10, which refer to 4,153 entity pairs; and 324 FN samples from GIDS, which refer to 285 entity pairs. The average precision is 92.0,For further research, we try to reduce FP in distant supervised learning. Samira Korani 🔗 - Break the bottleneck of AI deployment at the edge. (Poster) Today, the three main AI challenges in the industry are the quality and quantity of data required to create a performant AI model, the speed at which models can run at the edge, and the cost of edge solutions. To address these challenges, OpenVINO™ has developed an ecosystem that covers data management, retraining, optimization, and deployment. The Dataset Management Framework (Datumaro) builds, analyzes, and manages datasets. OpenVINO training extensions (OTE) create a suitable environment for training new computer vision models with efficient architectures using custom datasets, preserving data distribution, and achieving the best possible results for deploying models at the edge. For different use cases in industry and healthcare where there is insufficient data to initiate the supervised AI development process, OpenVINO released a new unsupervised anomaly detection library called Anomalib. The library offers several ready-to-use state-of-the-art anomaly detection algorithms from the literature and additional tools that facilitate end-to-end training and deployment pipeline. OpenVINO also provides three optimization tools that optimize deep learning models for faster, less memory-intensive execution: i) Model Optimizer (MO) swiftly converts models from various frameworks to the OpenVINO format, thereby enhancing the model's performance on Intel® processors. ii) The Post-Training Optimization Tool (POT) enables users to further accelerate the inference speed of OpenVINO format models by applying post-training quantization. iii) Neural Network Compression Framework (NNCF) integrates with PyTorch or TensorFlow training pipeline to quantize and compress models during or after training, increasing overall processing speed by 3.6x compared to the original FP16 model (SSD-300). Overall, OpenVINO has an ecosystem to facilitate a complete workflow, from data collection to model deployment, achieving high accuracy and being optimized for Intel® processors and accelerators. Paula Ramos · Helena Kloosterman · Samet Akcay · Yu-Chun Liu · Raymond Lo 🔗 - Towards probabilistic end-to-end Deep Learning Weather Forecasting: Spatio-Temporal Temperature Forecasting using Normalizing Flows (Poster) Forecasting of climate variables has been a long-standing problem in the community. There exist many Numerical Weather Prediction (NWP) methods for computing weather forecasts. The main disadvantages of NWPs are their slow convergence time and high computational cost. Recently, a Deep Generative-based model (DGM) outperformed SOTA NWP predictions on precipitation nowcasting, not only in terms of quantifiable scores and convergence time, but also in a qualitative study conducted among meteorologists. In contrast to deterministic forecasting models, DGMs allow for uncertainty estimates from ensembles of future predictions. In this work, we use Conditional Autoregressive Normalizing Flows (CANFs) [1, 2] for forecasting temperature frames from the ERA5 dataset. We motivate the use of Normalizing Flows over GANs due to their advantages in training stability, invertibility and convergence time. Christina Winkler 🔗 - Bias Assessment of Text-to-Image Models (Poster) While Text-to-Image models such as Dall-E and Stable Diffusion are becoming increasingly adept at creating new content, they also open doors to new types of harms and biases. We describe some of the challenges raised by deploying these models, provide a case study for bias assessment, and discuss future avenues of exploration in the field. Sasha Alexandra Luccioni · Clémentine Fourrier · Nathan Lambert · Unso Eun Seo Jo · Irene Solaiman · Helen Ngo · Nazneen Rajani · Giada Pistilli · Yacine Jernite · Margaret Mitchell 🔗 - SO(3) Equivariant Framework for Spatial Networks (Poster) Representation learning on spatial networks is emerging as a distinct area in machine learning and is attracting much attention in diverse domains. Some applications include molecular graphs for drug discovery and screening, brain networks for neuroscience, social networks for recommender systems, etc. However, most proposed approaches either use only spatial or network data that cannot distinguish certain types of graphs and limit their network expressivity or are tied to specific input data domains and network architectures. To address this gap, we introduce an equivariant message-passing network architecture that simultaneously leverages the spatial and network properties. In addition, we propose to take advantage of the geometric representations and extend the classical scalar features with 3D vectors. To exploit the spatial and network features, we present a Spatial Vector Neuron, which can be easily incorporated into the existing graph neural network architectures and allow the model to scale up by stacking more layers for larger receptive fields. A comprehensive set of experiments on both synthetic and real-world datasets demonstrate the strength of our proposed method and the potential of geometric representation learning for Spatial Networks. Sarp Aykent · Tian Xia 🔗 - Protein Structure Ranking with Atom-level Geometric Representation Learning (Poster) The recent technical advance in geometric deep learning, especially the successful application of GNNs to model graph structures, makes the study of non-Euclidean data has seen sharply growing popularity over the last few years. Proteins, as the building blocks for all living organisms, play an important role in different domains. The unique 3D structure of a protein determines its function. Although several works achieved a promising breakthrough in the modeling of protein 3D structures, it is not guaranteed to be the same as the native structure of proteins. Thus the assessment of a protein structure's quality is crucial to estimating the given protein structure. This work takes advantage of the atom-level structural representations of the protein 3D graphs and presents a generic equivariant message passing based graph neural network. We demonstrate the robustness of the proposed framework through extensive experiments and prove the potential of geometric graph representation learning for future works. Tian Xia · Sarp Aykent 🔗 - Detecting Synthetic Opioids with NQR Spectroscopy and Complex-Valued Signal Denoising (Poster) Dangerous synthetic opioids (e.g., fentanyl) are currently synthesized abroad and shipped into the United States illegally via international mail. They are largely responsible for the overdose crisis in the United States, which has been declared a public health emergency. One factor contributing to the influx of these drugs is the low risk of detection when mailed in small quantities. The goal of our research is to slow the passage of synthetic opioids into the United States by developing a technology capable of detecting them in unopened packages using Nuclear Quadrupole Resonance (NQR) spectroscopy along with complex-valued and real-valued neural networks for signal denoising and classification to improve detection. Amber Day · Natalie Klein · Michael Malone · Harris Mason · Sinead Williamson 🔗 - Identifying fine climatic parameters for high maize yield using pattern mining: case study from Benin (West Africa) (Poster) Climate change is significantly affecting crop yields in sub-Saharan Africa. This impact is associated with the inability of farmers to control climatic conditions. Therefore, an accurate prediction of crop yields is necessary to help farmers make good decisions. This paper highlights links between climatic parameters and maize yield in Benin to ensure early yield prediction using pattern mining. The datasets used contain climate and maize yield data over the last 26 years in 5 districts with synoptic stations located in two agro-climatic zones (Sudanian and Sudano-Guinean) in Benin. To find association rules, climate variables were aggregated with yield using “year” and “districts” variables and through supervised machine learning models: Support vector machine, K Nearest Neighbour, Artificial Neural Network, Decision Tree, and Recurrent Neural Network. The decision tree technique provided good accuracy (R2 = 0.998, MSE = 0.021, MAE = 0.0008). However, the model obtained is not easily interpretable. We then used it to augment the dataset to apply an association rules algorithm: the frequent pattern growth algorithm. This allows us to build relationships easily interpretable by the general public. Results showed that most of the rules obtained in both agro-climatic zones are associations between the minimum and maximum temperature, humidity, sunstroke, rainfall, and evapotranspiration. Moreover, the highest maize yield is obtained by combining the medium values of these parameters. The best trends are observed in the Sudano- Guinean zone for medium values of minimum temperature, rainfall, evapotranspiration, maximum temperatures, and humidity. In the Sudanian zone, the high maize yield is observed for medium values of minimum temperature, maximum temperature, and maximum humidity. The identified association rules demonstrated a reliable and promising approach to optimize maize yield. Souand Peace Gloria TAHI · Vinasetan Ratheil Houndji · Castro Hounmenou · Romain Glèlè Kakaï 🔗 - GraphFramEx: Towards Systematic Evaluation of Explainability Methods for Graph Neural Networks (Poster) As one of the most popular machine learning models today, graph neural networks (GNNs) have attracted intense interest recently, and so does their explainability. Users are increasingly interested in a better understanding of GNN models and their outcomes. Unfortunately, today's evaluation frameworks for GNN explainability often rely on few inadequate synthetic datasets, leading to conclusions of limited scope due to a lack of complexity in the problem instances. As GNN models are deployed to more mission-critical applications, we are in dire need for a common evaluation protocol of explainability methods of GNNs. In this paper, we propose, to our best knowledge, the first systematic evaluation framework for GNN explainability, considering explainability on three different "user needs". We propose a unique metric that combines the fidelity measures and classify explanations based on their quality of being sufficient or necessary. We scope ourselves to node classification tasks and compare the most representative techniques in the field of input-level explainability for GNNs. For the inadequate but widely used synthetic benchmarks, surprisingly shallow techniques such as personalized PageRank have the best performance for a minimum computation time. But when the graph structure is more complex and nodes have meaningful features, gradient-based methods are the best according to our evaluation criteria. However, none dominates the others on all evaluation dimensions and there is always a trade-off. We further apply our evaluation protocol in a case study for frauds explanation on eBay transaction graphs to reflect the production environment. Kenza Amara · Rex Ying · Ce Zhang 🔗 - Using Visual Similarity to Navigate eCommerce Inventory (Poster) Please refer to the attached document for the abstract Shubhangi Tandon · Christopher Miller · Selcuk Kopru · Senthilkumar Gopal 🔗 - Mixture of Gaussian Processes with Probabilistic Circuits for Multi-Output Regression (Poster) Gaussian process (GP) is a non-parametric model capitalizing probabilistic inference and can be used for Bayesian optimization. It is a powerful function approximator. Compared to neural networks which is a ‘black box’ function approximator, GP can perform accurate probabilistic inference and quantify the uncertainty of prediction, so it is widely used in classification and regression tasks in the field of machine learning. However, GP usually has a large computational complexity, which limits its application to large-scale datasets. Besides, most of the current GP-based approaches only focus on single-output regression, i.e., the dependent variable is univariate, and do not easily extend to multi-output regression tasks.In order to tackle the above issues, we propose an expert-based approximation method to develop a learnable model that can be applied to large-scale datasets and perform multi-output regression tasks. We propose the multi-output mixture of Gaussian processes (MOMoGP), which employs a deeply structured mixture of single-output GPs encoded via a Probabilistic Circuit. This allows one to accurately capture correlations between multiple outputs without introducing a cubic cost in the number of output dimensions. By recursively partitioning the covariate space and the output space, posterior inference in our model reduces to inference on single-output GP experts, which only need to be conditioned on a small subset of the observations. Mingye Zhu · Zhongjie Yu · Martin Trapp · Arseny Skryagin · Kristian Kersting 🔗 - Synthetic Data Augmentation for Time Series Forecasting (Poster) In this study, we propose a method to automatically generate synthetic training data that can be applied to a wide range of Time Series Forecasting (TSF) tasks. TSF is to predict the future sequence from a given sequence. We generate multiple data patterns with different functions with the expectation of modeling general waveform characteristics such as periodicity and continuity that appear in time series data. Specifically, we automatically generate many periodic waveforms such as sine waves and square waves, and waveforms with irregular peaks by randomly changing parameters such as periods. We compared the performance of sequence prediction with and without the addition of the synthetic data. We used the dataset ETT (Electricity Transformer Temperature), one of the standard benchmarks for TSF tasks with constant sampling periods. We used the Mean Squared Error (MSE) as an evaluation metric for the prediction performance. We compared three cases with different training data: the entire training data, 50% (208 samples) of the entire training data, and the 50% of the entire training data with 208 samples of synthetic data. The MSE on the test data was 0.2779, 0.2784, and 0.2724, respectively, with the addition of the generated data having the smallest value and slightly higher prediction performance. Kasumi Ohno · Kohei Makino · Makoto Miwa · Yutaka Sasaki 🔗 - Task-conditioned modelling of drug-target interactions (Poster) HyperNetworks have established themselves as an effective technique to rapidly adapt parameters in neural networks. Recently, HyperNetworks conditioned on descriptors of tasks have improved multi-task generalization in various domains, such as personalized federated learning and neural architecture search. Compelling results were achieved in few- and zero-shot settings, attributed to the increased information sharing by the HyperNetwork. With the rise of new diseases fast discovery of drugs is needed which requires proteochemometric models that can generalize drug-target interaction predictions in low-data scenarios. State-of-the-art methods apply a few fully-connected layers to concatenated learned embeddings of the protein target and drug molecule. In this work, we develop a task-conditioned HyperNetwork approach for the proteochemometrics problem in drug discovery. We show, that predictive performance can be improved or competitive over previous methods when model parameters are predicted based on the protein embedding for the fully-connected layers processing the molecule embedding. Furthermore, we extend our approach to also learn all parameters of a graph neural network as the molecular encoder using a particular weight initialization scheme. Our experiments with this extended architecture contribute new insights to the machine learning field, as HyperNetworks have rarely been applied to learn graph neural networks. Emma Svensson · Pieter-Jan Hoedt · Sepp Hochreiter · Günter Klambauer 🔗 - Can deep learning models understand natural language descriptions of patient symptoms following cataract surgery? (Poster) Demands on the healthcare system are rising, and healthcare staff are a finite resource. This study set out to explore which machine learning techniques are best able to understand clinical conversations and enable automation of activity previously performed by clinical staff. We used patient descriptions of their symptoms recorded during a cataract surgery follow-up conversation with our autonomous clinical assistant, Dora, which asks patients symptom-based questions to elicit postoperative concerns. We compare the ability of different machine learning techniques to understand patients’ descriptions of their symptoms and show how state-of-the-art natural language classifiers have the ability to understand routine patient intents and benefit clinical medicine. The training and test datasets were collected from two non-overlapping patient populations who used Dora as part of ethically approved research studies across 3 diverse UK hospitals. The training dataset was augmented with members of the public using the Dora platform. All participants consented to their data being used and the data was fully anonymised. The datasets consist of transcribed utterances of patients describing their symptoms in response to questions like “is your eye red?”. Each utterance was labelled with an intent from a total of 24 different intents. Two ophthalmologists independently labelled the dataset and resolved conflicts to establish ground truth labels. We compared 5 different deep learning and traditional machine learning models; the optimal hyperparameters for each were determined using a grid search and 4-fold cross-validation on the training set. The models were then trained on the entire training dataset and their performance was evaluated on the test set. The best performing model was the Dual Intent and Entity Transformer (DIET) classifier using word embeddings from BERT. Mohita Chowdhury · Oliver Gardiner · Ernest Lim · Aisling Higham · Nick de Pennington 🔗 - Hyperbolic Image Segmentation (Poster) For image segmentation, the current standard is to perform pixel-level optimization and inference in Euclidean output embedding spaces through linear hyperplanes. In this work, we show that hyperbolic manifolds provide a valuable alternative for image segmentation and propose a tractable formulation of hierarchical pixel-level classification in hyperbolic space. Hyperbolic Image Segmentation opens up new possibilities and practical benefits for segmentation, such as uncertainty estimation and boundary information for free, zero-label generalization, and increased performance in low-dimensional output embeddings. Mina Ghadimi Atigh · Julian Schoep · Erman Acar · Nanne van Noord · Pascal Mettes 🔗 - Maximum Class Separation as Inductive Bias in One Matrix (Poster) Maximizing the separation between classes constitutes a well-known inductive bias in machine learning and a pillar of many traditional algorithms. By default, deep networks are not equipped with this inductive bias and therefore many alternative solutions have been proposed through differential optimization. Current approaches tend to optimize classification and separation jointly: aligning inputs with class vectors and separating class vectors angularly. This work proposes a simple alternative: encoding maximum separation as an inductive bias in the network by adding one fixed matrix multiplication before computing the softmax activations. The main observation behind our approach is that separation does not require optimization but can be solved in closed-form prior to training and plugged into a network. We outline a recursive approach to obtain the matrix consisting of maximally separable vectors for any number of classes, which can be added with negligible engineering effort and computational overhead. Despite its simple nature, this one matrix multiplication provides real impact. We show that our proposal directly boosts classification, long-tailed recognition, out-of-distribution detection, and open-set recognition, from CIFAR to ImageNet. Tejaswi Kasarla · Gertjan Burghouts · Max van Spengler · Elise van der Pol · Rita Cucchiara · Pascal Mettes 🔗 - CLOOME: Contrastive Learning for Molecule Representation with Microscopy Images and Chemical Structures (Poster) Contrastive learning for self-supervised representation learning has brought a strong improvement to many application areas, such as computer vision and natural language processing. With the availability of large collections of unlabeled data in vision and language, contrastive learning of language and image representations has shown impressive results. The contrastive learning methods CLIP and CLOOB have demonstrated that the learned representations are highly transferable to a large set of diverse tasks when trained on multi-modal data from two different domains. In drug discovery, similar large, multi-modal datasets comprising both cell-based microscopy images and chemical structures of molecules are available. However, contrastive learning has not yet been used for this type of multi-modal data in drug discovery, although transferable representations could be a remedy for the time-consuming and cost-expensive label acquisition in this domain. In this work, we present a contrastive learning method for image-based and structure-based representations of small molecules for drug discovery. Our method, Contrastive Leave One Out boost for Molecule Encoders (CLOOME), is based on CLOOB and comprises an encoder for microscopy data, an encoder for chemical structures and a contrastive learning objective. On the benchmark dataset "Cell Painting", we demonstrate the ability of our method to learn transferable representations by performing linear probing for activity prediction tasks. Additionally, we have shown that CLOOME allows retrieving the corresponding applied molecule given a query microscopy image, an unsolvable task for human experts. Ana Sánchez Fernández · Elisabeth Rumetshofer · Sepp Hochreiter · Günter Klambauer 🔗 - Boosting Multi-modal Contrastive Learning with Modern Hopfield Networks and InfoLOOB (Poster) CLIP yielded impressive results on zero-shot transfer learning tasks and is considered as a foundation model like BERT or GPT3. CLIP vision models that have a rich representation are pre-trained using the InfoNCE objective and natural language supervision before they are fine-tuned on particular tasks. Though CLIP excels at zero-shot transfer learning, it suffers from an explaining away problem, that is, it focuses on one or few features, while neglecting other relevant features. This problem is caused by insufficiently extracting the covariance structure in the original multi-modal data. We suggest to use modern Hopfield networks to tackle the problem of explaining away. Their retrieved embeddings have an enriched covariance structure derived from co-occurrences of features in the stored embeddings. However, modern Hopfield networks increase the saturation effect of the InfoNCE objective which hampers learning. We propose to use the InfoLOOB objective to mitigate this saturation effect. We introduce the novel “Contrastive Leave One Out Boost” (CLOOB), which uses modern Hopfield networks for covariance enrichment together with the InfoLOOB objective. In experiments we compare CLOOB to CLIP after pre-training on the Conceptual Captions and the YFCC dataset with respect to their zero-shot transfer learning performance on other datasets. CLOOB consistently outperforms CLIP at zero-shot transfer learning across all considered architectures and datasets. Andreas Fürst · Elisabeth Rumetshofer · Johannes Lehner · Viet T. Tran · Fei Tang · Hubert Ramsauer · David Kreil · Michael Kopp · Günter Klambauer · Angela Bitto · Sepp Hochreiter 🔗 - Global-based Deep Q-Network for Molecule Generation (Poster) In this work, we propose the Global-based Deep Q-Network for Molecule Generation (mol-GDQN). Explicitly, the proposed mol-GDQN, designs a novel molecular drug by iteratively adding or deleting atoms, bounds or chemical fragments from a lead molecule until reaching a molecular structure with all target properties. We formulate the drug design problem as a Markov Decision Process (MDP), where the RL-agent is defined as the molecule generator. Instead of defining the agent's observations using only local information, we pass a global observation of the molecule as input to the agent at each iteration. We define the global observations of the dynamically changing molecule using a novel GNN variant that is based on a global message passing schema. According to the Markov property, the global observations of the molecule make the agent's actions independent from the previous molecule states contrary to the local observations. Therefore, when receiving global observations of the molecule, the defined RL-agent, carries more optimal actions leading to better properties improvement compared to the local-based approaches. Using the proposed global-based GNN variant, we further define a global observation of the original lead molecule and pass it as additional input to the RL-agent. The obtained results showed that when the RL-agent receives a global observation of the lead molecule throughout the generation process, it enables it to preserve the properties of the lead molecules in the newly generated molecules. Asmaa Rassil · Hiba Chougrad · Hamid Zouaki 🔗 - A Semantically Conditioned Code-Mixed Natural Language Generation for Task-Oriented Dialog (Poster) ne of the most challenging problems in the area of Natural Language Processing and Artificial Intelligence is automatically generating coherent and understandable language for humans, also known as Natural Language Generation (NLG). It is also a crucial component in task-oriented dialog (TOD) systems. In TOD systems, the NLG module converts a dialog state represented in a semantic form into a natural language response. In manystudies, deep learning-based models such as Convolution Neural Networks (CNNs), Long Short Term MemoryNetworks (LSTMs), and encoder-decoder transformer architectures were widely used to map the dialog state to natural language. Most ofthe research NLG was focused on the monolingual data, with amajority of the corpus in the English language. However, in several multilingual regions of the world, such as India, it is natural for speakers to produce utterances and responses which are Code-Mixed. So NLG systems must be trained to deliver a code-mixed multilingual output. In our work, we propose a semantically conditioned - IndicBART (SC-IndicBART) for code-mixed languages and evaluate it using the existing SOTA NLG models. Suman Dowlagar · Radhika Mamidi 🔗 - Multi-task Representation Learning for Renewable Energy Systems (Poster) The alarming rise in global surface temperatures calls for an urgent shift towards renewable sources for power generation. According to the 2021 global road map for Sustainable Development Goal 7, i.e., affordable and clean energy, and the Paris agreement on climate change, there is a need to triple the global renewable power capacity by 2030 and reach net zero emissions by 2050. This increase incapacity also requires an improvement in the accuracy of renewable power forecasting. A reliable power forecast is essential to reduce operational costs and improve the power grid’s safety and maintenance. Renewable power generation forecasts using machine learning are typically implemented as single-tasklearning (STL) models, where a separate model is trained for each solar or wind park. In recent years, transfer learning is gaining popularity in these systems, as it can be used to transfer the knowledge gained from source parks to a target park. However, in transfer learning, there is a need to determine the most similar source park(s) among the existing parks. This similarity determination using historicalpower measurements is challenging when the target park has limited to no historical data samples. Therefore, we propose a simple multi-task learning (MTL) architecture that initially learns a common representation of input weather features among the source tasks using a Unified Autoencoder (UAE) and then learns the task-specific information utilizing a Task Embedding (TE) layer in a neural network. Chandana Priya Nivarthi · Stephan Vogt · Bernhard Sick 🔗 - Modern Hopfield Networks for Iterative Learning on Tabular Data (Poster) While Deep Learning excels in structured data as encountered in vision and natural language processing, it failed to meet its expectations on tabular data. For tabular data, Support Vector Machines (SVMs), Random Forests, and gradient Boosting are the best performing techniques with Gradient Boosting in the lead. Recently, we saw a surge of Deep Learning methods that were tailored to tabular data but still underperform compared to Gradient Boosting on small-sized datasets. We suggest Hopular'', a novel Deep Learning architecture for medium- and small-sized datasets, where each layer is equipped with continuous modern Hopfield networks. The modern Hopfield networks use stored data to identify feature-feature, feature-target, and sample-sample dependencies. Hopular's novelty is that every layer can directly access the original input as well as the whole training set via stored data in the Hopfield networks. Therefore, Hopular can step-wise update its current model and the resulting prediction at every layer like standard iterative learning algorithms. In experiments on small-sized tabular datasets with less than 1,000 samples, Hopular surpasses Gradient Boosting, Random Forests, SVMs, and in particular several Deep Learning methods. In experiments on medium-sized tabular data with about 10,000 samples, Hopular outperforms XGBoost, CatBoost, LightGBM and a state-of-the art Deep Learning method designed for tabular data. Thus, Hopular is a strong alternative to these methods on tabular data. Bernhard Schäfl · Lukas Gruber · Angela Bitto · Sepp Hochreiter 🔗 - Delirium Prediction using Long Short-Term Memory (LSTM) in the Electronic Health Record (Poster) Delirium is an acute decline in cognitive function leading to confusion, which occurs in 29%–65% of hospitalized elderly patients. Previous studies have applied machine learning to predict delirium; however, existing models do not account for temporal data. We propose a method to capture temporal correlations using an LSTM-based model to predict new-onset delirium. We extracted data for all adult patients who had a CAM assessment between January 1, 2018 and October 1, 2021 at Vanderbilt University Medical Center. We developed a deep learning model with 2 parts: a fixed-length LSTM-based model and a machine learning model. We compared its performance with machine-learning-only models (logistic regression, random forest, support vector machine, neural network, and LightGBM). We calculated SHapley Additive exPlanations to gauge the feature impact. A total of 331,489 records from 34,035 patients (896 features) were included. The LSTM-based deep learning model achieved an AUC of 0.952 [0.950, 0.955] and F1 of 0.759 [0.755, 0.765], which showed a significant improvement compared to using machine learning only (p=.001). Leveraging LSTM to develop a deep learning model to capture temporal trends can significantly improve the prediction of new-onset delirium, providing an algorithmic basis for the subsequent development of clinical decision support tools for proactive delirium interventions. Siru Liu · Joseph Schlesinger · Allison McCoy · Thomas Reese · Bryan Steitz · Elise Russo · Adam Wright 🔗 - Evaluating and Improving Robustness of Self-Supervised Representations to Spurious Correlations (Poster) Recent empirical studies have found inductive biases in supervised learning toward simple features that may be spuriously correlated with the label, resulting in suboptimal performance on minority subgroups. Despite the growing popularity of methods which learn representations from unlabeled data, it is unclear how potential spurious features may be manifested in the learnt representations. In this work, we explore whether recent Self-Supervised Learning (SSL) methods would produce representations which exhibit similar behaviors under spurious correlation. First, we show that classical approaches in combating spurious correlations, such as dataset re-sampling during SSL, do not consistently lead to invariant representation. Second, we find that spurious information is represented disproportionately heavily in the later layers of the encoder. Motivated by these findings, we propose a method to remove spurious information from these representations during pre-training, by pruning or re-initializing later layers of the encoder. We find that our method produces representations which outperform the baseline on three datasets, without the need for group or label information during SSL. Kimia Hamidieh · Haoran Zhang · Marzyeh Ghassemi 🔗 - FACTORS INFLUENCING POSTGRADUATE STUDENTS' ACADEMIC PERFORMANCE: MACHINE LEARNING APPROACH. (Poster) An important determinant of a good tertiary institution is the academic performance of its students and the results produced. Research had revealed some factors as contributing to the academic performance of students. This research hence used the available factors to develop a framework that can be used by decision-makers to predict the academic performance of prospective postgraduate students. This study hence aims at developing a model that predicts postgraduate students’ performance using decision tree algorithms.Dataset used in this study was gotten from the postgraduate school, the University of Ibadan, and her Computer Science department. The datasets were adequately pre-processed and major attributes contributing to the prediction of postgraduate students' performance were determined using seven (7) different ranking evaluators. RandomTree, RepTree and J48 decision tree algorithms were applied to the pre-processed dataset, and rules were generated from the optimal algorithm.The results indicated the best attributes for predicting postgraduate students’ performance. It also showed that students from Federal undergraduate schools were more likely to finish their postgraduate course with a Ph.D. grade than their counterparts from state and private schools. J48 algorithm proved to be a better predictor for an imbalanced dataset while the RandomTree algorithm outperforms J48 when considering a balanced dataset.To improve the predictive ability of this model, more datasets containing information from the prospective students should be collected which should include students’ sociological background, personality, expectations as well as previous academic performances. Other data mining methods such as Naïve Bayes, Neural Network, and Support Vector Machine can be trained on the datasets and their results compared for better predictive models. Ayodele Awokoya 🔗 - A Simple Phoneme-based Error Simulator for ASR Error Correction (Poster) Despite the recent advances brought by deep neural networks, the real-world applications of Automatic Speech Recognition (ASR) inevitably suffer from various errors mostly caused by incorrectly captured phonetic features. This is of particular consequence in our work which involves the transcription of real patient clinical conversations. In this work, we aim to fix noisy off-the-shelf ASR transcriptions in low-data settings by building a simple phoneme-based error simulator that can generate large amounts of training data for post-editing ASR error correction systems. To demonstrate the efficacy of our simulated errors, we conduct experiments with different error correction architectures – our own multi-task trained dual-decoder transformer model that performs both error detection and error correction and two state-of-the-art grammatical error correction models. All these models improved in performance (by 0.3 - 1.4% WER) when pretrained on our simulated errors. Also, increasing the amount of simulated data in pretraining, from 0 to 1x and 10x the size of Librispeech, improves performance in the error correction task, regardless of the model structure. We are currently working to develop more domain-specific data to further improve transcriptions in clinical settings. Mohita Chowdhury · Oliver Gardiner · Yishu Miao 🔗 - Deep Learning methods for biotic and abiotic stresses detection in fruits and vegetables: state of the art and perspectives (Poster) Deep Learning (DL), a type of Machine Learning, has gained significant interest in many fields, includingagriculture. This paper aims to shed light on deep learning techniques used in agriculture forabiotic and biotic stresses detection in fruits and vegetables, their benefits, and the challenges facedby users. Scientific papers were collected from web of science, Scopus, Google scholar, Springer, andDirectory of Open Access Journal (DOAJ) using combinations of specific keywords such as: ’DeepLearning’ OR ’Artificial Intelligence’ in combination with ’fruit disease’, ’vegetable disease’, ’fruitstress’, OR ’vegetable stress’ following PRISMA guidelines. From the initial 818 papers identifiedusing the keywords, 132 were reviewed after excluding books, reviews, and the irrelevant. Therecovered scientific papers were from 2003 to 2022; ninety-three percent of them addressed bioticstress on fruits and vegetables. The most common biotic stresses on species are fungal diseases (greyspots, brown spots, black spots, downy mildew, powdery mildew, and anthracnose). Few studieswere interested in abiotic stresses (nutrient deficiency, water stress, light intensity, and heavy metalcontamination). Deep Learning and Convolutional Neural Network were the most used keywords,with GoogleNet (18.28%), ResNet50 (16.67%), and VGG16 (16.67%), the most used architectures.Fifty-two percent of the data used to compile these models come from the fields, followed by dataobtained online. We provided the gaps and some perspectives from the reviewed papers. Precisionproblems due to unbalanced classes and the small size of some databases were analyzed. The results suggest that further work should be done to improve the performance of the models. SETON CALMETTE ARIANE HOUETOHOSSOU · Vinasetan Ratheil Houndji · Castro Hounmenou · Rachidatou SIKIROU · Romain Glèlè Kakaï 🔗 - Follow the Flow: An Affective Computing Interface for the On-Line Detection of Flow Mental State (Poster) Flow is a precious mental status for achieving high sports performance, defined as an emotion with high valence and high arousal levels. However, a viable detection system that could provide information in real time is not yet recognized. The work presented here aims to create an online flow detection framework. A supervised machine learning model will be trained to predict valence and arousal levels on existing databases and freshly collected physiological data. As a final result, the definition of the minimally expensive (both in terms of sensors and time) amount of data needed to predict a flow status will enable the creation of a real-time detection interface of flow. Elena Sajno · G. Riva · Nicole Novielli 🔗 - Under-Counted Tensor Completion with Neural Network-based Side Information Learner (Poster) Under-counted data often arise in disciplines such as ecology and epidemiology. For example, in epidemiology, the cases of an infectious disease (e.g., the COVID-19) may always be under-counted due to the existence of symptom-free patients and the lack of testing. Estimating the unobserved true counts from the under-counted data is therefore a well-motivated task. A recent work has addressed the under-counting effect in matrix-type count data by employing a Poisson-Binomial matrix completion model. The model also learns the probability of detecting a count via a linear function of some side information. The model was shown effective for couple of matrix completion tasks. Nonetheless, there exists a number of challenges. First, the model cannot directly handle under-counted data represented by more than two aspects. Second, the linear function incorporated in the model is not general enough to capture any unknown nonlinear relationship that may occur between the side information and the probability of detection. Third, there is no theoretical understanding to the properties of such a Poisson-Binomial model for handling under-counted data. In this work, we address these aforementioned challenges by proposing a tensor-completion based framework. Our model is based low rank Poisson tensor decomposition combined with a nonlinear function modeling for the probability of detection. To learn the model parameters, we design a joint low-rank tensor completion and neural network learning algorithm. Furthermore, we derive theoretical conditions under which the model parameters can be recovered, leveraging the low-rank tensor structure and the similarity of the detection probabilities. Simulations and real-data experiments support our theoretical claims. Shahana Ibrahim · Xiao Fu · Rebecca Hutchinson · Eugene Seo 🔗 - Detecting State Changes in Dynamic Neuronal Networks (Poster) Our brain function depends on temporal modulation of neuronal networks. Neuronal developmental disorders (NDDs) such as autism spectrum disorder and Rett syndrome are associated with certain patterns of modulation. Our research goal is to identify state changes in neuronal activity associated with the onset of NDDs. Yiwei Gong · Sinead Williamson 🔗 - Learning to Defer in Ranking Systems (Poster) Information retrieval and ranking systems mediate access to information by directing users’ attention when they issue search queries. Rankings produced by such systems often have high real-world impact: for example, in an online recruitment context, a recruiter’s “attention” translates to economically consequential opportunities. Training robust and fair ranking and retrieval systems is hence important to ensure that attention is allocated fairly. In general, these systems work by predicting a “relevance” score – indicating how relevant an item is to a query issued. In our work, we first inspect regions of low performance in existing algorithms for ranking and content recommendation that dominate commercial use of machine learning. We find that low performance in ranking quality is closely related to higher uncertainty in relevance scores.Then, our proposed framework consists of a set of expert models and a defer-vs-no-defer prediction model. The expert models either utilize more information about the user (e.g., their demographic characteristics) or incorporate more context around the search query (e.g., explicitly asking for more words in the query) to improve performance. The deferral decision is made using uncertainty in ranking scores. The utility and fairness of our approach will be benchmarked in multiple setups: from ranking using past user interactions with information retrieval systems as well as online hiring and health information ranking settings. Aparna Balagopalan · Haoran Zhang · Elizabeth Bondi-Kelly · Thomas Hartvigsen · Marzyeh Ghassemi 🔗 - Categorizing Online Harassment on Twitter using Graph Convolutional Networks (Poster) Online platforms and social media are places where people express themselves freely more and more. Twitter is one of these social media that attracts more daily users. When users can express themselves freely, various tones can be seen in their posts. Harassment is one of the consequences of these platforms. Text categorization and classification is a task that aims to solve this problem. Many studies applied classical machine learning methods and recent deep neural networks to categorize text. However, only a few studies have explored graph convolutional neural networks simultaneously using classical approaches to categorize harassment in Tweets. In this work, we propose using Graph Convolutional Networks for the tweet categorization task. Second, we explore this categorization task using classical machine learning approaches and compare the results with the GCN model. Third, we show the effectiveness of the GCN model in performing this task by feeding half of the dataset to the model and still obtaining good performance, above 91%, for categorizing all different types. In addition, we used different embedding approaches to find the best representation for the dataset in each model. We used classical machine learning approaches, including Logistic Regression, Gaussian Na"ive Bayes, Decision Trees, Random Forests, Linear Support Vector Machines, Gaussian SVM, Polynomial SVM, and Multi-Layer Perceptron AdaBoost methods. Finally, we use a collection of English tweets as our dataset when running the experiments. We applied TF-IDF vectors and Word2Vec embeddings as features in these classical machine learning approaches. In our experiments with classical approaches, s a result, we achieved above 0.80 accuracies for detecting sexism and sexual harassment types in the data. Mozhgan saeidi 🔗 - A Domain-Oblivious Approach for Learning Concise Representations of Filtered Topological Spaces for Clustering (Poster) Persistence diagrams have been widely used to quantify the underlying features of filtered topological spaces in data visualization. In many applications, computing distances between diagrams is essential; however, computing these distances has been challenging due to the computational cost. In this paper, we propose a persistence diagram hashing framework that learns a binary code representation of persistence diagrams, which allows for fast computation of distances. This framework is built upon a generative adversarial network (GAN) with a diagram distance loss function to steer the learning process. Instead of using standard representations, we hash diagrams into binary codes, which have natural advantages in large-scale tasks. The training of this model is domain-oblivious in that it can be computed purely from synthetic, randomly created diagrams. As a consequence, our proposed method is directly applicable to various datasets without the need for retraining the model. These binary codes, when compared using fast Hamming distance, better maintain topological similarity properties between datasets than other vectorized representations. To evaluate this method, we apply our framework to the problem of diagram clustering and we compare the quality and performance of our approach to the state-of-the-art. In addition, we show the scalability of our approach on a dataset with 10k persistence diagrams, which is not possible with current techniques. Moreover, our experimental results demonstrate that our method is significantly faster with the potential of less memory usage, while retaining comparable or better quality comparisons. This work is been published in VIS 2021 and TVCG. Yu Qin · Brittany Terese Fasy · Carola Wenk · Brian Summa 🔗 - A Recommendation System in Task-Oriented Doctor-Patient Interactions (Poster) The research on doctor-patient interactions is gaining popularity, thanks to the recent advances in deep learning models that can handle unconstrained input. Most systems developed for doctor-patient interactions employ a task-oriented dialog (TOD) system that solves a patient's particular task, such as diagnosis, monitoring, assistance, and counseling. A modern TOD system is based on frame-based architecture. In frame-based architecture, the frames consist of slots that are filled with values elicited from the user. The conversations between the system and the patient will flow towards completing the frame. The frame specific to clinical interactions mostly records the patient's demographic and medical history. However, the information recorded in a frame might be insufficient if the patient forgets to mention a few symptoms. Then how do we get the complete information from the user? In this paper, we present a novel task symptom recommendation - whose goal is to automatically remind symptoms to the patient by learning directly from the doctor-patient conversational dataset. For this task, we gathered a real-life dataset with doctor-patient dialogs involving different medical specializations. Also, we experiment with multiple popular RS models. Suman Dowlagar · Radhika Mamidi 🔗 - Explaining Predictive Uncertainty by Looking Back at Model Explanations (Poster) Predictive uncertainty estimation of pre-trained language models is an important measure of how likely people can trust their predictions. However, little is known about what makes a model prediction uncertain. Explaining predictive uncertainty is an important complement to explaining prediction labels in helping users understand model decision making and gaining their trust on model predictions, while has been largely ignored in prior works. In this work, we propose to explain the predictive uncertainty of pre-trained language models by extracting uncertain words from existing model explanations. We find the uncertain words are those identified as making negative contributions to prediction labels, while actually explaining the predictive uncertainty. Experiments show that uncertainty explanations are indispensable to explaining models and helping humans understand model prediction behavior. Hanjie Chen · Wanyu Du · Yangfeng Ji 🔗 - DaME: Data Mapping Engine for Financial Services (Poster) In modern financial services, data engineers seek access to the end-to-end integrated data flow to take business decisions, which otherwise exists in silos, and hence cannot be used for effective analysis and inferencing. This work introduces a system called Data Mapping Engine (DaME) to realize the full potential across data sources and provide a comprehensive analysis of their relatedness. In the case of financial services, knowing the data mapping between contracts and invoices can provide insights into potential risks. However, based on the client engagements we have observed key challenges in data mapping across various industries. First, there is a Lack of Standardization across multiple organizations within the same industry different entity definitions are used for synonymous terms with similar meanings. Second, Manual Mapping and Maintenance as data engineers manually define mappings between data warehouses, which is prone to errors and counterproductive. Lastly, due to Lack of Governance, over time different departments may digress from the business process that are essential for maintaining and monitoring contract data. Hence, there is a need for a trustful, automated data mapping engine that can connect the different data types across tables using natural language processing and Human-in-the-Loop (HIL). SHUBHI ASTHANA · Ruchi Mahindru 🔗 - Improved Text Classification via Test-Time Augmentation (Poster) Test-time augmentation---the aggregation of predictions across transformed examples of test inputs---is an established technique to improve the performance of image classification models. Importantly, TTA can be used to improve model performance post-hoc, without additional training. Although test-time augmentation (TTA) can be applied to any data modality, it has seen limited adoption in NLP due in part to the difficulty of identifying label-preserving transformations. In this paper, we present augmentation policies that yield significant accuracy improvements with language models. A key finding is that augmentation policy design–for instance, the number of samples generated from a single, non-deterministic augmentation–has a considerable impact on the benefit of TTA. Experiments across a binary classification task and dataset show that test-time augmentation can deliver consistent improvements over current state-of-the-art approaches. Helen Lu · Divya Shanmugam · Harini Suresh · John Guttag 🔗 - Model Interpretation based Sample Selection in Large-Scale Conversational Assistants (Poster) Natural language understanding (NLU) models are a core component of large-scale conversational assistants. Collecting training data for these models through manual annotations is slow and expensive that impedes the pace of model improvement. We present a three-stage approach to address this challenge: First, we identify a large set of relatively infrequent utterances from live traffic where the users implicitly communicated satisfaction with a response (such as by not interrupting), along with the existing model outputs as candidate annotations. Second, we identify a small subset of these utterances using Integrated Gradients based importance scores computed with the current models. Finally, we augment our training sets with these utterances and retrain our models. We demonstrate the effectiveness of our approach in a large-scale conversational assistant, processing billions of utterances every week. By augmenting our training set with just 0.05 more utterances through our approach, we observe statistically significant improvements for infrequent tail utterances: a 1.03% reduction in NLU error rate in offline experiments, and a 1.23% reduction in defect rates in online A/B tests. Kiana Hajebi 🔗 - Human-AI Interaction in Selective Prediction Systems (Poster) Recent work has shown the potential benefit of selective prediction systems that can learn to defer to a human when the predictions of the AI are unreliable, particularly to improve the reliability of AI systems in high-stakes applications like healthcare or conservation. However, most prior work assumes that human behavior remains unchanged when they solve a prediction task as part of a human-AI team as opposed to by themselves. We show that this is not the case by performing experiments to quantify human-AI interaction in the context of selective prediction. In particular, we study the impact of communicating different types of information to humans about the AI system's decision to defer. Using real-world conservation data and a selective prediction system that improves expected accuracy over that of the human or AI system working individually, we show that this messaging has a significant impact on the accuracy of human judgements. Our results study two components of the messaging strategy: 1) Whether humans are informed about the prediction of the AI system and 2) Whether they are informed about the decision of the selective prediction system to defer. By manipulating these messaging components, we show that it is possible to significantly boost human performance by informing the human of the decision to defer, but not revealing the prediction of the AI. We therefore show that it is vital to consider how the decision to defer is communicated to a human when designing selective prediction systems, and that the composite accuracy of a human-AI team must be carefully evaluated using a human-in-the-loop framework. Elizabeth Bondi-Kelly · Raphael Koster · Hannah Sheahan · Martin Chadwick · Yoram Bachrach · Taylan Cemgil · Ulrich Paquet · Krishnamurthy Dvijotham 🔗 - User-interactive, On-demand Cycle-GAN-Based Super Resolution and Focus Recovery on Whole Slide Images (WSI) (Poster) The whole slide image (WSI) systems recently approved by FDA for primary diagnosis have opened up new possibilities for digital pathology to have an increased impact in clinical care and research. However, the process of scanning slides, autofocus algorithms failed, combined with data storage and management needs, represents a much more complex, expensive, and resource-intensive workflow than the simple glass slide and microscope workflow, which hinders adoption. This project aimed to make progress toward the whole slide image acquisition issues by 1) creating 'on-demand' resolution recovery algorithms that could generate high resolution image outputs from low-resolution WSI image inputs in areas of user interest, and 2) to automatically detect and correct regions of poor focus, eliminating the need for QC personnel and high re-scan rates. To address both of these goals, we used unsupervised Cycle-Consistent Adversarial Networks (Cycle-GAN), which are designed to work without paired training data. More specifically, we propose two networks SR-CycleGAN and refocus-CycleGAN, to achieve up to 4-times super resolution and dynamic focus recovery in whole slide images without any a priori knowledge, and demonstrated the generalizability of the models across multiple tissue types. We also present a deployment pipeline for practical scenarios, where users can choose ROIs for super-resolution by user-interactive on-demand, and focus quality can be automatically detected and corrected in WSIs. Huimin Zhuge · Brian Summa · J.Quincy Brown 🔗 - Self Supervised Learning in Microscopy (Poster) We use SSL as a tool to explore a public microscopy dataset consisting of multi-channel (3+), non-composite microscopy images, where each channel represents a different stain. We present an extensible pipeline to perform self-supervised training using a contrastive learning approach. The workflow involves three parts: (1) Pre-Training phase which allows us to run several state of the art SSL algorithms such as SimCLR 2, MOCO, PIRL and compare algorithm performance on loss function, evaluate linear model for quality of embeddings and compute requirements (time to convergence and speed of convergence). (2) We then do full-tuning on the training dataset by deploying the model weights from the pre-training phase and evaluate the classification task for the labeled dataset. We learn some key insights and takeaways including amount of information in each image channel, channel wise performance, class or siRNA relations and well, plate and stain relations, (3) finally, we perform large scale visualization and with domain experts and analyze any latent biological information that was detected by the SSL models. We conclude that SSL cannot replace the existing supervised and unsupervised learning algorithms for our dataset, instead it can be used as a tool to expedite work flows and bolster the existing learning methodologies by providing a more meaningful starting point. However, SSL algorithms are heavily reliant on and constrained by the quantity and quality of data available. In spite of the limitations, they provide an efficient, reusable and cost effective means to derive meaning out of unlabeled data without the intervention of a domain expert. SSL experimentation can be a time consuming process, we would like to highlight the pros and cons of this learning paradigm based on our work and dataset, and to help future researchers make educated decisions on what dataset features and tasks make SSL more suitable. Aastha Jhunjhunwala · Siddha Ganju 🔗 - Machine Learning for the detection of diabetic retinopathy (Poster) Diabetic retinopathy (DR) is one of the most common causes of vision loss. Although preventable and curable at early-stage, most diabetic patients are diagnosed with DR very late because the clinical method for the detection can be very tedious and may require highly technical analysis and it is time-consuming. Therefore, it is very important to detect diabetic retinopathy at an early stage to seek treatments and prevention measures. In this work, a Machine Learning model based on Convolutional Neural Network (CNN) and Random Forest were developed to detect retinopathy in diabetic patients. The dataset which comprises of 5 classes and 20,000 instances of images obtained from Kaggle. 5 disease severity levels were defined as: 0 for ‘no apparent retinopathy’, 1 for ‘mild’, 2 for ‘moderate’, 3 for ‘severe non-proliferative DR’, and 4 for ‘proliferative DR’. The dataset was divided into the training set and the testing set at a 70% and 30% value. After that, the training data is fed into the machine for training, and the test set is compared to the training set to ensure correctness. The performance accuracy between the training and test set are calculated, CNN has an accuracy of 73.44%, and Random Forest has an accuracy of 68.75%. A system was also built in which the model developed using CNN (since it outperforms Random Forest) was integrated. This work has narrowed the gap between clinical and machine learning methods for identifying diabetic retinopathy. The use of machine processes to diagnose diabetic retinopathy has given the medical sector a substantial improvement, and it is also helping to reduce the rate at which diabetic people lose their vision. Francisca Oladipo · Taiwo Amusan 🔗 - Topic: Building Identification In Aerial Imagery using Deep learning (Poster) Building identification is an important task for urban planning, settlement tracking, and can also help to supplement the limited data in developing countries where there is inadequate and infrequent census data. Several Deep learning architectures such as Fully connected network (FCN), UNET and Deeplab can be used to perform building identification in such scenarios where census data is limited and have given promising results. However, most of these architectures have some drawbacks such as poor edge detection thus necessitating the use of very huge training datasets that in turn leads to the utilization of a lot of computation resources. Additionally, there is a challenge when it comes to adapting these trained models to other domains, i.e., a model trained in one region poorly performs on other regions.With several highly performing semantic segmentation architectures being developed and published, comparison studies which help us choose the best architecture for a specific task are of vital importance. In this research, we carry out building identification using semantic segmentation to classify a given pixel as building or non-building. We use the diverse Inria aerial image labeling benchmark dataset (Maggiori, Emmanuel, et al., 2017). We intend to conduct a qualitative and quantitative comparative study of the semantic segmentation architectures that use encoder-decoder architecture, multitask learning, domain adaptation and architectures that use encoder-decoder architectures as their backbone. While comparative studies have been done such as Hu, Junxing, et al, 2019, they are not exhaustive as they consider specific architectures for instance encoder-decoder and do not cover newer architectures. In our work, we also look at other factors that affect model performance such as edge detection, the effect of hyperparameter tuning, and transfer learning. Proscovia Nakiranda 🔗 - Dynamic Head Pruning in Transformers (Poster) The transformer architecture has become a dominant paradigm in an overwhelming number of natural language processing (NLP) tasks. Its headline feature, the multi-head attention (MHA) mechanism, is remarkable at capturing pertinent relations within input sequences, but at the cost of high quadratic complexity in compute and memory. We address this by pruning attention heads. Despite existing work on this line of pruning, there is ambiguity as to whether there is an optimal strategy to create a pruned transformer model based on head importance. Our initial aim is to evaluate multiple pruning techniques to understand the aspects of a method that generally lead to a better trade-off in run-time speed and accuracy of the pruned model. A key constraint to note, however, is that due to the self-attention operation carried out in transformer heads, their importance is input dependent to a large extent. In the current design, heads that may be salient for particular inputs are permanently lost, which means that pruned models will rarely be able to restore original levels of accuracy. This prompts the question: can we dynamically determine the pruning configuration of the model based on inputs during run time? We try to achieve this by introducing a novel technique to carry out dynamic head pruning in transformers. Prisha Satwani · yiren zhao · Vidhi Lalchand · Robert Mullins 🔗 - Mobile-PDC: High-Accuracy Plant Disease Classification for Mobile Devices. (Poster) Cassava is a staple crop that is important for food safety in parts of Africa. A key challenge in growing the crop is that it is highly sensitive to diseases. Today, experts primarily diagnose these diseases by moving to different parts of the country while visually assessing the state of health of the crops, which is a cumbersome and erratic process. Nevertheless, state-of-the-art deep transfer learning models that can aid the automated diagnosis of these diseases exist. However, these models cannot be deployed on mobile devices because of the limited memory and computational capacity of these devices and there is not enough network coverage to service them from the cloud. To address this issue, we present knowledge distillation as a technique that can be used to build accurate plant disease classification models that are compatible with the capabilities of mobile devices.We train new Mobile-PDC Plant Decisive Classification models that have the same classification accuracy as state-of-the-art PDC models, but are much smaller in size and fit on mobile devices. Our Mobile-PDC models have the MobileNet structure, which makes them compatible with multiple mobile devices. Our experiments demonstrate that we can compress 91.2% of the original stateof- the-art PDC models without losing accuracy. Samiiha Nalwooga · Henry Mutegeki 🔗 - CAM-GAN: Continual Adaptation Modules for Generative Adversarial Networks (Poster) We present a continual learning approach for generative adversarial networks (GANs), by designing and leveraging parameter-efficient feature map transformations. Our approach is based on learning a set of global and task-specificparameters. The global parameters are fixed across tasks whereas the task-specific parameters act as local adapters for each task, and help in efficiently obtaining task-specific feature maps. Moreover, we propose an element-wise addition of residual bias in the transformed feature space, which further helps stabilize GAN training in such settings. Our approach also leverages task similarities based on the Fisher information matrix. Leveraging this knowledge from previous tasks significantly improves the model performance. In addition, the similarity measure also helps reduce the parameter growth in continual adaptation and helps to learn a compact model. In contrast to the recent approaches for continually-learned GANs, the proposed approach provides a memory-efficient way to perform effective continual data generation. Through extensive experiments on challenging and diverse datasets, we show that the feature-map-transformation approach outperforms state-of-the-art methods for continually-learned GANs, with substantially fewer parameters. The proposed method generates high-quality samples that can also improve the generative-replay-based continual learning for discriminative tasks. Sakshi Varshney · Vinay Verma · Srijith PK · Piyush Rai · Lawrence Carin 🔗 - Modeling Sharing Time Of Fake And Real News (Poster) Viral spread of misinformation on social media is one of the largest threats to national security, and accurately predicting what impacts sharing time can help combat viral spread. This work aims to predict the time it takes users to share misinformation on social media. Survival analysis is a statistical analysis method aimed at predicting the time-to-event and is often used to predict survival time for patients from the moment of a disease diagnosis. It differs from other statistical methods in that it also considers the data where the event (death/sharing) never occurs (censored data). This work applies survival analysis to time to sharing of information and misinformation on social media, using a dataset gathered by Joy et al., 2021. The dataset contains the user-based, news-based, and network-based information related to the news item exposure, as well as the time taken for a user to share the news in cases where they shared it. We compare several survival analysis methods for predicting the time until a user shares with several baseline regression models using various performance metrics. We use survival time estimates from the best performing model to compare survival time for fake and real news, and finally compute the weighted importance of each covariate on the survival time for a user. Maya Zeng · Cooper Doe · Vladimir Knezevic · Francesca Spezzano · Liljana Babinkostova 🔗 - Interaction Classification with Key Actor Detection in Multi-Person Sport Videos (Poster) Interaction recognition from multi-person videos is a challenging yet essential task in computer vision. Often the videos depict actions with multiple actors involved, some of whom participate in the main event, and the rest are present in the scene without being part of the actual event. This paper proposes a model to tackle the problem of interaction recognition from multi-person videos. Our model consists of a Recurrent Neural Network (RNN) equipped with a time-varying attention mechanism. It receives scene features and localized actors features to predict the interaction class. Additionally, the attention model identifies the people responsible for the main event. We chose penalty classification from ice hockey broadcast videos as our application. These videos are multi-persons and depict complex interactions between players in a non-laboratory recording setup. We evaluate our model on a new dataset of icehockey penalty videos and report 93.93% classification accuracy. We include a qualitative analysis of the attention mechanism by visualizing the attention weights. Farzaneh Askari · Rohit Ramaprasad · James Clark · Martin Levine 🔗 - Estimating Fairness in the Absence of Ground-Truth Labels (Poster) In a post-deployment setting, in the absence of ground- truth labels and possibly in the presence of distribution shift, how might we estimate fairness performance? We focus on two main questions: first, we evaluate how existing performance estimationmethods might extend to fairness metric estimation; and second, we show initial attempts at identifying a method which most effectively estimates fairness performance. For the first question, in addition to extending the implementations of existing methods, we determine criteria for how well these extensions work in a fairness context; for the second question, we apply this criteria to discuss how one method might work better over others. Michelle Bao · Jessica Dai · Keegan Hines · John Dickerson 🔗 - Motor Imagery ECoG Signal Classification With Optimal Selection Of Minimum Electrodes (Poster) Brain-Computer Interface (BCI), based on motor imagery, translates the motor intention into a control signal by classifying the electrophysiological patterns of different imagination tasks using ECoG, which can capture a broader range of frequencies showing better sensitivity and higher input quality than EEG. However, with ECoG being an invasive technique, there may be some utility in developing an ECoG bidirectional classifier that reduces the number of implanted electrodes. The present study aims to develop an ECoG signals classifier to achieve high accuracy with a limited number of electrodes. Tuga Yousif · Shubham Kumar · Ruoqi Huang 🔗 - A Noether's theorem for gradient flow: Continuous symmetries of the architecture and conserved quantities of gradient flow (Poster) The loss landscape of deep learning seems hopelessly complex. Most structures discovered in this landscape have been empirical observations. The loss landscape is determined by the model architecture and the dataset. Yet, the interplay between architecture and the structure of local minima is not well understood. We uncover a key part of this relation. First, we show that even nonlinear neural networks admit a large group of continuous symmetries which keep the loss invariant. These symmetries show that many local minima are valleys, having directions in the parameter space where the loss remains invariant. Additionally, we show that these symmetries imply certain quantities are conserved during gradient flow. We derive the explicit form of these conserved quantities using a Noether's theorem for gradient flow. These conserved quantities allow us to define coordinates along the valley of a local minimum. These symmetries can be used to create ensembles of trained models from a single trained model. Bo Zhao · Iordan Ganev · Robin Walters · Rose Yu · Nima Dehmamy 🔗 - Generating High-Quality Emotion Arcs Using Emotion Lexicons (Poster) Automatically generated emotion arcs that capture how an individual or a population feels towards a product or entity overtime are widely used in industry and across research disciplines. However, there is little work on evaluating the generated arcs. This is in part due to the difficulty of establishing the true (gold) emotion arc. Our work, for the first time, systematically and quantitatively evaluates automatically generated emotion arcs. We also compare two common ways of generating emotion arcs: lexicon-based methods and machine learning models for sentiment analysis. Along the way, we systematically study the relationship between the quality of an emotion lexicon and the quality of the emotion arc that can be generated with it. We show that despite being markedly poor at instance-level, the lexicon-only method has extremely high predictive power when it comes to aggregating information from hundreds of instances and generating emotion arcs. This work has wide-spread implications for commercial development, as well as research in psychology, public health, digital humanities, etc. that values simple interpretable methods without the need for domain-specific training data, programming expertise, and high-carbon-footprint neural models. Daniela Teodorescu · Saif Mohammad 🔗 - Exposure Fairness in Music Recommendation (Poster) As recommender systems play a larger and larger role in our interactions with online content, the biases that plague these systems grow in their impact on our content consumption and creation. This work focuses on the mitigation of one such bias, popularity bias, as it relates to music recommendation. We formulate the problem of music recommendation as that of automatic playlist continuation. In order to harness the power of graph neural networks (GNNs), we define our recommendation space as a bipartite graph with songs and playlists as nodes and edges between them indicating a song being contained in a playlist. Then, we implement PinSage, a state of the art graph based recommender system to perform link prediction. Finally, we integrate an individual fairness framework into the training regime of PinSage to learn fair representations which can be used to generate relevant recommendations. Rebecca Salganik · Fernando Diaz · Golnoosh Farnadi 🔗 - DeepWear: Towards an Automated Textiles Materials Classification using a Taxonomy-based ML Approach (Poster) Global sustainability has become an urgent call. Garments and textiles are ubiquitous in our daily lives, however, tons of garments end up in landfill every year. The fashion industry is undergoing a significant change to help textile materials to be reused, repaired and recycled in a sustainable manner. Textiles need to be traced back to their original forms so that recycling can be guaranteed. Yet, textiles are mostly sorted manually as automatic identification of textile materials is challenging and we lack a low-cost and effective technique for identifying textiles. Our proposed model looks at this textile classification problem from a very different angle, we use simple ubiquitous RGB textiles garments images. Shu Zhong · Miriam Ribul · Youngjun Cho · Marianna Obrist 🔗 - Revisiting Graph Neural Network Embeddings (Poster) Current attempts to improve the effectiveness of Graph Neural Networks (GNNs) work on pre-existingembedding datasets. The atate of the art is determined based on these limited datasets without consid-eration of how these architectures perform on other embeddings using the same underlying dataset.Existing dataset embeddings rarely reflect rich features from the dataset and instead utilisepre-existing feature extraction methods. This means that the performance of the different models on thesedatasets does not always reflect the best performance that could be achieved on that dataset. Beyond this,when looking at different embeddings of the same underlying dataset, we see significant variation in theperformance of the architectures.We explore new dataset embeddings to test existing GNNs on differing embeddings andpropose methods of transfer learning and mixed network architectures to generalise current GNN classifi-cation to the underlying dataset, not the embeddings. These new techniques allow for existing powerfulclassification techniques to utilise context that exists between items in a dataset. Skye Purchase · yiren zhao · Robert Mullins 🔗 - Estimating the Treatment Effect of Antibiotics Exposure on the Risk of Developing Anti-Microbial Resistance (Poster) Incorporating the propensity of an antibiotic engendering resistance to itself and other antibiotics is a potentially useful strategy for preventing antimicrobial resistance (AMR). However, prospective studies have been difficult to generalize to outpatients and retrospective studies are prone to design errors and model misspecification. To address this gap, we apply causal inference with Targeted Maximum Likelihood Estimation using machine learning, to data from the Electronic Health Record to define the antibiotic use (Treatment) - resistance (Outcome) relationship for common outpatient therapies used to treat urinary tract infection (UTI). We used total 20 covariates (Confounder) to adjust for the confounding effect of treatment on outcome. By estimating the treatment effect of antibiotics on future resistance, we expect to derive clinical correlation which helps build a decision support tool. Estimating the effect of antibiotic treatment will help clinicians design better care plans for patients by choosing the best antibiotics that minimize the risk of future AMR event. Hyewon Jeong · Kexin Yang · Ziming Wei · Yidan Ma · Intae Moon · Sanjat Kanjilal 🔗 - Can we explain Aha! moments in artificial agents ? (Poster) During the learning process, a child develops a mental representation of the task he or she is learning. A Machine Learning algorithm develops also a latent representation of the task it learns. We investigate the development of the knowledge construction of an artificial agent through the analysis of its behavior, i.e., its sequences of moves while learning to perform the Tower of Hanoï(TOH) task. We position ourselves in the field of explainable reinforcement learning for developmental robotics, at the crossroads of cognitive modeling and explainable AI. Our main contribution proposes a 3-step methodology named Implicit Knowledge Extraction with eXplainable Artificial Intelligence (IKE-XAI) to extract the implicit knowledge, in form of an automaton, encoded by an artificial agent during its learning. We showcase this technique to solve and explain the TOH task when researchers have only access to moves that represent observational behavior as in human-machine interaction. Therefore, to extract the agent acquired knowledge at different stages of its training, our approach combines: first, a Q-learning agent that learns to perform the TOH task; second, a trained recurrent neural network that encodes an implicit representation of the TOH task; and third, an XAI process using a post-hoc implicit rule extraction algorithm to extract finite state automata. We propose using graph representations as visual and explicit explanations of the behavior of the Q-learning agent. Our experiments show that the IKE-XAI approach helps understanding the development of the Q-learning agent behavior by providing a global explanation of its knowledge evolution during learning. IKE-XAI also allows researchers to identify the agent’s Aha! moment by determining from what moment the knowledge representation stabilizes and the agent no longer learns. Ikram Chraibi Kaadoud · Adrien Bennetot · Barbara Mawhin · Vicky Charisi · Natalia Diaz-Rodriguez 🔗 - Multimodal Deep Learning for Weapon Detection (Poster) See attached PDF. Parie Desai · Prajwal Saokar · William Wansing 🔗 - The Role of Expert-driven Prompt Engineering for Fine-grained Zero-shot Classification in Fashion (Poster) E-commerce fashion retailers face the increasingly challenging task of tracking fast-changing trends in delivering content-specific recommendations. A key differentiator among competitors in this space is the ability to identify fine-grained fashion attributes that distinguish these trends. Such fine-grained attributes go deeper than, e.g., the colors, fabrics, and sizes of clothing that are easily captured in retailers' catalogs, and involve nuanced attributes, such as the specific sleeve style of a garment, its intended occasion, or whether it belongs to a trending style. In order to classify these attributes via standard methods, we require large amounts of fine-grained, accurately labeled data which is expensive and labor intensive. We use CLIP to carry out zero-shot classification for fine-grained fashion attributes, where text prompts represent the classes. We demonstrate that for fine-grained fashion attributes, expert-driven prompts deliver higher accuracy than coarse, naive prompts. Because CLIP depends on joint text-image similarity, we hypothesize that adding detailed fashion descriptions like the captions included in the CLIP training set, lead to more well-defined embedding spaces and higher classification accuracy. Dhanashree Balaram · Matthew Nokleby · Thiyagarajan Ramanathan · Ajitesh Gupta Gupta · Ravi Kannan 🔗 - Explaining complex system of multivariate times series behavior (Poster) Complex systems represented by multivariate time series are ubiquitous in many applications, especially in industry. Understanding a complex system, its states and their evolution over time is a challenging task. This is due to the permanent change of contextual events internal and external to the system. We are interested in representing the evolution of a complex system in an intelligible and explainable way based on knowledge extraction. We propose XR-CSB (eXplainable Representation of Complex System Behavior) based on three steps: (i) a time series vertical clustering to detect system states, (ii) an explainable visual representation using unfolded finite-state automata and (iii) an explainable pre-modeling based on an enrichment via exploratory metrics. Four representations adapted to the expertise level of domain experts for acceptability issues areproposed. Experiments show that XR-CSB is scalable. Qualitative evaluation by experts of different expertise levels shows that XR-CSB meets their expectations in terms of explainability, intelligibility and acceptability Ikram Chraibi Kaadoud · Lina Fahed · Tian Tian · Yannis Haralambous · Philippe Lenca 🔗 - Computational models of Language Variation in Literary Narratives (Poster) The availability of massive text datasets is promising for understanding linguistic variation on a large scale. Literary authors often have distinctive writing styles; this style changes over time, with the genre of the text, and even through the course of a single novel.This work aims to develop techniques that can model stylistic variation in character voices within literary texts. A non-trivial problem here is that of reliably attributing quotations within a novel to the characters that utter them, a task called quotation attribution. It is particularly challenging in literary texts because of the large amount of variation in narrative style and structure, and the lack of annotated datasets in this domain to train models. We have currently annotated a set of 25 full-length English language novels for various aspects of quotation and coreference within them, and this is by an order of magnitude the largest such dataset for literary texts. A preliminary stylometric classification model achieves an average accuracy of 0.60 on this dataset. We are currently working on improving this model with contextual features obtained using PLLMs that are fine-tuned for character identification and quotation attribution in a semi-supervised setup.The resulting quotation attribution model, when applied to a large-scale corpus of literary novels, can be used to analyse several questions of interest regarding the choices authors make when writing their characters, and how this varies based on the demographic characteristics of the author, across different decades, and across genres. Do female authors write female characters with more or less stylistic distinctiveness compared with male authors? Do certain authors write more “balanced” characters across the board? How does this change across decades and centuries? Our work has the potential to answer these questions in a data-driven manner, and shed a light on various biases, implicit and explicit, that exist in the literary canon. Krishnapriya Vishnubhotla 🔗 - Graph Transformer Networks for Nuclear Proliferation Detection in Urban Environments (Poster) A network of sensors deployed in urban environments continuously monitor for the presence of radioactive isotopes whether routine (i.e., medical procedures) or nefarious (i.e., nuclear proliferation). Unattended radiological sensor networks must take advantage of contextual data (open-source and historical sensor signals) to anticipate background isotope signatures across locations and sensors to mitigate nuance alarms. In our approach, we develop novel graph transformer networks to predict radiological sensor and isotope alerts with signals extracted from historical time series and context from nearby radiation sources. Our Dynamic Graph Transformers (DGT) models exceed the basic capability of Graph Neural Networks to analyze patterns in graph structure over time and predict links between nodes (i.e., sensors, hospitals, and construction sites), by learning from dynamic, time-dependent relationships. We extend the pre-existing state-of-the-art dynamic graph models, TGN and RENet, by incorporating Transformers for radiological sensor signal modeling and develop three model architectures. First, DGT-Continuous learns complex relationships between nodes from a sequence of time-stamped edges, and outputs the predicted probability of future edges between nodes. DGT-Discrete learns from a series of graph snapshots representing the relationship between nodes in the previous 24 hours, and predicts the next graph snapshot for the next interval. We have two variations of this model: DGT-D/G incorporates global context and DGT-D/GL incorporates local and global context. We pretrain the DGT models to use in two downstream tasks; (1) predicting the total number of alerts (regression) and (2) forecasting if an alert from each isotope and sensor will occur in the next time step. We leverage data collected from five sensors in Washington, DC between Oct. 2019 and Dec. 2020, and rely on traffic patterns between potential sources of radiation (hospitals and construction sites), to p Anastasiya Usenko · Yasanka Horawalavithana · Ellyn Ayton · Joon-Seok Kim · Svitlana Volkova 🔗 - Automated Staging of Breast Cancer Histopathology Images Using Deep Learning. (Poster) Cancer is the second deadliest disease in the US. Each year, breast cancer alone causes the deaths of over 47,000 people. A key tool in the diagnosis and classification of cancer lies in the realm of Histopathology images in which a trained specialist, known as a pathologist, examines chemically stained biopsy samples under high magnification to diagnose cancerous tissue. We have developed a model to automatically diagnose breast cancer images found on The Cancer Genome Atlas (TCGA) based solely on their digital pathology image. In addition to identifying high risk patients, this model may help those with low grade cancer from undergoing costly treatments and surgeries. The problem is formulated as a multiclass classification ranging from in situ to metastatic breast cancer stages (I, II, III, and IV) using H&E (Hematoxylin and Eosin) stained images. The model was developed using a pipeline as detailed in the full abstract. Each digital Whole Slide Image (WSI) is assigned to a training, validation or testing set. The image is then converted into the Hue, Saturation, and Value (HSV) colorspace as the colorspace of tissue regions correspond primarily in the purple-blue region of the Hue spectrum. From these regions, 256x256 pixel images are sampled and passed through a VGG-16 model to determine whether they are from benign or malignant tissue. If a tile is classified as being cancerous, it was then passed through a ResNet-18 model for stage classification. Finally, the WSI is classified using the maximum probability of all the individual tiles extracted. WSIs contain numerous morphological indicators of cancer severity which can differentiate benign and aggressive cancers. Leveraging deep learning models to analyze WSIs will assist pathologists to better diagnose cancer improve patient outcomes. Angela Crabtree · Narmada Naik · Kevin Matlock 🔗 - Gaussian Process parameterized Covariance Kernels for Non-stationary Regression (Poster) A large cross-section of Gaussian process literature uses universal kernels like the squared exponential (SE) kernel along with automatic revelance determination (ARD) in high-dimensions. The ARD framework in covariance kernels operates by pruning away extraneous dimensions through contracting their inverse-lengthscales. This works considers probabilistic inference in the factorised Gibbs kernel (FGK) [Gibbs, 1998] and the multivariate Gibbs kernel (MGK) [Paciorek, 2003] with input-dependent lengthscales. These kernels allow for non-stationary modelling where samples from the posterior function space "adapt" to the varying smoothness structure inherent in the ground truth. We propose parameterizing the lengthscale function of the factorised and multivariate Gibbs covariance function with a latent Gaussian process defined on the same inputs. For large datasets, we show how these non-stationary constructions are compatible with sparse inducing variable formulations for regression. Experiments on synthetic and real-world spatial datasets for precipitation modelling and temperature trends demonstrate the feasibility and utility of the approach. Vidhi Lalchand · Talay Cheema · Laurence Aitchison · Carl Edward Rasmussen 🔗 - Heart Disease Prediction Using Machine Learning Techniques (Poster) Heart disease is one of the major diseases with life alarming danger in the world. As estimated by the World Health Organization report, about 17 million people die every year as a result of this disease, and it is projected to affect almost 23.6 million people by the year 2030. Heart disease refers to diseases in the heart and the blood vessels. Due to the importance and effect of this disease, early detection is crucial to reducing its effect on mankind. The expensive and unavailability of healthcare in several places has been a major bottleneck in tackling this disease. This work adopts four machine learning algorithms, namely; K-Nearest Neighbor, Decision Trees, Naïve Bayes, and Logistic Regression for the prediction of heart disease. Five different datasets with a total of 1190 records out of which 272 duplicate records were removed from the UCI Machine Learning Repository were used with 11 features. After data cleaning, a total of 272 records were removed. The records had 79% males with ages from 28 to 77 and 21% percent females with ages from 30 to 76.The dataset was split using a Stratified K-Fold over 5 folds for cross-validation and the accuracy of every fold was calculated using the area under the Receiver Operating Characteristic curve. The model was implemented in python programming language because of its vast and easy-to-use libraries. The result of the developed system is a classification of the output into the absence of heart disease (0) and the presence of heart disease (1). It is available on https://hallp.herokuapp.com. The developed system showed that KNN had the highest accuracy of 91.6%, followed by Naïve Bayes and Logistic Regression with 88% and Decision Trees with 77.2% accuracy. The developed system helps in early heart disease prediction faster because it computes quickly and is always available and stress-free. Asegunloluwa Babalola · Tekena Solomon 🔗 - Multi-group Reinforcement Learning for Electrolyte Repletion (Poster) Most off-policy reinforcement learning methods for specifying treatment policies in EHR data have a heterogenous patient population as well as different complications that are generally not considered in identifying optimal treatment policies as patient subgroups are hard to model. In this work, we use multi-group Gaussian process regression in a fitted Q-iteration framework to model diverse patient subgroups and adapt the optimal policies in a personalized manner as we approximate these functions across the full patient population. We apply our multi-group reinforcement learning (MGRL) model in specifying optimal treatment policies in recommending electrolyte repletion to ICU patients with several comorbidities in different ethnic groups. When utilized in clinical settings, we show that these policies learn interpretable differences in the datasets for the distinct patient subgroups. Promise Ekpo · Barbara Engelhardt 🔗 - Trading off Utility, Informativeness, and Complexity in Emergent Communication (Poster) Emergent communication research often focuses on optimizing task-specific utility as a driver for communication. However, human languages appear to evolve under pressure to efficiently compress meanings into communication signals by optimizing the Information Bottleneck tradeoff between informativeness and complexity. In this work, we study how trading off these three factors — utility, informativeness, and complexity — shapes emergent communication, including compared to human communication. To this end, we propose Vector-Quantized Variational Information Bottleneck (VQ-VIB), a method for training neural agents to compress inputs into discrete signals embedded in a continuous space. We train agents via VQ-VIB and compare their performance to previously proposed neural architectures in grounded environments and in a Lewis reference game. Across all neural architectures and settings, taking into account communicative informativeness benefits communication convergence rates, and penalizing communicative complexity leads to human-like lexicon sizes while maintaining high utility. Additionally, we find that VQ-VIB outperforms other discrete communication methods. This work demonstrates how fundamental principles that are believed to characterize human language evolution may inform emergent communication in artificial agents. Mycal Tucker · Julie A Shah · Roger Levy · Noga Zaslavsky 🔗 - Mitigating Online Grooming with Federated Learning (Poster) The rise in screen time and the isolation brought by the different containment measures implemented during the COVID-19 pandemic have led to an alarming increase in cases of online grooming. Online grooming is defined as all the strategies used by predators to lure children into sexual exploitation. Previous attempts made on the detection of grooming in the industry and academia rely on accessing and monitoring users’ private conversations through the training of a model centrally or by sending personal conversations to a global server. We introduce a first, privacy-preserving, cross-device, federated learning framework for the early detection of sexual predators, which aims to ensure a safe online environment for children while respecting their privacy. Empirical evaluation on a real-world dataset indicates that the performance of our framework is as good as the performance of a centrally trained model. Khaoula Chehbouni · Gilles Caporossi · Reihaneh Rabbany · Martine De Cock · Golnoosh Farnadi 🔗 - Towards Private and Fair Federated Learning (Poster) Existing bias mitigation algorithms in machine learning (ML) based decision-making systems assume that the sensitive attributes of the user are available to a central entity. This violates the privacy of the users. Achieving fairness in Federated Learning (FL), which intends to protect the raw data of the users, is a challenge as the bias mitigation algorithms inherently require access to sensitive attributes. We work towards resolving the conflict of privacy and fairness by combining FL with Secure Multi-Party Computation and Differential Privacy. In our work, we propose methods to train group-fair models in cross-device FL under complete privacy guarantees. We demonstrate the effectiveness of our solution on two real-world datasets in achieving group fairness. Sikha Pentyala · Nicola Neophytou · Anderson Nascimento · Martine De Cock · Golnoosh Farnadi 🔗 - Generalized PTR: User-Friendly Recipes for Data-Adaptive Algorithms with Differential Privacy (Poster) The Propose-Test-Release'' (PTR) framework is a classic recipe for designing differentially private (DP) algorithms that are data-adaptive, i.e., those that add less noise when the input dataset isnice''.We extend PTR to a more general setting by privately testing data-dependent privacy losses, rather than local sensitivity, hence making it applicable beyond the standard noise-adding mechanisms, e.g. to queries with unbounded or undefined sensitivity. We demonstrate the versatility of generalized PTR using linear regression as a case study. Rachel Redberg · Yuqing Zhu · Yu-Xiang Wang 🔗 - Characteristics of White Helmets Disinformation vs COVID-19 Misinformation (Poster) Disinformation campaigns and misinformation hinder the very foundations of democracy and can negatively influence public opinion. It is critical to understand how it spreads to enable mitigations to be developed. In this analysis, we contrast shared and unique misinformation spread patterns in different settings, comparing the rapid spreading of multiple misinformation narratives regarding the global COVID-19 pandemic to disinformation campaigns against a specific organization (White Helmets). We use two datasets for our analyses. The first is a Twitter dataset that was collected from March 7th to April 19th, 2020, to observe the early response to the COVID -19 pandemic and included 40 narratives, with 250,202 posts from 197,715 users. COVID-19 Misinformation has been spread through many narratives as we observe in this dataset: false cures, origin of the virus, weaponization of the virus, nature of the virus, emergency responses, etc. The second is a dataset collected from April 1st, 2018, to June 6th, 2019, that encompasses 48 unique narratives among two social media platforms – Twitter (167,017 posts from 56,679 users) and YouTube (15,567 posts from 9,176 users) – that target the reputation of the White Helmets (Syrian Civil Defense) organization. The organization has been a target of disinformation campaigns that have been launched against them in order to change public opinion about them. Anika Halappanavar · Maria Glenski 🔗 - Biomedical Word Sense Disambiguation with Contextualized Representation Learning (Poster) Contextualized word embedding has been shown to carry useful semantic information to improve the final results of various Natural Language Processing (NLP) tasks. However, it is still challenging to integrate these embeddings with the information of the knowledge base. This integration is helpful in NLP tasks, specifically in the lexical ambiguity problem. Word Sense Disambiguation (WSD) is one of the main problems in the core of the Natural Language Processing domain. Text representation is a critical component of all WSD models, which encodes the text and information to find the best meaning to disambiguate the text. Contextual embedding representations of words are shown to successfully encode all different meanings of a word. In this work, we propose a new embedding approach that considers not only the information from the context, but also the information from the knowledge base. We present C-KASE (Contextualized Knowledge base Aware Sense Embedding), a novel approach to producing sense embeddings for the lexical meanings within a lexical knowledge base that lies in a comparable space to that of contextualized word vectors. C-KASE representations enable a simple 1-Nearest-Neighbor algorithm to perform state-of-the-art models in the English Word Sense Disambiguation task. Since this embedding is specified for each knowledge base, it also outperforms other similar tasks, i.e., Wikification and Named Entity Recognition. In our experiments, we provide proper settings for the C-KASE representation to be comparable in both supervised and knowledge-based approaches. The results of comparing our method with current state-of-the-art methods show the efficiency of our method. Mozhgan saeidi 🔗 - Model Understanding and Debugging at The Level of Subpopulation (Poster) Understanding machine learning (ML) models has been of paramount importance when making decisions with societal impacts such as transport control, financial activities, and medical diagnosis. While local explanation techniques are popular methods to interpret ML models on a single instance, they do not scale to the understanding of a model's behavior on the whole dataset. In this workshop, I want to present two papers we published recently about model understanding and debugging at the level of subpopulation. The first approach is an interactive visualization widget embedded in the Jupyter notebook environment to guide users explore subpopulations where the local explanations (e.g., LIME, SHAP, etc.) tend to have the same patterns. Based on interactive clustering, users can select and create subpopulations for inspection in the user interface rendered in a notebook cell. Our widget enables flexible input and output. Besides conventional clicking or brushing selection, we allow users to create a subpopulation by calling a python function to select instances in the interactive user interface. Users can also output intermediate analysis results as DataFrame in Python for further inspection.In the second approach, we introduce an error analysis tool that helps people semantically understand errors in NLP models. This tool automatically discovers semantically-grounded subpopulations with high error rates in the context of a human-in-the-loop workflow. It enables model developers to learn more about their model errors through discovered subpopulations, validate the sources of errors through interactive analysis on the discovered subpopulations, and then test hypotheses about model errors by defining custom subpopulations. With the help of these tools, I believe model developers can gain a better understanding of their model behaviors, especially anomalous and error behaviors so that they can develop actionable insights to further improve their models. Jun Yuan 🔗 - Fair Active learning by exploiting causal data structure (Poster) Machine learning (ML) is increasingly being used in high-stakes applications impacting society. Prior evidence suggests that these models may learn to rely on “shortcut” biases or spurious correlations. Therefore, it is importance to ensure that ML models do not propagate biases found in training data. Further, collecting accurately labeled data can be very challenging and costly. In this work, we design algorithms for fair active learning that carefully selects data points to be labeled by exploiting their underlying causal structure so as to balance model accuracy and fairness. We look into a pool-based setup, where the learner has access to a small pool of labeled and a large pool of unlabelled data, both of which have the same biased distribution. We look at two cases of confounding bias where: a) bias is available b) bias is unknown or unavailable. For each class, we try to sample from interventional distribution to eliminate the effect of bias on the acquired data points. Exploiting the causal structure of the underlying data, the approach first involves expressing the interventional distribution as a simple weighted KDE to generate sampling weights. In each iteration, we generate weights for all labeled data samples and then batch sample unlabelled points, from kernels centered on labeled samples with probability w_n, ensuring diversity of the collected samples. We compare our method against the popular active learning baselines based on a) Uncertainty b) Density and c) Diversity. We also compare our method against models that implicity regularise for fairness while acquiring randomly or based on the entropy of the sample. We show that on the synthetically generated biased datasets, our method outperforms the baselines by a huge margin on unbiased test sets - implying that the model learned by acquiring actively based on the causal structure of the data is unbiased. We wish to further extend the results to large datasets and deep learning models. Sindhu C M Gowda · Haoran Zhang · Marzyeh Ghassemi 🔗 - Preference-Aware Constrained Multi-Objective Bayesian Optimization (Poster) Many optimization problems involve performing expensive simulations to evaluate the quality of the input values in terms of multiple objectives and the feasibility of the input values in terms of various constraints. Our goal is to approximate the optimal Pareto set over the small fraction of feasible input values by minimizing the number of simulations. The key scalability challenges include huge design space, the large number of objectives and constraints, and the small fraction of feasible inputs, which can be identified only after performing expensive simulations. Additionally, in various cases, the practitioner prefers specific objectives over others. We propose a novel and efficient preference-aware constrained multi-objective Bayesian optimization approach referred to as PAC-MOO to address these challenges. The key idea is to learn surrogate models for both output objectives and constraints and select the candidate circuit for simulation in each iteration that maximizes the information gained about the optimal constrained Pareto front while factoring in the objective preferences. Our experiments on two real-world analog circuit design optimization problems demonstrate the efficacy of PAC-MOO over prior methods. Alaleh Ahmadianshalchi · Syrine Belakaria · Janardhan Rao Doppa 🔗 - Preliminary Study for Impact of Social Media Networks on Traffic Prediction (Poster) While smart cities have the required infrastructure for traffic prediction, underdeveloped (like in Latin America) cities lack the budget and technology to perform an accurate model. Social networks have been shown to predict online behavior and interactions, but their prediction capabilities are still unknown. The hypothesis of this work is that social networks can aid in predicting vehicular traffic when data is scarce due to a lack of resources. This paper proposes a method with social network analysis to aid in the lack of data due to the minimal amount of traffic sensors. The Twitter API was used to download a network of users that follow traffic update accounts and then, use a model of information diffusion (independent cascade model) to retrieve a variable that holds a metric of how the information regarding current traffic has traveled through the network. Finally, an updated traffic dataset with the new social network variable is used to train and test an LSTM neural network to show if the new variable can be a predictor for traffic. Results show that a deterministic independent cascade model ran on a New York City-based 2-tier social network marginally improved the prediction by 0.4%. This proposal will be replicated in other information diffusion models like Bass, stochastic Independent Cascade, and agent-based. Furthermore, the deep learning methodology will be extended to hold spatio-temporal variables. The main contributions to date of this ongoing work are: (1) a systematic literature review presenting a gap in novel traffic prediction methods for underdeveloped cities, (2) a preliminary study for traffic prediction in cities with ITS that cannot hold a significant amount of sensor data, and (3) a proposition of future research venues where this method can be applied. Valeria Laynes Fiascunari · Luis Rabelo 🔗 - Explanation-Guided Learning for Human-AI collaboration (Poster) Ensuring machines remain beneficial to humans requires that machine learning systems are still able to communicate their inner workings such that another observer can infer its reasoning and intent/s. This process, known as explainability, is crucial in helping shape our relationship with machine learning systems.Despite the advantages of existing approaches to implement explainability in machine learning systems and learn through more natural interactions with humans and other agents, current algorithms generally (1) are not evaluated in teamwork and human decision-making scenarios and (2) often require large numbers of examples on how to solve a task. These are both crucial aspects for humans to operate alongside machine learning systems, especially in interactive settings. To address the above-mentioned limitations, in our work we conducted three studies centered around first, understanding the role of explanations in human-machine teamwork, second, exploring human learning from intelligent systems using machine-generated explanations, and thirdly, incorporating human explanations into machine learning.In presenting our computational models around these aspects, we hope to advance our knowledge and understanding of different facets of explainable agency in machine learning and enable successful human-AI partnership and knowledge transfer. Silvia Tulli 🔗 - Trust Me Not: Trust Scoring for Continuous Model Monitoring (Poster) Continuous monitoring of trained ML models to determine when their predictions should and should not be trusted is essential for their safe deployment.Such a framework ought to be high-performing, explainable, post-hoc and actionable. We propose TRUST-LAPSE, a "mistrust" scoring framework for continuous model monitoring. We assess the trustworthiness of each input sample's model prediction using a sequence of latent-space embeddings. Specifically, (a) our latent-space mistrust score estimates mistrust using distance metrics (Mahalanobis distance) and similarity metrics (cosine similarity) in the latent-space and (b) our sequential mistrust score determines deviations in correlations over the sequence of past input representations in a non-parametric, sliding-window based algorithm for actionable continuous monitoring. We evaluate TRUST-LAPSE via two downstream tasks: (1) distributionally shifted input detection and (2) data drift detection, across diverse domains- audio & vision using public datasets and further benchmark our approach on challenging, real-world electroencephalograms (EEG) datasets for seizure detection. Our latent-space mistrust scores achieve state-of-the-art results with AUROCs of 84.1 (vision), 73.9 (audio), 77.1 (clinical EEGs), outperforming baselines by over 10 points. We expose critical failures in popular baselines that remain insensitive to input semantic content, rendering them unfit for real-world model monitoring. We show that our sequential mistrust scores achieve high drift detection rates: over 90% of data streams show < 20% error for all domains. Through extensive qualitative and quantitative evaluations, we show that our mistrust scores are flexible, robust and explainable for easy adoption into practice and can even quantify models’ generalization capabilities or lack thereof. Nandita Bhaskhar · Daniel Rubin · Christopher Lee-Messer 🔗 - Multispectral Masked Autoencoder for Remote Sensing Representation Learning (Poster) Remote sensing data plays an important role in monitoring global-scale challenges. To achieve automated analysis of it, learning useful features from the vast amount of unlabeled data is the key. Based on the unique characteristics of RS data - multispectrum, large resolution, dense object and complex background, we propose a multispectrum masked autoencoder framework to learn RS representation in a self-supervised way and verify its performance by transfer learning to a sense classification task, which achieves the best top-1 accuracy. Yibing Wei · Zhicheng Yang · Hang Zhou · Mei Han · Pedro Morgado · Jui-Hsin Lai 🔗 - Learning Pedestrian Behaviour for Autonomous Vehicle Interactions (Poster) Autonomous vehicles (AVs), also called self-driving cars" that are appearing on the roads need a better understanding of pedestrians' social behaviour, especially in urban areas. Previous work showed that pedestrians may take advantage over autonomous vehicles by intentionally and constantly stepping in front of AVs, hence preventing them from making progress on the roads. This inability of current AVs to read the intention of other road users, predict their future behaviour and interact with them is known asthe big problem with self-driving cars". A comprehensive review of existing pedestrian models for AVs, ranging from low-level sensing, detection and tracking models to high-level interaction and game theoretic models of pedestrian behaviour, found that the lower-level models are accurate and mature enough to be deployed on AVs but more research is needed in the higher-level models. Hence, in this work, we focus on modelling, learning and operating pedestrian behaviour on self-driving cars. Game theory is a framework that has been widely used to model decision-making between rational agents, especially in economics and in multi-agent systems coordination. We here propose a game theory model, a discrete sequential model for negotiations between an AV and a pedestrian at an unsignalized intersection To validate this model, we ran several experiments with human participants to infer the utility parameters using Gaussian process regression. We also learned from current pedestrian--vehicle interactions using a large-scale dataset from real-world human road crossings at an intersection. Moreover, we recently developed the first mathematical model of proxemics and trust concept for self-driving cars and pedestrians interactions. We now plan to implement this model on OpenPodcar, a low-cost and open source autonomous vehicle research platform that we developed and that will be used for real-world tests. Fanta Camara 🔗 - Comparing neural population responses based on pairwise $p$-Wasserstein distance between topological signatures (Poster) Real-world data are often encoded in high-dimensional representations. Moreover, it is often unclear which coordinates and metrics can be meaningfully justified. Topological properties are well-suited for characterizing the structure of such high-dimensional data point-cloud: they are generalized to high-dimensional surfaces; they are also invariant under different coordinates and robust to the choice of metrics. Our work aims to compare point-clouds based on their topological properties and is motivated by emerging open problems in neuroscience to analyze the high-dimensional neural population response. A crucial gap in related works is that they have not considered how these neural population responses can be appropriately compared, which is key to understanding neural representations.We develop a topology-based approach and apply it to compare neural population responses in the mouse retina to different visual stimuli. We use nonlinear dimensionality reduction to obtain a lower-dimensional neural manifold of retinal ganglion cell population activity. Topological features are then extracted using persistent homology and represented as persistence diagrams. Finally, we compute the pairwise p-Wasserstein distance between these persistence diagrams. Our experiments show that in terms of topological structures, the neural population response to low-frequency gratings is significantly different from other types of flow stimuli, informing further neuroscientific investigations into this selective preference. Moreover, the p-Wasserstein distance induces a metric space of persistence diagrams where standard statistical objects are well-defined, allowing statistical inference on a distribution of persistence diagrams for the respective neural population responses, such as the expected diagram and the variance over the diagrams. The proposed approach can be used to compare neural population responses arising from a variety of artificial and biological neural networks. Liu Zhang · Fei Han · KELIN XIA 🔗 - Adversarial Analysis of Fake News Detectors (Poster) In recent years, machine learning models have been developed to mitigate the problem of fake news. dEFEND[2], a state-of-the-art natural language processing (NLP) model, uses news contents, comments, and the relation between the two to detect fake news. We aim to expose vulnerabilities in the model so that it can be strengthened against attempts to use manipulated data to mislead it.Attacks on fake news detection models are a growing concern and active area of research. One product of this is MALCOM[1], a GAN-based malicious comment generator that reportedly forces fake or real classifications with success rates upwards of 93%. MALCOM generates stylistically similar and topic-relevant comments to the input text, alleviating common problems with attacks on NLP models (e.g., producing nonsensical examples). However, it is possible to detect that these comments are computer-generated. We instead use real comments from the same dataset, so that they are indistinguishable from the rest. This approach aims to match Le et al.’s[1] results in a less complex and computationally expensive way.Using the FakeNewsNet dataset[3], we develop an attack by grouping articles and their preexisting comments into topics, then computing their similarity, or “distance,” from each other. Using this, we identify both generic and topic-specific comments that can sway dEFEND’s classification of an article. For comparison, we implement CopyCat, a baseline attack used by Le et al.[1] that “randomly retrieves a comment from a relevant article in the train set which has the target label.” Preliminary results show that our novel attack techniques outperform our implementation of CopyCat in most cases, as measured by attack success rate, i.e., the percentage of time an attack fools dEFEND into misclassifying an article. An ongoing area of research is creating a defense to mitigate these attacks, e.g., by filtering comments post-training, based on properties identified as being adversarial. Annat Koren · Hunter Ireland · Sandra Luo · Eryn Jagelski-Buchler · Edoardo Serra · Francesca Spezzano 🔗 - Fast Parameter Tuning for Rule-base Planners towards Human-like Driving (Poster) Selection of parameters decides behaviors of a planner in an autonomous driving system. This paper presents a learning-based framework that preserves the reliability and interpretability of rule-based planners while achieving human driving styles via selecting optimal parameters.The framework optimizes parameters of a planner to minimize the difference between human driving plans and autonomous driving plans. The difference is measured by a critic derived from human driving demonstrations via an inverse reinforcement learning inspired method. The automatically tuned planner achieves human-like balance between fast and comfort driving experiences compared to empirical parameters. The parameter tuning time is reduced by 95.25% on a parallel computing architecture compared to that of manual tuning. The merits on the learning-based critic for human-like driving and the extremely high efficiency allow the large-scale deployment of rule-based planners in autonomous driving. Shu Jiang · Szu-Hao Wu 🔗 - Model Averaging to Learn Bayesian Network Structures with Non-Linear Structured Representations (Poster) attached Charupriya Sharma 🔗 - Augmenting Driver Decision-Making Using Meta-Inverse Reinforcement Learning (Poster) Understanding human-driver decision making (such acceleration or deceleration) in complex traffic environments is imperative for improving driver assistance systems and accelerating the development of autonomous vehicles. In many cases a human driver performs different tasks alongside driving simultaneously and the current real-world data might not be capturing all the possible driving decisions that a human driver could potentially undertake in certain traffic scenarios. Many situations are safety-critical and capturing the data in a naturalistic traffic environment or in a controlled experiment would be hard. It would also be hard to independently capture data on such driving scenarios due to the rarity of the situations in the existing datasets and the amount of data collection efforts that will be required to procure more data. Therefore, to be able to capture the rarer and unseen driving tasks and to be able to augment the existing information or data on the different types of driving decisions that a driver undertakes on-road, we propose to use a meta-inverse reinforcement learning-based approach. Mayuree Binjolkar · Yana Sosnovskaya 🔗 - Explaining black-box models in natural language through fuzzy linguistic summaries - Bipolar Disorder case study (Poster) Combining several methods such as neural networks, explainable AI, and fuzzy linguistics using data from patients with bipolar disorder disease, we can obtain really interesting results.Our current work called PLENARY (explaining bLack-box models in Natural Language thRough fuzzY linguistic summaries) could help to comprehend how acoustic parameters could affect on patient's condition.Our predictive model is generated by neural networks and it is based on two levels of labels associated with the data. Explanations of that models are developed using Shapley Additive exPlanations (SHAP). Obtained results are visualization of acoustic parameters' impact on all types of labels. The last step is translating those explanation results into natural language using fuzzy linguistics summarization. Olga Kaminska · Katarzyna Kaczmarek-Majer 🔗 - Physics-Constrained Deep Learning for Climate Downscaling (Poster) The availability of reliable, high-resolution climate and weather data is important to inform long-term decisions on climate adaptation and mitigation and to guide rapid responses to extreme events. Forecasting models are limited by computational costs and therefore often predict quantities at a coarse spatial resolution. Statistical downscaling can provide an efficient method of upsampling low-resolution data. In this field, deep learning has been applied successfully, often using methods from the super-resolution domain in computer vision. Despite often achieving visually compelling results, such models often violate conservation laws when predicting physical variables. In order to conserve important physical quantities, we develop methods that guarantee physical constraints are satisfied by a deep downscaling model while also increasing their performance according to traditional metrics. We introduce two ways of constraining the network: A renormalization layer added to the end of the neural network and a successive approach that scales with increasing upsampling factors. We show the applicability of our methods across different popular architectures and upsampling factors using ERA5 reanalysis data. Paula Harder · Qidong Yang · Venkatesh Ramesh · Prasanna Sattigeri · Alex Hernandez-Garcia · Campbell Watson · Daniela Szwarcman · David Rolnick 🔗 - Graph Convolutional Neural Network-based Quality Assessment of Light Field Images (Poster) Unlike regular images that represent only light intensities, 4D Light Field images (LFI) carry information about the intensity of light in a scene, including the direction light rays are traveling in space. This allows for a richer representation of our world, but requires large amounts of data that need to be processed and compressed before being transmitted to the viewer. Since these techniques may introduce distortions, the design of Light Field Image Quality Assessment (LF-IQA) methods is essential. Most LF-IQA methods based on traditional Convolutional Neural networks (CNN) have limitations, i.e. they cannot increase the receptive field of a neuron-pixel to model non-local image features. To overcome this challenge, in this work, we propose a novel no-reference LF-IQA method which is based on Deep Graph Convolutional Neural Network (GCNN). To implement graphs, one of the biggest challenges is to prepare the input, i.e., keeping only the important nodes while reducing the computational cost. Another challenge is that every image generates a different-sized graph, which can become a problem for training. The third challenge is that, since a 4D LFI is represented by a 2D-plane plenoptic function, multiple 2D representations can be used to generate graphs. It is important to only incorporate the right representation of a LFI that helps converge the network well. In his proposal, we intend to investigate all of the challenges mentioned above in terms of solutions. Our method not only takes into account both LF angular and spatial information, but also learns the order of pixel information. Specifically, the method is composed of one input layer that takes a pair of graphs and their corresponding subjective quality scores as labels, 4 GCNN layers, fully connected layers, and a regression block for quality prediction. Our aim is to develop the quality prediction method with maximum accuracy for distorted LF content. Sana Alamgeer 🔗 - Erased text retrieval from historical palimpsest manuscripts using deep autoregressive priors (Poster) Historical palimpsests are manuscripts that, at some point in time, were erased and overwritten with a newer text. Currently, there is significant interest in discovering and studying these erased texts since they can contain previously unknown written work. The value of discovering these previously unknown works in some of these documents can be compared to finding new archaeological sights, leading to new insights. In our work, we propose to use a Bayesian approach for reconstructing the erased text from the palimpsest manuscript by using multispectral imaging and an autoregressive generative network. We formulate a problem as a Blind Source Separation problem where the erased and foreground inks and parchment were mixed together by some unknown mixing process. An autoregressive network is used as a spatial prior for undertext script, and multispectral imaging allows the presentation of signals in different modalities to decrease ambiguity during the reconstruction. We assume that the erased text script can be identified from the unprocessed palimpsest, such that its counterpart can be found among the other old but “clean” manuscripts for the training of generative network that would serve as a prior. The choice of using the autoregressive network is motivated by the fact that it has a dynamic scope of view, which is more suitable for continuous signals, such as handwriting, compared to other generative models, such as GANs, diffusion models, or score networks. Since the optimization process would happen directly in the pixel space rather than to hidden parameters, the usual deep learning algorithms, such as stochastic gradient descent, would be stuck in local minima before arriving at a meaningful solution. Therefore, for our problem, we apply annealed Langevin dynamics sampling with better convergence properties for non-convex problems. Anna Starynska · David Messinger 🔗 - Mask R-CNN model for banana diseases segmentation (Poster) Early detection of banana diseases is necessary to develop the effective control plans and minimize quality and financial losses. Fusarium Wilt Race 1 and Black Sigatoka diseases are among the most harmful banana diseases globally. In this study, we propose a model based on the Mask R-CNN architecture to effectively segment the damage of these two banana diseases. We also include a CNN model for classifying these diseases. We used an image dataset of 6000 banana leaves and stalks collected in the field. In our experiment, Mask R-CNN achieved a mean Average Precision of 0.04529, while CNN model achieved an accuracy of 33%. The Mask R-CNN model was able to accurately segment areas where the banana leaves and stalk were affected by Black Sigatoka and Fusarium Wilt Race 1 diseases in the image dataset. This model can assist farmers to take required measures for early controlling and minimizing the harmful effects of these diseases and rescue their yields. Neema Mduma · Christian A. Elinisa 🔗 - Security, IP protection, Privacy on Federated Learning and Machine Learning Edge Devices (Poster) Neural networks (NNs) on edge devices have experienced rapid adoption in many security-critical applications, including autonomous cars, facial recognition, surveillance, medical devices, drones, and robotics, making the associated security and privacy issues an urgent and severe concern. For example, in privacy issues, the leakage of the patient’s genomic information in medical devices, users’ location in autonomous cars, and confidential information in smart cities and smart homes may result in substantial economic losses to data owners and endanger their lives in extreme cases. In security issues, if the autonomous cars misclassify a stop sign to the speed sign of 80km/h, it potentially results in a crash. In facial and fingerprint recognition, an unauthorized person can gain authority, and in skin cancer screening, skin lesion images can be misdiagnosed [1,2]. On the other hand, the unprecedented success of NNs is largely supported by the subsequent advances in specialized hardware (HW) and their usage in tackling data-intensive computational workloads [2,3]. Therefore, when considering the general concept of security and privacy in NNs, it is impossible to ignore that the HW itself is a key factor in the equation. Furthermore, the assumption that HW is trustworthy and the security effort needs only encompass networks and software (SW) is no longer valid. Because attacks mounted on HW offer the adversary capabilities that bypass SW constraints. Therefore, in our research, we study all possible vulnerable spots and security threats of NNs systems and develop novel HW solutions for designing trustworthy and secure NNs systems. Mahdieh Grailoo 🔗 - SOIL MINERAL DEFICIENCY DETECTION USING A DEEP LEARNING ALGORITHM COMMONLY KNOWN AS CONVOLUTIONAL NEURAL NETWORKS (Poster) The Agricultural sector is one of the leading economic sectors in Africa providing employment to 60% of the population and about 70% are women (Women, Agriculture and Work in Africa, 2018) making it a primary source of food and income in their families. It is also estimated that agriculture contributes 15% of Africa’s Gross Domestic Product. However this contribution has kept on deteriorating in the last five years.Despite the agricultural growth and consequent improvement in Africa, plant yield from agriculture is still poor due to low nutrient content in the soil. Based on the results from the FAO food trials it shows that there has been a decline in the yield of crops due to low nitrogen and phosphorus content which are part of the fundamental nutrients. The yield deficit rose from 5% to 15% between 1975 and 2005 and this led to a rise of malnourished cases. According to FAO data for 2010, around 73 % of the people lived on less than two dollars per day, almost 28 % did not consume enough calories, and 24 % of the children under five were underweight. Of 925 million hungry people in the world, 239 million lived in sub-Saharan Africa. Interestingly as per 80% of arable land in Africa has low soil fertility and significant amounts of nutrients are lost every year due to unsustainable soil management practices.Since no efforts have been made to avert the nutrient crisis in Africa and farmers are still finding difficulty in identifying the nutrient content of their land, I propose SoMitLab a simple and novel solution that will enable farmers detect the mineral content majorly nitrogen, phosphorus and potassium since they are the fundamental nutrients needed by almost all plants to produce the best yields in their gardens. The results shall be in real time using smartphones and a sampling device attached.It shall be done in two phases i.e the soil sample extraction, preparation and image capture while the second phase is mainly image analysis and reporting. Jean Amukwatse 🔗 - P53 in Ovarian Cancer: Heterogenous Analysis of KeyBERT, BERTopic, PyCaret and LDAs methods (Poster) In recent times, researchers with Computational backgrounds have found it easier to relate to Artificial Intelligence by advancing the transformer model and unstructured medical data. This paper explores the heterogeneity of keyBERT, BERTopic, PyCaret and LDAs as key phrase generators and topic model extractors with P53 in ovarian cancer as a use case. PubMed abstract on mutant p53 was first extracted with the Entrez-global database and then preprocessed with regex. KeyBERT was used to extract keyphrases, and BERTopic modelling was used for removing the related themes. PyCaret was further used for unigram topics and LDAs for examining the interaction among the topics in the word corpus. Lastly, the Jaccard similarity index was used to check the similarity among the four methods. The results showed no relationship exists with keyBERT, having a score of 0.0, while a relationship exists among the three other topic models with scores of 0.095, 0.235, 0.4 and 0.111. Based on the result, it was observed that keywords, keyphrases, similar topics, and entities embedded in the data could be used in a closely related framework, which can give insights into medical data for modelling. Mary Adewunmi · Richard Oveh · Christopher Yeboah · Solomon Olorundare · Ezeobi Peace 🔗 - Graph-Transformer for Cross-lingual Plagiarism Detection (Poster) The vast amounts of multilingual textual data on the internet lead to the cross-lingual plagiarism phenomenon that becomes a severe problem in different areas such as education, literature, and science. Cross-lingual plagiarism refers to plagiarism by translation. It is plagiarism where the source text is in one language while the plagiarized text is in another. Because of the increasing menace to the academic world from this kind of plagiarism, it has become crucial to find techniques to detect cross-lingual plagiarism. Current approaches come up with different methods to estimate the similarities; they usually employ syntactic and lexical properties, external Machine Translation systems, or similarities with a multilingual set of documents. However, most of these methods are conceived for literal plagiarism, such as copy and paste, and their performance is diminished when handling complex cases of plagiarism, including paraphrasing.In this work, we propose a graph-based approach that represents text fragments in different languages using multilingual knowledge graphs. An effective way to represent knowledge graphs is by using Graph Neural Networks, and they usually compute the representation of each node based on its neighboring nodes. However, this local propagation interrupts efficient global communication, which becomes problematic at larger graph sizes. Regarding this limitation, we put forward a new graph representation method based on the Transformer architecture that uses explicit relation encoding and provides a more efficient way for global graph representation. Experimental results in Arabic-English, French–English, and Spanish–English plagiarism detection indicate that our graph transformer approach outperforms the state-of-the-art cross-lingual plagiarism detection approaches. Moreover, it proves effective in paraphrasing plagiarism cases and provides exciting insights on the use of knowledge graphs on a language-independent model. Oumaima Hourrane 🔗 - Deep Multi-Modal Structural Equations For Causal Effect Estimation With Unstructured Proxies (Poster) Estimating the effect of an intervention while accounting for confounding variables is a key task in causal inference. Oftentimes, the confounders are unobserved, but we have access to large amounts of unstructured data (images, text) that contain valuable proxy signal about the missing confounders. This paper demonstrates that leveraging unstructured data that is often left unused by existing algorithms improves the accuracy of causal effect estimation. Specifically, we introduce deep multi-modal structural equations, a generative model in which confounders are latent variables and unstructured data are proxy variables. This model supports multiple multi-modal proxies (images, text) as well as missing data. We empirically demonstrate on tasks in genomics and healthcare that our approach corrects for confounding using unstructured inputs, potentially enabling the use of large amounts of data that were previously not used in causal inference. Shachi Deshpande · Kaiwen Wang · Dhruv Sreenivas · Zheng Li · Volodymyr Kuleshov 🔗 - A recommendation system for technology intelligence based on multiplex networks (Poster) In order to identify technological opportunities and threats that may affect the future development and survival of companies, the scientific experts of these companies must scan and monitor developments in the external environment using a structured process called "technology intelligence." Technology intelligence is the collection and delivery of technological information as part of a structured process through which an organization/company develops awareness of technological threats and opportunities for its experts. This includes the experts' constant search for key information from the Internet, provided by various sources, to stay abreast of current developments in their area of interest and remain competitive. This information, which comes from data sources on the Internet, is stored as structured "documents." Therefore, one problem that Technology Intelligence faces is the real-time recommendation of relevant documents to experts. In Technology Intelligence, all of this information extracted from the Internet is associated with various processing steps that must be permanently continued or repeated for each new subject of investigation in order to obtain valid results. Thus, the need for an efficient recommendation system on the crawled data becomes clear. Foutse Yuehgoh 🔗 - Pre-processing of Social Media Feeds based on Integrated Local Knowledge Base (Poster) Most of the previous studies on the semantic analysis of social media feeds have not considered the issue of ambiguity that is associated with slangs, abbreviations, and acronyms that are embedded in social media posts. These noisy terms have implicit meanings and form part of the rich semantic context that must be analysed to gain complete insights from social media feeds. This paper proposes an improved framework for pre-processing of social media feeds for better performance. To do this, the use of an integrated knowledge base (ikb) which comprises a local knowledge source (naijalingo), urban dictionary and internet slang was combined with the adapted Lesk algorithm to facilitate semantic analysis of social media feeds to resolve the ambiguity in the usage of slangs/acronyms/abbreviations. Experimental results showed that the proposed approach performed better than existing methods when it was tested on three machine learning models, which are support vector machines, multilayer perceptron, and convolutional neural networks. The framework had an accuracy of 94.07% on a standardized dataset, and 99.78% on localised dataset when used to extract sentiments from tweets. The improved performance on the localised dataset reveals the advantage of integrating the use of local knowledge sources into the process of resolving social media feeds particularly in handling slangs/acronyms/abbreviations that have contextually rooted meanings. Taiwo Kolajo · Olawande Daramola · Ayodele Adebiyi 🔗 - Attention-Augmented ST-GCN for Efficient Skeleton-based Human Action Recognition (Poster) Graph convolutional networks (GCNs) achieved promising performance in skeleton-based human action recognition by modeling a sequence of skeletons as a spatio-temporal graph. Each human body skeleton is modeled as a graph which encodes the natural physical structure of human body joints and their spatial connections, while the temporal dynamics of each action are represented by a sequence of temporally connected skeletons.Most of the recently proposed GCN-based deep neural networksprocess all the body skeletons in a sequence depicting the performed action.This is not efficient in terms of memory consumption and computation time. Considering that all the body skeletons in a temporal sequence are not equally important for recognizing the performed action, processing only a subset of the most informative body skeletons is a large step towards increasing the computational efficiency of both training andinference processes.Our goal is to increase computational efficiency while performing on par, or even better, compared to the state-of-the-art models utilizing all the body skeletons in a sequence for action recognition. In this regard, wepropose an attention-augmented ST-GCN method, called TA-GCN, for skeleton-based human action recognition. Our proposed method is capable of measuring the importance of each skeleton in a sequence using a trainable temporal attention module (TAM) placed early in the network architecture, and therefore increasing the computational efficiency in both training and testing phases by automatically selecting a subset of most informative skeletons to be processed for feature extraction and classification. Negar Heidari · Alexandros Iosifidis 🔗 - Leveraging artificial intelligence for automatic depression detection using speech recognition. (Poster) Depression is a common mental disorder that affects more than 264 million people worldwide. Between 76% and 85% of people in low and middle-income countries receive no treatment for their disorder(P. S. Wang et al.,2017). There are many barriers to effective treatment such as social stigma, lack of resources, and shortage of trained professionals employed in mental health facilities to mention but a few. This study aims to investigate how machine learning algorithms can be used to create self-help applications that detect depression from vocal acoustic features and suggest self-help remedies to bridge the treatment gap. Hewitt Tusiime · Alvin Nahabwe · Julius Kimuli · Grace Babirye 🔗 - Shapelet Guided Counterfactual Explanation Generation for Black-Box Time Series Classifiers (Poster) We present a model agnostic method for generating realistic explanations for time series. Tina Han · Jette Henderson 🔗 - Weakly Supervised Medical Image Segmentation with Soft Labels and Noise Robust Loss (Poster) Acquiring large expert labeled annotation for medical images is expensive and many of the available datasets contain noisy labels. Training deep learning models with incorrect and noisy labels may introduce bias to the system, which could lead to false diagnoses in medical applications. Therefore there is a need for models that are robust against the label noise. In this study, we proposed to use soft labels in addition to adapted noise robust loss to learn from weak labels. Our experiments shows the proposed method is effective for highly noisy segmentation labels. Banafshe Felfeliyan · Abhilash Rakkunedeth · Jacob Jaremko · Janet Ronsky 🔗 - human trafficking detection using lockstep behaviour methods (Poster) Our research is trying to find an unsupervised automated method to help uncover human trafficking networks among Tweets related to Onlyfans. The goal of our method is to detect coordinated behaviour (lockstep) among authors by looking into what ties them together; in our case we investigate certain features extracted from the tweets as well as the structure of the tweet itself. Maricarmen Arenas · reihaneh rabbany · Golnoosh Farnadi 🔗 - Improving Induced Valence Recognition by Integrating Acoustic Sound Semantics in Movies (Poster) Every sound event that we receive and produce everyday carry certain emotional cues. Recently, developing computational methods to recognize induced emotion in movies using content-based modeling is gaining more attention. Most of the existing works treat this as a task of multimodal audio-visual modeling; while these approaches are promising, this type of holistic modeling underestimates the impact of various semantically meaningful events designed in movies. In specifics, acoustic sound semantics such as human sounds in movies can significantly direct the viewer’s attention to emotional content in movies. This work explores the use of cross-modal attention mechanism in modeling how the verbal and non-verbal human sound semantics affect induced valence jointly with conventional audio-visual content-based modeling. Our proposed method integrates both self and cross-modal attention into a feature-based transformer (Fea-TF CSMA) architecture, where it obtains a 49.74% of accuracy with frame-wise prediction on seven-class valence classification on the COGNIMUSE movie. Further analysis reveals insights about the effect of human verbal and non-verbal acoustic sound semantics on induced valence. Shreya Upadhyay · Bo-Hao Su · Chi-Chun Lee 🔗 - Efficient Hospital Management via Length of Stay prediction using Domain Adaptation (Poster) Inpatient length of stay (LoS) is an important managerial metric which if known in advance can be used to efficiently plan admissions, allocate resources and improve patient care. Using historical patient data and machine learning techniques, LoS prediction models can be developed. Ethically, these models can not be used for patient discharge in lieu of unit heads but are of utmost necessity for hospital management systems for effective hospital planning. Thus, the design of the prediction system should be adapted to a true hospital setting.In this study, we predict early hospital LoS at the granular level of admission units by applying transfer learning to leverage information learned from a potential source domain. Time-varying data from 110,079 and 60,492 patient admissions to 8 and 9 ICU units were respectively extracted from eICU and MIMIC-IV databases. These were fed into a Long-Short Term Memory and a Fully connected network to train a source domain model, which weights were transferred either partially or fully to initiate training in the target domains. Shapley Additive exPlanations (SHAP) algorithms were used to study the effect of weight transfer on model explanability. Compared to the two benchmark models, the proposed weight transfer model showed statistically significant gains in prediction accuracy (between 1% and 5%) as well as computation time (up to 2hrs) for some of the target domains.The proposed method thus provides an optimal clinical decision support system for hospital management that can ease processes of data access via ethical committee, computation infrastructures and time. Lyse Naomi Wamba · Nyalleng Moorosi · Elaine Nsoesie · Frank Rademakers · Bart DeMoor 🔗 - Reinforcement Learning for Cost to Serve (Poster) In the retail industry, electronic commerce (e-commerce) has grown quickly in the last decade and has further accelerated as a result of movement restrictions during the pandemic. While working with logistics and retail industry business collaborators, we found that the cost of delivery of products from the most opportune node in the supply chain (a quantity called the cost-to-serve or CTS) is a key challenge. We find that a reinforcement learning (RL) formulation is able to exceed the performance of the state of the art rule based policies, while being significantly faster than traditional optimisation approaches such as mixed-integer linear programming. We hypothesize that scaling up the RL based methodology will have a significant impact on the operating margins of retailers in the `new normal'. Pranavi Pathakota · Kunwar Zaid · Hardik Meisheri · Harshad Khadilkar 🔗 - Dual Channel Training of Large Action Spaces in Reinforcement Learning (Poster) The ability to learn robust policies while generalizing over large action spaces is an open challenge for intelligent systems, especially in noisy environments that face the curse of dimensionality. In this paper, we present a novel framework to efficiently learn action embeddings that simultaneously allow us to reconstruct the original action as well as to predict the expected future state. We use an encoder-decoder architecture for action embeddings with a dual channel loss that balances between action reconstruction and state prediction accuracy. The trained decoder is then utilized in conjunction with a standard reinforcement learning algorithm that produces actions in the embedding space. Our architecture is able to solve a 2D maze environment with up to 2^12 discrete noisy actions. Empirical results show that the model results in cleaner action embeddings, and the improved representations help learn better policies with earlier convergence. Pranavi Pathakota · Hardik Meisheri · Harshad Khadilkar 🔗 - Robustness in Weighted Networks (Poster) In the last two decades, Network science has become a strategic field of research thanks to both the increased availability of large datasets, and the strong development of high-performance computing technologies and methodologies. Despite the great amount of work in the field of Communities detection in networks, one important question that still to be addressed is the statistical validation of the results for weighted networks (graphs). This work presents a Machine Learning approach to test whether community structures detected by algorithms are statistically significant or a result of chance in weighted graphs. This is achieved by investigating the stability of a clustering against random perturbations of the structure of the graph. We identify a null model, defining a perturbation strategy and then we define a testing procedure based on using functional data analysis tools. Our overall approach has been tested on both simulated and a real networks.We also explore the robustness of compressed networks. Compressing a large network into a super node representation has been shown to speed up community detection algorithms. In this work we apply our robustness testing procedure for weighted graphs on compressed networks. We show that a super node network representation preserves the robustness property of a network. Luisa Cutillo · Valeria Policastro · Annamaria Carissimo 🔗 - Mapping Slums with Machine Learning and Medium-Resolution Satellite Imagery (Poster) Keywords: Slums, Deprived Areas, Machine learning, Weak Labels, Gridded Population Data, Earth Observation, Remote Sensing, Sentinel-2, Sustainable Development Goals.This abstract presents:- The motivation and challenges of mapping slums and estimating the population living in these communities.- Our approach and results 1) using weak labels and medium-resolution imagery to map slums; 2) employing Gridded Population Data to estimate the population living in slums. Agatha Mattos · Michela Bertolotto · Gavin McArdle 🔗 - Investigating the Effects of Environmental Factors on the Detection of Laryngeal Cancer from Speech Signals Using Machine Learning. (Poster) Approximately 2000 people in the UK are diagnosed with laryngeal cancer each year. One of the initial symptoms patients often present with is a change in voice. We propose that an AI system may be able to detect laryngeal cancer patients from non-cancer patients using speech signals. Such a system would be able to classify and prioritise high-risk patients to ensure more appropriate allocation of resources. In order to best implement this type of system into a healthcare setting it would need to be robust to the environmental factors that may affect the speech recordings such as background noise. In early work we have shown that the addition of background noise reduces the precision of classifiers and as such would not relieve the burden on healthcare systems and may, in fact, increase them. In future work we plan to create an AI system that will be robust to environmental factors (such as background noise) such that the system will be usable by patients non specialist recording environments. Mary Paterson · Luisa Cutillo · James Moor 🔗 - 3D-LatentMapper: View Agnostic Single-View Reconstruction of 3D Shapes (Poster) Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to represent and generate 3D shapes, as well as a vast number of use cases. However, single-view reconstruction remains a challenging topic, which can unlock various interesting use cases such as interactive design. In this work, we propose a novel framework that leverages the intermediate latent spaces of Vision Transformer (ViT) and a joint image-text representational model, CLIP, for fast and efficient Single View Reconstruction (SVR). More specifically, we propose a novel mapping network architecture that learns a mapping between deep features extracted from ViT and CLIP, and the latent space of a base 3D generative model. Unlike previous work, our method enables view-agnostic reconstruction of 3D shapes, even in the presence of large occlusions. We use the ShapeNetV2 dataset and perform extensive experiments with comparisons to SOTA methods to demonstrate our method's effectiveness. Alara Dirik · Pinar Yanardag 🔗 - Toward Qualitative Mechanical Problem-Solving using Hybrid AI (Poster) Qualitative mechanical problem-solving (QMPS) is central to human-level intelligence. Humans use their capacity for such problem-solving for tasks as routine as hanging a picture frame on a wall, as well as for more sophisticated tasks in demanding jobs that pay well in today’s economy (e.g. in emergency medicine, plumbing, and the use of hydraulic machinery). Unfortunately, AIs and robots of today lack the capacity in question. This work takes a step towards addressing this deficiency using hybrid AI techniques that include advanced automated reasoning and natural language processing (NLP). Shreya Banerjee · Selmer Bringsjord · Naveen Govindarajulu 🔗 - Fair Targeted Immunization with Dynamic Influence Maximization (Poster) The argument for targeted immunization has been prevailing since the Covid-19 pandemic. However, sophisticated techniques to identify “superspreaders” for targeted vaccination may lead to inequalities in vaccine distribution and immunity from Covid-19 between social communities. This is particularly poignant in social networks which demonstrate homophily: our tendancy to interact more with those whom we share similar demographics. If our contact networks similarly show that we move in close communities, can we ensure that targeted immunization does not benefit one community over another? Here, we answer this question by applying group fairness constraints, ensuring immunity is balanced among different sub-populations, to an Influence Maximization (IM) task. IM is a technique which identifies the most influential members of a social network, those who are responsible for the greatest spread of e.g. disease or information. Previous works have demonstrated the equivalence of outbreak minimization and IM to detect superspreaders, and shown that networks with homophilic social networks lead to more unbalanced spread of information. Whilst the fair IM problem has been approached from a time-critical perspective, no attempt has yet been made to achieve group fairness on dynamic social networks. Here, we propose a novel method for applying fairness constraints to IM on dynamic and homophilic social networks to detect superspreaders. Nicola Neophytou · Golnoosh Farnadi 🔗 - You Only Live Once: Single-Life Reinforcement Learning (Poster) Reinforcement learning algorithms are typically designed to learn a performant policy that can repeatedly and autonomously complete a task, usually starting from scratch. However, in many real-world situations, the goal might not be to learn a policy that can do the task repeatedly, but simply to perform a new task successfully once in a single trial. For example, imagine a disaster relief robot tasked with retrieving an item from a fallen building, where it cannot get direct supervision from humans. It must retrieve this object within one test-time trial, and must do so while tackling unknown obstacles, though it may leverage knowledge it has of the building before the disaster. We formalize this problem setting, which we call single-life reinforcement learning (SLRL), where an agent must complete a task within a single episode without interventions, utilizing its prior experience while contending with some form of novelty. SLRL provides a natural setting to study the challenge of autonomously adapting to unfamiliar situations, and we find that algorithms designed for standard episodic reinforcement learning often struggle to recover from out-of-distribution states in this setting. Motivated by this observation, we propose an algorithm, Q-weighted adversarial learning (QWALE), which employs a distribution matching strategy that leverages the agent's prior experience as guidance in novel situations. Our experiments on several single-life continuous control problems indicate that methods based on our distribution matching formulation are 20-60% more successful because they can more quickly recover from novel states." Annie Chen · Archit Sharma · Sergey Levine · Chelsea Finn 🔗 - Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time (Poster) Distribution shifts occur when the test distribution differs from the training distribution, and can considerably degrade performance of machine learning models deployed in the real world. While recent works have studied robustness to distribution shifts, distribution shifts arising from the passage of time have the additional structure of timestamp metadata. Real-world examples of such shifts are underexplored, and it is unclear whether existing models can leverage trends in past distribution shifts to reliably extrapolate into the future. To address this gap, we curate Wild-Time, a benchmark of 7 datasets that reflect temporal distribution shifts arising in a variety of real-world applications, including drug discovery, patient prognosis, and news classification. On these datasets, we systematically benchmark 13 approaches with various inductive biases. We evaluate methods in domain-generalization, continual learning, self-supervised learning, and ensemble learning, which leverage timestamps to extract the common structure of the distribution shifts. We extend several domain-generalization methods to the temporal distribution shift setting by treating windows of time as different domains. Finally, we propose two evaluation strategies to evaluate model performance under temporal distribution shifts---evaluation with a fixed time split (Eval-Fix) and evaluation with a data stream (Eval-Stream). Eval-Fix, our primary evaluation strategy, aims to provide a simple evaluation protocol for the broader machine learning community, while Eval-Stream serves as a complementary benchmark for continual learning approaches. Our experiments demonstrate that existing methods are limited in tackling temporal distribution shift: across all settings, we observe an average performance drop of 20% from in-distribution to out-of-distribution data. Caroline Choi · Huaxiu Yao · Yoonho Lee · Pang Wei Koh · Chelsea Finn 🔗 - Probabilistic Querying of Continuous-Time Sequential Events (Poster) In this work, we focus on the problem of answering probabilistic queries in the context of continuous-time event data, focusing in particular on neural autoregressive models. Probabilistic queries have potentially broad applications and can include queries such as “the probability of an event of type A occurring before type B” or “the probability of at least one event of type A occurring before time T”. Given that computation of such query probabilities is usually intractable and lacks analytical forms in general, we propose a general theoretical framework and a novel marginalization scheme that enables us to leverage importance sampling to answer these queries in a computationally efficient manner for any black-box autoregressive model. We evaluate our approach by presenting results with multiple real-world continuous-time event datasets, and demonstrate that our approach can be significantly more computationally efficient than the naive estimation. Please refer to the PDF for the one-page extended abstract for additional details. Alex Boyd · Yuxin Chang · Stephan Mandt · Padhraic Smyth 🔗 - Resume Parsing using an ensemble of CNN, Bi-LSTM and CRF in a Hard Voting Predictive Approach (Poster) In this study, we propose a neural network ensemble approach using Bi-LSTM, CNN and CRF combined with hard voting to parse resume and select the candidate that is best fit for the job. For the data sets, we collected about 7000 resumes in .pdf and .docx from kaggle and Teledom Nigeria. Due to the unstructured format of resumes, some preprocessing steps would be taken before they can be used. The first step is the extraction of the plain text whereby the resumes which are in .pdf, .docx, and .doc formats are converted into text format using Apache Tika Server. The second step is to remove the Stop Words whereby the many unwanted lines, punctuations, bullets, etc, are removed by using string replacement method and regular expressions. After this is done, then the Segmentation is carried out. The segmentation model is created using a Convolutional Neural Network (CNN) architecture, this will return the sentences with the correct label. In this step, the model divides a whole resume into different segments such as personal, educational occupational, Skills, Publications, Certifications etc. The next task would be to extract useful information from each segment. The base model for this stage is an ensemble of CRF, Bi-LSTM and CNN and hard voting approach will be employed. The idea is to use a single unified predictive model in place of separate classification models by considering the predicted class with maximum votes against each class label. This study is ongoing. Scholastica N. Mallo · Francisca Nonyelum Ogwueleka · Philip Odion · Martins E. Irhebhude 🔗 - Transformers for Synthesized Speech Detection (Poster) As voice synthesis systems and deep learning tools continue to improve, so does the possibility that synthesized speech can be used for nefarious purposes. We need methods that can analyze an audio signal and determine if it is synthesized. In this paper, we investigate three transformers for synthesized speech detection: Compact Convolutional Transformer (CCT), Patchout faSt Spectrogram Transformer (PaSST), and Self-Supervised Audio Spectrogram Transformer (SSAST). We show that each transformer independently detects synthesized speech well. Finally, we explore pretraining a transformer on a large-scale audio classification dataset and finetuning it for synthesized speech detection. We demonstrate that pretraining on a large dataset of audio signals that includes both speech and non-speech signals (such as music and animal noises) can improve synthesized speech detection. Evaluated on the ASVspoof2019 dataset, our approach successfully detects synthesized speech and achieves 92% or higher for all metrics considered. Emily Bartusiak 🔗 - Polynomials in Bayesian Problems (Poster) Kindly refer to PDF file attached. Thank you. Lilian Wong · Evans Harrell 🔗 - Estimating Uncertainty in Safety-Critical Deep Learning Models (Poster) This paper creates a regression model using the”Deep Ensemble” technique for predicting the Remaining UsefulLife (RUL) of an aircraft engine, which is a safety-criticalapplication. The run-to-failure turbo engine degradation datasethas been used, which is widely considered as a benchmarkdataset for aero engine predictive maintenance work.This paper identifies a gap in the previous works and provides asolution to that. The previous works only focused on developingpredictive models using classical Neural Network architecturesfor estimating the RUL of the turbo engines, however those worksdid not estimate the uncertainty from the predictive models.Since it is critical to know how certain the model is about itsown prediction, hence this project addresses that shortfall. Asany classical Neural Network architectures do not provide anymechanism to obtain the uncertainty from the model, hence aprobabilistic approach has been taken to modify the existingNeural Network architecture to obtain uncertainty estimates.The probabilistic method ”Deep Ensemble” is used and NegativeLog Likelihood (NLL) is applied as a training criterion forthe model. This model is run on simulated data-set from fourdifferent fleets of aero engine and the model’s prediction error isevaluated using Root Mean Square Error (RMSE) and Coefficientof Determination (R2). Various experiments showed that the model is onlyconfident when the prediction error is low, on the other handwhen the error rate is high, the uncertainty estimate is also high,which means that the model is aware of its own uncertainty. Theerror rate for two out of the four sets of fleet data have higherprediction error likewise with the uncertainty values, hence itprovides an useful insight on the confidence of the model, whichis critical for decision making in these types of safety-criticalapplications. Oishi Deb 🔗 - Adaptive Temporal Pattern Matching (Poster) We propose a multi-horizon forecasting approach that accurately models the underlying patterns on different time scales.Our approach is based on the transformer architecture, which across a wide range of domains, has demonstrated significant improvements over other architectures.Several approaches focus on integrating a temporal context into the query-key similarity of the attention mechanism of transformers to further improve their forecasting quality. In this paper, we provide several extensions to this line of work. We propose an adaptive temporal-aware attention that dynamically learns the ideal temporal context length for each forecasting time point. This allows the model to seamlessly switch between different time scales as needed, hence providing users with a better forecasting model. Our experiments on several real-world datasets demonstrate significant performance improvements over existing state-of-the-art methodologies. sepideh koohfar 🔗 - Respiratory Conditions (EIPH, PLH, and Mucus) in Racehorses (Poster) The horse racing industry is a multi-billion dollar industry and thousands of people's livelihood depend on the racing industry. Respiratory diseases and conditions in racehorses are common and noteworthy.The strenuous activity performed by racehorses can exasperate respiratory diseases. Respiratory conditions such as Exercise Induced Pulmonary Haemorrhage (EIPH), Pharyngeal Lymphoid Hyperplasia (PLH) and mucus accumulation can all affect race performance and lead to poor health outcomes.From our previous univariate analysis, we found that EIPH levels differed significantly among the different race tracks within this study. Since, this produced a significant and worthwhile result, we would like to better understand the reasoning behind this difference. We were able to gain significant insight through more traditional statistical methods such as PCA, ordinal logistic regression, MANOVA and bootstrapping for non-parametric summary statistics.For EIPH, PLH, and the mucus score disease severity is expressed through a grading system. For this particular study, each disease state was graded independently by 3 veterinarians. This categorical grading system leads naturally to the use of a logistic regression model. Ordinal Logistic regression can be used to help predict the probability of falling in a certain disease grade given a set of predictor variables. Ordinal logistic regression takes into account the order nature of the response variable. For all three disease states; EIPH, PLH, and mucus accumulation, this is the ideal regression model. For this analysis, we used PCs to further describe our data and to reduce our data-set and find a worthwhile relationship among the factors. This analysis was done to hopefully find a combination of factors that would explain the different levels of the disease state of EIPH. A excellent benefit of using PCA is that it does not require the data to be normally distributed. Allison Fisher · Warwick Bayly · Sierra Shoemaker · Julia Bagshaw · Yuan Wang · Macarena Sanz 🔗 - Shared Hardware, Shared Baselines: An Offline Robotics Benchmark (Poster) Ask 10 robotics researchers what the state-of-the-art learning algorithm is for manipulation, and you'll get 10 different algorithms. Why do we have these fundamental disagreements on which methods work the best? Robotics as a field struggles to compare results between labs due to the wide variety of experimental conditions. In addition, methods are sensitive to specific implementations and hyperparameters, which make it difficult for a researcher to implement competitive baselines in their own setting. Finally, the difficulties of purchasing, building, and installing hardware and software infrastructure make it challenging if not impossible for newcomers to contribute to the field.It is clear that for robotics research to advance we need a way to democratize, benchmark, and pool engineering resources. Our solution is the Offline Robotics Benchmark, which includes not only a large-scale manipulation dataset but also the hardware on which benchmark users can test their own methods now and going forward. Initial benchmark users have contributed open-source implementations, which can be used as baselines in future work without needing to rerun any of these approaches. Gaoyue Zhou · Victoria Dean 🔗 - Decomposed Linear Dynamical Systems (dLDS) for learning the latent components of neural dynamics (Poster) Learning interpretable representations of neural dynamics at a population level is a crucial step to understanding how neural activity relates to perception and behavior. Models of neural dynamics often focus on either low-dimensional projections of neural activity or on dynamical systems models. While both approaches seek to represent low-dimensional geometric structures, we currently lack methods that integrate the manifold hypothesis directly into a dynamical systems model, thus maintaining both model capacity and interpretability. Here, we discuss how these two approaches are interrelated by considering dynamical systems as representative of flows on a low-dimensional manifold. We propose a new decomposed dynamical system model (dLDS), that can describe complex non-stationary and nonlinear dynamics of time-series data as a sparse combination of simpler, more interpretable components, chosen from a dictionary of linear dynamical systems (LDSs). The decomposed nature of the dynamics in our model generalizes over previous approaches and enables modeling of overlapping and non-stationary drifts in the dynamics, as well as dynamics with different speeds or orientations. Our model-learning provides an avenue by which we can estimate dynamical systems that are locally linear at each point, but whose parameters change over time, and thus able to approximate nonlinear dynamics by treating the nonlinearity as a temporal non-stationarity. First, we demonstrate our model in a synthetic experiment where we recover efficient representations of an LDS with time-varying speeds and rotations, and contrast our results with existing similar models. Next, we apply our model to the Fitzhugh Nagumo and Lorenz attractors, showing that it manages to identify meaningful dynamical components that indicate different sides of the Lorenz spirals. When applying it to C. elegans neural recordings, we were able to illustrate a diversity of dynamics that was obscured in previous similar models. Noga Mudrik · Yenho Chen · Eva Yezerets · Christopher Rozell · Adam Charles 🔗 - Probabilistic Interactive Segmentation for Medical Images (Poster) Deep learning models have been very successful at performing medical imaging tasks such as segmentation or registration. However, training these models requires substantial amounts of labeled data, most often annotated manually. Segmenting new medical images to create labeled training data is a tedious and time-consuming process for human annotators, particularly for 3D modalities involving sets of images. Existing frameworks for interactive segmentation have focused on minimizing initial user interaction and training domain-specific models with limited generalizability. Most interactive segmentation systems have two stages: first, the user provides initial input to seed a rough predicted segmentation, and then they provide additional feedback to refine the segmentation over multiple iterations. When segmenting objects or modalities not seen in the training data, these systems may require the user to make many corrections to clarify their target if the initial predicted segmentation focuses on the wrong object or they wish to segment multiple noncontiguous objects. We propose a probabilistic interactive segmentation system to help human annotators quickly and accurately segment new medical images. At each iteration this system takes in an input image and the partial segmentation completed so far, and probabilistically predicts a next step for the segmentation i.e., a larger partial segmentation. We focus on predicting several possible segmentations, to enable the user to quickly choose the correct next step in ambiguous situations. Hallee Wong · John Guttag · Adrian Dalca 🔗 - Evaluation of Active Learning and Domain Adaptation on Health Data (Poster) Machine learning (ML) uses data to make decisions and predictions. Labeled data is necessary for ML to understand how previous decisions and predictions have been made, particularly in healthcare settings. Unfortunately, such data is prohibitively expensive and requires subject-specific expertise. Active learning poses the possibility of achieving accurate ML models with a lower requirement of labeled data. Dataset shifts also pose a challenge to the performance of ML systems for healthcare. Domain adaptation aims to mitigate the effects of dataset shifts. This work applies existing active learning and domain adaptation techniques in the context of healthcare data to evaluate the specific accuracy of general solutions. eICU is a labeled dataset from intensive care units across the United States, and MIMIC-III and MIMIC-IV are both labeled datasets from hospital admissions to Beth Israel Deaconess Medical Center in Boston, MA.All three of these datasets have shifts that we investigate. Overall, this research reports on a series of tests with existing active learning and domain adaptation techniques to evaluate appropriate future uses of these methods in the field of ML for healthcare. Kristina Holsapple · Haoran Zhang · Marzyeh Ghassemi 🔗 - Towards interpretable health monitoring and service anomaly detection in the cloud (Poster) We propose MetricSound, a reliable and interpretable ML-based monitoring framework for data center incident detection. It is capable of distilling the representation of service health status and is scalable to high volumes of real-time system metrics with linear complexity on counter dimensions. Our key insight is to bridge the gap between non-parametric statistical methods and ML model-based methods. Specifically, the proposed approach uses unsupervised outlier detection algorithms to extract useful representations of time series and project them on an ECOD anomaly space. Our approach then stacks anomalies at different granularity and builds a supervised classifier with focal loss for unbalanced labels. It then uses Bayesian optimization and Recursive feature elimination based on Shapley values for more robust service failure detection. Furthermore, since the method establishes a one-to-one mapping from raw metric to anomaly score, once the model predicts failures, the Shapley value are used to interpret the outcome and correlation between metrics and pinpoint the low-level counter/resources that contribute to the incident.MetricSound has been tested on incidents from a commercial cloud provider’s data and achieves more than 92% precision, 89% accuracy, and 90% F1 score across 1-month metric data. Since it provides an interpretation of health status for each diagnosis, we conducted some case studies (for OS upgrade failures, and high availability database backup issues) and showed its capability to identify the right set of root causes. Unlike existing solutions targeted toward a specific system, it is faster, more interpretable, and more general. While challenges may vary for different cloud providers and services, we foresee that this general model for machine health can lead to cost savings, reduced human effort, and better customer experience. Yueying (Lisa) Li · G. Edward Suh · Christina Delimitrou 🔗 - Hearing Touch: Using Contact Microphones for Robot Manipulation (Poster) Humans manipulate objects using all of their senses, including sound and touch: audio can indicate whether or not the door has been unlocked or an egg has been properly cracked. Prior work has shown that humans can use auditory feedback alone to categorize types of events and infer continuous aspects of these events, such as the length of a wooden dowel being struck [1]. However, microphones remain underexplored in robotics, especially their potential as tactile vibration sensors.In this work, we investigate contact audio as an alternative tactile modality for complex manipulation tasks that are challenging from vision alone. Contact microphones record vibrations of anything in direct contact at a high-frequency (1000 times higher frequency than the next common tactile sensor [2]). This makes them well-suited to use as tactile sensors when interacting with objects in manipulation. Furthermore, contact audio is immune to many aspects of environment variation that vision is plagued by, such as lighting and color variation, making it promising for transfer learning and multi-task settings that are common in robotics. Shaden Alshammari · Victoria Dean · Tess Hellebrekers · Pedro Morgado · Abhinav Gupta 🔗 - Adapting the Function Approximation Architecture in Online Reinforcement Learning (Poster) One of the main learning tasks in Reinforcement Learning (RL) is to approximate the value function – a mapping from the present observation to the expected sum of future rewards. Neural network architectures for value function approximation typically impose sparse connections with prior knowledge of observational structure. When this structure is known, architectures such as convolutions, transformers, and graph neural networks can be inductively biased with fixed connections. However, there are times when observational structure is unavailable or too difficult to encode as an architectural bias – for instance, relating sensors that are randomly dispersed in space. Yet in all of these situations it is still desirable to approximate value functions with a sparsely-connected architecture for computational efficiency. An important open question is whether equally-useful representations can be constructed when observational structure is unknown – particularly in the incremental, online setting without access to a replay buffer.Our work is concerned with how a RL system could construct a value function approximation architecture in the absence of observational structure. We propose an online algorithm that adapts connections of a neural network using information derived strictly from the learner’s experience stream, using many parallel auxiliary predictions. Auxiliary predictions are specified as General Value Functions (GVFs) [11], and their weights are used to relate inputs and form subsets we call neighborhoods. These represent the input of fully-connected, random subnetworks that provide nonlinear features for a main value function. We validate our algorithm in a synthetic domain with high-dimensional stochastic observations. Results show that our method can adapt an approximation architecture without incurring substantial performance loss, while also discovering a local degree of spatial structure in the observations without prior knowledge. John Martin · Joseph Modayil · Fatima Davelouis · Michael Bowling 🔗 - FMAM: A novel Factorization Machine based Attention Mechanism for Forecasting Time Series Data (Poster) Transformer-based architectures gained popularity dueto their exceptional performances in the natural lan-guage processing domain. Then gradually, we have seenwidespread use of Transformer based architectures inthe other domains like vision and time series. However,a renowned bottleneck of the transformers when usinglong sequences is that they use the self-attention mech-anism, and computing the self-attention is very costlyfor such long sequences. Therefore, the performanceof the Transformers is greatly affected when dealingwith such long sequences and we know that most ofthe real-world time series data contain long sequences.To overcome this problem, various approaches havebeen adopted. Among them, various modifications ofthe vanilla Transformer and sparse attention techniquesare worth mentioning. To solve this problem, we pro-pose a novel attention mechanism inspired by the Fac-torization Machine. In this paper, we show that insteadof computing the exact attention values, we can learn afunction that computes the approximate attention for thelong sequences and thus, predicts long time series se-quences faster. In particular, we aims to develop a novelattention mechanism that takes advantage of the exist-ing attention mechanism in Transformers and makesthem more efficient by learning approximate attentionwithout affecting the performance of the Transformersmuch. Fahim Azad 🔗 - 11:00 a.m. Sponsor Talks 🔗