Timezone: »

Affinity Workshop
WiML Workshop 1
Soomin Aga Lee · Meera Desai · Nezihe Merve Gürel · Boyi Li · Linh Tran · Akiko Eriguchi · Jieyu Zhao · Salomey Osei · Sirisha Rambhatla · Geeticka Chauhan · Nwamaka (Amaka) Okafor · Mariya Vasileva

Wed Dec 08 06:00 PM -- 11:00 PM (PST) @ None

WiML’s purpose is to enhance the experience of women in machine learning. Our flagship event is the annual Women in Machine Learning (WiML) Workshop, typically co-located with NeurIPS. We also organize an “un-workshop” at ICML, as well as small events at other machine learning conferences such as AISTATS, ICLR, etc.

Our mission is to enhance the experience of women in machine learning, and thereby

Increase the number of women in machine learning
Help women in machine learning succeed professionally
Increase the impact of women in machine learning in the community

Toward this goal, we create opportunities for women to engage in substantive technical and professional conversations in a positive, supportive environment (e.g. annual workshop, small events, mentoring program). We also work to increase awareness and appreciation of the achievements of women in machine learning (e.g. directory and profiles of women in machine learning). Our programs help women build their technical confidence and their voice, and our publicity efforts help ensure that women in machine learning and their achievements are known in the community.

 Wed 6:00 p.m. - 7:00 p.m. Preworkshop social - South West Garden (firepits #1 and 2), Gather.Town (Social)  link » 🔗 Wed 7:00 p.m. - 7:20 p.m. WiML Opening remarks (Remarks) Boyi Li · Mariya Vasileva 🔗 Wed 7:20 p.m. - 7:30 p.m. D&I Remarks (Danielle Belgrave) (Remarks) Danielle Belgrave 🔗 Wed 7:30 p.m. - 8:05 p.m. Invited talk - Machine Learning as a Service: The Challenges of Serving diverse client Distributions, Sunita Sarawagi (Talk) Sunita Sarawagi 🔗 Wed 8:05 p.m. - 8:15 p.m. Live Q&A with Sunita Sarawagi (Q&A) Sunita Sarawagi 🔗 Wed 8:15 p.m. - 8:40 p.m. Contributed talk #1 – Regret minimization in heavy-tailed bandits, Shubhada Agrawal (Talk) Shubhada Agrawal 🔗 Wed 8:45 p.m. - 9:45 p.m. Poster Session #1 - North Side, Gather.Town (Posters)  link » 🔗 Wed 10:15 p.m. - 10:50 p.m. Invited talk – Learning physics models that generalize, Meire Fortunato (Talk) Meire Fortunato 🔗 Wed 10:50 p.m. - 11:00 p.m. Live Q&A with Meire Fortunato (Q&A) Meire Fortunato 🔗 - Regret Minimization in Heavy-Tailed Bandits (Poster)  link » We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order (1 + \epsilon) are uniformly bounded by a known constant B, for some given  \epsilon > 0. We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite time bound on its regret. We show that our index concentrates faster than the well-known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost. Link » Shubhada Agrawal · Sandeep Juneja · Wouter Koolen 🔗 - Flood Segmentation on Sentinel-1 SAR Imagery with Semi-Supervised Learning (Poster)  link »    Floods wreak havoc throughout the world, causing billions of dollars in damages, and uprooting communities, ecosystems and economies. The NASA Impact Emerging Techniques in Computational Intelligence (ETCI) competition on Flood Detection tasked participants with predicting flooded pixels after training with synthetic aperture radar (SAR) images in a supervised setting. We propose a semi-supervised learning pseudo-labeling scheme that derives confidence estimates from U-Net ensembles, thereby progressively improving accuracy. Concretely, we use a cyclical approach involving multiple stages (1) training an ensemble model of multiple U-Net architectures with the provided high confidence hand-labeled data and, generated pseudo labels or low confidence labels on the entire unlabeled test dataset, and then, (2) filter out quality generated labels and, (3) combine the generated labels with the previously available high confidence hand-labeled dataset. This assimilated dataset is used for the next round of training ensemble models. This cyclical process is repeated until the performance improvement plateaus. Additionally, we post process our results with Conditional Random Fields. Our approach sets a high score, and a new state-of-the-art on the Sentinel-1 dataset for the ETCI competition with 0.7654 IoU, an impressive improvement over the 0.60 IOU baseline. Our method, which we release with all the code including trained models, can also be used as an open science benchmark for the Sentinel-1 released dataset. Link » Siddha Ganju · Sayak Paul 🔗 - Generating Thermal Human Faces for Physiological Assessment using Thermal Sensor Auxiliary Labels (Poster)  link »    Thermal images reveal medically important physiological information about human stress, signs of inflammation, and emotional mood that cannot be seen on visible images. Providing a method to generate thermal faces from visible images would be highly valuable for the telemedicine community in order to show this medical information. To the best of our knowledge, there are limited works on visible-to-thermal (VT) face translation, and many current works go the opposite direction to generate visible faces from thermal surveillance images (TV) for law enforcement applications. As a result, we introduce favtGAN, a VT GAN which uses the pix2pix image translation model with an auxiliary sensor label prediction network for generating thermal faces from visible images. Since most TV methods are trained on only one data source drawn from one thermal sensor, we combine datasets from faces and cityscapes. These combined data are captured from similar sensors in order to bootstrap the training and transfer learning task, especially valuable because visible-thermal face datasets are limited. Experiments on these combined datasets show that favtGAN demonstrates an increase in SSIM and PSNR scores of generated thermal faces, compared to training on a single face dataset alone. Link » Catherine Ordun · Sanjay Purushotham · Edward Raff 🔗 - Gaussian Process Bandits with Aggregated Feedback (Poster)  link » We consider the continuum-armed bandits problem, under a novel setting of recommending the best arms within a fixed budget under aggregated feedback. This is motivated by applications where the precise rewards are impossible or expensive to obtain, while an aggregated reward or feedback, such as the average over a subset, is available. We constrain the set of reward functions by assuming that they are from a Gaussian Process and propose the Gaussian Process Optimistic Optimisation (GPOO) algorithm. We adaptively construct a tree with nodes as subsets of the arm space, where the feedback is the aggregated reward of representatives of a node. We propose a new simple regret notion with respect to aggregated feedback on the recommended arms. We provide theoretical analysis for the proposed algorithm, and recover single point feedback as a special case. We illustrate GPOO and compare it with related algorithms on simulated data. Link » Mengyan Zhang · Russell Tsuchida · Cheng Soon Ong 🔗 - Privacy-Preseving Federated Multi-Task Linear Regression: A One-shot Linear Mixing Approach Inspired by Graph Regularization (Poster)  link » We investigate multi-task learning (MTL), where multiple learning tasks are performed jointly rather than separately to leverage their similarities and improve performance. We focus on the federated multi-task linear regression setting, where each machine possesses its own data for individual tasks and sharing the full local data between machines is prohibited. Motivated by graph regularization, we propose a novel fusion framework that only requires a one-shot communication of local estimates. Our method linearly combines the local estimates to produce an improved estimate for each task, and we show that the ideal mixing weight for fusion is a function of task similarity and task difficulty. A practical algorithm is developed and shown to significantly reduce mean squared error (MSE) on synthetic data, as well as improve performance on a real-world income prediction task. Link » Harlin Lee 🔗 - Machine Learning API in NASA’s WorldView Satellite Image Search System (Poster)  link » SpaceML, an extension of the Frontier Development Lab, is a volunteer-based developer community that builds open-source computing tools for the bettering of space science and exploration. NASA’s WorldView Satellite Image Search System is a SpaceML project that was born out of the all-too-familiar issue of having abundant data but no efficient way of studying it. For example, if a researcher needed to find a specific type of hurricane cloud pattern, they would input one instance and the system would return all the recorded cases of that phenomena. My team was assigned to develop the system’s RESTful Application Programming Interface (API) which would handle user requests and communicate with the data store to perform image similarity searches. We have accomplished our goal of building an API that allows any user service to perform similarity searches on satellite imagery by utilizing scalable, high-performance, and developer-friendly frameworks and a machine learning integration. Link » Kai Priester 🔗 - Interpretable Machine Learning with Symbolic Regression (Poster)  link »    The interest in interpretable machine learning has grown significantly in the last few years. In this work, we present an approach belonging to the family of symbolic regression, which considers models as mathematical formulas thus creating new features from input variables as well as possible interactions between variables. Symbolic regression inherently allows for interpretability by searching for models that are usually much simpler than random forests or neural networks, and whose formula is explicit. Our approach, called Zoetrope Genetic Programming, combines the advances in symbolic regression with sparse linear regression, and builds models that are both interpretable and with good performance. We demonstrate the good performance on a benchmark of 97 regression datasets comparing ZGP with other state-of-the-art classical regression and symbolic regression algorithms. Link » Aurélie Boisbunon · Ingrid Grenet · Marc Schoenauer 🔗 - Visual Question Answering (VQA) Models for Hypothetical Reasoning (Poster)  link » In this work, we propose a novel vision-language question answering task for ‘what-if’ reasoning over images. We set up a synthetic corpus based on the CLEVR (Johnson et al., 2017a) dataset which is carefully crafted to ensure minimal biases, support explainable model development and yet diverse. We set up several baselines based on existing architectures to gain insights about their ability to perform hypothetical reasoning. In future, we would like to develop better vision-language models to tackle the hypothetical reasoning problem. Link » Shailaja Sampat 🔗 - Identifying Hijacked Reviews (Poster)  link »    Fake reviews and review manipulation are growing problems on online marketplaces globally. Review Hijacking is a new review manipulation tactic in which unethical sellers “hijack” an existing product page (usually one with many positive reviews), then update the product details like title, photo, and description with those of an entirely different product. With the earlier reviews still attached, the new item appears well-reviewed. However, there are no public datasets of review hijacking and little is known in the literature about this tactic. Hence, this paper proposes a three-part study: (i) we propose a framework to generate synthetically labeled data for review hijacking by swapping products and reviews; (ii) then, we evaluate the potential of both a Twin LSTM network and BERT sequence pair classifier to distinguish legitimate reviews from hijacked ones using this data; and (iii) we then deploy the best performing model on a collection of 31K products (with 6.5 M reviews) in the original data, where we find 100s of previously unknown examples of review hijacking. Link » Monika Daryani · James Caverlee 🔗 - Car Damage Detection and Patch-to-Patch Self-supervised Image Alignment (Poster)  link »    Most computer vision applications aim to identify pixels in a scene and utilize them for diverse purposes. One intriguing application is car damage detection for insurance carriers which tends to detect all car damages by comparing both pre-trip and post-trip images, even requiring two components: (i) car damage detection (ii) image alignment. Firstly, we implemented a Mask R-CNN model to detect car damages on custom images. But for the image alignment section, we especially propose a novel Patch-to-Patch SimCLR inspired image alignment approach to find perspective transformations between custom pre/post car rental images except for traditional methods. Link » Hanxiao Chen 🔗 - Classification of Shoulder Impingement Syndrome using Transfer Learning model (Poster)  link »    Shoulder impingement syndrome refers to the inflammation and irritation of the rotator cuff. It refers to the insidious of sharp, anterolateral shoulder pain produced during elevation, eased on lowering the arm in the presence of a positive Neer and Hawkins Kennedy Test Tool utility, which typically occurs in patients over age 40. It is a common musculoskeletal medical condition affecting 7% to 26% of individuals before the Covid 19 and increases as individuals less physical activity during home quarantine. Over the last years, transfer learning methods have shown efficient results to assist radiologists and surgeons in classifying shoulder impingement syndrome. This abstract provides a brief overview of the most effective learning model used to classify shoulder impingement from the MURA-v1.1 dataset that is VRR16 and RESNET model. Although the shoulder impingement syndrome has been known for a long time, it remains an indisposed understood in the musculoskeletal medical entity. In contrast, hereby propose that regardless of the dataset size, we can use transfer learning model as baseline to gained knowledge previously to achieve highly accurate results. Figure model VGG16 results from predicted random four images Normal 79.99%, 96.89% while Abnormal 74.66% and 90.47%XRshoulder images. In comparison, the RESNET model has a more complex architecture compared with VGG16. Notwithstanding included efficient accuracy in classification, tuning hyperparameter and depending on epoch and batch size. Consequently, Figure model RESNET predicts four random images, likely the Normal 97.78%, 91.22% and Abnormal 81.36%, 78.62% XRShoulder images that achieve > 70% accuracy using deep learning technique such as transfer learning practical method with no clinical test use except for anonymous patient who volunteers to present medical image of shoulder impingement syndrome. Link » Raquel Marasigan 🔗 - Comparative Analysis of Machine Learning Techniques for Breast Cancer Detection (Poster)  link » Death from cancer is one of humanity's biggest problems, though there are many ways to stop its occurrence because there is still no cancer cure. The death rate from breast cancer is increasing significantly with the rapid growth of the population; thus, effective diagnosis of cancer is significant. Cancer of the breast is one of the major cancer-related deaths amongst women globally. Survival rates differ across the numerous health treatments. Therefore, data analysis approaches employed to detect and treat breast cancer have to be improved to facilitate quick treatment and achieve more reliable outcomes. This study conducted a comparative analysis of machine learning techniques for breast cancer detection. This study was analyzed using Wisconsin datasets from an online UCI machine-learning repository. First, feature selection was carried out through the Particle Swarm Optimization algorithm (PSO); this algorithm helped pick relevant features from the raw dataset to eliminate and reduce noises for a better outcome, and then a reduced dataset was achieved. Three machine learning algorithms for classification were used, namely: support vector machine (SVM), artificial neural networks (ANNs), and decision tree (DT), for classification purposes, and these classifiers were used to analyze the reduced dataset to simulate the model. The performance metrics used for evaluating the model include precision, sensitivity, specificity, accuracy, F-score, false acceptance rate, error rate, and false-rejection rate. The model was simulated using Matlab 2015 version. The result from the evaluation phase in terms of performance metrics reveals that ANNs achieved the highest accuracy, sensitivity, precision, and F-score, and recall of 97.13%, 99.10%, 96.49%, 97.77%, and 99.09% respectively, and ANN also produced the lowest false acceptance rate, error rate, and false rejection rate of 0.0450, 0.0666 and 0.0090 respectively. Link » Jesutofunmi Afolayan 🔗 - Importance of Data Re-Sampling and Dimensionality Reduction in Predicting Students’ Success (Poster)  link » We present the importance of data pre-processing in predicting students’ success. We implemented Principal Component Analysis for dimensionality reduction to achieve better model performance. Data re-sampling techniquewas also utilized to handle the imbalanced class problem that is one of the significant issues in effective classification in Educational Data Mining due to the nature of the data fromeducational settings. We also performed a comparative analysis on the impacts of Random Under-Sampling (RUS), Random Over-Sampling (ROS), and Synthetic Minority Over-SamplingTechnique (SMOTE) to an imbalanced dataset used in this study. SMOTE and PCA techniques application offer better performance compared to RUS and ROS with PCA. Support Vector Machine had the best accuracy value of 0.94 after the application of SMOTE and PCA. The application of PCA on the imbalanced data also positively affected the accuracy of the models used in this study. We used other performance metrics to evaluate our models: Kappa, Area Under Curve, and Precision-Recall curve. Our finding shows that the predictive models can predict student success with the application of PCA and data re-sampling techniques. Link » Eluwumi Buraimoh · Ritso Ajoodha · Kershree Padayachee 🔗 - Adversarial Training for Improving Model Robustness? Look at Both Prediction and Interpretation (Poster)  link »    Neural language models show vulnerability to adversarial examples which are semantically similar to their original counterparts with a few words replaced by their synonyms. A common way to improve model robustness is adversarial training which follows two steps—collecting adversarial examples by attacking a target model, and fine-tuning the model on the augmented dataset with these adversarial examples. The objective of traditional adversarial training is to make a model produce the same correct predictions on an original/adversarial example pair. However, the consistency between model decision-makings on two similar texts is ignored. We argue that a robust model should behave consistently on original/adversarial example pairs, that is making the same predictions (what) based on the same reasons (how) which can be reflected by consistent interpretations. In this work, we propose a novel feature-level adversarial training method named FLAT. FLAT aims at improving model robustness in terms of both predictions and interpretations. FLAT incorporates variational word masks in neural networks to learn global word importance and play as a bottleneck teaching the model to make predictions based on important words. FLAT explicitly shoots at the vulnerability problem caused by the mismatch between model understandings on the replaced words and their synonyms in original/adversarial example pairs by regularizing the corresponding global word importance scores. Experiments show the effectiveness of FLAT in improving the robustness with respect to both predictions and interpretations of four neural network models (LSTM, CNN, BERT, and DeBERTa) to two adversarial attacks on four text classification tasks. The models trained via FLAT also show better robustness than baseline models on unforeseen adversarial examples across different attacks. Link » Hanjie Chen · Yangfeng Ji 🔗 - Syntax-enhanced Dialogue Summarization using Syntax-aware information (Poster)  link »    During the COVID-19 pandemic, a virtual conversation tool like Zoom is inevitable. With this much demand, dialogue summarization has emerged as a means to summarize the dialogues. There are two challenges in dialogue summarization as follows: firstly, multiple speakers from different textual styles participate in dialogue, and secondly, informal dialogue structures (e.g., slang, colloquial representation). To address these challenges, we investigated the relationship between textual styles and representative attributes of utterances. [1] proposed that the types (e.g., intent or role of a speaker) of sentences from speakers are associated with different syntactic structures, such as part-of-speech (POS) tagging. This is derived from the fact that different speaker roles are characterized by different syntactic structures. In essence, the uttered text has a unique representation from each speaker, like a voiceprint (i.e., identity information from the human voice [2]. Based on this prior research, we began our study with the assumption that because syntactic structures tend to be associated with a representative of a sentence uttered from speakers, these structures would help distinguish the different styles of utterances. In this work, we propose a novel abstractive dialogue summarization model for use in a daily conversation setting, characterized by an informal style of text that employs multi-task learning to learn linguistic information and dialogue summarization simultaneously. Link » Seolhwa Lee · Kisu Yang · Chanjun Park · Heuiseok Lim 🔗 - COVID-Net Clinical ICU: Enhanced Prediction of ICU Admission for COVID-19 Patients via Explainability and Trust Quantification (Poster)  link » The COVID-19 pandemic continues to have a devastating global impact, and has placed a tremendous burden on struggling healthcare systems around the world. Given the limited resources, accurate patient triaging and care planning is critical in the fight against COVID-19, and one crucial task within care planning is determining if a patient should be admitted to a hospital’s intensive care unit (ICU). Motivated by the need for transparent and trustworthy ICU admission clinical decision support, we introduce COVID-Net Clinical ICU, a neural network for ICU admission prediction based on patient clinical data. Driven by a transparent, trust-centric methodology, the proposed COVID-Net Clinical ICU was built using a clinical dataset from Hospital Sírio-Libanês comprising of 1,925 COVID-19 patients, and is able to predict when a COVID-19 positive patient would require ICU admission with an accuracy of 96.9% to facilitate better care planning for hospitals amidst the on-going pandemic. We conducted system-level insight discovery using a quantitative explainability strategy to study the decision-making impact of different clinical features and gain actionable insights for enhancing predictive performance. We further leveraged a suite of trust quantification metrics to gain deeper insights into the trustworthiness of COVID-Net Clinical ICU. By digging deeper into when and why clinical predictive models make certain decisions, we can uncover key factors in decision making for critical clinical decision support tasks such as ICU admission prediction and identify the situations under which clinical predictive models can be trusted for greater accountability. Link » Audrey Chung · Mahmoud Famouri · Andrew Hryniowski · Alexander Wong 🔗 - Using computer vision to measure spatial-temporal change of building conditions in neighborhoods with street view imagery (Poster)  link » Neighborhood environments play a significant role in shaping the well-being of individuals and communities, consequently contributing to inequality in the United States. Drawing on a dataset of street view images of buildings in five U.S. cities collected over 13 years, we train a convolutional neural network to estimate the amount of deterioration in the building shown in each image. We then use the trained model to detect trends in the level of building blight in Boston from 2007 to 2017. Our results show that the changes in upkeep correspond to overall economic trends, specifically the Great Recession and the subsequent recovery. Link » Evelyn Fitzgerald · Tingyan Deng · Lijing Wang 🔗 - Polaris: accurate spot detection for biological images with deep learning and weak supervision (Poster)  link »    Recent advances in imaging and machine learning have increased our ability to capture information about biological systems in the form of images. Therefore, images have the potential to be a universal data type for biology. A common and challenging computational task required for the analysis of biological images is fluorescent spot detection. This problem is challenging to solve with supervised learning methods because the notion of ground truth is ambiguous — most images contain too many spots for humans to manually curate. Moreover, expert human annotators disagree significantly on the number and location of spots in images. In this work, we present a weakly supervised approach to spot detection that addresses these challenges to reliable spot detection. Rather than manually annotating each spot, we fine tune a collection of classical spot detection algorithms on a set of images to create a set of annotations. We then perform generative modeling to create a consensus annotation set which is then used to train a deep learning model for spot detection. We show that when trained in this fashion, our deep learning model outperforms deep learning models trained with an annotation set from a single classical algorithm and has spot detection capabilities that generalize to image sets from a wide range of assays. When paired with our deep learning-based methods for cell segmentation and tracking, this spot detection method can be applied to the analysis of a number of live cell reporters and end-point spatial-omics assays. To improve accessibility, we have developed an image analysis pipeline, called Polaris, for singleplex and multiplex spatial transcriptomics image sets. Importantly, this paradigm of using weakly supervised learning to create consensus training data would be expected to improve the performance of any deep learning model for spot detection, regardless of model architecture, because it improves the accuracy of the training annotation set. Link » Emily Laubscher · William Graf · David Van Valen 🔗 - Drought and Nitrogen Induced Stress Identification for Maize Crop using Deep Learning deployed on Unmanned Aerial Vehicles (Drones) (Poster)  link » Maize (Zea mays L.) constitutes 36\% (782 metric tonnes) of the global grain production and is one of the most versatile crops that grows under varied climatic conditions, making it a staple food in most countries. It contributes nearly 9\% to the Indian food basket and more than 100 billion INR to the agricultural GDP. Due to climate change and growing demand, food safety and security are greatly affected. We endeavour to develop a deep learning-based technique that can aid farmers in improving yield by identifying stress (drought and nitrogen-based). Concretely, we aim to develop an end-to-end pipeline in conjunction with farmers and agricultural researchers to, (1) perform data collection of RGB and multispectral data using drones at various stages of growth and stress (2) propose deep learning-based methods that can identify various kinds of stress and recommend required action and, (3) work with the farmers and agricultural researchers to deploy this technology to aid their production. Link » Tejasri Nampally · G Ujwal Sai · Siddha Ganju · Ajay Kumar · Balaji Banothu 🔗 - Scene statistics and noise determine the relative arrangement of receptive field mosaics (Poster)  link »    Many sensory systems utilize parallel ON and OFF pathways that signal stimulus increments and decrements, respectively. These pathways consist of ensembles or grids of ON and OFF detectors spanning sensory space. Yet, encoding by opponent pathways raises a question: How should grids of ON and OFF detectors be arranged to optimally encode natural stimuli? We investigated this question using an artificial neural network model of the retina guided by efficient coding theory. Specifically, we optimized spatial receptive fields and contrast response functions to encode natural images given noise and constrained firing rates. We find that the optimal arrangement of ON and OFF receptive fields exhibits a transition between aligned and antialigned grids. The preferred phase depends on detector noise and the statistical structure of the natural stimuli. These results reveal that noise and stimulus statistics produce qualitative shifts in neural coding strategies and provide theoretical predictions for the configuration of opponent pathways in the nervous system. Link » Na Young Jun 🔗 - Predictive classification of clinical ball catching trials with recurrent neural networks (Poster)  link »    Motor disturbances arising from neurodegenerative and neurodevelopmental disorders, such as Spinocerebellar Ataxias and Autism Spectrum Disorders, can strongly affect a patient’s quality of life. A classification of clinical catching trials might give insight into the existence of pathological alterations in the relation of arm and ball movements. Accurate, but also early decisions are required to classify a catching attempt before the catcher's first ball contact. To ensure clinically valuable results, we postulate a confidence threshold of 75%. Hence, three competing objectives have to be optimized at the same time: accuracy, earliness and decision-making confidence. Here we propose a coupled classification and prediction approach for early time series classification: a predictive, generative recurrent neural network (RNN) forecasts the next data points of ball trajectories based on already available observations; a discriminative RNN continuously generates classification guesses based on the available data points and the unrolled sequence predictions. We compare our approach, which we refer to as predictive sequential classification (PSC), to state-of-the-art sequence learners, including long short-term memory networks (LSTMs) and temporal convolutional networks (TCNs). On this hard real-world task we can consistently demonstrate the superiority of PSC over all other models in terms of accuracy and confidence with respect to earliness of recognition. Specifically, PSC is able to confidently classify the success of catching trials as early as 123 milliseconds before the first ball contact. Our findings show that PSC with its two-model design can simultaneously optimize accuracy, earliness and confidence of decision-making, thus constituting a promising approach for early time series classification, when accurate and confident decisions are required. Link » Jana Lang 🔗 - A Neural Network Ensemble Approach to System Identification (Poster)  link »    We present a new algorithm for learning unknown governing equations from trajectory data, using and ensemble of neural networks. Given samples of solutions $x(t)$ to an unknown dynamical system $\dot{x}(t)=f(t,x(t))$, we approximate the function $f$ using an ensemble of neural networks. We express the equation in integral form and use Euler method to predict the solution at every successive time step using at each iteration a different neural network as a prior for $f$. This procedure yields M-1 time-independent networks, where M is the number of time steps at which $x(t)$ is observed. Finally, we obtain a single function $f(t,x(t))$ by neural network interpolation. Unlike our earlier work, where we numerically computed the derivatives of data, and used them as target in a Lipschitz regularized neural network to approximate $f$, our new method avoids numerical differentiations, which are unstable in presence of noise. We test the new algorithm on multiple examples both with and without noise in the data. We empirically show that generalization and recovery of the governing equation improve by adding a Lipschitz regularization term in our loss function and that this method improves our previous one especially in presence of noise, when numerical differentiation provides low quality target data. Finally, we compare our results with the method proposed by Raissi, et al. arXiv:1801.01236 (2018). Link » Elisa Negrini · Giovanna Citti 🔗 - Two Sides of the Same Coin: Heterophily and Oversmoothing in Graph Convolutional Neural Networks (Poster)  link »    In node classification tasks, graph convolutional neural networks (GCNs) may perform poorly in graphs where neighbors have different features/classes (heterophily) and when stacking multiple layers (oversmoothing). These two seemingly unrelated problems have been studied mostly independently, but there is recent empirical evidence that solving one problem may benefit the other. In this work, going beyond empirical observations, we aim to: (1) propose a unified theoretical perspective to analyze the heterophily and oversmoothing problems, (2) identify the common causes of the two problems based on our theoretical insights, and (3) propose simple yet effective strategies to address the common causes. In our theoretical analysis, we view the changes of node representations at different layers as their "movements" in the latent space, and we prove that, under certain conditions, movements of nodes towards the other class lead to a non-decreasing misclassification rate of node labels. We show that the common causes of heterophily and oversmoothing problems are those which trigger the nodes to move towards the other class, which include: the relative degree of a node (compared to its neighbors) and the heterophily level of a node's neighborhood. Under certain conditions, our analysis suggests that : (1) Nodes with high heterophily have a higher misclassification rate. (2) Even with low heterophily, degree disparity between nodes can influence the moving dynamics of nodes and result in a pseudo-heterophily situation, which helps to explain oversmoothing. Based on our insights, we propose simple modifications to the GCN architecture---i.e., degree corrections and signed messages---which alleviate the causes of these issues. Link » Yujun Yan · Milad Hashemi · Kevin Swersky · Yaoqing Yang · Danai Koutra 🔗 - Feedforward Omnimatte (Poster)  link » In digital media, a standard concept is the idea of layers. Layers enable artist to manipulate independent groups of objects in a disentangled way and organize a scene from back to front. Manually extracting these layers can be time-consuming, since it involves repeatedly segmenting groups of content over many frames. In [1], Lu et al. coin the problem of automatically decomposing a video into these layers as creating an Omnimatte. More specifically, given an initialization of each layer with a sequence of object masks, everything associated with that object must be attached to the same layer. This includes effects such as shadows and reflections. The current method of extracting Omnimattes takes approximately two hours per video and must be optimized from scratch for every new video. Motivated by the challenge of more quickly producing Omnimattes, we have designed and tested a train-and-evaluate network that generates Omnimattes for new videos with a single forward pass. Our network builds on the idea of learned gradient descent, a setup which has also been applied to generate multi-plane images used to render novel views of a scene. Initial results show that this approach can generate meaningful decompositions of videos into foreground and background layers. Link » Sharon Zhang · Jonathan Huang · Vivek Rathod 🔗 - Predicting Fake News and Real News Spreaders' Influence (Poster)  link »    The spread of misinformation across social media is one of the biggest national security threats in the 21st century. Previous research has been successful at identifying misinformation spreaders on Twitter based on user demographics and past tweet history, and others have been relatively successful at predicting the number of retweets of a given tweet. However, the problem of predicting the number of retweets of news articles tweeted by a specific user has not yet been tackled, which determines the impact of the initial tweet containing misinformation. We use data from FakeNewsNet, containing a list of 43119 known fake news spreaders and 135,234 real news spreaders, and the past 500 tweets of each user to build profiles of each user to predict the number of retweets the news article tweet will receive. We present a Random Forest classifier that categorizes the number of retweets a news tweet will receive into 5 ranges using user profile characteristics and information about past tweets. This model resulted in a weighted F1 score of 0.931 at highest for the real news dataset and 0.853 for the fake news dataset, higher than existing models. We show the difficulty in fake news retweet prediction due to low variance in user characteristics in fake news spreader and propose the potential for graph-based models to more accurately predict retweets of fake news. Link » Amy Zhang · Aaron Brookhouse · Francesca Spezzano 🔗 - Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies (Poster)  link »    Gender is widely discussed in the context of language tasks and when examining the stereotypes propagated by language models. However, current discussions primarily treat gender as binary, which can perpetuate harms such as the cyclical erasure of non-binary gender identities. These harms are driven by model and dataset biases, which are consequences of the non-recognition and lack of understanding of non-binary genders in society. In this paper, we explain the complexity of gender and language around it, and survey non-binary persons to understand harms associated with the treatment of gender as binary in English language technologies. We also detail how current language representations (e.g., GloVe, BERT) capture and perpetuate these harms and related challenges that need to be acknowledged and addressed for representations to equitably encode gender information. Link » Anaelia Ovalle 🔗 - Learning Aerodynamics and Instrument behavior to Fly in Dangerous Conditions (Poster)  link » Flying an aircraft from takeoff to landing typically involves multiple tasks that take place in a large environment. Different areas in this environment have different effects on the behavior of the aircraft, for example, when flying at different altitudes with different speeds. In this large environment with continuous action space, fully observable states are not available due to, for example, atmosphere changes due to humidity but also limited equipment onboard. Within this large environment, besides flying typical tasks, accident prevention and recovery is essential for safety. All these flying conditions form a diverse task space and complex challenge that can be approached with deep reinforcement learning by using simulations. Existing research that uses reinforcement learning to learn to fly an aircraft or Unmanned Aerial Vehicles focuses usually on a subset of this task space. We propose a method based on on-policy reinforcement learning to cover a majority of flying tasks and potential accidents by leveraging the aerodynamics learned by the agent during simple tasks. To create these specific generally capable agents, safety is of higher importance for this problem compared to other applications. When the agent cannot participate in a task a possible flying path should still be found. An agent is trained using an on-policy framework with the main focus on understanding the dynamics in the environment on basic flying tasks. We show that this agent has high performance in zero shot and fine tuning on a diverse set of failures. Furthermore, we propose a system for assessing the probability of being in a certain time frame before being in an unsafe state. Link » Cynthia Koopman · David Zammit Mangion · Alexiei Dingli 🔗 - Effectiveness of Transformers on Session-Based Recommendation (Poster)  link »    Recommender systems (RecSys) are the engine of the modern internet and play a critical role in driving user engagement while helping users to find relevant items that match their preferences learned from their historical interactions on other items. In many recommendation domains such as news, e-commerce, and streaming video services, users might be untraceable/anonymous, their histories can be short and users can have rapidly changing tastes. Providing recommendations based purely on the interactions within the current session is an important and challenging problem. The field of NLP has evolved significantly over the past decade, particularly due to the increased usage of deep learning. The state-of-the-art NLP approaches have inspired RecSys practitioners and researchers to adapt those architectures, especially for sequential and session-based recommendation problems. In our work, we investigate the effectiveness of the transformer-based architectures for next-click predictions using the short user sequences for session-based recommendation tasks. In addition, we explored if combining different transformer architectures and training techniques such as MLM (masked language modeling), PLM (permutation LM), and RTD (the Replacement Token Detection) could be beneficial in session-based recommendation tasks with short sequences of interactions. To effectively bridge the gap between modern NLP and sequential and session-based recommendation tasks, we developed a RecSys framework built upon direct integration with the Hugging Face (HF) Transformers library. We conducted experiments using our developed framework and the results showed that training XLNet with RTD, to our knowledge a novel combination, led to an improvement of +14.15% NDCG@20 and +9.75% NDCG@20 across REES46 and YOOCHOOSE e-commerce datasets, respectively, relative to the best baseline. Our framework provides the necessary functionalities to use Transformers to build sequential and session-based models. Link » SARA RABHI · Ronay Ak · Gabriel Moreira · Jeong Min Lee · Even Oldridge 🔗 - Decaying Clipping Range in Proximal Policy Optimization (Poster)  link » Proximal Policy Optimization (PPO) [1] is among the most widely used algorithms in reinforcement learning, which achieves state-of-the-art performance in many challenging problems. The keys to its success are the reliable policy updates through the clipping mechanism and the multiple epochs of minibatch updates. The aim of this research is to give new simple but effective alternatives to the former. For this, the new methods that we propose include linear, exponential and Z-shaped curve clipping range reduction throughout the training, as well as a moving average approach. With these, we would like to provide higher exploration at the beginning and stronger restrictions at the end of the learning phase. We investigate their performance in several classical control and locomotive robotic simulation environments in which we test and compare the performance of the alternative algorithms. These include the solution of simpler classical control tasks in the OpenAI Gym environments and slightly more complex continuous control tasks in the Box2D simulator, along with locomotive robotic control problems in the PyBullet environments. In our analysis, we conclude that the examined PPO algorithm can be successfully applied in all nine environments studied, which shows its power and provides insight into why it is so popular nowadays. Furthermore, our proposed clipping range strategies, which are designed to further refine this state-of-the-art method, are able to achieve better results in several cases than the original constant approach, especially the exponential and Z-shaped declining strategies. However, the OpenAI Gym Box2D environments show that these approaches are not always successful, which is not surprising since there is usually no general solution for all situations. Although they are promising alternatives to the constant clipping method. Link » Mónika Farsang 🔗 - Reading the Road: Leveraging Meta-Learning to Learn Other Driver Behavior (Poster)  link » A significant challenge for autonomous vehicles today is learning to drive with other drivers. This stems from the need to accurately model others drivers' actions, which is especially difficult because of how many unique driving styles can exist. We present a meta learning approach: By treating the experience of driving with others as tasks, the deep learning model creates a unique representation for each experience. During test time, the model can quickly adapt to unseen driving styles with only a few updates. We present promising initial results: the meta learning model outperforms reinforcement learning baselines, even accounting for exposure to new drivers. Link » Anat Kleiman · Ryan Adams 🔗 - Identifying ATT&CK Tactics in Android Malware Control Flow Graph Through Graph Representation Learning and Interpretability (Poster)  link »    To mitigate a malware threat, it is important to understand the malware’s behavior. The MITRE ATT&ACK ontology specifies an enumeration of tactics, techniques, and procedures(TTP) that characterize malware. However, absent are auto-mated procedures that would characterize, given the malware executable, which part of the execution flow is connected with a specific TTP. This paper provides an automation methodology to locate TTP in a sub-part of the control flow graph that describes the execution flow of a malware executable. This methodology merges graph representation learning and tools for machine learning explanation. Link » Christine patterson · Edoardo Serra 🔗 - Self-Supervised Visual Representation Learning for Time-series Clustering (Poster)  link »    Self-supervised and transfer learning has been demonstrated to lead to more generalized solutions than supervised learning regimes, recently reinforced by advances in both computer vision and natural language processing. Here, we present a simple yet effective method that learns meaningful representations for 1D time-series data through their 2D visual patterns without any external supervision. This is motivated by two factors: 1) supervision-disability, either due to lack of labelled data or lack of supervisory signal such as for exploratory data analysis, and 2) human-basis, emulating a data scientist's visual perception to obtain visualization-based insights from data that is inherently not necessarily 2D/image type. These are named Learned Deep Visual Representations (LDVR) for time-series. We first convert 1D time-series signals into 2D images, followed by self-supervised contrastive learning using pre-trained 2D CNNs to obtain time-series representations. The generalizability is demonstrated through diverse time-series datasets for the unsupervised task of clustering, where no prior knowledge of instance labels is utilized. The learnt representations lend themselves to more meaningful time-series clusters, validated through quantitative and qualitative analyses on the UCR times-series benchmark. Link » Gaurangi Anand · Richi Nayak 🔗 - The impact of weather information on machine-learning probabilistic electricity demand predictions (Poster)  link »    Accurate electricity demand prediction is an essential task for supporting power balance, energy trading, and demand-side management in power systems. Extreme weather events, such as winter cold spells or summer heatwaves, can result in unprecedented peak demands due to sudden heating or cooling needs. In those cases, the point demand prediction presenting a single possibility is not sufficient. The system needs to adopt the probabilistic forecast which produces the whole load probability distribution to assess diverse grid scenarios and future uncertainty. This work examines the impact of weather information on machine-learning probabilistic electricity demand predictions. The case study is performed on six European countries involving a great diversity of weather conditions, heating, and cooling needs. Link » Yifu Ding 🔗 - The Two-sample Problem in High Dimension: A Ranking-based Method (Poster)  link » In this work, we propose a general framework for testing the equality of two unknown probability distributions, when considering two independent i.i.d. random samples, that are valued on the (same) measurable space. While there exists a long-standing literature for the univariate setting, this problem remains a subject of research for both the multivariate and nonparametric frameworks. Indeed, the increasing ability to collect large, even massive data, that is possibly biased due to the collection process for instance, and of various structure, has strongly defied classical modelings, in particular in applied fields such as in biomedicine (e.g. clinical trials, genomics), in marketing (e.g. A/B testing), in economics, etc. The present method generalizes a particular class of permutation statistics known as 'two-sample linear rank statistics'. We overcome the lack of natural order in the multivariate feature space thanks to the comparison of the univariate 'projected' observations using a scoring function valued in the real line. In particular, our methods consists in a two-stage procedure. (i) 'Maximization of the rank statistic': on the first half of each sample, we optimize a tailored version of the two-sample rank statistic over the class of scoring functions by means of ranking-based algorithms, (ii) 'Two-sample homogeneity test': for a given level of test, the univariate rank test is performed on the remaining observations that have been scored with the optimal function of step (i). We accompany our method with theoretical guarantees and with various numerical experiments, that intend to model complex structures of data to incidentally question both existing and present statistical tests. Link » Myrto Limnios · Stephan Clémençon · Nicolas Vayatis 🔗 - Causal meta-learning by making informative interventions about the functional form (Poster)  link » Please see the submitted one-page pdf abstract. Link » Chentian Jiang · Chris Lucas 🔗 - A Data-driven Approach to Infer Latent Dynamics of COVID-19 Transmission Model (Poster)  link »    Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2; COVID-19) has rapidly spread across the world since 2019. Under the circumstance of viral variants that have emerged, the importance of mathematical modeling of infectious diseases is highlighted to understand the ongoing outbreak. In this work, we propose a data-driven epidemic model based on the Markov chain including a vaccinated and isolated group. Our model uses daily reported data and fits them into our model to find underlying dynamics parameters of the Markov chain. In this work, we aim to estimate latent state values by taking advantages of the officially reported data and underlying Markov chain dynamics. We confirmed that the proposed model is able to successfully estimate all states by fitting COVID-19 data in South Korea into our model. Link » Sujin Ahn · Minhae Kwon 🔗 - Maintenance planning framework using online and offline deep reinforcement learning (Poster)  link » Cost-effective asset management is an area of interest across several industries, for example, manufacturing, transportation, and infrastructure. This paper develops a deep reinforcement learning (DRL) framework to automatically learn an optimal rehabilitation policy for continuously deteriorating water pipes. We approach the problem of rehabilitation planning in online and offline DRL settings. In online DRL, the agent interacts with a simulated environment of multiple pipes with distinct length, material, and failure rate characteristics. We train the agent using deep Q learning (DQN) to learn an optimal policy with minimal average costs and maximum reliability level for assets under consideration. In offline learning, the agent uses the entire DQN replay dataset to learn an optimal policy via the conservative Q-learning algorithm without further interactions with the environment. We demonstrate that DRL-based policies show improvements over standard preventive and corrective planning approaches. Additionally, learning from the fixed DQN replay dataset surpasses vanilla DQN, which learns from online interactions with the environment. The results warrant that the existing deterioration profiles of water pipes consisting of large and diverse states and actions trajectories can be used to learn rehabilitation policies in offline DRL settings. Link » Zaharah Bukhsh · Nils Jansen 🔗 - Machine Learning-based Mobility Assessment from Passively Sensed Digital Biomarkers (Poster)  link »    Mobility limitations are associated with poor clinical outcomes, including higher mortality and disability rates, especially in older adults. Early detection of decline in mobility is of great importance for clinical practice, as it can still be stabilized or even reversed in the early stages. Mobile sensing offers a great range of sources, such as GPS, accelerometer, gyroscope, that can be used to implement mobility measures. Unlike traditional assessment tools, these technologies allow passively observing patients’ functions in real-life settings. Therefore, the purpose of this study is to develop a machine learning-based model to passively follow up on patients’ mobility over time from passively sensed mobility descriptor biomarkers and socio-demographic data of patients. The WHODAS 2.0 Questionnaire is used as a mobility measurement tool, which queries whether the individual had difficulty performing a set of tasks over the past 30 days. Using these scores as target outcomes, we define a pipeline that performs feature encoding for the daily information by applying Time2Vec, followed by an LSTM encoder for the 30-day embedded input sequence. A feed-forward layer on top of the LSTM outputs concatenated with demographic data is then used to get the predictions. Moreover, since the temporal data is regularly sampled but frequently missing, probabilistic generative models will be used to perform data imputation. Link » Emese Sükei · Pablo Olmos 🔗 - Soil Moisture Estimation with cycleGANs for Time-series Gap Filing (Poster)  link »    Fast soil moisture content (SMC) mapping is necessary to support water resource management and to understand crops' growth, quality and yield. Thereby, Earth Observation (EO) plays a key role due to its ability of almost real-time monitoring of large areas in a low cost. This study aims to explore the possibility of taking advantage of free-available Sentinel-1 (S1) and Sentinel-2 (S2) EO data for the simultaneous prediction of SMC with cycle-consistent adversarial network (cycleGAN) for time-series gap filing. The proposed methodology, first, learns latent low-dimensional representation of the satellite images, then learns a simple machine learning model on top of these representations. Specifically, we presented an efficient framework for extracting latent features from S1 and S2 imagery. Link » Nataliia Efremova · Esra Erten 🔗 - Social Representation of Political Inclination of Users (Poster)  link »    The unprecedented adoption of social media for communicating political views has created widespread opportunities to study the opinions of enormous numbers of people who have been labeled as politically active individuals in real time. But the absence of methods to classify between users of conflicting po- litical alignments, absence of distinct signals at the very base level i.e. for each individual may, in the aggregate, mystify extreme partisan differences in ideologies that are important and associated with a particular political strategy. In this paper, we present a preliminary yet ground-breaking study of the social representation of news articles on Twitter. Existing methods for predicting the political inclination of users use techniques which rely on hand- engineering and use a small number of candidate predictors based on domain knowledge. Moreover, literature consists of comparative analysis of extreme ideology with neutral, small scale personality analysis and no mapping between network analysis and political inclination. In this work, we perform diverse feature analysis, a comparative linguistic analysis of the aggregate and top left and right-biased user base, study the personality traits on a large scale using IBM Watson Personality Insights API, perform network analysis using the follower-followee & reply graph & then build a set of machine learning models that are able to automatically detect the bias associated with a tweet. Link » Anjali Jha 🔗 - Parkinson’s Disease Detection using Imputed Multimodal Datasets (Poster)  link » Parkinson's disease is a progressive nervous system disorder that affects movement, often including tremors and work has been done on developing systems for early-stage detection of Parkinson’s disease. During our literature survey we observed that there are many available datasets, but most are small and often have multiple samples from the same subject. Either the current state-of-the-art models assume the source of the multi-modality to be the same, or they lack deployability in low-income areas, because of their dependence on means of data-collection which require expensive equipment. In our goal to build a more generalizable hybrid model, we decided to tackle this problem. We propose utilize information from 3 datasets available in the public domain. Link » Hetvi Jethwani · Bhumika Chopra 🔗 - Combining semantic search and twin product classification for recognition of purchasable items in voice shopping (Poster)  link » For virtual assistants like Alexa, Google Home the accuracy of the online shopping component via voice commands is particularly important and may have a great impact on customer trust. To ensure good customer experience, our work focuses on the problem of detecting if an utterance contains actual and purchasable products, thus referring to a shopping-related intent. A typical Spoken Language Understanding [4] architecture consists of an intent classifier and a slot detector. Intent classification identifies the user’s intent from a set of pre-defined intents and slot labeling extracts token sequences that are relevant for the fulfillment of the user’s request. For example, if the user says ‘Buy toilet paper’ the intent is BuyItem and the item slot is toilet paper. Buy is not important to fulfill the user’s request and is therefore not part of the slot. To understand if an item is purchasable on the connected e-commerce platform, one needs to check if the item is part of the platform’s product catalog. Searching through billions of products to check if a detected slot is a purchasable item is prohibitively expensive. To overcome this problem, we present a framework that (1) uses a retrieval module [3] that returns the most relevant products with respect to the detected slot, and (2) combines it with a twin network [1] [2] that decides if the detected slot is indeed a purchasable item or not. Figure 1 shows the architecture of the classifier. We show that the classifier outperforms a typical slot detector approach, with a gain of +81% in accuracy and +41% in F1 score. Passing the whole utterance on the left of the twin network instead of only the ItemName candidate and using an online contrastive loss function resulted in the best performance. For the retrieval module, we experimented with different numbers of matching products returned by semantic search and show that using the top five most relevant product names yields the best results. Link » Dieu Thu Le · Verena Weber 🔗 - Efficient evaluation metrics for evaluating the performance of GANs Architecture (Poster)  link » The significance of Generative Adversarial Networks (GAN) cannot be overemphasized, especially its adoption in computer vision [1] applications such as image generation, image to image translation, facial attribute manipulation and similar domains. GAN is a generative model in machine learning. The architecture (Figure 1) is made of two networks: Generator and Discriminator [2]. The generator function is basically to create an object that is as close as the real data using a random noise variable as the input. The discriminator on the other hand must be able to differentiate the data coming from the generator and the actual data. The advantages of GAN over other generative models, such as variational autoencoders, is that it can handle sharp estimated density functions, generate desired samples efficiently, and eliminate deterministic bias. However, with the great successes achieved applying GANs to real-world problems possess significant challenges Link » Ramat Salami · Sakinat Folorunso 🔗 - Automated deep lineage tree analysis using a Bayesian single cell tracking approach (Poster)  link » Single-cell methods are beginning to reveal the intrinsic heterogeneity in cell populations. However, it remains challenging to quantify single-cell behaviour from time-lapse microscopy data, owing to the difficulty of extracting reliable cell trajectories and lineage information over long time-scales and across several generations. To address this challenge, we developed a hybrid deep learning and Bayesian cell tracking approach to reconstruct lineage trees from live-cell microscopy data (Figure 1). We implemented a residual U-Net model coupled with a cell state CNN classifier to allow accurate instance segmentation of the cell nuclei. To track the cells over time and through cell divisions, we developed a Bayesian cell tracking methodology that uses input features from the images to enable the retrieval of multi-generational lineage information from a corpus of thousands of hours of live-cell imaging data. Using our approach, we extracted 20,000+ fully annotated single-cell trajectories from over 3,500 hours of video footage, organised into multi-generational lineage trees spanning up to 8 generations and fourth cousin distances. Benchmarking tests against other tracking algorithms, including lineage tree reconstruction assessments, demonstrate that our approach yields high-fidelity results with our data, with minimal requirement for manual curation. To demonstrate the robustness of our minimally supervised cell tracking methodology, we retrieve cell cycle durations and their extended inter- and intra-generational family relationships in 5,000+ fully annotated cell lineages without any manual curation. Our analysis expands the depth and breadth of investigated cell lineage relationships in approximately two orders of magnitude more data than in previous studies of cell cycle heritability. Link » Kristina Ulicna · Giulia Vallardi · Guillaume Charras · Alan R Lowe 🔗 - Targeted active semi supervised learning for new customers in virtual assistants (Poster)  link » For virtual assistants new customers play a special role and pose a specific challenge. New users interact with the device more naturally and conversationally as they are not yet aware which commands work or and which don’t. Data from new customers is therefore especially valuable to improve spoken language understanding (SLU) systems. In this work, we improve intent classification [5] in SLU for new customers. Most studies in the literature on semi-supervised learning focus on general accuracy improvement [1] [2] [3] [4]. In contrast, we concentrate on data from new customers and use the framework to improve the experience not only for new but for all customers in a large-scale setting. We employ a self training framework that combines targeted active and semi-supervised learning to incorporate new customers’ utterances into the training set. The first step is the identification of problematic utterances that new customers often use and that differ from utterances more experienced customers say frequently. Therefore we project frictional utterances from both cohorts into an embedding space using BERT and topic modelling, then use a density clustering with topic guidance to identify the areas that are representative of utterances from new customers. After the identification, we combine active and semi-supervised learning in two phases. In phase I (Active Learning), we annotate problematic utterances from new customers that were identified in the first step to get a correct interpretation. We then use the model trained on the added active learning dataset as the teacher model [6] for phase II, Semi-supervised Learning. We use a retrieval module to retrieve utterances that are similar to the selected ones in phase I, and use the teacher model to provide pseudo labeling, which is then added into the training set. Results for two languages show offline improvements from 100-300 bps and improvements in online friction rates of up to 100 bps overall for new customers. Link » Dieu Thu Le · Verena Weber 🔗 - Evaluating the Impact of Embedding Representations on Deception Detection (Poster)  link » Contextualized word embeddings underpin most state-of-the-art machine learning models used in natural language processing, natural language understanding, and more. With the ever-increasing number of models, there are a multitude of options to choose from. It can be difficult to assess the strengths, weaknesses, or biases of these models; for the average ML practitioner, pretrained embeddings can seem like black boxes because pretraining requires significant computational resources and time and remove control or identification of biases present in training data that can affect downstream behavior. In this ongoing work, we evaluate the extent to which the choice of pre-trained embeddings impacts downstream performance on a deception detection task. We evaluate the impact of seven variations of four popular text embedding models — ALBERT (base v2 and XXLarge v2), BERT (base cased, base uncased, and multilingual), DistilBERT (base), and RoBERTa (base) — on deception detection performance. We leverage the huggingface[3] library to train models for each embedding variation that use a consist architecture of an embedding layer, a varying number of LSTM layers, and a final output layer to classify social media posts as deceptive or credible. We use a dataset of social news posts from Twitter and Reddit in 2016 that has been used previously for deception detection evaluations[1,2] that contains 40k posts linked to credible sources and 55k post linked to deceptive sources (e.g., sources that share clickbait, propaganda, disinformation). We reserved 20% of posts for testing and 10% for validation, and tuned hyperparameters using grid search and trained for 100 epochs. Link » Ellyn Ayton · Maria Glenski 🔗 - Commit-Checker: A human-centric approach for adopting bug inducing commit detection using machine learning models (Poster)  link »    When developing new software, testing can take up to half of their sources. Although much work has been done to automate different parts of testing, fixing a bug after it is discovered is still a costly task that developers need to do. Recently, the research community has proposed different methodologies in order to detect Just-in-Time(JIT) bugs at the commit level. But all this work including the state-of-the-art technique of Kamei et al. does not provide a real-time solution for the problem. In our study, we focused on finding usability patterns that can give real-time support to the developers so that they can be warned that their commit may have a chance of being bug-inducing before they perform the commit operation. Keeping that in mind, we built a VSCode IDE plug-in that warns the developer when they try to perform a commit operation. Additionally, we build an executable tool for developers who prefer to use a command-line interface. For the machine learning model to detect a bug-inducing commit, we followed the work of Kamei et al.and applied Random Forest, Decision Tree and Logistic Regression as the Machine Learning algorithm. According to Catal-ino et al., we first filter out the most essential metrics based on their information gain and build the model with an 80% accuracy score and 58% f-1 score, which Logistic Regression achieved. Our developed plug-in and command-line tool provide great support for developers to detect bug-inducing commits. After building the VSCode plug-in, we provided a configuration option along with the plug-in that developers can use for customization. The developers can select their preferred features and algorithms in order to train their own models, compare their accuracy and use their own customized models in the plug-in. This option also worksfor JetBrains plug-ins like PyCharm and WebStorm. Finally, we performed a user study with developers working in the software industry to validate the usage of our tools. Link » Naz Zarreen Oishie 🔗 - Getting Started with Model Cards (Poster)  link »    Tech companies are proficient at creating and maintaining product specifications. With machine learning applications being increasingly deployed in the real world, it is important to be transparent about scope and limitations. A straightforward way to do that is through model cards [1]. Model card is a short document that records model behavior, context, intended use, and details about the data. The goal of our cross-functional effort is to help make it easy to create model cards. We share our experience with creating model cards for two kinds of machine learning models: medical image classification [3, 4] and credit default prediction [5]. We describe experiments we performed, the tools we used, and datasets we analyzed [3, 4, 5]. We also propose a model card template created using a user experience design tool. We describe some of the challenges we encountered, to help make it visible what it would take for wide adoption of model cards by practitioners. We ask and provide some answers to questions like: What data will you need? How time- and resource-intensive will the experiments be? How to interpret and communicate the experiment results? Link » Maitreyi Chitale · Anoush Najarian · Helen Chigirinskaya · Louvere Walker-Hannon · Sindhuja Parimalarangan · Rajasi Desai 🔗 - Using Embeddings to Estimate Peer Influence on Social Networks (Poster)  link » We address the problem of using observational data to estimate peer contagion effects, the influence of treatments applied to individuals in a network on the outcomes of their neighbours. A main challenge to such estimation is that homophily - the tendency of connected units to share similar latent traits - acts as an unobserved confounder for contagion effects. Informally, it's hard to tell whether your friends have similar outcomes because they were influenced by your treatment, or whether it's due to some common trait that caused you to be friends in the first place. Because these common causes are not usually directly observed, they cannot be simply adjusted for. We describe an approach to perform the required adjustment using node embeddings learned from the network itself. The main aim is to perform this adjustment non-parametrically, without functional form assumptions on either the process that generated the network or the treatment assignment and outcome processes. The key questions we address are: How should the causal effect be formalized? And, when can embedding methods yield causal identification? Link » Irina Cristali · Victor Veitch 🔗 - Sequential Decision Making with Limited Resources (Poster)  link » Offline reinforcement learning methods have been used to learn policies from observational data for recommending treatment of chronic diseases and interventions in critical care. In these formulations, treatments can be recommended for each patient individually without regard to treatment availability because resources are plentiful and patients are independent of one another. However, in many decision making problems, such as recommending care in resource poor settings, the space of available actions is constrained and the policy must take these constraints into account. We consider the problem of learning policies for personalized treatment when there are limited resources and actions taken for one patient affect the actions available for other patients. One such sequential decision making problem is hospital bed assignment. Hospitals are complex systems, in which not only the medical care, but also the physical hospital environment aﬀect patients’ outcomes. For CDI, one of the most common healthcare acquired infections, the history of a patient’s bed and room can contribute to their risk of infection because c. diff. spores can linger on surfaces. We consider the problem of assigning patients to hospital beds with the objective of reducing the incidence of Clostridioides diﬃcile infection (CDI) while taking into account the limited availability of beds. Our algorithm first learns a Q-function for assigning beds to an individual patient ignoring bed availability. We use this Q-function to assign patients to beds in order of their risk level, taking the highest value action among those available for each patient. We test our algorithm on simulated data as well as a real dataset of hospitalizations from a large urban hospital. Link » Hallee Wong · Maggie Makar · Aniruddh Raghu · John Guttag 🔗 - Mitigating Overlap Violations in Causal Inference with Text Data (Poster)  link » Consider the problem of estimating the causal effect of some attribute of a text document; for example: what effect does writing a polite vs. rude email have on response time? To estimate a causal effect from observational data, we need to control for confounding by adjusting for a set of covariates X that includes all common causes of the treatment and outcome. For this adjustment to work, the data must satisfy overlap: the probability of treatment should be bounded away from 0 and 1 for all levels of X. In the text setting, we can try to satisfy the requirement that we adjust for all common causes by adjusting for all the text. However, when the treatment is an attribute of the text, this violates overlap. The main goal of this paper is to develop an alternative approach that allows us to adjust for a “part” of the text that is large enough to control for confounding but small enough to avoid overlap violations. We propose a procedure that can identify and throw away the part of the text that is only predictive of the treatment. This information is not necessary to control for confounding (it does not affect the outcome) and so can be safely removed. On the other hand, if the removed information was necessary for perfect treatment prediction, then overlap will be recovered. We adapt deep models and propose a learning strategy to recognize multiple representations with different prediction properties. The procedure explicitly divides a (BERT) embedding of the text into one piece relevant to the outcome and one relevant to the treatment only. A regularization term is included to enforce this structure. Early empirical results show that our method effectively detects an appropriate confounding variable and mitigates the overlap issue. Link » Lin Gui · Victor Veitch 🔗 - Interpretable & Hierarchical Topic Models using Hyperbolic Geometry (Poster)  link »    Topic Models (TM) are statistical models to learn latent topics present in a collection of text documents. These topics are usually not independent and represent concepts that are related hierarchically. Flat TMs such as LDA fail to capture this inherent hierarchy. To overcome these limitations, Hierarchical Topic Model (HTM) have been proposed that discover latent topics while preserving the inherent hierarchical structure between different topics (for example, aspect hierarchies in reviews and research topic hierarchies in academic repositories). Despite showing great promise, the state-of-the-art HTMs fail to capture coherent hierarchical structures. Also the number of topics in each level is usually unknown and is determined empirically, wasting a lot of time and resources. Finally, HTMs have very long training time, making them unsuitable for real-time production environments. Thus, there is a need for HTMs that (i) offers good hierarchical structures and more meaningful and interpretable topics and that (ii) can automatically find the number of topics at each level without multiple training iterations. In our work, we address the problems mentioned above by utilizing properties of hyperbolic geometry that has been successfully applied in learning hierarchical structures such as ontologies in Knowledge bases, and latent hierarchy between words. Our initial experiments have yielded promising results where the training time has reduced from weeks to less than an hour, and the quantitative metrics have improved with significantly better hierarchical structures. We have attached an abstract with a brief overview of our approach and results. Link » Simra Shahid · Tanay Anand · Nikaash Puri · Balaji Krishnamurthy 🔗 - As easy as APC: Leveraging self-supervised learning in the context of time series classification with varying levels of sparsity and severe class imbalance (Poster)  link »    High levels of sparsity and strong class imbalance are ubiquitous challenges that are often presented simultaneously in real-world time series data. While most methods tackle each problem separately, our proposed approach handles both in conjunction, while imposing fewer assumptions on the data. In this work, we propose leveraging a self-supervised learning method, specifically Autoregressive Predictive Coding (APC), to learn relevant hidden representations of time series data in the context of both missing data and class imbalance. We apply APC using either a GRU or GRU-D encoder on two real-world datasets, and show that applying one-step-ahead prediction with APC improves the classification results in all settings. In fact, by applying GRU-D - APC, we achieve state-of-the-art AUPRC results on the Physionet benchmark. Link » Fiorella Wever 🔗 - SPP-EEGNET: An Input-Agnostic Self-supervised EEG Representation Model for Inter-Dataset Transfer Learning (Poster)  link »    Self-supervised contrastive learning learns inherent features from unlabeled data. In this work, we develop an EEG feature extractor model and train it on a contrastive learning task. Such a model can then be transferred to any new EEG dataset without the need for modifying the target dataset's original dimensionality. The proposed method shows great potential to improve the performance of downstream classification tasks. Link » Xiaomin Li · Vangelis Metsis 🔗 - Maintaining fairness across distribution shifts: do we have viable solutions for real-world applications? (Poster)  link » Fairness and robustness are often considered as orthogonal dimensions to evaluate machine learning on. Recent evidence has however displayed that fairness guarantees are not transferable across environments. In healthcare settings, this can result in e.g. a model that performs fairly (according to a selected metric) in hospital A showing unfairness when deployed in hospital B. Here we illustrate how fairness metrics may change under distribution shift using 2 real-world applications in Electronic Health Records (EHR) and Dermatology. We further show that clinically plausible shifts simultaneously affect multiple parts of the data generation process through a causal analysis. Such complex shifts invalidate most assumptions required by current mitigation techniques, which typically target either covariate or label shift. Our work hence displays a technical gap to a realistic problem and hopes to elicit further research at the intersection of fairness and robustness in real-world applications. Link » Jessica Schrouff · Natalie Harris · Sanmi Koyejo · Ibrahim Alabdulmohsin · Eva Schnider · Diana Mincu · Christina Chen · Awa Dieng · Yuan Liu · Vivek Natarajan · Katherine Heller · Alexander D'Amour 🔗 - How Much Data Analytics is Enough?: The ROI of Machine Learning Classification and its Application to Requirements Dependency Classification (Poster)  link » Machine Learning (ML) can substantially improve the efficiency and effectiveness of organizations and is widely used for different purposes within Software Engineering. However, the selection and implementation of ML techniques rely almost exclusively on accuracy criteria. Thus, for organizations wishing to realize the benefits of ML investments, this narrow approach ignores crucial considerations around the anticipated costs of the ML activities across the ML life-cycle, while failing to account for the benefits that are likely to accrue from the proposed activity. We present findings for an approach that addresses this gap by enhancing the accuracy criterion with return on investment (ROI) considerations. Specifically, we analyze the performance of the two state-of-the-art ML techniques: Random Forest and Bidirectional Encoder Representations from Transformers (BERT), based on accuracy and ROI for two publicly available data sets. Specifically, we compare decisionmaking on requirements dependency extraction (i) exclusively based on accuracy and (ii) extended to include ROI analysis. As a result, we propose recommendations for selecting ML classification techniques based on the degree of training data used. Our findings indicate that considering ROI as additional criteria can drastically influence ML selection when compared to decisions based on accuracy as the sole criterion. Link » Gouri Deshpande · Guenther Ruhe 🔗 - Combining Transfer Learning And Transformer Attention Mechanism to Increase Aqueous Solubility Prediction Performance (Poster)  link » In this study, we propose a deep learning architecture that employs both a text representation of molecules and a graph-based strategy with an attention mechanism to learn and predict aqueous solubility. The core contributions of this work are as follows. (1) We treat aqueous solubility prediction as a translation problem. Our architecture represents an encoder-decoder design. However, in order to learn a latent representation, our main encoder consists of two subencoders, i.e., a graph encoder and an encoder that employs a Transformer. We call this architecture M2M. (2) To address the problem of the availability of limited amounts of high-quality data and to increase the aqueous solubility prediction performance, transfer learning is incorporated. Therefore, we first pretrain the model on pKa dataset that consists of more than 6000 chemical compounds. Then, the learned knowledge is transferred to be used on a smaller water solubility dataset. The final architecture is called TunedM2M. (3) We demonstrate that the proposed method outperforms the state-of-the-art approaches, obtaining an RMSE of 0.587 during both cross-validation and a test on an independent dataset. To be more precise, the model is evaluated on molecules downloaded from the Online Chemical Database and Modeling Environment (OCHEM). Beyond aqueous solubility prediction, the strategy presented in this work may be useful for modeling any kind of (chemical or biological) properties for which there is a limited amount of data available for model training. Link » Magdalena Wiercioch 🔗 - Automatic Curricula via Expert Demonstrations (Poster)  link »    We propose Automatic Curricula via Expert Demonstrations (ACED), a reinforcement learning (RL) approach that combines the ideas of imitation learning and curriculum learning in order to solve challenging robotic manipulation tasks with sparse reward functions. Curriculum learning solves complicated RL tasks by introducing a sequence of auxiliary tasks with increasing difficulty, yet how to automatically design effective and generalizable curricula remains a challenging research problem. ACED extracts curricula from a small amount of expert demonstration trajectories by dividing demonstrations into sections and initializing training episodes to states sampled from different sections of demonstrations. Through moving the reset states from the end to the beginning of demonstrations as the learning agent improves its performance, ACED not only learns challenging manipulation tasks with unseen initializations and goals, but also discovers novel solutions that are distinct from the demonstrations. In addition, ACED can be naturally combined with other imitation learning methods to utilize expert demonstrations in a more efficient manner, and we show that a combination of ACED with behavior cloning allows pick-and-place tasks to be learned with as few as 1 demonstration and block stacking tasks to be learned with 20 demonstrations. Link » Siyu Dai · Andreas Hofmann · Brian Williams 🔗 - Across the Pond and Back: Evaluation of News Deception Detection Approaches Across Natural and Synthetic Regional Dialects (Poster)  link » The need for deceptive news detection on social media platforms has grown significantly in the last several years. In 2018 about half of consumers expected the news they received on social media to be largely inaccurate and surveys from 2021 have revealed that 57% of adults would like to see steps taken to restrict the spread of false information online [1]. Although several approaches have been proposed, most rely on standard performance metrics and test data sets which do not effectively capture underlying biases or model dependencies; they answer the question of how the model is performing but not why or within what circumstances the model will perform in this way. Recent work has also shown models display biases towards certain dialects – e.g., “California English” [2]. Standard test data used to evaluate model performance is not likely to be representative across variations, such as regional or dialectic differences in language, that the model will encounter when deployed in a real-world setting. Link » Robin Cosbey · Maria Glenski 🔗 - Leveraging Resource Allocation and Approximation for Faster Hyperparameter Exploration (Poster)  link »    The test accuracy of machine learning models, such as deep neural networks, critically depends on hyperparameters related to model size (e.g., number of hidden layers) and model optimizer (e.g., learning rate). There has been much research on designing parallel Hyperparameter Exploration (HPE) algorithms, which concurrently train models configured with different hyperparameters and search for the best configuration. However, existing HPE algorithms have two limitations. First, most algorithms are synchronous parallel, and the exploration procedure can be significantly slowed down when candidate models have skewed training speeds. Second, existing algorithms are agnostic to resource availability and are not resilient to severe and dynamically changing resource constraints. In this study, we propose an HPE algorithm to overcome these limitations. First, our design exploits nearly lossless approximate training techniques (i.e., mixed-precision quantization and low-rank gradient compression) to reduce resource requirements and fit within runtime resource constraints. In addition, approximation speeds up slow-training models with reduced compute and network complexity. Second, our algorithm dynamically re-distributes compute and network resource allocation based on the significance of candidate models for maximal resource efficiency. Experiments show that our design achieves 1.5-3.2x speedup over existing synchronous and asynchronous HPE algorithms on real HPE traces. Link » Lynn Liu 🔗 - Machine learning powered quantitative histologic assessment of disease severity in ulcerative colitis (Poster)  link »    Abstract PDF file uploaded. Link » Kathleen Sucipto · Maryam Pouryahya · Victoria Mountain 🔗 - Application of an interpretable graph neural network to predict gene expression in histopathological images (Poster)  link »    Abstract uploaded as a pdf Link » Judy Shen · Victoria Mountain · Maryam Pouryahya 🔗 - Graph Neural Networks for automated histologic scoring of NASH liver biopsy (Poster)  link » Abstract pdf uploaded Link » Maryam Pouryahya · Victoria Mountain 🔗 - Improved robustness to disfluencies in RNN-Transducer based Speech Recognition (Poster)  link » Automatic Speech Recognition (ASR) based on Recurrent Neural Network Transducers (RNN-T) is gaining interest in the speech community. We investigate data selection and preparation choices aiming for improved robustness of RNN-T ASR to speech disfluencies with a focus on partial words. For evaluation we use clean data, data with disfluencies and a separate dataset with speech affected by stuttering. We show that after including a small amount of data with disfluencies in the training set the recognition accuracy on the tests with disfluencies and stuttering improves. Increasing the amount of training data with disfluencies gives additional gains without degradation on the clean data. We also show that replacing partial words with a dedicated token helps to get even better accuracy on utterances with disfluencies and stutter. The evaluation of our best model shows 22.5% and 16.4% relative WER reduction on those two evaluation sets. Link » Tina Raissi 🔗 - `We Don't Talk Anymore?": An analysis of cross-cutting political discussion on Reddit (Poster)  link » Discussion across political divides is vital to sustain a functional democracy. We live in a world of unprecedented connection: high-speed Internet on our affordable handheld devices transcend geographic borders. Yet, an open question remains as to whether this heightened state of connectivity is indeed beneficial, or drives far-reaching wedges into the sphere of politics. However, there is a lack of significant research exploring the degree to which the topic under discussion influences participants to engage in cross-cutting political conversation in online communities. I explore whether the propensity for cross-cutting political discussion is dependent on the nature of the topic. I approach this question by comparing conversation patterns on Reddit's Change My View, a community created for the specific purpose of being challenged by those with differing opinions. The polarised subreddits analysed in this study were identified through tapping into the hivemind of Reddit's 'Against Hate Subreddits', a space where users flag potentially problematic communities. Structural topic models were used to extract underlying topics from a corpus of over 369,000 posts across these 18 communities over the three year time period of 2018-2021. The first research question employs structural topic models to exploree whether there is a topical difference between opinions put forwards to be challenged on Change My View and those discussed on 18 polarised subreddits. The second research question explores whether there is a topical difference not just in engagement, but also in the willingness to change one's perspective. The findings indicate a relationship between the nature of the topic and the propensity for cross-cutting political discussion. However, there seems to be no association between whether a topic is more likely to enter cross-cutting political discussion, and the propensity for those who do enter to change their viewpoint. Link » Dulshani Withana Thanthri Gamage 🔗 - A Graph Perspective on Neural Network Dynamics (Poster)  link »    Attached is a pdf file with the full abstract. Link » Fatemeh Vahedian · Ruiyu Li · Puja Trivedi · Di Jin · Danai Koutra 🔗 - Propagation on Multi-relational Graphs for Node Regression (Poster)  link »    Recent years have witnessed a rise in real-world data captured with rich structural information that can be conveniently depicted by multi-relational graphs. While inference of continuous node features across a simple graph is rather under-studied by the current relational learning research, we go one step further and focus on the node regression problem on multi-relational graphs. We take inspiration from the well-known label propagation algorithm aiming at completing categorical features across a simple graph and propose a novel propagation framework for completing missing continuous features at the nodes of a multi-relational and directed graph. Our multi-relational propagation algorithm is composed of iterative neighborhood aggregations which originate from a relational local generative model. Our findings show the benefit of exploiting the multi-relational structure of the data in node regression task. Link » Eda Bayram 🔗 - The Many Hats We Wear as Machine Learning Practitioners for Marine Mammal Conservation (Poster)  link » Please see document attached Link » Louisa van Zeeland · Gracie Ermi 🔗 - Self-supervised pragmatic reasoning (Poster)  link »    Models of context-sensitive communication often use the Rational Speech Act framework (RSA; Frank & Goodman, 2012), which formulates listeners and speakers in a cooperative reasoning process. Large-scale applications of RSA have relied on training models to imitate human behaviors using contextually grounded datasets, but collecting such data can be costly. Here, we propose a new approach to scalable pragmatics, building upon recent theoretical results (Zaslavsky et al., 2020) that characterize pragmatic reasoning in terms of general information-theoretic principles. Specifically, we propose an architecture and learning process in which agents acquire pragmatic policies via self-supervision instead of imitating human data. This work suggests a new principled approach for equipping artificial agents with pragmatic skills via self-supervision, which is grounded both in pragmatic theory and in information theory. Link » Jennifer Hu · Roger Levy · Noga Zaslavsky 🔗 - Graph Convolutional Networks for Multi-modality Movie Scene Segmentation (Poster)  link » A typical movie scene is comprised of a number of different shots, edited together to form a narrative thread. The intricate transitions between different shots within a movie scene allow filmmakers to tell the story or convey a message in a clear and vivid manner. As a result of the complexity in the interactions between individuals and their actions within a movie scene, a major challenge in movie semantic understanding is that of scene segmentation, where the goal is to identify the individual scenes within a movie. A key part of the challenge with scene segmentation is the fact that a movie scene may be comprised of multiple uncut shots filmed over an uninterrupted period of time, leading to a visually discontinuous yet semantically coherent segment. Therefore, while separating a movie into individual shots can be accomplished based on visual continuity between frames, separating a movie into individual scenes requires a much deeper understanding of the semantics of a film and the relationship between shots that are semantically consistent but physically distinct. Link » Yaoxin Li · Alexander Wong · Mohammad Javad Shafiee 🔗 - AI-Driven Predictive Analytics to Inform Nuclear Proliferation Detection in Urban Environments (Poster)  link » Unattended radiological sensor networks must take advantage of contextual data e.g., open-source data in addition to historical sensor signals to anticipate nuclear isotope signatures and mitigate nuisance alarms in urban environments. To address these challenges, we have developed novel AI-driven predictive analytics – using machine learning and deep learning models – to predict radiological isotope signatures by learning from historical sensor data for 9 months in DC and 7 months in Fairfax in 2019 and 2020. Our sensor data includes alerts from three medical Tc-99m, I-131 and 511 from Positron Emission Tomography (PET) and one industrial Cs-137 isotopes. Our AI-driven analytics leverage historical data [1] to anticipate the number of alarms per isotope per sensor in the next hour across nine sensors in two locations: Fairfax, VA and Washington, DC. We design experiments to contrast performance of the state-of-the-art ML models (Logistic Regression, Random Forest, ARIMA, SVM, and K-Nearest Neighbors) with deep learning models that rely on Long-Short Term Memory (LSTM) [2] and Transformer [3] architectures. n Table 1, we present experimental results of the LSTM model compared to two top performing ML models – Logistic Regression (LR) and Random Forest (RF) – over two locations: Fairfax, VA (4 sensors over 7 months) and Washington, DC (5 sensors over 9 months). The LSTM model outperformed the top ML models for the Fairfax, VA location and met or exceeded the performance of the top ML models for the Washington, DC location across several metrics. Link » Anastasiya Usenko · Ellyn Ayton · Svitlana Volkova 🔗 - Do You See What I See: Using Augmented Reality and Artificial Intelligence (Poster)  link »    We explore the challenges of real-world applications of augmented reality (AR) and artificial intelligence (AI) through experiments that demonstrate interactions with an augmented world. We interrogate applications of AR and AI, their limitations, and social impacts amplified by the pandemic. We share code, explore ethical considerations, and future projects. We conduct three experiments to apply the theory of pose estimation with deep learning [1][2][3][4]. In our first experiment, we implement AR using segmentation to augment the scene captured by laptop webcam. In the second experiment, we use keypoint estimation using a deep neural network for pose estimation. In the last experiment, we implement pose estimation against different backgrounds. These experiments enable us to reflect on how context matters.[5] We ask, who is at the table when applications are built and deployed, and invite you too to reflect on the challenges associated with using AR and AI and our responsibilities as builders as tech. References 1. Carlo Tomasi and Takeo Kanade. Detection and Tracking of Point Features. Carnegie Mellon University Technical Report CMU-CS-91-132, 1991. 2. Coelho, T., Calado, P., Souza, L., Ribeiro-Neto, B., and Muntz, R. (2004). “Image Retrieval Using Multiple Evidence Ranking”. IEEE Transactions on Knowledge and Data Engineering, 16(4):408–417. 3. Ni, Jianjun & Khan, Zubair & Wang, Shihao & Wang, Kang & Haider, Syed. (2016). Automatic detection and counting of circular shaped overlapped objects using circular hough transform and contour detection. 2902-2906. 4. Xiao, Bin, Haiping Wu, and Yichen Wei. “Simple baselines for human pose estimation and tracking.” Proceedings of the European Conference on Computer Vision (ECCV). 2018. 5. A view to a brawl. (2013). Science Node. https://sciencenode.org/feature/What%20does%20violence%20look%20like.php Link » Shruti Karulkar · Louvere Walker-Hannon · Sarah Mohamed 🔗 - Measuring the Cause and Effect in Scientific Productivity: A Case Study of the ACL Community (Poster)  link » Causality aims to connect the dots between cause and effect beyond a simple correlation. We rely on the causal analysis as a tool to describe research influences and trends in the computational linguistics (CL) community. Specifically, aiming to draw connections about research productivity based on a scientists’ research portfolio in the area of CL. Studying these research trends is a valuable way to gain insights on a particular discipline and explain research dynamics within and across fields. Link » Jasmine Eshun · Maria Glenski · Svitlana Volkova 🔗 - Data Efficient Domain Adaptation using FiLM (Poster)  link » GANs or Generative Adversarial Networks, have been a cornerstone in the machine learning and computer vision community due to their ability to generate high-quality realistic images by learning the underlying complex distributions in the training data. On the other hand, adaptation of generative models by retraining on the target dataset is currently computationally extensive. Therefore, there exists a need for data efficient transfer of generative models. We hypothesize that approaches which seek to align the source and target distributions generally tend to overfit the target data. Alternatively, we propose linear modulation of features in a non-linear network (FiLM) which allows fine grain control over the learned parameters during retraining. Originally introduced for visual reasoning, FiLM learns scaling and bias parameters gamma and beta respectively, that help close the divergence during source and target distributions. Using a DCGAN, we obtain results on the MNIST digits dataset while adapting it to the Rotated MNIST dataset. As future work, we hope to explore the performance of FiLM in our proposed GAN structure in conjunction with further fine-tuning and manipulation of batchnorm statistics. Link » Sinjini Mitra · Rushil Anirudh · Jayaraman Thiagarajan · Pavan Turaga 🔗 - Exploiting Hyperdimensional Computing and Probabilistic Inference for Reasoning Across Levels of Abstraction in Dynamic Biosignal-Based Applications (Poster)  link » Hyperdimensional computing (HDC) has recently emerged as a well-suited approach for efficient biosignal processing and classification. Inspired by the understanding that the brain's computations rely on massive circuits of neurons and synapses, HDC operates on pseudo-random hypervectors (HVs) and a set of well-defined arithmetic operations. The traits of HDC have been most notably leveraged by electromyogram (EMG)-based hand gesture classification, often used for prosthetic interfaces. Previous works have shown that HDC can classify hand-gestures with over 90% accuracy by projecting the EMG signals onto >1000-dimensional bipolar HVs representing its spatiotemporal properties, and performing nearest-neighbor search on prototype class HVs learned from data. However, to augment the functionality of HDC beyond static input-output mappings, higher level-of-abstraction representation and reasoning capabilities are needed. We propose a hybrid scheme that can hierarchically represent the different levels of abstraction of the application: at the lowest level it relies on HDC and other machine learning models to encode and classify spatiotemporal features from biosensors; at the high level it relies on a Dynamic Bayesian Network (DBN) to probabilistically encode the temporal relations between user intent and the information provided by the layer below. Link » Laura Isabel Galindez Olascoaga 🔗 - Strategic Clustering (Poster)  link »    We study the problem of clustering in the context of social networks, where people can have preferences and incentives over the groups that they may be a part of. As clustering is increasingly used for grouping large data that pertains to people, we ask how much does the quality of the clustering --- typically measured by the conductance, or by the number of edges cut, or the average distance to the centers --- deteriorate if the nodes are strategic and can change clusters? And among reasonable utilities for the nodes, which one hurts quality the least? We investigate these questions both theoretically, by studying the equilibria of hedonic games (simplified clustering games with unconstrained number of clusters), and experimentally, by measuring the quality of pure Nash equilibria of more realistic clustering games. We introduce a new utility function for the nodes which we call closeness, and which we believe is an attractive alternative to previously studied node utilities. We study the properties of the closeness utility theoretically and demonstrate experimentally its advantages over other established utilities such as the modified fractional utility. Finally, we present a polynomial-time algorithm which, given a clustering with optimal quality, finds another clustering with better average utility, and in fact the one that maximizes the ratio of the gain in average utility over the loss in quality. Link » Ana Stoica · Christos Papadimitriou 🔗 - Deep Generative Models for Task-Based fMRI Analysis (Poster)  link »    While functional magnetic resonance imaging (fMRI) remains one of the most widespread and important methods in basic and clinical neuroscience, the data it produces---time series of brain volumes---continue to pose daunting analysis challenges. The current standard (mass univariate'') approach involves constructing a matrix of task regressors, fitting a separate general linear model at each volume pixel (voxel''), computing test statistics for each model, and correcting for false positives \emph{post hoc} using bootstrap or other resampling methods. Despite its simplicity, this approach has enjoyed great success over the last two decades due to: 1) its ability to produce effect maps highlighting brain regions whose activity significantly correlates with a given variable of interest; and 2) its modeling of experimental effects as separable and thus easily interpretable. However, this approach suffers from several well-known drawbacks, namely: inaccurate assumptions of linearity and noise Gaussianity; a limited ability to capture individual effects and variability; and difficulties in performing proper statistical testing secondary to independently fitting voxels. In this work, we adopt a different approach, modeling entire volumes directly in a manner that increases model flexibility while preserving interpretability. Specifically, we use a generalized additive model (GAM) in which the effects of each regressor remain separable, the product of a spatial map produced by a variational autoencoder and a (potentially nonlinear) gain modeled by a covariate-specific Gaussian Process. The result is a model that yields group-level effect maps comparable or superior to the ones obtained with standard fMRI analysis software while also producing single-subject effect maps capturing individual differences. This suggests that generative models with a decomposable structure might offer a more flexible alternative for the analysis of task-based fMRI data. Link » Daniela de Albuquerque · Jack Goffinet · Rachael Wright · John Pearson 🔗 - How we browse: Measurement and analysis of digital behavior (Poster)  link »    Accurately analyzing and modeling online browsing behavior play a key role in understanding users and technology interactions. In this work, we design and conduct a user study to collect browsing data from 31 participants continuously for 14 days and self-reported browsing patterns. We combine self-reports and observational data to provide an up-to-date measurement study of online browsing behavior. We use these data to empirically address the following questions: (1) Do structural patterns of browsing differ across demographic groups and types of web use?, (2) Do people have correct perceptions of their behavior online?, and (3) Does the length of time that people are observed relate to changes in their browsing behavior? In response to these questions, we find significant differences in level of activity based on user age, but not based on race or gender. We also find that users have significantly different behavior on Security Concerns websites, which may enable new behavioral methods for automatic detection of security concerns online. We find that users significantly overestimate the time they spend online, but have relatively accurate perceptions of how they spend their time online. We find no significant changes in behavior over the course of the study, which may indicate that observation had no effect on behavior or that users were consciously aware of being observed throughout the study. Link » Yuliia Lut · Rachel Cummings 🔗 - Augment Your Deterministic Model with Monte Carlo Dropout to Combat Noisy Labels (Poster)  link » It is inevitable that nowadays the AI systems are trained on inaccurate, missing, or wrong label information for classification, detection, or segmentation tasks. However, without accurate labels as ground truth, AI algorithms, especially those powered by deep neural networks, tend to perform badly. This is the infamous phenomenon called the memorization effect of deep learning where deep nets tend to learn from clean labels first, gradually adapt to noisy labels, and eventually overfit to completely random noise. Such a property of deep learning can cause poor generalization on test sets. Here we present a thorough study of augmenting deterministic models with Monte Carlo Dropout when training with both synthetic and real-world label noise settings. We investigate the classification efficacy, network sparsity and neuron responsiveness on label noise simulated via a class-conditional transition matrix and examine the method’s effectiveness on real-world dataset containing human annotation noise. Link » Li Chen · Ilknur Kaynar Kabul 🔗 - Occlusion-Aware Crowd Navigation Using People as Sensors (Poster)  link »    Navigating crowded, partially occluded environments is an open challenge for mobile robots. In dense crowds, spatial occlusions are inevitable due to limited sensor field of view (FOV) and obstructing obstacles. Prior work shows the efficacy of using observed interactions between human agents to make inferences about potential obstacles in occluded spaces. The observed humans act as an additional sensor to complement traditional sensors that form the incomplete map. Extending this idea, we propose a deep reinforcement learning (RL) planner that incorporates occlusion-aware features to encourage proactive avoidance of occluded obstacles and agents. We empirically demonstrate that our RL framework successfully avoids collisions with occluded agents by extracting informative features from observed interactions. To the best of our knowledge, this is the first study to exploit social inference in crowds for collision avoidance of occluded dynamic agents. Link » Ye-Ji Mun · Masha Itkina · Katherine Driggs-Campbell 🔗 - Active Noise Cancellation for Spatial Computing (Poster)  link »    We propose a noisy-label resilient model-agnostic training framework named Active Noise Cancellation (ANC) for semantic segmentation in spatial computing. In the presence of noisy labels, which arise due to measurement errors, crowdsourcing, insufficient expertise and so on, deep learning tends to have poor generalization on test data. In spatial computing, noisy labels largely come from crowdsourcing and measurement error. The ANC framework is a training paradigm to improve label quality during batch update, detect the unreliable pixel labels and filter them during training. We demonstrate the effectiveness of our proposed framework on two satellite image datasets for building footprint detection. As a result, our method produces better intersection over union (IoU), precision, recall and F1 score when training with noisy masks. Link » Li Chen · David Yang · Xiang Gao · Ilknur Kaynar Kabul 🔗 - Physics-assisted Machine Learning (Poster)  link » Pdf attached Link » Abhilasha Katariya · Jin Ye 🔗 - An Interpretable Approach to Hateful Meme Detection (Poster)  link »    Hateful memes are an emerging method of spreading hate on the internet, relying on both images and text to convey a hateful message. We take an interpretable approach to hateful meme detection, using machine learning and simple heuristics to identify the features most important to classifying a meme as hateful. In the process, we build a gradient-boosted decision tree and an LSTM-based model that achieve comparable performance (73.8 validation and 72.7 test auROC) to the gold standard of humans and state-of-the-art transformer models on this challenging task. Link » Tanvi Deshpande 🔗 - Self-Supervision for Scene Graph Embeddings (Poster)  link »    Scene graph embeddings are used in applications suchas image retrieval, image generation and image captioning.Many of the models for these tasks are trained on largedatasets such as Visual Genome, but the collection of these human-annotated datasets is costly and onerous. We seek to improve scene graph embedding representation learning by leveraging the already available data (e.g. the scene graphs themselves) with the addition of self-supervision. In self-supervised learning, models aretrained for pretext tasks which do not depend on manual labels and use the existing available data. However, it is largely unexplored in the area of image scene graphs. In this work, starting from a baseline scene graph embedding model trained on the pretext task of layout prediction, we propose several additional self-supervised pretext tasks. The impact of these additions is evaluated on a downstream retrieval task that was originally associated with the baseline model. Experimentally, we demonstrate that the addition of each task individually and cumulatively improves on ther retrieval performance of the baseline model, resulting in near saturation when all are combined. Link » Brigit Schroeder · Adam Smith · Subarna Tripathi 🔗 - Solving the super rural and super dense delivery with asset-light programs (Poster)  link »    Speedy fulfillment and distribution isn't just a "nice to have" — it's the expectation of every online shopping experience. Companies like Amazon strives to optimize their expansion strategies to satisfy customers with the fastest delivery yet under a reasonable cost. In addition to being a key to customer satisfaction, last mile delivery is both the most expensive and most challenging part of the shipping process. As a share of the total cost of shipping, last mile delivery costs are substantial — comprising about 53% of the overall fulfillment costs. As such, it's become the first place we're looking to implement new technologies and drive process improvements, especially for the most challenging geographies. The low density and wide geographical spread of rural customers makes it financially infeasible to invest in centralized facilities. The high real estate cost within the urban area and a capped labor and resource pool also constrains the service scaling. Last mile network will need to evolve to include lower cost nodes that enable cost-effective delivery in these extreme geographies. We develop innovative approaches that leverage upon methods based on risk pooling as well as resource sharing to effectively use resources. Link » Jin Ye · Abhilasha Katariya 🔗 - Application of a Bayesian CAR Prior to Analyzing Ancient Statistical Records of the Inca Empire (Poster)  link » We use a Bayesian model to analyze ancient khipu, the knotted cords of yarn used by the Incas to record state statistics and preserve control over one of the largest empires in the Western Hemisphere. In order to explore the degree of Inca influence on the information storage and dissemination in the peripheral regions and any significant provincial differences, we developed a Bayesian statistical model that enables us to quantify the uncertainties among the unknown observations recorded in the Khipu Research Database (Urton and Brezine, 2005). We use the Bayesian conditional autoregressive (CAR) prior to incorporate spatial correlations among adjacent locations, allowing us to impute the locations for khipus with unknown locations. The results bolster our hypothesis of differences between the samples from the coastal regions associated with diverse cultures subordinate to the Incas and produced a consistent pattern along the coast. By utilizing such variables as types of knot, cord directionality, and colors in our multivariate model, we draw further implications of potential regional markers and distribution of power and control across the 15th and 16th-century Latin America. Link » Anastasiya Travina 🔗 - Topological characterizations of neuronal fibers and its implications in comparing brain connectomes (Poster)  link »    Our brain consists of approximately 100 billion neurons that form functional networks across different brain regions. Brain functions involve complex interactions between these regions that are poorly understood and lack quantitative characterizations. Such complex data analysis requires the integration of multiple imaging modalities and methods. In this work, we apply topological data analysis to diffusion magnetic resonance imaging (dMRI) and show that it has great potential in comparing brains. This is motivated by our previous research work that combines features from neuronal fibers along with the brain lesion segmentation mask, thereby, improving the performance of a patch-based neural network. The novel use of tractographic features appears to be promising in the overall survival prediction of brain tumor patients. However, previous studies ignore the geometry and topology of the white matter fibers. Towards this, we propose a novel and efficient algorithm to model high-level topological structures of neuronal fibers using the construct of the Reeb graph. Tractography generates complex neuronal fibers in 3D that exhibit the geometry of white matter pathways in the brain. However, most tractography analysis methods are time consuming and intractable. We develop a computational geometry-based tractography representation that simplifies the connectivity of white matter fibers into a graph-based mathematical model. We present an application of the Reeb graph model integrated with a machine learning model to the classifications tasks from Alzheimer's studies. Experimental results are reported on the ADNI (\url{http://adni.loni.usc.edu/}) dataset. Link » Shailja Shailja · B.S. Manjunath 🔗 - Bayesian Network Structure Learning with Structured Representations (Poster)  link »    A Bayesian network (BN) is a probabilistic graphical model that consists of a directed acyclic graph (DAG), where each node is a random variable and attached to each node is a conditional probability distribution (CPD, and is used for knowledge representation and modelling causal relationships between variables. A BN can be learned from data using the well-known score-and-search approach, and within this approach a key consideration is how to simultaneously learn the global structure in the form of the underlying DAG and the local structure in the CPDs. Several useful forms of local structure have been identified in the literature but thus far the score-and-search approach has only been extended to handle local structure in form of context-specific independence and linear models. In this work, we show how to extend the score-and-search approach to Multivariate Adaptive Regression Splines (MARS), which are regression models represented as piecewise spline functions. MARS models capture non-linear relationships without imposing a non-linear representation over the global structure. We provide an effective algorithm to score a candidate MARS using the widely used BIC score and we provide pruning rules that allow the search to successfully scale to large sized networks without overfitting. Our empirical results provide evidence for the success of our approach to learning Bayesian networks that incorporate MARS relations. Link » Charu Sharma · Peter van Beek 🔗 - Towards Automated Evaluation of Explanations in Graph Neural Networks (Poster)  link » Explaining Graph Neural Networks predictions to end users of AI applications in easily understandable terms remains an unsolved problem. In particular, we do not have well developed methods for automatically evaluating explanations, in ways that are closer to how users consume those explanations. Based on recent application trends and our own experiences in real world problems, we propose automatic evaluation approaches for GNN Explanations. Link » Vanya BK · Balaji Ganesan · Devbrat Sharma · Arvind Agarwal 🔗 - Model-Free Learning for Continuous Timing as an Action (Poster)  link » In systems where RL algorithms can be readily integrated (e.g. robotics, gaming, and finance), there is often little inherent cost to making numerous observations or actions. However, there are also several real-world settings in which constant observational or interventional access to the system cannot be taken for granted. In this work, we propose a new setting in reinforcement learning: the timing-as-an-action setting. Here, agents choose not only the action that they normally would, but also a duration associated with that action. By augmenting existing policy gradient algorithms, we demonstrate how to jointly learn actions and their durations. Specifically, we create an additional policy network for the duration that takes both the action and the state observation as input. We consider several parameterizations of these durations, from discrete categorical distributions of varying granularity to different types of continuous distributions. Experiments are conducted on OpenAI simulators modified for the timing-as-an-action setting. Overall, we find that certain continuous parameterizations have significant advantages over discrete parameterization of durations, while others get stuck in local minima. More broadly, we note that the marginal benefit of learning durations likely depends on the nature of the environment and its sensitivity to small changes in timing. Link » Helen Zhou · David Childers · Zachary Lipton 🔗 - Accurate Multi-Endpoint Molecular Toxicity Predictions in Humans with Contrastive Explanations (Poster)  link »    Explainable machine learning (XML) for molecular toxicity prediction appears promising for efficient drug development and drug safety. A predictive ML model of toxicity can reduce experimental cost and time, while also mitigating ethical concerns by significantly reducing animal and human testing. In this work, we use a deep learning framework for modeling in vitro, in vivo and human toxicity data simultaneously. Two different input representations of molecules are considered: molecular fingerprints and pretrained SMILES embeddings. A multi-task deep learning model accurately predicts toxicity for all endpoints including in humans, as indicated by the area under the Receiver Operator Characteristic curve and balanced accuracy. To provide confidence and explain the model's predictions, we adapt a post-hoc contrastive explanation method that returns pertinent positive and negative features. Resultant pertinent features correspond well to known mutagenic and reactive toxicophores, such as unsubstituted bonded heteroatoms, aromatic amines, and Michael receptors. Toxicophore recovery by pertinent feature analysis captures more in vitro and in vivo endpoints, and indeed uncovers a bias in known toxicophore data towards in vitro and in vivo experimental data, thus demonstrating the efficacy of our proposed approach. Link » Bhanushee Sharma · Vijil Chenthamarakshan · Amit Dhurandhar · James Hendler · Jonathan S. Dordick · Payel Das 🔗 - Graph Representation Learning on Trajectory-Encoded Volumetric Heatmaps for Human Motion Generation (Poster)  link » The increased popularity of Deep Learning and the growing availability of large volumes of human motion capture data have both contributed to a rise in data-driven statistical approaches to human motion modelling. Among these methods, those that aim to exploit these data collections to generate new plausible and realistic motions can be found at the intersection of graphics and computer vision, with tasks such as motion synthesis and motion prediction being researched and applied in animation, games and robotics, among others. Most methods proposed so far take a deterministic approach to modelling human motion, and, albeit successful, they are limited to outputting a single predicted motion sequence and do not consider the highly stochastic nature of human motion. To overcome this lack of motion diversity, probabilistic methods have lately received a lot of attention as they have the potential to account for the multi-modality of human motion. For these reasons, and based on the successful use of heatmaps to represent 3D human poses and motion trajectories in pose estimation and action recognition, in this research we aim to study the validity and effectiveness of such representation for the tasks of motion prediction and synthesis, and thus propose a novel framework whose goal is to learn both spatial and temporal dependencies of human motion and generate new sequences that are valid, diverse and realistic. Link » Michelle Wu · zhidong xiao · Hammadi Nait-charif 🔗 - A Vision-Based Gait Analysis Framework for Predicting Multiple Sclerosis (Poster)  link » Multiple Sclerosis (MS) is one of the most common neurological conditions worldwide whose prevalence is now greatest among people 50-60 years of age. While clinical presentations of MS are highly heterogeneous, mobility limitations are one of the most frequent symptoms. In contrast to the monitoring of most underlying manifestations of MS, which require neurological examinations by a trained practitioner, gait can be quickly and remotely monitored. In this work, we propose VGA4MS (vision-based gait analysis framework for MS prediction), a deep learning (DL) based methodology to classify gait strides of individuals with MS from healthy controls, so as to generalize across different walking tasks and subjects, on the basis of characteristic 3D joint key points extracted from multi-view digital camera videos. This is the first attempt to demonstrate the potential of vision-based DL for MS research. Given digital cameras are the only required equipment, this can be employed in the domestic environment of elderly for regular gait monitoring, and thus is crucial for early intervention and hence, more efficient MS treatment. Link » Rachneet Kaur 🔗 - Opening the Black Box: High-dimensional Safe Policy Search via Sim-to-real (Poster)  link » See file attached for extended abstract submission. Short abstract: Bayesian Optimization (BO) is a class of sample efficient methods for optimizing expensive-to-evaluate black-box functions with a wide range of applications for e.g. in robotics, system design and parameter optimization. However, BO is known to be difficult to scale to high-dimensions (d > 20). In order to scale the method and take advantage of its benefits, we propose the SafeOpt-HD pipeline that identifies relevant domain regions for given objective and restricts BO search to this preprocessed domain. By employing cheap (and potentially inaccurate) simulation models, we perform offline computations using Genetic search algorithms to only consider domain subspaces that are likely to contain optimal policies for given task, thus significantly reducing domain size. Our approach can be augmented to any known safe BO methods like SafeOpt, to obtain a safe Bayesian optimization algorithm that is applicable for problems with large input dimensions. To alleviate the issues due to sparsity in the non-uniform preprocessed domain, we propose an approach to systematically generate new input parameters with desirable properties. We evaluate the effectiveness of our proposed approach by optimizing a 48-dimensional policy to perform full position control of a quadrotor, while guaranteeing safety. Link » Aneri Muni 🔗 - Depth without the Magic: Inductive Biases of Natural Gradient Descent (Poster)  link » In gradient descent, changing how we parametrize the model can lead to very different optimization trajectories and even to qualitatively different optima. Exploiting only the form of over-parametrization, gradient descent alone can produce a surprising range of meaningful behaviours: identify sparse classifiers or reconstruct low-rank matrices without the need for explicit regularisation. This implicit regularisation has been hypothesised to be a contributing factor to good generalisation in deep learning. However, natural gradient descent with infinitesimally small learning rate is invariant to parameterization, it always follows the same trajectory and finds the same optimum. The question naturally arises: what happens if we eliminate the role of parameterization, which solution will be found, what new properties occur? We characterise the behaviour of natural gradient flow in linearly separable classification under logistic loss and discover new invariance properties. Some of our findings extend to nonlinear neural networks with sufficient but finite over-parametrization. In addition, we demonstrate experimentally that there exist learning problems where natural gradient descent can not reach good test performance, while gradient descent with the right architecture can. Link » Anna Mészáros · Anna Kerekes · Ferenc Huszar 🔗 - Accelerating Symmetric Rank 1 Quasi-Newton Method with Nesterov's Gradient (Poster)  link » Second order methods have shown to have better convergence than first order methods in several highly non-linear problems. However, the computational cost incurred has been a major drawback and thus quasi-Newton methods have been popularly used. Among the quasi-Newton methods, the BFGS method is widely used in training neural networks. Recently NAQ was proposed to accelerate the BFGS method using the Nesterov's accelerated gradient and momentum terms. In this study, we explore if the Nesterov's acceleration can be applied to other quasi-Newton methods as well. Thus, this paper proposes a Nesterov's accelerated LSR1 (L-SR1-N) and momentum accelerated LSR1 (L-MoSR1) methods for training neural networks. Link » Indrapriyadarsini Sendilkkumaar · Hiroshi Ninomiya · Takeshi Kamio · Hideki Asai 🔗 - TaxonBags: Clustering and Vote for Precise Metagenomic Taxonomic Classification (Poster)  link » TaxonBags: Clustering and Vote for Precise Metagenomic Taxonomic Classification Link » Induja Chandrakumar 🔗 - Transformer-based Self-Supervised Learning for Medical Images (Poster)  link »    Medical tasks often lack big amounts of labeled data, so a self-supervised learning approach can be very helpful in retrieving useful information without supervision. Current best-performing self-supervised methods use vision transformers which let them build meaningful global-scale connections between embeddings and activation maps for different classes. Inspired by the DINO approach, we tested its performance on two medical problems: pneumothorax detection and tissue semantic segmentation. The method uses self-distillation with no labels, where two models with identical architectures are trained alongside each other but have different parameters. Link » Mariia Dobko · Mariia Kokshaikyna 🔗 - Fixed Neural Network Steganography: Train the images, not the network (Poster)  link » Recent attempts at image steganography make use of advances in deep learning to train an encoder-decoder network pair to hide and retrieve secret messages in images. These methods are able to hide large amounts of data, but also incur high decoding error rates (around 20\%). We propose a novel algorithm for steganography that takes advantage of the fact that neural networks are sensitive to tiny perturbations. Our method, Fixed Neural Network Steganography (FNNS), achieves 0\% error reliably for hiding up to 3 bits per pixel (bpp) of secret information in images and yields significantly lower error rates when compared to prior state of the art methods for hiding more than 3 bpp. FNNS also successfully evades existing statistical steganalysis systems and can be modified to evade neural steganalysis systems as well. Recovering every bit correctly, up to 3 bpp, enables novel applications, e.g. those requiring encryption. We introduce one specific use case for facilitating anonymized and safe image sharing. Link » Varsha Kishore · Xiangyu Chen · Yan Wang · Boyi Li · Kilian Weinberger 🔗 - Apple - Machine Learning at Apple (Lizi Ottens) (Sponsor Talk)  link » Lizi Ottens 🔗 - Capital One - AI & ML at Capital One (Cat Posey) (Sponsor Talk)  link » Jeffrey Cooke 🔗 - DeepMind - Machine Learning at DeepMind (Mihaela Rosca, Feryal Behbahani, Kate Parkyn) (Sponsor Talk)  link » James Robson 🔗 - Waymo - Machine Learning for Autonomous Driving at Waymo (Chen Wu) (Sponsor Talk)  link » Chen Wu 🔗 - SambaNova Systems - ML Accelerators & Performance (Qinghua Li) (Sponsor Talk)  link » Qinghua Li 🔗 - Microsoft - Advancing real-world few-shot learning with the new ORBIT dataset (Daniela Massiceti) (Sponsor Talk)  link » Daniela Massiceti 🔗 - D. E. Shaw Research - Machine Learning Initiatives at D. E. Shaw Research (Jocelyn Sunseri) (Sponsor Talk)  link » Jane Chen 🔗

#### Author Information

##### Salomey Osei (University of Deusto)

Salomey is a research assistant at DeustoTech, University of Deusto. She is also a researcher at Masakhane and the research lead of unsupervised methods for Ghana NLP. She has been involved with a number of organizations such as Black in AI, Women in Machine Learning (WiML) and Women in Machine Learning and Data Science (WiMLDS) as a co-organiser. She is also passionate about mentoring students, especially females in STEM and her long term goal is to share her knowledge with others by lecturing.

##### Geeticka Chauhan (Massachusetts Institute of Technology)

I work on Natural Language Processing and Machine Learning for Health.

#### More from the Same Authors

• 2021 : Fixed Neural Network Steganography: Train the images, not the network »
Varsha Kishore · Xiangyu Chen · Yan Wang · Boyi Li · Kilian Weinberger
• 2021 Affinity Workshop: WiML Workshop 4 »
Soomin Aga Lee · Meera Desai · Nezihe Merve Gürel · Boyi Li · Linh Tran · Akiko Eriguchi · Jieyu Zhao · Salomey Osei · Sirisha Rambhatla · Geeticka Chauhan · Nwamaka (Amaka) Okafor · Mariya Vasileva
• 2021 Affinity Workshop: Black in AI Workshop »
Irene Nandutu · Hameed Abdul-Rashid · Foutse Yuehgoh · Mírian Silva · Salomey Osei · Victor Silva
• 2021 Affinity Workshop: WiML Workshop 3 »
Soomin Aga Lee · Meera Desai · Nezihe Merve Gürel · Boyi Li · Linh Tran · Akiko Eriguchi · Jieyu Zhao · Salomey Osei · Sirisha Rambhatla · Geeticka Chauhan · Nwamaka (Amaka) Okafor · Mariya Vasileva
• 2021 Affinity Workshop: WiML Workshop 2 »
Soomin Aga Lee · Meera Desai · Nezihe Merve Gürel · Boyi Li · Linh Tran · Akiko Eriguchi · Jieyu Zhao · Salomey Osei · Sirisha Rambhatla · Geeticka Chauhan · Nwamaka (Amaka) Okafor · Mariya Vasileva
• 2021 : WiML Opening remarks »
Boyi Li · Mariya Vasileva
• 2020 Poster: How does This Interaction Affect Me? Interpretable Attribution for Feature Interactions »
Michael Tsang · Sirisha Rambhatla · Yan Liu
• 2020 Poster: Provable Online CP/PARAFAC Decomposition of a Structured Tensor via Dictionary Learning »
Sirisha Rambhatla · Xingguo Li · Jarvis Haupt
• 2020 Affinity Workshop: Black in AI »
Victor Silva · Flora Ponjou Tasse · Krystal Maughan · Eric Maigua · Charles Earl · Nwamaka (Amaka) Okafor · Ignatius Ezeani · Oloruntobiloba Olatunji · Foutse Yuehgoh · Salomey Osei · Ezinne Nwankwo · Joyce D. Williams
• 2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen (Vincent) Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova
• 2019 Poster: Positional Normalization »
Boyi Li · Felix Wu · Kilian Weinberger · Serge Belongie
• 2019 Spotlight: Positional Normalization »
Boyi Li · Felix Wu · Kilian Weinberger · Serge Belongie