Data science and Quantum Information Science division focuses on sharing knowledge, state-of-art tools, techniques, research, and emerging industry trends with machine learning applications. Machine learning tools are becoming prevalent with a growing impact across industries. Reflecting the increased relevance, this division aims to create a forum to exchange experiences and discussions on research topics and inspire collaborations bringing people together from academia to industries.
Data Science in Real Life: What Can Possibly Go Wrong?
Data Science is now very popular, and many companies are making large investments into this area.
Universities and bootcamps keep adding programs to prepare more and more data scientists, and many people with careers in other fields elect to learn the skills and join the “Sexiest job of the 21st Century” (Harvard Business Review October 2012). In some cases the investment pays off nicely, in others not so much. Even though everything should work fine in theory, many unexpected problems are encountered in practice. In this talk we will touch on various aspects of data science process, including data and model quality, model deployment and updates, and point out ways to mitigate some of the potential problems.
Please look below for detailed schedule.
Date/Time: |
Abstract Number: ANPA2023-N00075 Presenting Author: Shree Bhattarai Presenter's Affiliation: Torqata Data and Analytics Title: Building end-to-end machine learning pipeline to predict tire attributes using Kubeflow Location: Florida International University, FL, USA Show/Hide Abstract When a customer visits a tire shop to purchase/replace tire, the most important factors that come into play are whether (i) the size of the tire fits their vehicle? (ii) the tire design suits the surface they drive on? (Smooth road, Rough/Muddy road etc.), and (iii) it is suitable for the season (Winter, Summer etc.). This information is not readily available for all tires. Thus, our goal here was to build a ML model that would collect information from web search to build a classification model to extract these attributes.
During this presentation, we will be talking about our approach in designing a ML pipeline utilizing KubeFlow in Google Cloud Platform to predict above three tire attributes. We will further explain how KubeFlow helps making deployment of machine learning workflows on containerized applications simple, portable and scalable.
|
||||||
Date/Time: |
Abstract Number: ANPA2023-N00079 Presenting Author: Svetlana Levitan (Invited) Presenter's Affiliation: Walgreens Boots Alliance Title: Data Science in Real Life: What Can Possibly Go Wrong? Location: Virtual Presentation Show/Hide Abstract Data Science is now very popular, and many companies are making large investments into this area.
Universities and bootcamps keep adding programs to prepare more and more data scientists, and many people with careers in other fields elect to learn the skills and join the “Sexiest job of the 21st Century†(Harvard Business Review October 2012). In some cases the investment pays off nicely, in others not so much. Even though everything should work fine in theory, many unexpected problems are encountered in practice. In this talk we will touch on various aspects of data science process, including data and model quality, model deployment and updates, and point out ways to mitigate some of the potential problems.
|
||||||
Date/Time: |
Abstract Number: ANPA2023-N00080 Presenting Author: Puskar Bhattarai Presenter's Affiliation: Washington University in St. Louis Title: Promises of imaging physics and machine learning in mental disorders Location: Virtual Presentation Show/Hide Abstract Magnetic resonance imaging (MRI) and positron emission tomography (PET) are some of the greatest inventions of physics and high-level engineering designs. MRI provides anatomical and functional information based on proton interactions with a strong magnetic field and radiofrequency pulses. PET offers the spatial distribution of positron-emitting radionuclides based on capturing the gamma rays emitted upon the annihilation of electron-positron pair, allowing the quantification of various physiological and molecular processes. These non-invasive techniques have revolutionized many fields, including neuroimaging of mental disorders. However, MRI and PET neuroimaging data are highly complex and difficult to interpret. This makes drawing clinical conclusions challenging that are critical for clinical diagnostics, prognostics, and precision medicine. To overcome such barriers, modern machine learning methods have shown great promise in characterizing neuroimaging data. The large-scale resources in mental disorders involve images from multiple sites, scanners, protocols, and diverse populations. One of the biggest challenges for using such research resources is that quantitative measures are not easily reproducible and are highly sensitive to PET and MRI acquisition differences and other sources of variance. Our team has been at the forefront of building machine learning and statistical harmonization methods to study brain patterns in various diseases, including Alzheimer’s disease and schizophrenia. We will highlight various statistical harmonization and machine learning approaches, such as control-based regression, supervised learning, semi-supervised learning, feature importance, and dimensionality reduction, in the context of MRI and PET data. We will show their broad applications in detecting subtle structural and functional changes in the brain and identifying the novel relationships of these changes with behavioral, clinical, and genetic measures. Overall, our findings demonstrate that physics-based imaging technologies integrated with advanced machine learning approaches provide a multidisciplinary perspective in understanding neurobiological mechanisms in health and diseases, and aid in developing future computer-aided therapeutic efforts in mental disorders.
|
||||||
Date/Time: |
Abstract Number: ANPA2023-N00077 Presenting Author: Kamal R Dhakal Presenter's Affiliation: AbbVie Inc Location: Virtual Presentation Show/Hide Abstract Glaucoma, a multifactorial neurodegenerative disease, is characterized by retinal ganglion cell loss during disease progression, which results in retinal nerve fiber layer (RNFL) thinning. Segmentation of the retinal layers often fails in diseased eye conditions; therefore, manual correction is necessary, which is time-consuming and prone to errors and subjective bias. Erroneous segmentation of RNFL alters the measurements and adversely impacts the therapeutic results. Herein, we developed a deep learning (DL) model to accurately segment and quantify RNFL thicknesses obtained from OCT images of experimentally induced chronic ocular hypertension nonhuman primate (OHT NHP) glaucoma model.
|
||||||
Date/Time: |
Abstract Number: ANPA2023-N00076 Presenting Author: Kirti Bir Rajguru Presenter's Affiliation: Department of Applied Sciences And Chemical Engineering IOE, Pulchowk campus Location: Virtual Presentation Show/Hide Abstract The performance of electrochemical double-layer capacitors (EDLCs) is evaluated by the capacitance of activated carbon (AC) electrodes. The capacitance of AC electrodes is influenced by many factors such as precursor type, activation method, pore structure, surface chemistry and electrolytic properties. In this paper, we present a comparative study of machine learning based prediction of capacitance of AC electrodes prepared from different precursors.
In this study, different machine learning (ML) models were used to predict the specific capacitance, surface area, mesopore volume, and total pore volume of activated carbon from different biomass in efficient manner. The ML models were trained on a dataset of experimental and synthetic data that included the activation temperature, methylene blue number and iodine number of the activated carbon (AC). The best performing ML model was random forest model which had an R2 score of 0.968 for specific capacitance. The analysis revealed temperature was the most significant factor in predicting capacitance. The results of this study can be used to optimize the production of activated carbon and improve its performance in energy storage applications.
Keywords: Machine Learning, Activated Carbon, Energy storage
|
||||||
Date/Time: |
Abstract Number: ANPA2023-N00078 Presenting Author: Ghanashyam Khanal Presenter's Affiliation: Citigroup Inc. Title: From Physicist to Quant: a Phase Transition Location: Virtual Presentation Show/Hide Abstract A Quant's primary responsibility is to build financial models using various numerical methods in order to help the business desk make decisions on the millions of transactions they make everyday. In this presentation I will talk about my transition from a PhD in physics to my current role as a quant in a major US financial institution. I will draw parallels and also differentiate between the two positions and give ideas on how one can prepare for such transition.
|