During my tenure in the lab, I focused on leveraging Large Language Models for temporal decision-making in health datasets. The collaborative and supportive environment inspired me to tackle challenging problems during COVID-19, pushing me to excel. I am deeply grateful to my teammates for their invaluable knowledge and support, which played a crucial role in my growth as a researcher.
WashKaro is a free Android-based app which aims to spread awareness among the mass regarding Tuberculosis. The app provides a plethora of features like: Human curated information Success Stories Multilingual text and speech support Chatbot
Ridam Pal, Tavpritesh Sethi
We developed Strainflow for SARS-CoV-2 genome sequences, where sequences were treated as documents with words (codons) to learn the codon context of 0.9 million spike genes using the skip-gram algorithm.
Sargun Nagpal, Ridam Pal, Ashima, Ananya Tyagi, Sadhana Tripathi, Aditya Nagori, Saad Ahmad, Hara Prasad Mishra, Tavpritesh Sethi, Rintu Kutum
EvidenceFlow is a web app that tracks trends in COVID-19 research. Built on WHO-approved literature, it uses fancy charts and network analysis to show how research focus changes over time. This helps users understand current and upcoming trends in COVID-19 research.
Ridam Pal, Pradeep Singh, Akshaya Devadiga
The COVID-19 Infodemic had an unprecedented impact on health behaviors and outcomes at a global scale. While many studies have focused on a qualitative and quantitative understanding of misinformation, including sentiment analysis, there is a gap in understanding the emotion-carriers of misinformation and their differences across geographies. In this study, we characterized emotion carriers and their impact on vaccination rates in India and the United States. A manually labelled dataset was created from 2.3 million tweets and collated with three publicly available datasets (CoAID, AntiVax, CMU) to train deep learning models for misinformation classification. Misinformation labelled tweets were further analyzed for behavioral aspects by leveraging Plutchik Transformers to determine the emotion for each tweet. Time series analysis was conducted to study the impact of misinformation on spatial and temporal characteristics. Further, categorical classification was performed using transformer models to assign categories for the misinformation tweets. Word2Vec+BiLSTM was the best model for misinformation classification, with an F1-score of 0.92. The US had the highest proportion of misinformation tweets (58.02%), followed by the UK (10.38%) and India (7.33%). Disgust, anticipation, and anger were associated with an increased prevalence of misinformation tweets. Disgust was the predominant emotion associated with misinformation tweets in the US, while anticipation was the predominant emotion in India. For India, the misinformation rate exhibited a lead relationship with vaccination, while in the US it lagged behind vaccination. Our study deciphered that emotions acted as differential carriers of misinformation across geography and time. These carriers can be monitored to develop strategic interventions for countering misinformation, leading to improved public health.
Read moreRidam Pal, Sanjana S, Deepak Mahto, Kriti Agrawal, Gopal Mengi, Sargun Nagpal, Akshaya Devadiga, Tavpritesh Sethi
Social media plays a pivotal role in disseminating news globally and acts as a platform for people to express their opinions on various topics. A wide variety of views accompany COVID-19 vaccination drives across the globe, often colored by emotions that change along with rising cases, approval of vaccines, and multiple factors discussed online.
Read moreHarshita Chopra, Aniket Vashishtha, Ridam Pal, Ashima Ashima, Ananya Tyagi, Tavpritesh Sethi
Evidence from peer-reviewed literature is the cornerstone for designing responses to global threats such as COVID-19. In massive and rapidly growing corpuses, such as COVID-19 publications, assimilating and synthesizing information is challenging. Leveraging a robust computational pipeline that evaluates multiple aspects, such as network topological features, communities, and their temporal trends, can make this process more efficient.
Read moreRidam Pal, Harshita Chopra, Raghav Awasthi, Harsh Bandhey, Aditya Nagori, Amogh Gulati, Ponnurangam Kumaraguru, Tavpritesh Sethi
The global efforts to control COVID-19 are threatened by the rapid emergence of novel SARS-CoV-2 variants that may display undesirable characteristics such as immune escape, increased transmissibility or pathogenicity. Early prediction for emergence of new strains with these features is critical for pandemic preparedness. We present Strainflow, a supervised and causally predictive model using unsupervised latent space features of SARS-CoV-2 genome sequences. Strainflow was trained and validated on 0.9 million sequences for the period December, 2019 to June, 2021 and the frozen model was prospectively validated from July, 2021 to December, 2021. Strainflow captured the rise in cases 2 months ahead of the Delta and Omicron surges in most countries including the prediction of a surge in India as early as beginning of November, 2021. Entropy analysis of Strainflow unsupervised embeddings clearly reveals the explore-exploit cycles in genomic feature-space, thus adding interpretability to the deep learning based model. We also conducted codon-level analysis of our model for interpretability and biological validity of our unsupervised features. Strainflow application is openly available as an interactive web-application for prospective genomic surveillance of COVID-19 across the globe.
Read moreSargun Nagpal, Ridam Pal, Ashima, Ananya Tyagi, Sadhana Tripathi, Aditya Nagori, Saad Ahmad, Hara Prasad Mishra, Rintu Kutum, Tavpritesh Sethi
The global efforts to control COVID-19 are threatened by the rapid emergence of novel SARS-CoV-2 variants that may display undesirable characteristics such as immune escape, increased transmissibility or pathogenicity. Early prediction for emergence of new strains with these features is critical for pandemic preparedness. We present Strainflow, a supervised and causally predictive model using unsupervised latent space features of SARS-CoV-2 genome sequences. Strainflow was trained and validated on 0.9 million sequences for the period December, 2019 to June, 2021 and the frozen model was prospectively validated from July, 2021 to December, 2021. Strainflow captured the rise in cases 2 months ahead of the Delta and Omicron surges in most countries including the prediction of a surge in India as early as beginning of November, 2021. Entropy analysis of Strainflow unsupervised embeddings clearly reveals the explore-exploit cycles in genomic feature-space, thus adding interpretability to the deep learning based model. We also conducted codon-level analysis of our model for interpretability and biological validity of our unsupervised features. Strainflow application is openly available as an interactive web-application for prospective genomic surveillance of COVID-19 across the globe.
Read moreSargun Nagpal, Ridam Pal, Ashima, Ananya Tyagi, Sadhana Tripathi, Aditya Nagori, Saad Ahmad, Hara Prasad Mishra, Rintu Kutum, Tavpritesh Sethi
The global efforts to control COVID-19 are threatened by the rapid emergence of novel SARS-CoV-2 variants that may display undesirable characteristics such as immune escape or increased pathogenicity. Early prediction of emerging strains could be vital to pandemic preparedness but remains an open challenge. Here, we developed Strainflow, to learn the latent dimensions of 0.9 million high-quality SARS-CoV-2 genome sequences, and used machine learning algorithms to predict upcoming caseloads of SARS-CoV-2. In our Strainflow model, SARS-CoV-2 genome sequences were treated as documents, and codons as words to learn unsupervised codon embeddings (latent dimensions). We discovered that codon-level changes lead to a change in the entropy of the latent dimensions. We used a machine learning algorithm to find the most relevant latent dimensions called Dimensions of Concern (DoCs) of SARS-CoV-2 spike genes, and demonstrate their potential to provide a lead time for predicting new caseloads in several countries. The DoCs capture codons associated with global Variants of Concern (VOCs) and Variants of Interest (VOIs), and may be surveilled to predict country-specific emergence and spread of SARS-CoV-2 variants.
Read moreSargun Nagpal, Ridam Pal, Ashima, Ananya Tyagi, Sadhana Tripathi, Aditya Nagori, Saad Ahmad, Hara Prasad Mishra, Rintu Kutum, Tavpritesh Sethi
Ridam Pal, Harshita Chopra, Raghav Awasthi, Harsh Bandhey, Aditya Nagori, Amogh Gulati, Ponnurangam Kumaraguru, Tavpritesh Sethi
Ridam Pal, Harshita Chopra, Raghav Awasthi, Harsh Bandhey, Aditya Nagori, Amogh Gulati, Ponnurangam Kumaraguru, Tavpritesh Sethi
The COVID-19 pandemic has put immense pressure on health systems which are further strained due to the misinformation surrounding it. Under such a situation, providing the right information at the right time is crucial. There is a growing demand for the management of information spread using Artificial Intelligence. Hence, we have exploited the potential of Natural Language Processing for identifying relevant information that needs to be disseminated amongst the masses. In this work, we present a novel Cross-lingual Natural Language Processing framework to provide relevant information by matching daily news with trusted guidelines from the World Health Organization. The proposed pipeline deploys various techniques of NLP such as summarizers, word embeddings, and similarity metrics to provide users with news articles along with a corresponding healthcare guideline. A total of 36 models were evaluated and a combination of LexRank based summarizer on Word2Vec embedding with Word Mover distance metric outperformed all other models. This novel open-source approach can be used as a template for proactive dissemination of relevant healthcare information in the midst of misinformation spread associated with epidemics.
Read moreRidam Pal, Rohan Pandey, Vaibhav Gautam, Kanav Bhagat, Tavpritesh Sethi
The relationship between meteorological factors such as temperature and humidity with COVID-19 incidence is still unclear after 6 months of the beginning of the pandemic. Some literature confirms the association of temperature with disease transmission while some oppose the same. This work intends to determine whether there is a causal association between temperature, humidity and Covid-19 cases. Three different causal models were used to capture stochastic, chaotic and symbolic natured time-series data and to provide a robust & unbiased analysis by constructing networks of causal relationships between the variables. Granger-Causality method, Transfer Entropy method & Convergent Cross-Mapping (CCM) was done on data from regions with different temperatures and cases greater than 50,000 as of 13th May 2020. From the Granger-Causality test we found that in only Canada, the United Kingdom, temperature and daily new infections are causally linked. The same results were obtained from Convergent Cross Mapping for India. Again using Granger-Causality test, we found that in Russia only, relative humidity is causally linked to daily new cases. Thus, a Generalized Additive Model with a smoothing spline function was fitted for these countries to understand the directionality. Using the combined results of the said models, we were able to conclude that there is no evidence of a causal association between temperature, humidity and Covid-19 cases.
Read moreRaghav Awasthi, Aditya Nagori, Pradeep Singh, Ridam Pal, Vineet Joshi, Tavpritesh Sethi
The flood of conflicting COVID-19 research has revealed that COVID-19 continues to be an enigma. Although more than 14,000 research articles on COVID-19 have been published with the disease taking a pandemic proportion, clinicians and researchers are struggling to distill knowledge for furthering clinical management and research. In this study, we address this gap for a targeted user group, i.e. clinicians, researchers, and policymakers by applying natural language processing to develop a CovidNLP dashboard in order to speed up knowledge discovery. The WHO has created a repository of about more than 5000 peer-reviewed and curated research articles on varied aspects including epidemiology, clinical features, diagnosis, treatment, social factors, and economics. We summarised all the articles in the WHO Database through an extractive summarizer followed by an exploration of the feature space using word embeddings which were then used to visualize the summarized associations of COVID-19 as found in the text. Clinicians, researchers, and policymakers will not only discover the direct effects of COVID-19 but also the systematic implications such as the anticipated rise in TB and cancer mortality due to the non-availability of drugs during the export lockdown as highlighted by our models. These demonstrate the utility of mining massive literature with natural language processing for rapid distillation and knowledge updates. This can help the users understand, synthesize, and take pre-emptive action with the available peer-reviewed evidence on COVID-19. Our models will be continuously updated with new literature and we have made our resource CovidNLP publicly available in a user-friendly fashion at http://covidnlp.tavlab.iiitd.edu.in/.
Read moreRaghav Awasthi, Ridam Pal, Harshita Chopra, Harsh Bandhey, Pradeep Singh, Aditya Nagori, Suryatej Reddy, Amogh Gulati, Ponnurangam Kumaraguru, Tavpritesh Sethi
The flood of conflicting COVID-19 research has revealed that COVID-19 continues to be an enigma. Although more than 14,000 research articles on COVID-19 have been published with the disease taking a pandemic proportion, clinicians and researchers are struggling to distill knowledge for furthering clinical management and research. In this study, we address this gap for a targeted user group, i.e. clinicians, researchers, and policymakers by applying natural language processing to develop a CovidNLP dashboard in order to speed up knowledge discovery. The WHO has created a repository of about more than 5000 peer-reviewed and curated research articles on varied aspects including epidemiology, clinical features, diagnosis, treatment, social factors, and economics. We summarised all the articles in the WHO Database through an extractive summarizer followed by an exploration of the feature space using word embeddings which were then used to visualize the summarized associations of COVID-19 as found in the text. Clinicians, researchers, and policymakers will not only discover the direct effects of COVID-19 but also the systematic implications such as the anticipated rise in TB and cancer mortality due to the non-availability of drugs during the export lockdown as highlighted by our models. These demonstrate the utility of mining massive literature with natural language processing for rapid distillation and knowledge updates. This can help the users understand, synthesize, and take pre-emptive action with the available peer-reviewed evidence on COVID-19. Our models will be continuously updated with new literature and we have made our resource CovidNLP publicly available in a user-friendly fashion at http://covidnlp.tavlab.iiitd.edu.in/.
Read moreRaghav Awasthi, Ridam Pal, Harshita Chopra, Harsh Bandhey, Pradeep Singh, Aditya Nagori, Suryatej Reddy, Amogh Gulati, Ponnurangam Kumaraguru, Tavpritesh Sethi
The COVID-19 pandemic has uncovered the potential of digital misinformation in shaping the health of nations. The deluge of unverified information that spreads faster than the epidemic itself is an unprecedented phenomenon that has put millions of lives in danger. Mitigating this ‘Infodemic’ requires strong health messaging systems that are engaging, vernacular, scalable, effective and continuously learn the new patterns of misinformation.
Read moreRohan Pandey, Vaibhav Gautam, Ridam Pal, Harsh Bandhey, Lovedeep Singh Dhingra, Himanshu Sharma, Chirag Jain, Kanav Bhagat, Arushi Arushi, Lajjaben Patel, Mudit Agarwal, Samprati Agrawal, Rishabh Jalan, Ayush Garg, Akshat Wadhwa, Vihaan Misra, Yashwin Agrawal, Bhavika Rana, Ponnurangam Kumaraguru, Tavpritesh Sethi