Onkar Pandit

Hi there! I’m a Senior Applied Scientist at Inception Institute of AI, UAE, where I have the incredible opportunity to develop Large Language Models for Arabic and Hindi. We proudly open-sourced some of the first and best Arabic and Hindi LLMs—meet Jais and Nanda.

I’ve also worked on enhancing math and reasoning abilities in LLMs and recently tackled fascinating challenges in weather and climate prediction. Right now, I’m diving into an exciting problem for the Oil & Gas industry, designing a domain-specific LLM and pushing the boundaries of Large Multi-modal Models.

I earned my Ph.D. in Computer Science from University of Lille, France, under the guidance of Dr. Pascal Denis and Prof. Liva Ralaivola, while being part of the Magnet team at INRIA, Lille.

Education

Ph.D. in Computer Science Dec. 2017-Sept. 2021

Université de Lille and INRIA, Lille, France.
M.Tech. in Electrical Engineering Jul. 2010-Jun. 2012

Indian Institute of Technology (IIT), Kanpur, India.
B.Tech. in Electronics and Telecommunication Engineering Jul. 2006-Jun. 2010

Shri Guru Gobind Singhji Institute of Engineering and Technology, Nanded, India.

Employment

Senior Applied Scientist Apr. 2023 – Present

Inception Institute of AI, UAE.
Project Scientist Jul. 2016 – Nov. 2017

Indian Statistical Institute, Kolkata, India.
Senior Member Technical Staff Jul. 2012 – Jun. 2016

Oracle India Pvt. Ltd., Bangalore, India.

Research Publications

Integrating Contextual and Commonsense Information for Automatic Discourse Understanding: Contributions to Temporal Relation Classification and Bridging Anaphora Resolution.

Ph.D. Dissertation.

PDF BibTex

Bilingual Adaptation of Monolingual Foundation Models

Gurpreet Gosal, Yishi Xu, Gokul Ramakrishnan, Rituraj Joshi, Avraham Sheinin, Zhiming (Charles) Chen, Biswajit Mishra, Natalia Vassilieva, Joel Hestness, Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Onkar Pandit, Satheesh Katipomu, Samta Kamboj, Samujjwal Ghosh, Rahul Pal, Parvez Mullah, Soundar Doraiswamy, Mohamed El Karim Chami, Preslav Nakov.

ICML2024 Workshop on Foundation Models in the Wild

PDF
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models

Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Alham Fikri Aji, Zhengzhong Liu, Andy Hock, Andrew Feldman, Jonathan Lee, Andrew Jackson, Preslav Nakov, Timothy Baldwin, Eric Xing.

arXiv:2308.16149, 2023.

PDF
Probing for Bridging Inference in Transformer Language Models.

Onkar Pandit, Yufang Hou.

Annual Conference of the North American Chapter of the Association for Computational Linguistics 2021 (NAACL 2021).

PDF BibTex Code
Integrating knowledge graph embeddings to improve mention representation for bridging anaphora resolution.

Onkar Pandit, Pascal Denis, Liva Ralaivola.

Workshop on Computational Models of Reference, Anaphora and Coreference, COLING 2020.

PDF BibTex Code
Learning Rich Event Representations and Interactions for Temporal Relation Classification.

Onkar Pandit, Pascal Denis, Liva Ralaivola.

European Symposium on Artificial Neural Networks (ESANN-2019).

PDF BibTex Code
CNN for Text-Based Multiple Choice Question Answering.

Akshay Chaturvedi, Onkar Pandit, Utpal Garain.

Association for Computational Linguistics (ACL-2018).

PDF BibTex Code
Context Sensitive Lemmatization Using Two Successive Bidirectional Gated Recurrent Networks.

Abhisek Chakrabarty, Onkar Pandit, Utpal Garain.

Association for Computational Linguistics (ACL-2017).

PDF BibTex Code
Identification of Reader Specific Difficult Words by Analyzing Eye Gaze and Document Content.

Utpal Garain, Onkar Pandit, Olivier Augereau, Ayano Okoso, Koichi Kise.

International Conference on Document Analysis and Recognition (ICDAR-2017).

PDF BibTex