Onkar Pandit

Hi there! I’m a Senior Applied Scientist at Inception Institute of AI, UAE, where I have the incredible opportunity to develop Large Language Models for Arabic and Hindi. We proudly open-sourced some of the first and best Arabic and Hindi LLMs—meet Jais and Nanda.

I’ve also worked on enhancing math and reasoning abilities in LLMs and recently tackled fascinating challenges in weather and climate prediction. Right now, I’m diving into an exciting problem for the Oil & Gas industry, designing a domain-specific LLM and pushing the boundaries of Large Multi-modal Models.

I earned my Ph.D. in Computer Science from University of Lille, France, under the guidance of Dr. Pascal Denis and Prof. Liva Ralaivola, while being part of the Magnet team at INRIA, Lille.

Education

  • Ph.D. in Computer Science Dec. 2017-Sept. 2021

    Université de Lille and INRIA, Lille, France.

  • M.Tech. in Electrical Engineering Jul. 2010-Jun. 2012

    Indian Institute of Technology (IIT), Kanpur, India.

  • B.Tech. in Electronics and Telecommunication Engineering Jul. 2006-Jun. 2010

    Shri Guru Gobind Singhji Institute of Engineering and Technology, Nanded, India.

Employment

  • Senior Applied Scientist Apr. 2023 – Present

    Inception Institute of AI, UAE.

  • Project Scientist Jul. 2016 – Nov. 2017

    Indian Statistical Institute, Kolkata, India.

  • Senior Member Technical Staff Jul. 2012 – Jun. 2016

    Oracle India Pvt. Ltd., Bangalore, India.

Research Publications

  1. Integrating Contextual and Commonsense Information for Automatic Discourse Understanding: Contributions to Temporal Relation Classification and Bridging Anaphora Resolution.

    Ph.D. Dissertation.


  1. Bilingual Adaptation of Monolingual Foundation Models

    Gurpreet Gosal, Yishi Xu, Gokul Ramakrishnan, Rituraj Joshi, Avraham Sheinin, Zhiming (Charles) Chen, Biswajit Mishra, Natalia Vassilieva, Joel Hestness, Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Onkar Pandit, Satheesh Katipomu, Samta Kamboj, Samujjwal Ghosh, Rahul Pal, Parvez Mullah, Soundar Doraiswamy, Mohamed El Karim Chami, Preslav Nakov.

    ICML2024 Workshop on Foundation Models in the Wild

  2. Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models

    Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Alham Fikri Aji, Zhengzhong Liu, Andy Hock, Andrew Feldman, Jonathan Lee, Andrew Jackson, Preslav Nakov, Timothy Baldwin, Eric Xing.

    arXiv:2308.16149, 2023.

  3. Probing for Bridging Inference in Transformer Language Models.

    Onkar Pandit, Yufang Hou.

    Annual Conference of the North American Chapter of the Association for Computational Linguistics 2021 (NAACL 2021).

  4. Integrating knowledge graph embeddings to improve mention representation for bridging anaphora resolution.

    Onkar Pandit, Pascal Denis, Liva Ralaivola.

    Workshop on Computational Models of Reference, Anaphora and Coreference, COLING 2020.

  5. Learning Rich Event Representations and Interactions for Temporal Relation Classification.

    Onkar Pandit, Pascal Denis, Liva Ralaivola.

    European Symposium on Artificial Neural Networks (ESANN-2019).

  6. CNN for Text-Based Multiple Choice Question Answering.

    Akshay Chaturvedi, Onkar Pandit, Utpal Garain.

    Association for Computational Linguistics (ACL-2018).

  7. Context Sensitive Lemmatization Using Two Successive Bidirectional Gated Recurrent Networks.

    Abhisek Chakrabarty, Onkar Pandit, Utpal Garain.

    Association for Computational Linguistics (ACL-2017).

  8. Identification of Reader Specific Difficult Words by Analyzing Eye Gaze and Document Content.

    Utpal Garain, Onkar Pandit, Olivier Augereau, Ayano Okoso, Koichi Kise.

    International Conference on Document Analysis and Recognition (ICDAR-2017).