Onkar Pandit

Hi! I am an Applied Scientist at Inception Institute of AI, UAE , where I work on developing Arabic Large Language Model. I am happy share to that, we recently open-sourced Jais, an Arabic LLM.

I graduated with Ph.D. in Computer Science from Univesity of Lille, France. I was fortunate to have Dr. Pascal Denis, Prof. Liva Ralaivola, and Prof. Marc Tommasi as my Ph.D. advisors. During this period, I was a part of lovely Magnet team at INRIA, Lille, France.

Prior to joining doctoral program, I worked as Project Scientist with Prof. Utpal Garain at his wonderful NLP lab at Indian Statistical Institute, Kolkata, India. Before that, I spent four delightful years at Oracle India Pvt. Ltd., Bangalore, India, as a Software developer.

Education

  • Ph.D. in Computer Science Dec. 2017-Sept. 2021

    Université de Lille and INRIA, Lille, France.

  • M.Tech. in Electrical Engineering Jul. 2010-Jun. 2012

    Indian Institute of Technology (IIT), Kanpur, India.

  • B.Tech. in Electronics and Telecommunication Engineering Jul. 2006-Jun. 2010

    Shri Guru Gobind Singhji Institute of Engineering and Technology, Nanded, India.

Employment

  • Applied Scientist Apr. 2023 – Present

    Inception Institute of AI, UAE.

  • Project Scientist Jul. 2016 – Nov. 2017

    Indian Statistical Institute, Kolkata, India.

  • Senior Member Technical Staff Mar. 2016 – Jun. 2016

    Oracle India Pvt. Ltd., Bangalore, India.

  • Member Technical Staff Jul. 2012-Feb. 2016

    Oracle India Pvt. Ltd., Bangalore, India.

  • Project Associate Jun. 2011 – Jun. 2012

    Indian Institute of Technology (IIT), Kanpur, India.

Research Publications

  1. Integrating Contextual and Commonsense Information for Automatic Discourse Understanding: Contributions to Temporal Relation Classification and Bridging Anaphora Resolution.

    Ph.D. Dissertation.


  1. Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models

    Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Alham Fikri Aji, Zhengzhong Liu, Andy Hock, Andrew Feldman, Jonathan Lee, Andrew Jackson, Preslav Nakov, Timothy Baldwin, Eric Xing.

    arXiv:2308.16149, 2023.

  2. Probing for Bridging Inference in Transformer Language Models.

    Onkar Pandit, Yufang Hou.

    Annual Conference of the North American Chapter of the Association for Computational Linguistics 2021 (NAACL 2021).

  3. Integrating knowledge graph embeddings to improve mention representation for bridging anaphora resolution.

    Onkar Pandit, Pascal Denis, Liva Ralaivola.

    Workshop on Computational Models of Reference, Anaphora and Coreference, COLING 2020.

  4. Learning Rich Event Representations and Interactions for Temporal Relation Classification.

    Onkar Pandit, Pascal Denis, Liva Ralaivola.

    European Symposium on Artificial Neural Networks (ESANN-2019).

  5. CNN for Text-Based Multiple Choice Question Answering.

    Akshay Chaturvedi, Onkar Pandit, Utpal Garain.

    Association for Computational Linguistics (ACL-2018).

  6. Context Sensitive Lemmatization Using Two Successive Bidirectional Gated Recurrent Networks.

    Abhisek Chakrabarty, Onkar Pandit, Utpal Garain.

    Association for Computational Linguistics (ACL-2017).

  7. Identification of Reader Specific Difficult Words by Analyzing Eye Gaze and Document Content.

    Utpal Garain, Onkar Pandit, Olivier Augereau, Ayano Okoso, Koichi Kise.

    International Conference on Document Analysis and Recognition (ICDAR-2017).