Publications

* denotes equal contribution


  1. Prompt Composition Technique for Code-Switched Tasks

    EMNLP 2022 (long paper)

    Code-switched (CS) data is ubiquitous in to- day’s globalized world, but the dearth of anno- tated datasets in code-switching poses a signifi- cant challenge for learning diverse tasks across different language pairs. Parameter-efficient prompt-tuning approaches conditioned on frozen language models have shown promise for transfer learning in limited-resource setups. In this paper, we propose a novel instance- based prompt composition technique, PRO- CS, for CS tasks that combine language and task knowledge. We compare our approach with prompt-tuning and fine-tuning for code- switched tasks on 10 datasets across 4 language pairs. Our model outperforms the prompt- tuning approach by significant margins across all datasets and outperforms or remains at par with fine-tuning by using just 0.18% of total parameters. We also achieve competitive re- sults when compared with the fine-tuned model in the low-resource cross-lingual and cross- task setting, indicating the effectiveness of our approach to incorporate new code-switched tasks.
  1. Zero-shot cross-lingual open domain question answering
    Sumit Agarwal, Suraj Tripathi, Teruko Mitamura, and Carolyn Rose.

    In Proceedings of the Workshop on Multilingual Information Access (MIA) @ NAACL 2022.

    People speaking different kinds of languages search for information in a cross-lingual manner. They tend to ask questions in their language and expect the answer to be in the same language, despite the evidence lying in another language. In this paper, we present our approach for this task of cross-lingual open-domain question-answering. Our proposed method employs a passage reranker, the fusion-in-decoder technique for generation, and a wiki data entity-based post-processing system to tackle the inability to generate entities across all languages. Our end-2-end pipeline shows an improvement of 3 and 4.6 points on F1 and EM metrics respectively, when compared with the baseline CORA model on the XOR-TyDi dataset. We also evaluate the effectiveness of our proposed techniques in the zero-shot setting using the MKQA dataset and show an improvement of 5 points in F1 for high-resource and 3 points improvement for low-resource zero-shot languages. Our team, CMUmQA’s submission in the MIA-Shared task ranked 1st in the constrained setup for the dev and 2nd in the test setting.
    1. R3: Refined Retriever-Reader Pipeline for Multidoc2dial

      In DialDoc @ ACL 2022.

      In this paper, we present our submission to the DialDoc shared task based on the MultiDoc2Dial dataset. MultiDoc2Dial is a conversational question answering dataset that grounds dialogues in multiple documents. The task involves grounding a user’s query in a document followed by generating an appropriate response. We propose several improvements over the baseline’s retriever-reader architecture to aid in modeling goal-oriented dialogues grounded in multiple documents. Our proposed approach employs sparse representations for passage retrieval, a passage re-ranker, the fusion-in-decoder architecture for generation, and a curriculum learning training paradigm. Our approach shows a 12 points improvement in BLEU score compared to the baseline RAG model.
    1. Input-conditioned Convolution Filters for Feature Learning
      Suraj Tripathi, Saurabh Tripathi, Abhay Kumar, and Chirag Singh,

      In Proceedings of the 18th International Conference on Advances in Mobile Computing & Multimedia 2020.

      We propose a novel framework which combines the input-conditioned filter generation module and a decoder based network to incorpo- rate contextual information present in images into Convolutional Neural Networks (CNNs). In contrast to traditional CNNs, we do not employ the same set of learned convolution filters for all input image instances. And our proposed decoder network serves the purpose of reducing the transformation present in the input image by learning to construct a representative image of the input image class. Our proposed joint supervision of input-aware framework when combined with techniques inspired by Multi-instance learn- ing and max-pooling, results in a transformation-invariant neural network. We investigated the performance of our proposed frame- work on three MNIST variations, which covers both rotation and scaling variance, and achieved 0.98% error on MNIST-rot-12k, 1.12% error on Half-rotated MNIST and 0.68% error on Scaling MNIST, which is significantly better than the state-of-the-art results. Our proposed model also showcased consistent improvement on the CIFAR dataset. We make use of visualization to further prove the effectiveness of our input-aware convolution filters. Our proposed convolution filter generation framework can also serve as a plugin for any CNN based architecture and enhance its modeling capacity.
      1. Stance Detection in Code-Mixed Hindi-English Social Media Data using Multi-Task Learning

        In Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, WASSA @ NAACL-HLT 2019.

        Social media sites like Facebook, Twitter, and other microblogging forums have emerged as a platform for people to express their opinions and views on different issues and events. It is often observed that people tend to take a stance; in favor, against or neutral towards a particular topic. The task of assessing the stance taken by the individual became significantly important with the emergence in the usage of online social platforms. Automatic stance detection system understands the user’s stance by analyzing the standalone texts against a target entity. Due to the limited contextual information a single sentence provides, it is challenging to solve this task effectively. In this paper, we introduce a Multi-Task Learning (MTL) based deep neural network architecture for automatically detecting stance present in the code-mixed corpus. We apply our approach on Hindi-English code-mixed corpus against the target entity - “Demonetisation.” Our best model achieved the result with a stance prediction accuracy of 63.2% which is a 4.5% overall accuracy improvement compared to the current supervised classification systems developed using the benchmark dataset for code-mixed data stance detection.
        1. Speech Emotion Recognition Using Spectrogram & Phoneme Embedding

          In INTERSPEECH 2018.

          This paper proposes a speech emotion recognition method based on phoneme sequence and spectrogram. Both phoneme sequence and spectrogram retain emotion contents of speech which is missed if the speech is converted into text. We performed various experiments with different kinds of deep neural networks with phoneme and spectrogram as inputs. Three of those network architectures are presented here that helped to achieve better accuracy when compared to the stateof-the-art methods on benchmark dataset. A phoneme and spectrogram combined CNN model proved to be most accurate in recognizing emotions on IEMOCAP data. We achieved more than 4% increase in overall accuracy and average class accuracy as compared to the existing state-of-the-art methods.
          © Copyright 2022 Suraj Tripathi.