Nishant Balepur

Ph.D. Student in Computer Science at University of Maryland, College Park

profile.png

Email:

nbalepur[at]umd[dot]edu

Hi! My name is Nishant and I’m a second-year Ph.D. student at the University of Maryland, where I am fortunate to be advised by Professors Jordan Boyd-Grayber and Rachel Rudinger. I am graciously supported by the NSF GRFP and a Cohere For AI Research Grant.

I semi-jokingly say that I work on bullying (evaluating flaws) and babysitting (alignment) in LLMs. I’m currently excited about the three research questions:

  1. How can we better synthesize factual sources? [topic mining (ACL’23), expository text (EMNLP’23), fact transfer (EMNLP’23), debatable queries (NAACL’25)]
  2. How can we teach models to help users? [flashcards (EMNLP’24), mnemonics (EMNLP’24), personalized dpo]
  3. How can we build evaluations to expose model/dataset flaws? [process of elimination (ACL’24), mcqa artifacts (ACL’24), benchmark cheating (ACL’24), mcqa plausibility (EMNLP’24), reverse qa (NAACL’25), mcqa is flawed]

I’m generally interested in research that is useful (helps users) and fun (with entertaining outputs to look at). If you’re interested in similar problems, don’t hesitate to reach out!

And if you’ve seen another “Balepur, N” during your literature search, you may be looking for my sister 😛


📝 Selected Publications

2025

  1. Preprint
    Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above
    Nishant Balepur, Rachel Rudinger, and Jordan Lee Boyd-Graber
    2025

2024

  1. EMNLP 2024
    A SMART Mnemonic Sounds like “Glue Tonic”: Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick
    Nishant Balepur, Matthew Shu, Alexander Hoyle, and 4 more authors
    In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024
  2. ACL 2024
    Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?
    Nishant Balepur, Abhilasha Ravichander, and Rachel Rudinger
    In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Aug 2024
    Best Paper Award (4%) and Oral (7%) at MASC-SSL 2024

🥳 Research Highlights

Mar 24, 2025 Humbled to be invited for talks at Imperial College London on building Helpful QA systems (slides) and Google Translate’s Reading Group on improving MCQA evals (slides)
Feb 28, 2025 Two exciting life updates! I plan to join: 1) Ai2 (Semantic Scholar) as a summer intern working on helpful scientific QA systems; and 2) NYU as a visiting researcher (AY 25-26) with Eunsol Choi to live with my amazing partner in NYC :)
Feb 20, 2025 Tired of LLM evaluations with multiple choice questions? Our new position paper discusses these flaws and how insights from education can make evaluations more meaningful
Jan 22, 2025 Two papers at NAACL 2025 (main)! One work shows LLMs are surprisingly weak at generating accurate questions, while my Adobe internship project builds a multi-agent summarization model for debatable queries
Jan 20, 2025 New preprint on improving personalization in DPO! Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas

😔 Negative Results

Feb 13, 2025 One paper got bad reviews in December ARR
Dec 19, 2024 Didn’t get intern/fellow offers after interviewing at Meta, Cohere, and Anthropic
Jun 15, 2024 KAR³L is on its fourth resubmission 🫡
Apr 15, 2024 One paper not committed to ACL 2024
Feb 15, 2024 Two papers not committed to NAACL 2024
Feb 10, 2024 Banned on r/ACT for trying to advertise our KAR³L user study 😭
Oct 6, 2023 One paper rejected from EMNLP 2023
Mar 20, 2023 My first ever review score of 1 recieved on an ARR submission