Nishant Balepur
Ph.D. Student in Computer Science at University of Maryland, College Park

Email:
nbalepur[at]umd[dot]edu
Hi! My name is Nishant and I’m a second-year Ph.D. student at the University of Maryland, where I am fortunate to be advised by Professors Jordan Boyd-Grayber and Rachel Rudinger. I am graciously supported by the NSF GRFP and a Cohere For AI Research Grant.
I semi-jokingly say that I work on bullying (evaluating flaws) and babysitting (alignment) in LLMs. I’m currently excited about the three research questions, in descending order of excitement:
- How can we teach models to help users? [flashcards (EMNLP’24), mnemonics (EMNLP’24), personalized dpo]
- How can we build evaluations to expose model/dataset flaws? [process of elimination (ACL’24), mcqa artifacts (ACL’24), benchmark cheating (ACL’24), mcqa plausibility (EMNLP’24), reverse qa (NAACL’25), mcqa just generally sucks]
- How can we better synthesize factual sources? [topic mining (ACL’23), expository text (EMNLP’23), fact transfer (EMNLP’23), debatable queries (NAACL’25)]
I’m generally interested in research that is helpful (useful for humans) and fun (resulting in papers that are entertaining to read). If you’re interested in similar problems, don’t hesitate to reach out!
And if you’ve seen another “Balepur, N” during your literature search, you may be looking for my sister 😛
📝 Selected Publications
2025
2024
- ACL 2024
🥳 Research Highlights
Apr 5, 2025 | Excited to give an oral presentation on why MCQA sucks at MASC-SLL 2025. Also humbled to win a best paper award! |
---|---|
Mar 24, 2025 | Humbled to be invited for talks at Imperial College London on building Helpful QA systems (slides) and Google Translate’s Reading Group on improving MCQA evals (slides) |
Feb 28, 2025 | Two exciting life updates! I plan to join: 1) Ai2 (Semantic Scholar) as a summer intern working on helpful scientific QA systems; and 2) NYU as a visiting researcher (AY 25-26) with Eunsol Choi to live with my amazing partner in NYC :) |
Feb 20, 2025 | Tired of LLM evaluations with multiple choice questions? Our new position paper discusses these flaws and how insights from education can make evaluations more meaningful |
Jan 22, 2025 | Two papers at NAACL 2025 (main)! One work shows LLMs are surprisingly weak at generating accurate questions, while my Adobe internship project builds a multi-agent summarization model for debatable queries |
😔 Negative Results
Feb 13, 2025 | One paper got bad reviews in December ARR |
---|---|
Dec 19, 2024 | Didn’t get intern/fellow offers after interviewing at Meta, Cohere, and Anthropic |
Jun 15, 2024 | KAR³L is on its fourth resubmission 🫡 |
Apr 15, 2024 | One paper not committed to ACL 2024 |
Feb 15, 2024 | Two papers not committed to NAACL 2024 |
Feb 10, 2024 | Banned on r/ACT for trying to advertise our KAR³L user study 😭 |
Oct 6, 2023 | One paper rejected from EMNLP 2023 |
Mar 20, 2023 | My first ever review score of 1 recieved on an ARR submission |