Nishant Balepur
Ph.D. Student in Computer Science at University of Maryland, College Park

Email:
nbalepur[at]umd[dot]edu
Hi! My name is Nishant and I’m a second-year Ph.D. student at the University of Maryland, where I am fortunate to be advised by Professors Jordan Boyd-Grayber and Rachel Rudinger. I am graciously supported by the NSF GRFP and a Cohere For AI Research Grant.
I semi-jokingly say that I work on bullying (evaluating flaws) and babysitting (alignment) in LLMs. I’m currently excited about the three research questions:
- How can we better synthesize factual sources? [topic mining (ACL’23), expository text (EMNLP’23), fact transfer (EMNLP’23), debatable queries (NAACL’25)]
- How can we teach models to help users? [flashcards (EMNLP’24), mnemonics (EMNLP’24), personalized dpo]
- How can we build evaluations to expose model/dataset flaws? [process of elimination (ACL’24), mcqa artifacts (ACL’24), benchmark cheating (ACL’24), mcqa plausibility (EMNLP’24), reverse qa (NAACL’25), mcqa is flawed]
I’m generally interested in research that is useful (helps users) and fun (with entertaining outputs to look at). If you’re interested in similar problems, don’t hesitate to reach out!
And if you’ve seen another “Balepur, N” during your literature search, you may be looking for my sister 😛
📝 Selected Publications
2025
2024
🥳 Research Highlights
Mar 24, 2025 | Humbled to be invited for talks at Imperial College London on building Helpful QA systems (slides) and Google Translate’s Reading Group on improving MCQA evals (slides) |
---|---|
Feb 28, 2025 | Two exciting life updates! I plan to join: 1) Ai2 (Semantic Scholar) as a summer intern working on helpful scientific QA systems; and 2) NYU as a visiting researcher (AY 25-26) with Eunsol Choi to live with my amazing partner in NYC :) |
Feb 20, 2025 | Tired of LLM evaluations with multiple choice questions? Our new position paper discusses these flaws and how insights from education can make evaluations more meaningful |
Jan 22, 2025 | Two papers at NAACL 2025 (main)! One work shows LLMs are surprisingly weak at generating accurate questions, while my Adobe internship project builds a multi-agent summarization model for debatable queries |
Jan 20, 2025 | New preprint on improving personalization in DPO! Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas |
😔 Negative Results
Feb 13, 2025 | One paper got bad reviews in December ARR |
---|---|
Dec 19, 2024 | Didn’t get intern/fellow offers after interviewing at Meta, Cohere, and Anthropic |
Jun 15, 2024 | KAR³L is on its fourth resubmission 🫡 |
Apr 15, 2024 | One paper not committed to ACL 2024 |
Feb 15, 2024 | Two papers not committed to NAACL 2024 |
Feb 10, 2024 | Banned on r/ACT for trying to advertise our KAR³L user study 😭 |
Oct 6, 2023 | One paper rejected from EMNLP 2023 |
Mar 20, 2023 | My first ever review score of 1 recieved on an ARR submission |