Nishant Balepur
Ph.D. Student in Computer Science at University of Maryland, College Park
Email:
nbalepur[at]umd[dot]edu
Hi! My name is Nishant and I’m a third-year Ph.D. candidate at the University of Maryland, where I am fortunate to be advised by Professors Jordan Boyd-Grayber and Rachel Rudinger. I’m currently interning with Ai2 to personalize ScholarQA and visiting NYU as a researcher with Eunsol Choi.
I want to build LLMs that actually help people, not just say what people want to hear. As a result, I often think about better ways to design evaluations offline and to capture downstream human feedback online. I’m currently excited about three research questions, in descending order of excitement:
- How can we build systems that help users? [flashcards (EMNLP’24), memorable study aids (EMNLP’24), personalized dpo (ACL’25), multi-step plans (EMNLP’25)]
- How can we rigorously evaluate model flaws? [eliminative reasoning (ACL’24), mcqa artifacts (ACL’24), benchmark cheating (ACL’24), mcqa plausibility (EMNLP’24), abductive reasoning (NAACL’25), mcqa edu theory (ACL’25), agents, reasoning LLM shortcuts]
- How can we synthesize factual sources? [topic mining (ACL’23), expository text (EMNLP’23), fact transfer (EMNLP’23), debatable queries (NAACL’25)]
I’m generally interested in research that is helpful for humans and fun to read. If you’re interested in similar problems, don’t hesitate to reach out!
And if you’ve seen another “Balepur, N” during your literature search, you may be looking for my sister 😛
📝 Selected Publications
2025
- ACL 2025Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the AboveIn Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2025Oral at ACL 2025, Best Paper Award and Oral (1.5%) at MASC-SLL 2025
2024
🥳 Research Highlights
| Aug 26, 2025 | We release AstaBench at Ai2, a more rigorous evaluation suite for AI agents. Check out our technical report and leaderboard! |
|---|---|
| Aug 20, 2025 | One paper accepted to EMNLP! We build an interface to help users solve complex questions with plans and show predicting which plans help humans is difficult for humans, reward models, and agents! |
| May 15, 2025 | Two papers accepted to ACL 2025! We design a simple technique to improve DPO’s personalization and make our case for why MCQA is a terrible evaluation format (oral!) |
| May 6, 2025 | I passed my thesis proposal so I’m Ph-Done! (with being a regular student as I am now a candidate 🤓☝️). Fun fact: my sister and I proposed our theses on the same day 😁 |
| Apr 5, 2025 | Excited to give an oral presentation on why MCQA sucks at MASC-SLL 2025. Also humbled to win a best paper award! |
| Mar 24, 2025 | Humbled to be invited for talks at Imperial College London on building Helpful QA systems (slides) and Google Translate’s Reading Group on improving MCQA evals (slides) |
😔 Negative Results
| Aug 8, 2025 | One paper got bad reviews at EMNLP 2025, then desk rejected from AAAI 2025 (never adding an Appendix again smh) |
|---|---|
| Jul 7, 2025 | One paper rejected from COLM 2025 💪 |
| Jun 11, 2025 | Our Schmidt Science Expression of Interest for AI Safety in the Inference-Time Compute Paradigm was rejected |
| Feb 13, 2025 | One paper got bad reviews in December ARR |
| Dec 19, 2024 | Didn’t get intern/fellow offers after interviewing at Meta, Cohere, and Anthropic |
| Jun 15, 2024 | KAR³L is on its fourth resubmission 🫡 |
| Apr 15, 2024 | One paper not committed to ACL 2024 |
| Feb 15, 2024 | Two papers not committed to NAACL 2024 |
| Feb 10, 2024 | Banned on r/ACT for trying to advertise our KAR³L user study 😭 |
| Oct 6, 2023 | One paper rejected from EMNLP 2023 |
| Mar 20, 2023 | My first ever review score of 1 recieved on an ARR submission |