Nishant Balepur
Ph.D. Student in Computer Science at University of Maryland, College Park

Email:
nbalepur[at]umd[dot]edu
Hi! My name is Nishant and I’m a third-year Ph.D. candidate at the University of Maryland, where I am fortunate to be advised by Professors Jordan Boyd-Grayber and Rachel Rudinger. I’m currently interning with Ai2 to personalize ScholarQA and visiting NYU as a researcher with Eunsol Choi.
I mostly work on alignment (rewards beyond preferences) and evaluation (metrics beyond correctness) to make LLM responses more helpful. I’m currently excited about three research questions, in descending order of excitement:
- How can we build systems that help users? [flashcards (EMNLP’24), memorable study aids (EMNLP’24), personalized dpo (ACL’25), multi-step plans (EMNLP’25)]
- How can we rigorously build evaluations to expose model flaws? [eliminative reasoning (ACL’24), mcqa artifacts (ACL’24), benchmark cheating (ACL’24), mcqa plausibility (EMNLP’24), abductive reasoning (NAACL’25), mcqa generally sucks (ACL’25)]
- How can we synthesize factual sources? [topic mining (ACL’23), expository text (EMNLP’23), fact transfer (EMNLP’23), debatable queries (NAACL’25)]
I’m generally interested in research that is useful for humans and fun to read. If you’re interested in similar problems, don’t hesitate to reach out!
And if you’ve seen another “Balepur, N” during your literature search, you may be looking for my sister 😛
📝 Selected Publications
2025
2024
🥳 Research Highlights
Aug 26, 2025 | We release AstaBench at Ai2, a more rigorous evaluation suite for AI agents. Check out our technical report and leaderboard! |
---|---|
Aug 20, 2025 | One paper accepted to EMNLP! We build an interface to help users solve complex questions with plans and show predicting which plans help humans is difficult for humans, reward models, and agents! |
May 15, 2025 | Two papers accepted to ACL 2025! We design a simple technique to improve DPO’s personalization and make our case for why MCQA is a terrible evaluation format (oral!) |
May 6, 2025 | I passed my thesis proposal so I’m Ph-Done! (with being a regular student as I am now a candidate 🤓☝️). Fun fact: my sister and I proposed our theses on the same day 😁 |
Apr 5, 2025 | Excited to give an oral presentation on why MCQA sucks at MASC-SLL 2025. Also humbled to win a best paper award! |
Mar 24, 2025 | Humbled to be invited for talks at Imperial College London on building Helpful QA systems (slides) and Google Translate’s Reading Group on improving MCQA evals (slides) |
😔 Negative Results
Aug 8, 2025 | One paper got bad reviews at EMNLP 2025, then desk rejected from AAAI 2025 (never adding an Appendix again smh) |
---|---|
Jul 7, 2025 | One paper rejected from COLM 2025 💪 |
Jun 11, 2025 | Our Schmidt Science Expression of Interest for AI Safety in the Inference-Time Compute Paradigm was rejected |
Feb 13, 2025 | One paper got bad reviews in December ARR |
Dec 19, 2024 | Didn’t get intern/fellow offers after interviewing at Meta, Cohere, and Anthropic |
Jun 15, 2024 | KAR³L is on its fourth resubmission 🫡 |
Apr 15, 2024 | One paper not committed to ACL 2024 |
Feb 15, 2024 | Two papers not committed to NAACL 2024 |
Feb 10, 2024 | Banned on r/ACT for trying to advertise our KAR³L user study 😭 |
Oct 6, 2023 | One paper rejected from EMNLP 2023 |
Mar 20, 2023 | My first ever review score of 1 recieved on an ARR submission |