Dobariya, Om, and Akhil Kumar. “Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (Short Paper).” arXiv, 2025. DOI.org (Datacite), https://doi.org/10.48550/ARXIV.2510.04950.
Spurred discussion in the prompt engineering community with over a million views on social media posts about the implications of the findings within a week of publication.
Designed and conducted the study testing how prompt politeness levels affect LLM accuracy.
Created a dataset of 50 base multiple-choice questions rewritten into 250 variants across five politeness tones.
Implemented experimental pipeline with ChatGPT-4o and applied paired t-tests for statistical validation.
Found that impolite prompts consistently outperformed polite ones, highlighting the need to examine pragmatic features of prompts and their effects on LLM responses.
Kumar, Akhil, et al. “Evaluation of LLMs for Process Model Analysis and Optimization.” arXiv, 2025. DOI.org (Datacite), https://doi.org/10.48550/ARXIV.2510.07489.
Designed and implemented the comparative evaluation framework for benchmarking four large language models on BPMN process-analysis tasks.
Developed a five-criterion scoring matrix (syntax error detection, logical error detection, semantic reasoning depth, optimization quality, and BPMN diagramming) with a consistent 0–3 evaluation scale.
Conducted the experiments and quantitative analysis, compiled results (Table 4), and identified cross-model differences in reasoning accuracy and diagram generation.
My CV