CoderUJB: An Executable and Unified Java Benchmark for Practical Programming Scenarios

Zeng, Zhengran; Wang, Yidong; Xie, Rui; Ye, Wei; Zhang, Shikun

Abstract:In the evolving landscape of large language models (LLMs) tailored for software engineering, the need for benchmarks that accurately reflect real-world development scenarios is paramount. Current benchmarks are either too simplistic or fail to capture the multi-tasking nature of software development. To address this, we introduce CoderUJB, a new benchmark designed to evaluate LLMs across diverse Java programming tasks that are executable and reflective of actual development scenarios, acknowledging Java's prevalence in real-world software production. CoderUJB comprises 2,239 programming questions derived from 17 real open-source Java projects and spans five practical programming tasks. Our empirical study on this benchmark investigates the coding abilities of various open-source and closed-source LLMs, examining the effects of continued pre-training in specific programming languages code and instruction fine-tuning on their performance. The findings indicate that while LLMs exhibit strong potential, challenges remain, particularly in non-functional code generation (e.g., test generation and defect detection). Importantly, our results advise caution in the specific programming languages continued pre-training and instruction fine-tuning, as these techniques could hinder model performance on certain tasks, suggesting the need for more nuanced strategies. CoderUJB thus marks a significant step towards more realistic evaluations of programming capabilities in LLMs, and our study provides valuable insights for the future development of these models in software engineering.

Comments:	11 pages, 4 figures, issta2024 accepted
Subjects:	Software Engineering (cs.SE)
MSC classes:	68N30 (Primary) 68T20 (Secondary)
ACM classes:	D.2.0
Cite as:	arXiv:2403.19287 [cs.SE]
	(or arXiv:2403.19287v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2403.19287

Computer Science > Software Engineering

Title:CoderUJB: An Executable and Unified Java Benchmark for Practical Programming Scenarios

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators