WebGen-Bench
ImagesTextsMITIntroduced 2025-05-06
WebGen-Bench
WebGen-Bench is created to benchmark LLM-based agent's ability to generate websites from scratch. The dataset is introduced in WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch. It contains 101 instructions and 647 test cases. It also has a training set of 6667 instructions, named WebGen-Instruct.
The code for evaluation as well as the training code and data are released at WebGen-Bench (Github)
Citation
If you find our project useful, please cite:
@misc{lu2025webgenbenchevaluatingllmsgenerating,
title={WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch},
author={Zimu Lu and Yunqiao Yang and Houxing Ren and Haotian Hou and Han Xiao and Ke Wang and Weikang Shi and Aojun Zhou and Mingjie Zhan and Hongsheng Li},
year={2025},
eprint={2505.03733},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.03733},
}