Dheeraj Baiju

We introduce DynEval, a dynamic framework for evaluating text-to-image generation that jointly assesses prompt alignment and image quality. To enable scalable training, we construct GenDB and DynEvalInstruct, two large-scale datasets containing generated prompt–image pairs and structured evaluation instructions. By distilling a strong multimodal teacher into compact 2B and 4B evaluator models, DynEval achieves higher correlation with human judgments than existing T2I evaluators while also providing fine-grained diagnostic feedback on generation failures.