-
Codifying the Judge: Scalable Evaluation via Program Distillation
Under Submission
Tzu-Heng Huang*, Shengqi Qiu*, Frederic Sala
[PDF]
[CODE]
[PROJECT PAGE]
-
Test-Time Scaling Makes Overtraining Compute-Optimal
Under Submission
Nicholas Roberts, Sungjun Cho, Zhiqi Gao, Tzu-Heng Huang, Albert Wu, Gabriel Orlanski, Avi Trost, Kelly Buchanan, Aws Albarghouthi, Frederic Sala
[PDF]
[X POST]
-
RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning
Under Submission
Tzu-Heng Huang, Sirajul Salekin, Javier Movellan, Frederic Sala, Manjot Bilkhu
[PDF]
[X POST]
-
WARP: Weight-Space Analysis for Recovering Training Data Portfolios
ICML'26 Weight-Space Symmetries: from Foundations to Practical Applications (WSS) Workshop
Tzu-Heng Huang*, Aditya Goyal*, John Cooper, Frederic Sala
[PDF]
[CODE]
-
Evaluating Sample Utility For Efficient Data Selection by Mimicking Model Weights
ICML'26
ICML'25 Unifying Data Curation Frameworks Across Domains (DataWorld) Workshop (Oral)
Tzu-Heng Huang, Manjot Bilkhu, John Cooper, Frederic Sala, Javier Movellan
[PDF]
[CODE]
[X POST]
-
CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation
ICML'26
Jitian Zhao, Changho Shin, Tzu-Heng Huang, Satya Sai Srinath Namburi GNVV, Frederic Sala
[PDF]
[CODE]
[X POST]
-
Time to Impeach LLM-as-a-Judge: Programs are the Future of Evaluation
ICML'25 Programmatic Representations for Agent Learning (PRAL) Workshop
Tzu-Heng Huang, Harit Vishwakarma, Frederic Sala
[PDF]
[CODE]
[X POST]
-
Shrinking the Generation-Verification Gap by Scaling Compute for Verification
NeurIPS'25
ICML'25 Efficient Systems for Foundation Models (ES-FoMo III) Workshop
ICML'25 Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures (MAS) Workshop
Jon Saad-Falcon, E. Kelly Buchanan, Mayee F Chen, Tzu-Heng Huang, Brendan McLaughlin, Tanvir Bhathal, Shang Zhu, Ben Athiwaratkun, Frederic Sala, Scott Linderman, Azalia Mirhoseini, Christopher Re
[PDF]
[CODE]
[BLOG]
[X POST]
-
From Many Voices to One: A Statistically Principled Aggregation of LLM Judges
NeurIPS'25 Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling (LLM-Eval) Workshop
NeurIPS'25 Reliable ML from Unreliable Data (Reliable ML) Workshop
Jitian Zhao, Changho Shin, Tzu-Heng Huang, Satya Sai Srinath Namburi GNVV, Frederic Sala
[PDF]
[CODE]
[X POST]
-
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training
ICML'25 Unifying Data Curation Frameworks Across Domains (DataWorld) Workshop
ICML'25 Data in Generative Models (The Bad, the Ugly, and the Greats) (DIG-BUGS) Workshop
Albert Ge, Tzu-Heng Huang, John Cooper, Avi Trost, Ziyi Chu, Satya Sai Srinath Namburi GNVV, Ziyang Cai, Kendall Park, Nicholas Roberts, Frederic Sala
[PDF]
[X POST]
-
The ALCHEmist: Automated Labeling 500x CHEaper Than LLM Data Annotators
NeurIPS'24 (Spotlight)
Tzu-Heng Huang, Catherine Cao, Vaishnavi Bhargava, Frederic Sala
[PDF]
[CODE]
[BLOG]
[X POST]
-
MoRe Fine-Tuning with 10x Fewer Parameters
ICML'24 Efficient Systems for Foundation Models (ES-FoMo) Workshop
ICML'24 Foundation Models in the Wild (FM-Wild) Workshop
Wenxuan Tan, Nicholas Roberts, Tzu-Heng Huang, Jitian Zhao, John Cooper, Samuel Guo, Chengyu Duan, Frederic Sala
[PDF]
[CODE]
-
Train 'n Trade: Foundations of Parameter Markets
NeurIPS'23
Tzu-Heng Huang, Harit Vishwakarma, Frederic Sala
[PDF]
[X POST]
-
Geometry-Aware Adaptation for Pretrained Models
NeurIPS'23
Nicholas Roberts, Xintong Li, Dyah Adila, Sonia Cromp, Tzu-Heng Huang, Jitian Zhao, Frederic Sala
[PDF]
[CODE]
[X POST]
-
Multimodal Data Curation via Object Detection and Filter Ensembles
ICCV'23 Towards the Next Generation of Computer Vision Datasets (TNGCV) Workshop
1st place on the Datacomp leaderboard (small-scale filtering track)
Tzu-Heng Huang*, Changho Shin*, Sui Jiet Tay, Dyah Adila, Frederic Sala
[PDF]
[X POST]
-
ScriptoriumWS: A Code Generation Assistant for Weak Supervision
ICLR'23 Deep Learning for Code (DL4C) Workshop
2023 Midwest Machine Learning Symposium
Tzu-Heng Huang, Catherine Cao, Spencer Schoenberg, Harit Vishwakarma, Nicholas Roberts, Frederic Sala
[PDF]
[CODE]
-
AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels
NeurIPS'22
Nicholas Roberts, Xintong Li, Tzu-Heng Huang, Dyah Adila, Spencer Schoenberg, Cheng-Yu Liu, Lauren Pick, Haotian Ma, Aws Albarghouthi, Frederic Sala
[PDF]
[CODE]
[BLOG]
[X POST]
-
Key Sensor Discovery for Quality Audit of Air Sensor Networks
MobiSys'20
Tzu-Heng Huang, Cheng-Hsien Tsai, Man-Kwan Shan
[PDF]