-
Codifying the Judge: Scalable Evaluation via Program Distillation
Under Submission
Tzu-Heng Huang*, Shengqi Qiu*, Frederic Sala
-
Test-Time Scaling Makes Overtraining Compute-Optimal
Under Submission
Nicholas Roberts, Sungjun Cho, Zhiqi Gao, Tzu-Heng Huang, Albert Wu, Gabriel Orlanski, Avi Trost, Kelly Buchanan, Aws Albarghouthi, Frederic Sala
[PDF]
[X POST]
-
RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning
Under Submission
Tzu-Heng Huang, Sirajul Salekin, Javier Movellan, Frederic Sala, Manjot Bilkhu
[PDF]
[X POST]
-
WARP: Weight-Space Analysis for Recovering Training Data Portfolios
ICML'26 Weight-Space Symmetries: from Foundations to Practical Applications (WSS) Workshop
Tzu-Heng Huang*, Aditya Goyal*, John Cooper, Frederic Sala
[CODE]
-
Evaluating Sample Utility For Efficient Data Selection by Mimicking Model Weights
ICML'26
ICML'25 Unifying Data Curation Frameworks Across Domains (DataWorld) Workshop (Oral)
Tzu-Heng Huang, Manjot Bilkhu, John Cooper, Frederic Sala, Javier Movellan
[PDF]
[CODE]
[X POST]
-
CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation
ICML'26
Jitian Zhao, Changho Shin, Tzu-Heng Huang, Satya Sai Srinath Namburi GNVV, Frederic Sala
[PDF]
[CODE]
[X POST]
-
Time to Impeach LLM-as-a-Judge: Programs are the Future of Evaluation
ICML'25 Programmatic Representations for Agent Learning (PRAL) Workshop
Tzu-Heng Huang, Harit Vishwakarma, Frederic Sala
[PDF]
[CODE]
[X POST]
-
Shrinking the Generation-Verification Gap by Scaling Compute for Verification
NeurIPS'25 &
ICML'25 Efficient Systems for Foundation Models (ES-FoMo III) Workshop &
ICML'25 Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures (MAS) Workshop
Jon Saad-Falcon, E. Kelly Buchanan, Mayee F Chen, Tzu-Heng Huang, Brendan McLaughlin, Tanvir Bhathal, Shang Zhu, Ben Athiwaratkun, Frederic Sala, Scott Linderman, Azalia Mirhoseini, Christopher Re
[PDF]
[CODE]
[BLOG]
[X POST]
-
From Many Voices to One: A Statistically Principled Aggregation of LLM Judges
NeurIPS'25 Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling Workshop &
NeurIPS'25 Reliable ML from Unreliable Data Workshop
Jitian Zhao, Changho Shin, Tzu-Heng Huang, Satya Sai Srinath Namburi GNVV, Frederic Sala
[PDF]
[CODE]
[X POST]
-
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training
ICML'25 Unifying Data Curation Frameworks Across Domains (DataWorld) Workshop &
ICML'25 Data in Generative Models (The Bad, the Ugly, and the Greats) (DIG-BUGS) Workshop
Albert Ge, Tzu-Heng Huang, John Cooper, Avi Trost, Ziyi Chu, Satya Sai Srinath Namburi GNVV, Ziyang Cai, Kendall Park, Nicholas Roberts, Frederic Sala
[PDF]
[X POST]
-
The ALCHEmist: Automated Labeling 500x CHEaper Than LLM Data Annotators
NeurIPS'24 (Spotlight)
Tzu-Heng Huang, Catherine Cao, Vaishnavi Bhargava, Frederic Sala
[PDF]
[CODE]
[BLOG]
[X POST]
-
MoRe Fine-Tuning with 10x Fewer Parameters
ICML'24 Efficient Systems for Foundation Models (ES-FoMo) Workshop &
ICML'24 Foundation Models in the Wild Workshop
Wenxuan Tan, Nicholas Roberts, Tzu-Heng Huang, Jitian Zhao, John Cooper, Samuel Guo, Chengyu Duan, Frederic Sala
[PDF]
[CODE]
-
Train 'n Trade: Foundations of Parameter Markets
NeurIPS'23
Tzu-Heng Huang, Harit Vishwakarma, Frederic Sala
[PDF]
[X POST]
-
Geometry-Aware Adaptation for Pretrained Models
NeurIPS'23
Nicholas Roberts, Xintong Li, Dyah Adila, Sonia Cromp, Tzu-Heng Huang, Jitian Zhao, Frederic Sala
[PDF]
[CODE]
[X POST]
-
Multimodal Data Curation via Object Detection and Filter Ensembles
ICCV'23 Towards the Next Generation of Computer Vision Datasets (TNGCV) Workshop
1st place on the Datacomp leaderboard (small-scale filtering track)
Tzu-Heng Huang*, Changho Shin*, Sui Jiet Tay, Dyah Adila, Frederic Sala
[PDF]
[X POST]
-
ScriptoriumWS: A Code Generation Assistant for Weak Supervision
ICLR'23 Deep Learning for Code (DL4C) Workshop &
2023 Midwest Machine Learning Symposium
Tzu-Heng Huang, Catherine Cao, Spencer Schoenberg, Harit Vishwakarma, Nicholas Roberts, Frederic Sala
[PDF]
[CODE]
-
AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels
NeurIPS'22
Nicholas Roberts, Xintong Li, Tzu-Heng Huang, Dyah Adila, Spencer Schoenberg, Cheng-Yu Liu, Lauren Pick, Haotian Ma, Aws Albarghouthi, Frederic Sala
[PDF]
[CODE]
[BLOG]
[X POST]
-
Key Sensor Discovery for Quality Audit of Air Sensor Networks
MobiSys'20
Tzu-Heng Huang, Cheng-Hsien Tsai, Man-Kwan Shan
[PDF]