
Model Folding is a data-free model compression technique that merges structurally similar neurons across layers, reducing model size without fine-tuning or training data. It preserves data statistics using
Models learned by SGD trend to have correlated patterns or similar parameters in the weight space. The right-top plot in the following figure shows
Inspired by the structural similarities in pre-trained models, we propose model folding, clustering similar neurons instead of zeroing them out. We proposed two data-free repair algorithms to correct the BatchNorm statistics of a folded model.
To fold a model, there are only three steps: cluster, merge, and repair. No Data No Fine-tuning

We compared model folding to other SOTA methods including:
Fold-Naïve: model folding w/o repair
Fold-R: model folding w repair
Fold-AR: model folding w approximate repair
Fold-DIR: model folding w deep inversion repair
SP L1/L2: L1/L2 structured pruning
IFM: Iterative Feature Merging
INN: Integral neural networks

Comparison with IFM and structured magnitude pruning. Model folding, when tested on ResNet18 (top row) and VGG11-BN (bottom row) trained on CIFAR10 (left column) and ImageNet (right column), outperforms IFM with higher sparsity and increasing dataset difficulty.

Comparison of model folding with IFM, and INN using ResNet18 on CIFAR10. In the original experiment defined in the IFM and INN papers, where only the last two blocks of a ResNet18 are pruned, folding is significantly better than INN while it matches the performance of IFM for lower sparsities and becomes significantly better for higher sparsities.

Performance of structured pruning methods on LLaMA-7B without post-tuning, showing perplexity on WikiText2 and zero-shot performance across tasks. The "Average" is computed over four tasks. "Wanda_sp" represents an adapted Wanda method for structured pruning. Despite not using data or fine-tuning, model folding achieves comparable performance to data-driven methods.
xxxxxxxxxx71@inproceedings{wang2025forget,2 title = {Forget the Data and Fine-tuning!\\Just Fold the Network to Compress},3 author = {Dong Wang and Haris \v{S}iki\'{c} and Lothar Thiele and Olga Saukh},4 booktitle = {Proceedings of the International Conference on Learning Representations (ICLR)},5 year = {2025},6 url = {https://openreview.net/forum?id=W2Wkp9MQsF} 7}Model Folding Team w/ ❤️