Elias Frantar, Dan Alistarh
We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Question Answering | PIQA | Accuracy | 81.07 | OPT-175B |
| Question Answering | PIQA | Accuracy | 80.63 | SparseGPT 175B (50% Sparsity) |
| Question Answering | PIQA | Accuracy | 79.54 | SparseGPT 175B (4:8 Sparsity) |
| Question Answering | PIQA | Accuracy | 79.54 | SparseGPT 175B (2:4 Sparsity) |
| Question Answering | PIQA | Accuracy | 54.73 | OPT-175B (50% Sparsity) |
| Question Answering | StoryCloze | Accuracy | 79.82 | OPT-175B |
| Question Answering | StoryCloze | Accuracy | 78.87 | SparseGPT (175B, 50% Sparsity) |
| Question Answering | StoryCloze | Accuracy | 77.02 | SparseGPT (175B, 4:8 Sparsity) |
| Question Answering | StoryCloze | Accuracy | 76.19 | SparseGPT (175B, 2:4 Sparsity) |
| Question Answering | StoryCloze | Accuracy | 47.1 | OPT-175B (50% Sparsity) |
| Common Sense Reasoning | ARC (Challenge) | Accuracy | 43.94 | OPT-175B |
| Common Sense Reasoning | ARC (Challenge) | Accuracy | 41.3 | SparseGPT (175B, 50% Sparsity) |
| Common Sense Reasoning | ARC (Challenge) | Accuracy | 39.85 | SparseGPT (175B, 4:8 Sparsity) |
| Common Sense Reasoning | ARC (Challenge) | Accuracy | 38.99 | SparseGPT (175B, 2:4 Sparsity) |
| Common Sense Reasoning | ARC (Challenge) | Accuracy | 25.6 | OPT-175B (50% Sparsity) |
| Common Sense Reasoning | ARC (Easy) | Accuracy | 71.04 | OPT-175B |
| Common Sense Reasoning | ARC (Easy) | Accuracy | 69.65 | SparseGPT 175B (50% sparsity) |
| Common Sense Reasoning | ARC (Easy) | Accuracy | 68.35 | SparseGPT (175B, 4:8 Sparsity) |
| Common Sense Reasoning | ARC (Easy) | Accuracy | 67.08 | SparseGPT 175B (2:4 sparsity) |
| Common Sense Reasoning | ARC (Easy) | Accuracy | 28.03 | OPT 175B (50% Sparsity) |
| Language Modelling | LAMBADA | Accuracy | 79.47 | SparseGPT (175B, 2:4 Sparsity) |
| Language Modelling | LAMBADA | Accuracy | 78.77 | SparseGPT (175B, 4:8 Sparsity) |
| Language Modelling | LAMBADA | Accuracy | 76.51 | SparseGPT (175B, 50% Sparsity) |
| Language Modelling | LAMBADA | Accuracy | 75.59 | OPT-175B |
| Language Modelling | LAMBADA | Accuracy | 0.02 | OPT-175B (50% Sparsity) |
| Language Modelling | WikiText-2 | Test perplexity | 8.21 | SparseGPT (175B, 50% Sparsity) |
| Language Modelling | WikiText-2 | Test perplexity | 8.34 | OPT-175B |
| Language Modelling | WikiText-2 | Test perplexity | 8.45 | SparseGPT (175B, 4:8 Sparsity) |
| Language Modelling | WikiText-2 | Test perplexity | 8.73 | SparseGPT (175B, 2:4 Sparsity) |
| Language Modelling | WikiText-2 | Test perplexity | 234.77 | OPT-175B (50% Sparsity) |