pruning for sparsity – turns out [some LLMs work just as wel...

pruning for sparsity – turns out some LLMs work just as well if you set 60% of the weights to zero (though this likely isn’t true if you’re using Chinchilla-optimal training)

www.joshbeckman.org/notes/452253632