Starred immediately.<p>This is exactly the kind of practical quantization work that makes running longer-context models on consumer GPUs actually feasible. Looking forward to seeing it generalized beyond the one model.Great stuff, g023.
by santander_cl
|
Apr 4, 2026, 12:18:53 AM