Discussion about this post

User's avatar
Billy Newport's avatar

I wonder on the deepseek stuff. They keep using the same approach, it’s like using DCT for jpeg compression, whether for KV compression or now manifold compression. Yes, they use less memory, yes they take less compute to train and infer but the resolution is limited for complex problems. JPEG vs RAW. The hard stuff needs RAW…

Expand full comment
2 more comments...

No posts

Ready for more?