Sitting on my toilet and just took a big #Chamath. Dude, next article, can you tell us the art of how one moves from tech dork to trying to be cool? Once a tool, always a tool. What SPAQ are you going to pawn off on us or what pedophile grifter are you going to kiss up too to serve the interests of your big ass Indian nose. Who is uglier? You or eyeballs Patel?
Now the question is, as someone who only owns index funds but has been using AI for years , and came to the conclusion that Claude is far better than anything else, should I buy shares?
The Pluralis number is the one I'd hold most loosely: 20x versus 5x, so 5.5 years to catch up. Extrapolating two exponentials to their crossing point is where most forecasts like this go to die, because the faster rate is usually fast precisely because it's early.
A 20x annual rate is mostly a property of being small, not of decentralized training itself. Early systems post huge growth off a low base, harvesting gains that were cheap because no one had taken them yet, and the rate decays as the level rises. here the decay has a specific cause: the binding constraint for distributed training is communication, not compute, synchronising gradients across hundreds of cities on ordinary internet, and that cost grows with model size and node count rather than shrinking.
So the 20x is borrowed from the regime where models are small enough that communication hasn't bitten yet. Push toward frontier scale and the constraint that's mild today becomes the wall. The crossing point recedes as you approach it, because the trailing line slows exactly where it would need to hold its pace. The buried assumption is that the growth rate is independent of the level, and for this technology the rate is a function of the level.
the science is detailed out in this paper presented at NeurIPS 23 https://arxiv.org/abs/2506.01260 "Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism"
The current Pluralis-8B run has an overhead of 1.6x over the centralized settings https://agora.pluralis.ai/
Sitting on my toilet and just took a big #Chamath. Dude, next article, can you tell us the art of how one moves from tech dork to trying to be cool? Once a tool, always a tool. What SPAQ are you going to pawn off on us or what pedophile grifter are you going to kiss up too to serve the interests of your big ass Indian nose. Who is uglier? You or eyeballs Patel?
Now the question is, as someone who only owns index funds but has been using AI for years , and came to the conclusion that Claude is far better than anything else, should I buy shares?
the cost to find edge cases drops when models stop cutting corners
Claude is a better product across the board, I'm not surprised the market values it higher than ChatGPT.
The Pluralis number is the one I'd hold most loosely: 20x versus 5x, so 5.5 years to catch up. Extrapolating two exponentials to their crossing point is where most forecasts like this go to die, because the faster rate is usually fast precisely because it's early.
A 20x annual rate is mostly a property of being small, not of decentralized training itself. Early systems post huge growth off a low base, harvesting gains that were cheap because no one had taken them yet, and the rate decays as the level rises. here the decay has a specific cause: the binding constraint for distributed training is communication, not compute, synchronising gradients across hundreds of cities on ordinary internet, and that cost grows with model size and node count rather than shrinking.
So the 20x is borrowed from the regime where models are small enough that communication hasn't bitten yet. Push toward frontier scale and the constraint that's mild today becomes the wall. The crossing point recedes as you approach it, because the trailing line slows exactly where it would need to hold its pace. The buried assumption is that the growth rate is independent of the level, and for this technology the rate is a function of the level.
That's how Pluralis is addressing the communication bottleneck https://pluralis.ai/docs/agora-system/overview/
the science is detailed out in this paper presented at NeurIPS 23 https://arxiv.org/abs/2506.01260 "Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism"
The current Pluralis-8B run has an overhead of 1.6x over the centralized settings https://agora.pluralis.ai/