We are about to split into two civilizations: those who own their intelligence, and those who rent it.

A 70B parameter model running on a 128GB Apple laptop is likely sufficient for continuously-learning human-level intelligence. A trillion-parameter superintelligence will never run on your local machine. Both of these things are true simultaneously, and the gap between them is not a temporary engineering problem waiting to be solved. It is a permanent feature of physics, and it will reshape society more profoundly than the internet did.

Here is why the 70B ceiling is higher than people think. The human brain has roughly 86 billion neurons. It does not grow new neurons when you learn something. It reweights existing connections. A static 70B model is a snapshot frozen at training time. A continuously learning 70B model is a living system doing exactly what your brain does: reshaping itself from experience, every day. The parameter count becomes a vessel that is constantly being reformed. Size stops being the variable. Temporal depth of adaptation becomes the variable. A 128GB M-series MacBook has unified memory shared across CPU, GPU, and Neural Engine at roughly 800 GB/s bandwidth. A 70B model in 4-bit quantization fits in about 38GB, leaving substantial room for context, memory buffers, and lightweight gradient updates. For the first time in history, the continuous learning loop can close locally, in real time, on a device you own.

Now for the hard ceiling at the top. A 1 trillion parameter model at aggressive 2-bit quantization requires roughly 250GB just to hold the weights, before activations, before the KV cache, before any actual compute happens. No consumer device in any foreseeable roadmap touches this. But memory size is not even the binding constraint. LLM inference is almost entirely limited by how fast you can stream weights from memory to compute units. A trillion-parameter forward pass requires moving trillions of values. Even at theoretical consumer memory bandwidth speeds, generating a single token takes seconds. Then there is heat. A laptop sustains 20 to 40 watts. Dense superintelligence inference requires hundreds of kilowatts and active liquid cooling. This is not an engineering gap closing over time. The requirements of the largest models are diverging from consumer hardware, not converging toward it.

What emerges is a permanent three-tier structure:

The separation is not just technical. It is political. Tier 2 democratizes human-level reasoning. Anyone with capable hardware gets a private, persistent, unkillable cognitive partner that knows their history and can never be revoked. Tier 3 concentrates superhuman reasoning in whoever controls the infrastructure. The most consequential design decisions of the next decade will not be about model architecture or benchmark scores. They will be about which capabilities live in which tier, and who gets to decide. That question is already being answered, mostly without public debate, mostly by the people who benefit most from keeping superintelligence behind a paywall and a terms-of-service agreement.