Tesla should have a stake in x.ai - here is why
Ultimately both companies are working on the same problem
“In order to succeed in robotics, a company must have three capabilities: Dexterity, AI and scale.” Elon Musk
In this X-Post Elon Musk challenges the Grok team at X.ai to develop Grok 5 to a level where it can play the video game League of Legends in such a way that it can play like a human.
That means
Can only look at the monitor with a camera, seeing no more than what a person with 20/20 vision would see
Reaction latency a click rate no faster than human
According to Shen Zhouran, an X.ai employee this requires the following capabilities:
recognize a computer interface from video stream, without APIs
reason with complexity in tight time limits
executer actions on a computer without APIs
Do all this in <150ms
Why is this a big deal?
Perception
Grok will be able to read camera streams, parse out information, remember things and locate exact pixels when action is needed
2. Speed
High throughput of actions. This is particularly important for RL, you can speedup the learning process
3. Reasoning
High speed reasoning
Long context window, coherence
Reasoning under uncertainty
Implication
Read and understand vision, no API
Navigate computer interface without API
Reason and plan under high speed constraint, uncertainty and stay coherent
Musk’s challenge isn’t just a gaming stunt. It has ripple effects for Tesla and the broader AI ecosystem.
real time visual understand in unstructured settings
strategic planning in real time from pixel to action. For example, Optimus sees a loose screw and reasons that this should be fixed. No explicit knowledge given, just reason based on common sense in a factory.
Execution can be generalized, not just mouse clicks, but all kinds of dexterous tasks
The tactical knowledge would be useful to FSD when reasoning in unstructured, complex environments. (FSD sees large parked on the street corner and reasons that vision is obscured. Infers action based on obscured vision, such as nudging forward slowly until it sees more).
Fog of war. Hidden elements. Reason under uncertainty, keep coherence
Talent
Economics
X.ai is the brain and Tesla builds the body and the dexterity.

Humans don't need APIs to navigate the environment. Neither do robots. The question is whether vision is enough.
This article comes at the perfect time, and your analysis is realy insightful. I totally agree this has massive implications for AI, but how realistic is 'no API' for human-level play?