Challenged RLC-AI performance claims before NVidia/Humain use
Situation
Peter personally interrogated the RLC-AI 9-10% performance advantage claim by going directly to Damen Knight (engineer who ran benchmarks) and Max Spevack. Discovered gains largely disappear when benchmarking code is properly optimized (uses torch.compile, etc.). Then asked Damen to evaluate whether Brian's marketing write-up is accurate or misleading: 'makes it sound awesome instead of pointless for data center deployments.'
Reasoning
RLC Pro AI GA targets Feb 25 and the board meeting is today -- performance claims will be part of the narrative to both audiences. Google explicitly wants RLC-AI benchmarks, so if claims don't hold up CIQ's credibility with a major customer is at stake. Peter values accuracy over hype -- the 9-10% number only applies to unoptimized code and disappears on real workloads. By going straight to the engineer who did the work, he bypassed telephone-game distortion. The framing ('makes it sound awesome instead of pointless') shows he's trying to make messaging defensible, not kill it.
Additional Context
Damen's benchmarks ran on A30 GPUs in his home lab. Gains come from unoptimized code paths; once torch.compile and vllm cuda graphs are used, RLC-AI is roughly equivalent to Ubuntu. Hardware matters significantly -- results vary across GPU types. Google roadmap meeting happened day before, where Google requested RLC-AI benchmarks. Board meeting happening today (Feb 24).
Observed Evidence
Peter initiated the investigation: 'I want to get a better understanding of the performance testing.' Damen confirmed: 'Later, when we optimised our benchmarking code, most of those gains disappeared.' Peter acknowledged: 'Yup. And lots of places where Ubuntu outperforms us.' Then asked Damen to audit Brian's write-up. Separately told Bjorn that Google wants RLC-AI benchmarks.
Confidence Breakdown
Reasoning Depth Analysis
Related Context
slack
I want to get a better understanding of the performance testing... where that 9-10% number comes from, what circumstances it applies to
slack
So the last bullet point, when we optimized our benchmarking code... we were equivalent, yes?
slack
Now that I understand your results I think his write up says the same thing, just makes it sound awesome instead of pointless for data center deployments.
slack
And they want RLC-AI benchmarks.
Outcome
No outcome recorded yet.
Decision ID: 549863b3-5361-433f-bb21-1d4e2429fb2e