• 1 Post
  • 95 Comments
Joined 2 years ago
cake
Cake day: June 6th, 2023

help-circle














  • Not somebody who knows a lot about this stuff, as I’m a bit of an AI Luddite, but I know just enough to answer this!

    “Tokens” are essentially just a unit of work – instead of interacting directly with the user’s input, the model first “tokenizes” the user’s input, simplifying it down into a unit which the actual ML model can process more efficiently. The model then spits out a token or series of tokens as a response, which are then expanded back into text or whatever the output of the model is.

    I think tokens are used because most models use them, and use them in a similar way, so they’re the lowest-level common unit of work where you can compare across devices and models.


  • Agreed! I’m just not sure TOPS is the right metric for a CPU, due to how different the CPU data pipeline is than a GPU. Bubbly/clear instruction streams are one thing, but the majority type of instruction in a calculation also effects how many instructions can be run on each clock cycle pretty significantly, whereas in matrix-optimized silicon its a lot more fair to generalize over a bulk workload.

    Generally, I think its fundamentally challenging to generate a generally applicable single number to represent CPU performance across different workloads.