Yeah, we built a custom chat bot that uses the GPT API in conjunction with our content (docs, forum, etc.) to give hopefully good answers. It's a custom retrieval augmented generation system. We could plug any LLM into it.
As for your concern, yeah, the GPU has to unpack those bytes back into floats, depending on your shader of course. However, there's specialized circuitry for exactly that on the die, as packed colors is a pretty common thing. For good reason: we have to submit new vertex buffers to the GPU each frame. Sending any kind of data to the GPU is usually the bottle neck, so we try to save as much space as possible.
You'll find that many other APIs that abstract the GPU, actually use packed colors. For similar reasons.