How LLMs Work (basically)

9/16/2025By Friends & Fables

Franz (our AI Game master) can sometimes feel like magic. He can remember something from a long time ago, perform actions like generating characters and monsters, and write emotional narratives. All of that is made possible by LLMs (large language models), and the way we’ve composed them inside of Friends & Fables.

In this post, we’ll give you a simplified explanation of how Franz (and the LLMs that power him) actually work. We won’t be a perfectly technical breakdown (we’re going to use “words” instead of “tokens”, etc), as our goal is to help build intuition. We want you to understand enough about LLMs to see how they shape your adventures and so that you can get the most out of Franz!

LLMs are just predicting the next words

The way LLMs work is they take some existing text, and then they just try to predict the most likely words that come after. In the example above, you can see that given “The wizard casts”, the LLM ranks “fireball” as more likely than “a ritual”.

Now, if we change the last word to “drinks” instead of “casts”, you can see that “a potion” is now the most likely phrase. “The wizard drinks fireball” is probably less likely than “The wizard drinks a potion”! This next word prediction is the core building block of how LLMs work!

How it works in chat apps

When you talk to Franz, ChatGPT, or any other LLM powered chat service, what’s happening is just more next word prediction, but just repeating over and over and over until it writes a whole message. You can see in this next example that when the user sends a new message, everything in the white box is the “context” and then the LLM just has to predict the most likely response. Since the user is talking about Italian foods and asking for a haiku, the likelihood that the LLM generates Italian food related text in a Haiku is much higher than it saying anything else.

Memory / Context

When you send a message to an LLM, it doesn’t really have “memory” like your computer has. You can think of an LLM like a program. You put some text in, and you get some text out. Each time you want to get text out, you have to reconstruct the text that goes in.

Now, that doesn’t mean that LLM-powered apps don’t have memory. We can just use normal computer memory to save the chat history, and any time someone sends a message, we can just get the chat history from the computer’s memory and send it to the LLM as the context.

Whether or not an LLM “remembers” something depends on whether or not the application you are using has included that information in context and sent it to the LLM.

Cost

The way that most LLM providers (OpenAI, Google, Anthropic, etc) charge for LLM usage is by the amount of text you send in the context window. The more text you send as context, the more expensive each response is.

Imagine you’ve had a weeks long conversation with an LLM and your chat history is now the length of a book. If you sent the entire chat history as context every single time you wanted a response, it could cost a lot of money for each response. If you sent only the last 5-10 messages in the chat history, it would be much much cheaper, but the LLM will have “forgotten” about anything you said before.

That’s why most LLM powered applications will use a variety of techniques to summarize and condense information over time, so that costs don’t balloon out of control, but it still feels like the LLM is “remembering” things.

How this applies to Franz

So now that you know how LLMs work, what’s the big takeaway? How is this going to help you get more out of Franz?

The big idea is that what you put in is what you get out.

The quality of your input will greatly affect the quality of your output. In Friends & Fables, we try to automatically do as much of this management for you as possible. It’s why Franz creates long term memories, rewrites his plot & plan, and performs research steps on every turn. Since Franz 2.0, we’ve made the context as transparent as we can, so you can see what goes in as context and have the chance to edit any inaccuracies.

If your context is full of irrelevant information or confusing instructions that don’t apply to the current scene, the quality of Franz’s response might be worse. If you keep your context relevant, the quality of Franz’s response should be better.

If you want to learn more about how Franz works, check out the next post: “How Franz Works”