If Paddington Bear was an MLE...
Paddington eyed the tray hungrily. There was half a grapefruit in a bowl, a plate of bacon and eggs, some toast, and a whole pot of marmalade, not to mention a large cup of tea. “Is all that for me?” he exclaimed. - A Bear Called Paddington, by Michael Bond
I developed marmalade after building multiple retrieval and recommendation systems with text embeddings. It can be difficult to internally visualize and communicate to others how a given passage of text will be translated into tokens, especially when context limits mean either naive truncation or more involved chunking approaches.

My aim is to make a web utility tool for developers, engineers and product owners building features around text emebeddings. Think Regex101, Carbon, or Favicon. Here are few suggested uses cases:
- Select a sample text and configure the tokenizer and chunking strategies to understand the mechanics.
- Provide your own text (just copy/paste) and see how it “looks” tokenized and chunked based on your chosen settings.
- Debug your semantic search application whenever you aren’t getting the results you expect
A key feature of the implementation is that it is completely client-side, without any backend services. So users can safely enter their own text without any privacy concerns.
The metaphor
As for the marmalade metaphor and the overall design of the app, it is in reference to Paddington Bear, who is known for his love of marmalade sandwiches and those sweet “sticky chunks”. Also, he’s got “padding” right in his name!

I first developed this metaphor while building a demo notebook for teammates working on an Extractive Q&A Extractive Q&A specifies an explicit answer to a question within a source text, rather than generating or summarizing an answer. feature. Later, while leading the development of an internal, domain specific semantic search application, I found I often needed to visualize to others how a certain text passage would “look” to a tokenizer. Also, my kids have always loved the Paddington stories so why not make something they would like too.
It is also a funny artifact that the terms “marmalade” and “paddington” both require multiple tokens for many tokenizers, such as classic BERT-based models.
E.g., “marmalade” is tokenized as "mar" · "mal" · "ade"