cheddar
·
2025-04-18
2024-01-07-01-SDBD.md
1---
2date: 2024-01-07
3title: Let's Create a Data Format
4series:
5 name: "SDBD: Creating a Data Format"
6 number: 1
7---
8I have a problem. I want to be able to transfer self-contained binary data with metadata through a variety of protocols with no knowledge of the binary data's format or the protocol being used for transfer.
9
10Or in other words, I want to be able to send files anywhere without losing the filename.
11
12That's a bit simpler than my actual goal, but I think this is a problem every software developer has considered at some point. We've all asked the question, "Why isn't the filename attached to the file?" or slightly more advanced, "Why isn't the file format attached to the file?"
13
14The answer isn't all that complicated.
15* Any file transfer protocol ever invented can pass the filename with the file
16* File extensions are Good Enough for identifying the file format
17* We have good tools for guessing the format if the filename is missing
18* As human beings we can use context to guess the format and "fix" the extension
19
20But I'm going to declare that Good Enough isn't good enough. Perhaps this is the metadata that is most useful for files, but it's not the only useful metadata. It also depends on the transfer protocol to preserve the metadata. What if I don't want to rely on a specific protocol?
21
22And so, knowing full well that this is likely to go nowhere and that solutions to this problem almost certainly already exist, I'm going to set out to create a new data format that encapsulates data and metadata into a single file.
23
24## The hard problem
25The first thing to do is give my new format a name. After some deliberation, I'm going to settle on Self-described Binary Document or SDBD for short. It's contains arbitrary binary data. It's a document and not a file because it could live anywhere. And the whole purpose is to let the document describe its own contents. Now that I've tackled [one of the hard problems](https://martinfowler.com/bliki/TwoHardThings.html), the rest should be easy.
26
27The next step is to talk about how we talk about the format. How is it structured and what concepts do we use to build it?