Succinct references

Saving is, for the most part, a prerequisite for sharing and search. A thing cannot be distributed nor retrieved in the absence of a representation. Yet, these representations must be succinct to enable effective sharing and search.

In On Exactitude in Science, Jorge Luis Borges describes the absurdity of perfect, one-for-one, representation. The full fiction:

…In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast Map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.

Imagine trying to share such an infinitely precise map; it wouldn't work. What does work: saving succinct references to actual things. The consequence? We promiscuously share and search. But what happens when we save something we discover online today? What exactly is it that gets saved?

This is an important question for us—saving is the first part of the save-share-search loop that we're in the process of modernising. So we've been formulating an answer. What gets saved can be broken down into four parent categories:

  • Basic info: core identifying elements of the thing

  • Metadata: descriptive data about the thing's creation and capture

  • Contextualising info: general and user-specific data that adds meaning

  • Technical data: system-level info, item properties and constraints

Within each of these four parent categories are sub-categories of information that make up the body of a succinct reference.

Within the basic info category, the following is saved:

  • URL / CID: a mostly machine readable identifier for the thing

  • Title / name: a short string of natural language

  • Description: a brief summary of the thing and/or its purpose

  • Language: the primary language in which the thing is written or presented

Within the metadata category, the following is saved:

  • Timestamp: the datetime of when the thing was saved

  • Source: where the thing was found (e.g. a website or app)

  • User: the entity or account that saved the thing

  • Device: the hardware or platform used to save the thing

Within the contextualising info category, the following is saved:

  • Ontology: annotations that describe what the thing fundamentally is

  • Semantics: annotations that allocate meaning to the thing

  • Relationships: connections to other things (e.g. items, people or projects)

  • Actions: intended uses or next steps associated with the thing

Within the technical data category, the following is saved:

  • Provenance: the thing's origin and history

  • Versioning: the current state plus past usage and modifications made

  • Attributes: observable specifications and properties derived via analysis

  • Constraints: limitations governing the thing's use or access

Enumerated like this, it seems as if a lot of information gets captured when we save a thing. But it isn't that much, really:

  • An arbitrary, multi-thousand word essay on the web—ignoring image content, formatting, or any of the enveloping elements of the browser that deliver it—equates to 10-20,000 characters, or approximately 10-20 KB of text

  • A relatively minimal set of information about a saved thing using the schema above, in contrast, is going to equate to 1-2,000 characters, or 1-2 KB of text

Of course, both the essay and the saved info themselves contain succinct references to other things (e.g. prior ideas for the essay, compliance frameworks for the saved thing data). They're part of a greater, deeper, civilisation-wide web. But I suspect you see the point.

We save succinct references to found things, and the elements of those succinct references are what enables us to share things with our peers and search across distributed networks for both new and old things.

When sharing, we can quickly identify relevant content using the basic info and metadata, while the contextualising info provides rich background for why the content matters. For instance, when sharing a research paper, a user can easily explain its relevance using ontology and semantics and note any intended actions.

When searching, these elements enable powerful, context-aware queries. Users can search not just by keywords, but by concepts (using ontology and semantics), intended use (via stated actions), or technical specifications. This allows for precise retrieval of saved items, such as finding a specific algorithm based on its conceptual relevance, intended application, or implementation characteristics.

Ultimately, the saving of succinct references transforms sharing and searching from simple content-matching tasks to rich, context-driven processes. Yet, despite its fundamental role in our digital lives, something as simple as saving is not a solved problem. There's little consensus on what gets saved, how it's stored, or the actions that are available to different parties downstream.

We're aiming to change this. Asking "What gets saved?" is just a small step in a longer journey. One where more contacts that leave traces are saved and where more things that are saved are shareable and searchable.

Subset logo
Subscribe to Subset and never miss a post.