My Archivist

AI tooling isn't just leaking into communication infrastructure and patterns; it's flooding them. Well, we recently learned about a little project called Distill. In Hacking our way to better team meetings, Werner Vogels describes the value proposition. It's the sort of fundamental issue catalysing the flood as a whole:

The distraction of looking down to jot down notes or tapping away at the keyboard can make it hard to stay engaged in the conversation, as it forces us to make quick decisions about what details are important, and there’s always the risk of missing important details while trying to capture previous ones. Not to mention, when faced with back-to-back-to-back meetings, the challenge of summarizing and extracting important details from pages of notes is compounding – and when considered at a group level, there is significant individual and group time waste in modern business with these types of administrative overhead.
Faced with these problems on a daily basis, my team – a small tiger team I like to call OCTO (Office of the CTO) – saw an opportunity to use AI to augment our team meetings. They have developed a simple, and straightforward proof of concept for ourselves, that uses AWS services like Lambda, Transcribe, and Bedrock to transcribe and summarize our virtual team meetings. It allows us to gather notes from our meetings, but stay focused on the conversation itself, as the granular details of the discussion are automatically captured (it even creates a list of to-dos).

In its simplest form, Distill takes an audio file, transcribes it, passes it to an AI model for summarisation and returns the output. In more detail (as described by Vogel but itemised here for readability):

First, we upload an audio file of our meeting to an S3 bucket.
Then an S3 trigger notifies a Lambda function, which initiates the transcription process.
An Event Bridge rule is used to automatically invoke a second Lambda function when any Transcribe job beginning with summarizer- has a newly updated status of COMPLETED.
Once the transcription is complete, this Lambda function takes the transcript and sends it with an instruction prompt to Bedrock to create a summary.
In our case, we’re using Claude 3 Sonnet for inference, but you can adapt the code to use any model available to you in Bedrock.
When inference is complete, the summary of our meeting — including high-level takeaways and any to-dos — is stored back in our S3 bucket.

This evoked a speculative capability for Subset we've been considering. Something that eliminates the need to:

Look up referenced items, mid-communication, during a call
Send a follow-up containing links to discussed items after a session

It has two variants. The first provides a basic post-meeting recommendation. The second provides live recommendations via real-time audio. The requirements for this speculative Subset capability are straightforward.

At least two people in a digital meeting with live audio
The meeting is social and peer-to-peer, not the usual white collar meeting theatre
Subset has advanced enough to enable automatic querying of the things a person has saved and to facilitate pattern-based sharing with contacts
The participants in the call have curated and saved a rich set of items related to the themes of the conversation that is likely to unfold

With all that in place, let's sketch out our speculative capacities in practice.

The first variant is close to what Distill does. It takes a complete audio transcript of a meeting and transcribes it and eventually returns an output. In our case, however, the output is not a summary. Instead, the output is a collection of saved items. It includes any items explicitly referenced during the conversation. It also includes any items from a user's collection of saved things that are meaningfully close to the topics of the discussion. Essentially, the audio is a token stream used to query a collection of saved items for similarly meaningful things.

The second variant builds on the first. But instead of a single capture-query-response-compile-surface loop, there's many. The audio is converted to a query in real-time, as it's being captured. The results are immediately surfaced to the user who leverages them however they please—perhaps sharing them, perhaps using them to inform part of their own dialogue or parse the ideas of another, perhaps marking them for deeper investigation later.

The first variant can probably be done right now by snapping together existing components from current cloud and AI ecosystems. The second variant is a little more far-fetched. The responsiveness requires local storage, AI capabilities and processing, as well as some smart manoeuvring to interop successfully with the current state of different platforms, browsers and devices.

Both are eminently feasible examples of tools for third places that would reduce the toil associated with curation. Let's dub them MARC—an acronym for My Archivist, as well as shorthand for machine-readable cataloguing.

Matthew McDowell-Sweet

Commented 1 year ago

"The first variant takes a complete audio transcript of a meeting and transcribes it and eventually returns an output. In our case, however, the output is not a summary. Instead, the output is a collection of saved items. It includes any items explicitly referenced during the conversation. It also includes any items from a user's collection of saved things that are meaningfully close to the topics of the discussion. Essentially, the audio is a token stream used to query a collection of saved items for similarly meaningful things. The second builds on the first. But instead of a single capture-query-response-compile-surface loop, there's many. The audio is converted to a query in real-time, as it's being captured. The results are immediately surfaced to the user who leverages them however they please—perhaps sharing them, perhaps using them to inform part of their own dialogue or parse the ideas of another, perhaps marking them for deeper investigation later." https://paragraph.xyz/@subset/my-archivist

The distraction of looking down to jot down notes or tapping away at the keyboard can make it hard to stay engaged in the conversation, as it forces us to make quick decisions about what details are important, and there’s always the risk of missing important details while trying to capture previous ones. Not to mention, when faced with back-to-back-to-back meetings, the challenge of summarizing and extracting important details from pages of notes is compounding – and when considered at a group level, there is significant individual and group time waste in modern business with these types of administrative overhead.
Faced with these problems on a daily basis, my team – a small tiger team I like to call OCTO (Office of the CTO) – saw an opportunity to use AI to augment our team meetings. They have developed a simple, and straightforward proof of concept for ourselves, that uses AWS services like Lambda, Transcribe, and Bedrock to transcribe and summarize our virtual team meetings. It allows us to gather notes from our meetings, but stay focused on the conversation itself, as the granular details of the discussion are automatically captured (it even creates a list of to-dos).

First, we upload an audio file of our meeting to an S3 bucket.
Then an S3 trigger notifies a Lambda function, which initiates the transcription process.
An Event Bridge rule is used to automatically invoke a second Lambda function when any Transcribe job beginning with summarizer- has a newly updated status of COMPLETED.
Once the transcription is complete, this Lambda function takes the transcript and sends it with an instruction prompt to Bedrock to create a summary.
In our case, we’re using Claude 3 Sonnet for inference, but you can adapt the code to use any model available to you in Bedrock.
When inference is complete, the summary of our meeting — including high-level takeaways and any to-dos — is stored back in our S3 bucket.

This evoked a speculative capability for Subset we've been considering. Something that eliminates the need to:

Look up referenced items, mid-communication, during a call
Send a follow-up containing links to discussed items after a session

At least two people in a digital meeting with live audio
The meeting is social and peer-to-peer, not the usual white collar meeting theatre
Subset has advanced enough to enable automatic querying of the things a person has saved and to facilitate pattern-based sharing with contacts
The participants in the call have curated and saved a rich set of items related to the themes of the conversation that is likely to unfold

With all that in place, let's sketch out our speculative capacities in practice.

Matthew McDowell-Sweet

Commented 1 year ago