Understory
Feed
Map
Sign in with your Atmosphere Account
Everything Everywhere All at Once - an impromtu session
20 min
Your browser does not support video playback.
Speaker A
0:00
modify GitHub to track things I care about.
0:02
I just get what the boffins in San Francisco think I should have.
0:06
And what they think I should have is more LLMs, which honestly is fine.
0:10
But my point is that it's not a place where I can make decisions about the product that I'm using.
0:15
And so the vision here with Cambria was every team should be able to make decisions about the software and to change it and customize it to their needs without losing the ability to correspond and even collaborate in real time across versions of the software.
Speaker B
0:31
Right?
Speaker A
0:32
So that's the vision behind the lenses.
0:33
We did this in 2020.
0:35
We wrote some papers about the problems.
0:37
Because we're a weird research lab, we were like, great, we know how this could be.
0:44
We built a pretty nice prototype.
0:45
It's actually running in the website.
0:47
If you go to inkandswitch.com/cambria, the code is all live.
0:50
And then we were like, cool, we know how to do this.
0:53
I hope somebody does.
0:55
And that was 5 years ago, 6 years ago now.
0:56
And that was the last time we touched it.
0:58
So that's where we were.
1:00
And I think the story picks up from here.
1:03
OK.
Speaker C
1:06
So this little text demo— what Peter was saying— so about a month ago when we were talking about all this standard site and all the Markdown stuff and lexicons, I kind of revived some of this idea.
1:25
All of these formats are defined in terms of a thing that I called relational text, which is best understood as like, it's not a format, it's a meta-format that is the union of all text formats that exist on Earth and might exist on Earth.
1:43
And so I've defined basically lenses into each of the formats and out of— well, I mean, they're invertible, so there's just one lens.
1:50
But anyways.
1:53
So all of those, there's just one lens defined per format into relational text.
2:00
And then you can invert that.
2:02
And so you get conversion between all of these different formats in like a clean path.
2:09
So I've been talking to various people about this.
2:13
And one of the people, I saw Aaron's chive.pub and layers.pub, and I'm like, I need to talk to Aaron because he's working on some interesting related stuff.
2:23
And I'll just sort of character— we had a 30-minute impromptu chat.
2:29
And about 15 minutes through the chat, Aaron was in a different planet.
2:32
And he was just like, I think there's a general way to solve this.
2:37
Because I had built something really specific to text formats based on the App Proto facet.
2:44
Lexicons.
2:45
And because of working with Peter and everything, I had a sense that this— like, that there was a way to approach this in general for lexicons.
2:54
But I know a lot about text formats.
2:56
And I'm like, I'm going to solve this piece.
2:59
And then 12 hours later, Aaron dropped the most insane stuff that I've ever seen.
3:05
And so yeah, please come up and— yeah.
Speaker B
3:10
Thanks.
3:11
Yeah, so I just for background, I work a lot with text annotation actually.
3:17
That's why, where the layers.pub stuff was coming from.
3:23
And a big interest of mine is very similar to this.
3:27
Like I want to be able to kind of represent all possible annotations on text that we tend to use.
3:34
So I want to be able to talk about links out to external knowledge graphs.
3:41
I want to be able to talk about links to video bounding boxes over frames.
3:49
I want to be able to talk about links out to neural recording data.
3:57
So I work with a variety of people in our med center.
4:00
I'm at the University of Rochester.
4:02
I'm an associate professor.
4:03
Professor there.
4:05
And so this is where kind of the layers.pub stuff came from.
4:10
Just kind of as background.
4:13
I also kind of work— I tend to work with dependent type theories quite a bit.
4:21
And so I sort of was noticing some similarities basically with some of the stuff that I know from that literature.
4:32
And so specifically, it seemed to me that the lens stuff could be generalized out and sort of abstracted another level up effectively.
4:47
So where we're thinking about working with, say, JSON schemas and kind of mapping between those kinds of JSON schemas, we'll specify a particular lens on the JSON schema.
5:02
But what if we want to be able to do, you know, a mapping out to a SQL database or something like that?
5:09
And so— and what if we want to be able to kind of specify lenses in a generic way, right?
5:17
Specify these transformations in sort of declaratively and then What if we could sort of compile that declaration down into one of these sorts of mappings in such a way that we're not sort of— we're not restricted to a particular schema, but we're also not super restricted in the nature of the data that we're working with.
5:47
So it seems like also we want something that allows us to do transformation on recursively defined data types, which is super important in the context of something like— of App Proto in particular, since we can define such data types.
6:04
But we also, of course, need to be able to handle things like tables, not just trees.
6:09
And so the upshot is that there's this nice generalization of these sorts of declarative transformations that is relatively tractable, that doesn't put you fully into super fancy, you know, dependent type theory math stuff, but that still gets you kind of the layer of abstraction up that you want, which comes from this literature on what's called generalized algebraic theories, where Roughly, these are related to— so if we want to specify associativity laws, right, all we really need are universal quantifiers.
7:00
We don't need kind of the full power of, say, first-order logic.
7:04
These are kind of that to dependent type theories.
7:07
I don't know if this is going to like ring true for anyone, but they're basically kind of abstract, as abstract as we might want to go in the sense that they handle a very large variety of types of schema languages, right?
7:24
So, you know, App Proto, JSON Schema, SQL, and we can kind of, we can always kind of compile down to the schema itself.
7:39
Where the basic idea is that— so these are often called theories, and the kind of lower-level— the schemas themselves are thought of as models of these theories.
7:50
So they're basically things that conform to the constraints of the schema language in some sense.
7:59
Where the schema languages kind of have three parts.
8:03
Basic— they have what are called sorts.
8:05
They have operations, which are basically relations on the sorts, and then they have equations, which basically specify the relevant constraints.
8:12
That then determines kind of what a schema is allowed to look like.
8:17
In addition, the schema can be thought of as kind of determining what the data is allowed to look like.
8:23
So you can kind of think of the, the set of all possible pieces of data as kind of satisfying the relevant constraints of the schema.
8:34
The upshot of that is that you, you can define mappings between schema theories, so like mappings between App Proto and SQL, and then kind of lower them down in a programmatic way to mappings between schema specified in App Proto and schema specified in SQL, which then can be kind of lowered in very much the same way down to data specified in that @proto schema and data, data is kind of specified in that particular kind of SQL schema.
9:20
Yeah.
9:20
And so that's, I think, like the high-level idea.
9:25
The next relevant thing is that at least some of this can be done in an automated way by basically looking for structural similarities between the schemas.
9:39
That's kind of the basic idea.
9:42
Often that's not gonna get you all the way there, right?
9:45
Like you need to have— you will often need to nudge to say, well, yeah, I know that there's like a a few different ways that this— that I could do this data transformation.
9:57
So like what I need you to do— what I need to do is add a constraint that says, no, you need to do the data transformation in this particular way.
10:05
So there is the ability to kind of like give some semantic information to the transformations that is not kind of implied by structural matching.
10:19
This is done in a sort of ML-like expression language within the particular library that is underpinning some of this.
10:28
Yeah.
10:30
Okay.
Speaker C
10:31
There you go.
10:32
So the thing that Aaron built is called Panproto.
10:36
It's at panproto.dev.
10:38
And I think like this is a— since working on things around Cambria and some of the ideas there and some of this text transformation stuff, this is hugely exciting because it means that— so Nick's demo is entirely powered by Panproto.
11:04
The text transformation stuff is entirely powered by Panproto.
11:10
It's just defined in terms of— all this stuff is defined in terms of lenses.
11:14
And there's a bunch of really nice properties that these lenses have.
11:18
So as we come down to like troglodyte programmer level, one of the big challenges with lexicons is that we spend a lot of time as software engineers discussing what the shape of some JSON thing should be so that we can coordinate.
11:41
And it's a lot, like it's a lot of time.
11:43
It's a lot of energy.
11:44
And it's actually to go back to my, you know, global minima case.
11:48
It actually means that we build less expressive software and we build, you know, we're not using folksonomies and like situated software.
11:58
We're building to these standards defined by standards bodies.
12:04
And I think by having these translation layers, we can just dispel with all of that and say, I'm just going to build my schema.
12:12
And then at the point when I want— so I can diverge.
12:14
And then when I want to reconverge with the network, I just build a lens to reconverge, essentially.
12:24
Now, there's a piece of the lenses that's really interesting.
12:28
If you hold the complement— so when you lens from one record to another, You can tell which stuff wasn't translated in that conversion, and you can actually compute it automatically by doing a round-trip lensing.
12:45
So, you know, if you go from A to B and then back from B to A, the stuff that's missing in the A and A prime— between the diff between A and A prime is called the complement.
12:59
And if you minimize the complement, then you know the best path through the network, and these things are composable.
13:10
And so it's inductive.
13:13
It's not necessarily a proof, but like you can actually get quite close to knowing that you have faithfully translated all of this data.
13:21
And so you can kind of automate a lot of these translation pieces.
13:27
And I think it's just gonna massively reduce the coordination costs that we pay and massively increase the sort of expressivity.
13:37
So I've managed to break a thing that I wanted to show.
13:41
I built a thing called Panna Cotta, which is a recipe site.
13:47
It has 100-some-odd thousand recipes pulled from Wikibooks, from the Cooklang community, and from the— there's another online cookbook repository.
14:02
And basically, I built it in the last day and a half.
14:06
It lenses in WikiText, Cooklang, some other formats.
14:12
The renderer is— it only renders one lexicon.
14:16
That is the org.panna cotta lexicon that I've defined based on my needs.
14:23
And then it just lenses in recipes.
14:25
And it's honestly the most expressive recipe interface that I have seen.
14:34
And I think there's just so much opportunity here.
14:38
So I think that's everything.
14:42
Peter?
14:43
Yeah.
Speaker A
14:46
So I hope and imagine some people here who fought The standards wars over schemas in the past are like, you cowards.
15:00
You know?
15:00
Like, there is a right schema and we just need to find it.
15:04
And you people are giving up the good fight and just abandoning the field.
15:09
And I think my conclusion— personal opinion here— but in my experience, what I would offer as a maxim is that any schema that is useful to you will not be general enough to apply to anyone else.
15:30
And the, the converse is also true, which is that any schema that is general enough to apply to everyone will not be useful to you.
15:38
And I think recipes are a great example of this, where like the kind of schemas you need to represent a recipe if you're building a nutritional information site, you know, and you want to do you know, calorie counting and these kinds of things are radically different than the kinds of things you want if you're just trying to transcribe someone's, like, you know, recipe box index cards for sharing with their family.
15:58
And, you know, that's true in every field and in every domain.
16:01
And I'm not saying there aren't values in microformats like detecting phone numbers and, you know, handles and websites and things.
16:08
But as a generalization at larger scales, I believe this is true.
16:11
And I think this is the only real path forward is to abandon the project of trying to get everybody to agree on a schema and to recognize that there is a panoply of schemas out there, and the right schema for your app depends on the kind of things you care about.
16:28
And so it's really about recognizing that.
16:30
And the very exciting thing about this pan-proto style approach, particularly this idea of the complement, is that if you're bringing in data, a static analysis of the comparative schemas will tell you what data that schema you're bringing in lacks that you expect expect in a principled way where you can then either prompt the user or the system or compensate in other ways.
16:53
And you're not just YOLOing, you know, one JSON into another.
16:56
You can actually say, I want this nutritional information.
17:01
This particular data format doesn't provide it.
17:04
And so I will need to go and augment this data from some other source or throw an error or prompt the user or otherwise deal with it.
17:10
And I think that that is actually incredibly exciting and builds on the work that we'd done in our earlier kind of like real-time editor space.
17:17
So I'm very excited about this and I think it's the right path forward.
Speaker C
17:23
Actually, I'll just, just to build on that, I think there's a really important piece here.
17:29
This doesn't, this doesn't obviate the need for consensus building, right?
17:37
It just moves it from arguing about the shape of some JSON to collaborating and figuring out what's the thing at the social level that we're actually trying to build.
17:49
So yeah.
Speaker A
17:51
My comment is exactly on that.
Speaker C
17:55
Regardless of how we got to the point, right now the minimum amount of content needed to be displayed in the atmosphere for most consumers is a created-at date.
18:06
And a text field.
18:08
So this means that all of the other types of content that exist, whether it's markdown—
Speaker A
18:14
God forbid— faceted text, rich text, composed chunks of blocks of content across any type of lexicon, whether it's an event or a recipe or something, all of that now can be distilled into a universal view.
18:31
And when you even take outside of the concept of just lexicons being used, this is RSS and Atom feeds, other types of structured data that can be decomposed, you know, with libraries like this, specifically Panproto.
18:46
And it allows us to think more universally about that, that view, that interface and shape that we all want to present to the user and is more user-focused and more— I'm gonna say more human-focused because we're really trying to convey meaningful information.
19:02
It also means that agentic tooling that allows us to take, you know, some of this content.
19:08
It's easier to adapt it in whatever way you, you know, feel most comfortable or is determined to be the best or most optimal for an agent to start using.
19:20
That Lexicon Garden transmogrify function is also available agentically.
Speaker C
19:24
So now within tooling, you can, you know, discover lexicons through XRPC calls.
19:30
You can create new ones, you can convert dynamic content and create examples.
19:36
And you can even transcode, transcribe, whatever trans word we'd like to use to migrate this content into different views and then eventually write that.
19:47
And I think that's really powerful.
19:49
Something that Brian mentioned a while ago was this concept of an internal