Aaron Steven White 0:01 All right. 0:02 Hi, I'm so happy to be here. 0:04 This, as Maria was saying, is so energizing, and it's like, I don't know, just the vibe is great. 0:11 So I am Aaron White. 0:14 I'm an associate professor at the University of Rochester. 0:18 I am in the Department of Linguistics. 0:20 I also have a cross-appointment in the Department of Computer Science. 0:27 So I'm actually coming from a somewhat similar background to Maria in the sense that I work in computational linguistics and natural language processing with a kind of main focus on extracting structured information from large document collections and doing things like summarization on top of it. 0:49 So what I'm gonna talk about today is, Chive, which is a platform for doing decentralized e-prints on Proto. 0:59 And by platform here, I mean a set of lexicons and an app view, of course, right? 1:04 So, you know, data sovereignty and discoverability have historically been opposed for scholarly work. 1:13 And this is sort of, you know, everyone is used to this in the context of traditional journal or conference-based publishing, where we still kind of We frequently give up our rights to control our scholarly products in perpetuity for the sake of reaching a wider audience and gaining endorsement based on our peers' expert reviews. 1:31 And those are, you know, those expert reviews are really, really important. 1:35 We need those kind of endorsement mechanisms. 1:39 But, you know, the question is whether we really need to give up the data sovereignty. 1:43 So it's also true that in the context of more modern methods of sharing work, You know, I, you know, I can put my papers, data, code, etc. 1:55 up on my website, over which I have full control, right? 1:59 But the only way people are going to discover them is if they know who I am and furthermore think to go to my website to check up on what I've been up to. 2:09 So often we turn to platforms like arXiv or more bespoke field-specific repositories that are nonetheless that's still not really owned by us to enable discovery. 2:20 So I think @proto really helps us resolve this tension. 2:24 Sorry, this is giving a little bit of feedback. 2:27 By enabling discoverability without the loss of data sovereignty. 2:32 And so, you know, I think what we can do is really retain the benefits of the sort of full control we have over our scholarly websites, in the form of our PDSs, but also support the kind of rich metadata and interop that @proto supports. 2:52 And so I think the real challenge here is in designing a metadata system that's flexible enough for arbitrary scholarly products. 3:03 And really the tension I think we have to resolve is that we want rich metadata systems to support discoverability, but the metadata needs to be generic enough that any scholarly field could use it, right? 3:16 Like, I need to, as a computational linguist, I need to be able to use this, but like, a cell biologist also needs to be able to use this. 3:24 And part and parcel of this kind of genericity is that the metadata system needs to be able to evolve as fields split and merge, and as new entities and concepts are discovered in indisparate fields, right? 3:38 So even if we could, you know, fully engineer a system for every kind of existing field, which I actually don't think is possible or the right way to go, it's still gonna go stale like immediately. 3:55 And so the tension that Chive aims to resolve is this one. 4:03 So the idea is really to develop a combination of lexicons and an app view that supports discoverability while also being extensible through a community governance system. 4:15 So the kind of backbone of this approach is a self-describing knowledge graph. 4:21 I'm not really going to talk about the self-description aspects of the knowledge graph too much, but I'm happy to if people are interested. 4:31 This self-describing knowledge graph itself lives in a community editable PDS that the app view makes it straightforward to clone into your own PDS. 4:42 You can clone pieces of the graph via the app view into your PDS or the entire graph if you want. 4:50 I'm not going to show the actual cloning feature. 4:53 I can if people are interested in a question period or after. 4:59 But there's various ways of kind of navigating neighborhoods in the graph and picking radii from nodes and doing cloning and things like that. 5:11 So the graph has nodes both for every metadata value that the system works with and —this is the important part—every metadata field that takes on those values. 5:26 And this is sort of supported by the self-description aspects of the graph. 5:31 So what this means in practice is that the community can expand the set of values that some field takes on. 5:39 So for instance, one use case is, say, deleting or merging or splitting academic fields and then connecting them to each other various via various kinds of relations. 5:53 But it can also expand the dimensions along which scholarly products are themselves categorized. 6:00 And so this is done, like I said, all by a governance system that is kind of Wikipedia style that allows for both node proposals. 6:10 So I'm showing you a case of a node proposal here that can introduce arbitrary node types. 6:15 As well as edge proposals that can link existing nodes through typed edges. 6:22 The types for the edges, like everything in Chive, are all just themselves nodes. 6:28 So those are also expandable. 6:30 We basically— the whole system is set up in such a way that I think there are maybe 3 hardcoded enums. 6:40 The— anything that you would normally have as an enum, your little dropdowns that are like, here are the edge types, or your dropdowns that are, here are the node types. 6:48 Those are all nodes in the graph and they can be extensible, they're community extensible. 6:55 The extensible, this like knowledge graph is related to Scholarly products in sort of two main ways. 7:03 And this is the kind of how we support discoverability. 7:07 One is rich text and the other is annotations. 7:11 So, rich text is basically just an enriched version of the rich text you're familiar with from things like Bluesky. 7:18 But in addition to allowing things like user mentions and hashtags and like kind of all the things you're used to, we can additionally refer to knowledge graph nodes such as fields, but additionally like entities or events or organizations or all kinds of things. 7:36 This is going to become relevant later when I talk about the collection system. 7:42 And it also supports arbitrary Markdown, including code blocks with highlighter and kind of the standard set of highlighters, and inline and display LaTeX math. 7:57 Rich text is available in most places that provide textual inputs, including reviews and even titles. 8:04 So I'm not really talking too much about the review system here, but this is showing you kind of like what it looks like. 8:12 These are fully threaded and kind of stand the standard way that you would want them to be threaded. 8:17 And like I said, they support rich text, so you can actually do kind of discoverability on reviews and what reviews are referring to, etc. 8:28 Right, so that's rich text. 8:29 Annotations are the other way that we connect a product to the Knowledge Graph by linking parts of that product, in this case just text spans with either a line-level comment that can itself contain rich text. 8:46 So that's the kind of left thing here. 8:48 Or by providing a link to a span in the knowledge graph or actually various kinds of external knowledge graphs like Wikidata. 8:57 So this is showing a case where we're performing a link to Wikidata. 9:03 And there are a variety of other knowledge graphs that are supported for various parts of the system. 9:10 So for instance, for organizations, we support linking to Roar, among other things. 9:19 I'm currently also thinking about ways to integrate richer annotation structures from the layers lexicons. 9:25 So stay tuned for that. 9:26 And I'd love to talk with anyone who's interested in linguistic annotation about layers. 9:32 There's docs up and you can follow layers at layers.pub. 9:38 And so there are a few ways that we actually go about using these links to support discovery. 9:44 And you can think of all of these as sort of different ways of querying against the knowledge graph and its links out to papers. 9:51 So the most basic case of this is that the app view automatically populates a trending page with fields you publish in. 9:58 And this trending page is computed directly from the fields associated with the researcher's papers. 10:03 Okay, I will finish. 10:05 It also shows a separate set of followed fields that can be added in the profile settings. 10:12 The most basic case of this is that the app view automatically— yeah, so is— you can also do followed fields. 10:21 That's kind of the idea. 10:23 A more controlled way to use the Knowledge Graph is using faceted search. 10:27 So this allows you to pick selected values from a set of facets. 10:31 These are themselves all nodes. 10:33 They can be extendable by the community. 10:35 You can say exactly what values they take on by hooking the facets to the values they take on via edges. 10:45 And then the final way is via the collections feature, which we're working on some Sembler interop for. 10:53 But basically, This allows you to put arbitrary Knowledge Graph nodes, authors, institutions, entities, events, etc., into a collection and then follow all of the activity related with those nodes. 11:06 So for instance, if you've got a bunch of authors in there, you'll see all of the papers that those authors produced. 11:11 If you've got institutions, you'll see all of the papers that any author from that is— that associated with that institution in the Knowledge Graph produced. 11:19 If you have entities in there, like coming from Wikidata, you'll see anything like, I don't know, you know, some concept that comes from Wikidata that we pulled into the knowledge graph, you'll see anything that mentions that, including reviews. 11:34 So reviews will show up in these feeds alongside papers, etc. 11:39 Yeah, so I will finish up with that. 11:44 So Chive is currently in Clopin alpha, so I just I— so you can go to actually staging.chive.pub and that will let you in. 11:58 So the open alpha is staged at the moment. 12:02 The actual chive.pub site is still closed. 12:05 There's still kind of an applicate— there's still an alpha gate on it, but I'm just working out a few little kinks and then it will be fully open, mainly around kind of integrated bug reporting and things like that. 12:18 But yeah, please try it out. 12:20 Thanks. 12:21 One question. 12:22 We have time for one question. 12:24 Yeah, so I'll ask a question. 12:27 I was curious just like if everything you showed is on on Lexicon, like even the annotations, for example? 12:43 Are they like— they're Lexicon? 12:44 Yeah, everything's a Lexicon. 12:45 And all of it is open source. 12:47 So you can see how the Lexicons are structured. 12:51 All the xRPC methods are up there. 12:53 There's a massive number of them, though I have stripped some of them out now that the alpha gate is coming down. 13:00 But yeah, it's all up there. 13:01 Amazing. 13:02 Yeah, interop, definitely, because there's more annotations, Lexicons out there. 13:05 So yeah, we'll talk about that later. 13:07 Cool. 13:07 Cool. 13:08 Thank you very much, Aaron. 13:10 Thanks.