Rowan Cockett 0:01 So yeah, while Matt and Rohan are setting up here, I just want to introduce them because, yeah, it's not every day that you get to see people visiting from the future, but you're about to see that happen right now. 0:14 We're very fortunate to be visited by some researchers who are coming to tell us about the future of science publishing and communication from the future. 0:21 It's a special day today. 0:23 Rohan is the CEO and founder of CurvNote. 0:27 Where he leads the development of tools designed to free scientific communication from PDFs. 0:33 Rohan is also the founder of the Continuous Science Foundation, a nonprofit organization for the next generation of science communication. 0:41 CSF is developing standards to support a more continuous and composable model of science. 0:46 Rohan's also visiting his alma mater today. 0:48 He has a PhD from UBC in geophysics, computational geophysics. 0:53 And has received multiple awards for innovative research dissemination and open education resources, including Visible Geology, an interactive geoscience modeling application used by over 1 million students globally. 1:05 Matt Akamatsu is an assistant professor of biology at the University of Washington in Seattle. 1:10 He's also the co-founder of the Discourse Graphs Project, supported by the Chan Zuckerberg Initiative and Navigation Fund. 1:18 They're creating systems and software tools for modular research in the age of AI. 1:23 Matt is a recipient of the 2020 Porter Prize for Research Excellence from the American Society of Cell Biology, the K99 Pathway to Independence Award, and the Experimental Foundations Beyond the Journal Award. 1:36 Yeah, Matt received his PhD in biophysics at Yale, did postdoc at Berkeley, and started his lab in UW. 1:44 Take it away, guys. Matt Akamatsu 1:48 Excellent. 1:49 Thank you so much. 1:49 I'm, like, absolutely thrilled and excited to talk to you today about the movement towards modular open science, and some of the work that we're doing at Continuous Science Foundation and Curvenote towards these goals. 2:05 And so today, science is trapped in this sort of paper-shaped box. 2:09 We're communicating our discoveries using print-era concepts that constrain the way that we communicate, evaluate, and ultimately build towards scientific knowledge and progress. 2:22 But the science that's being produced has changed so massively with larger datasets, larger-scale collaborations, continuous sharing of computational notebooks and graphs. 2:34 The way that we communicate science has really not kept up to that change. 2:40 Our mission at the Continuous Science Foundation is to help move the open science movement from sort of the sole focus on reading and access, the open access movement, to a future that's more focused on utility and reuse, that's built on these same foundations of openness, and designed for modern discovery patterns, where the code, data, protocols, methods are available, they're interconnected, they're modular, they can be composed together through remixing and sort of importing somebody else's work. 3:18 And that really is the ethos of science, is building upon other people's work, is standing on the shoulders of giants. 3:28 There's an illusion published by Louis Albert Necker in 1832. 3:33 He was a crystallographer. 3:34 And this is the Necker cube. 3:36 And it's 12 lines on a page. 3:39 But if you stare at the bottom, the left sort of side pops out and it changes and it flips orientation. 3:45 If you look at the top, again, there's this perceptual reversal. 3:49 It has the same underlying data. 3:52 But different interpretations, different stories that you can tell about fitting those data points together. 4:00 And this, this is the way that I like to think about modular science. 4:04 Is we have these, these Lego bricks, these modules of science, the components, the figures, the equations, the datasets, the scripts, the visualization routines, those protocols, questions, claims, evidence, and they're in these different spaces. 4:18 They can live on the floor, scattered. 4:21 They can be sorted into buckets or assembled on a shelf as sort of a published paper. 4:29 And so the assembly into a paper, it tells a story. 4:33 It's that narrative that introduces a new concept. 4:36 It changes minds. 4:38 The components of scientific discourse are bundled together into that sort of narrative, that traditional way of publishing, this composed artifact, that paper, that story. 4:50 But our scientific ways, our sharing industry, and the standards today, they demand that you sort of glue that trophy together when you share it. 5:01 That 8-page paper with its 4 sort of paneled figures, a zip file supplement, it's all fused together. 5:09 And if you want to reuse a figure or a dataset, you have to sort of chisel out that figure. 5:14 You have to take a screenshot of it, copy that dataset, and as soon as you take it out of that piece of paper, that figure gets amnesia. 5:22 It sort of forgets all about its attribution, its trust, its licensing. 5:27 It loses its context. 5:28 It loses, like, that license and that credit behind it. 5:32 All of that rich metadata that sort of our scientific Commons are designed to support. 5:38 And so these, these were really the challenges that prompted the first wave of modular science. 5:44 Many people recognized that it's not enough to share an advertisement of your experiment. 5:49 You should share the data and the methods and the code underlying it. 5:55 And so the prescription in 2010 was for modular science was pretty simple. 6:01 You just break it into pieces and you put it into buckets. 6:05 And you put it in the appropriate repository for that component. 6:10 So protocols.io, you share your protocols. 6:12 Figures go to Figshare. 6:14 Software Heritage is the place that you share your code. 6:17 A specialized data repository for genomics or microscopy data. 6:21 And Zenodo can sort of take up all of the rest. 6:24 And so these, these are the buckets of modular science. 6:28 The digital repository is for archiving, tagging, organizing your data, protocols, code, and figures. 6:36 And they really are actually excellent buckets. 6:39 They exist fit for purpose for science. 6:42 But they're largely designed for after the work is done, when you're starting to organize it, package up, and put it in these places. 6:50 So tidying up your workspace, putting the blocks away. 6:54 And sort of pointing them from your glued-together trophy on a shelf. 7:00 And they're structured, they're organized, they strive to be these sort of tidy, curated spaces. 7:06 You fill out your intake form, specify your author's license description, your ORCID, your ROAR, your funders, all of these references, and it's really a very high bar to share work. 7:17 And if you have some visualization code or a Scratch script script that you really know is the secret of how your work hangs together, it's, it's like that, that the bar for sharing that is, is so high. 7:33 So all of those small things sort of get lost. 7:36 They don't fit together on the shelf with that glued-together trophy, and they don't really amount to enough to justify the investments to sort of formally request bucket status. 7:48 And that's the aggregation of all of these small components of sort of the workflows and scripts of science fitting together. 7:56 That's where the tacit knowledge of science lives, of all of those workflow scripts of saying, yeah, I've, I've seen that before. 8:04 I'll grab you the notebook or I'll grab you that script or I've seen that last year and that's a dead end. 8:10 And so these, these buckets are sort of both that very high bar and they're not fully aligned with science and with research, which is often much less structured than sort of the scientific method, like laid out on a piece of paper step by step. 8:27 That's how it works. 8:28 It's much more fluid, much more serendipitous. 8:31 You need to access all of the pieces to build new things, and you sort of dump the buckets out onto the floor, and this is the space to build. 8:40 To experiment, to support that sort of serendipity of science. 8:45 Maybe the equivalent of sort of leaving that petri dish out and Fleming discovering penicillin in 1928. 8:53 Or maybe you have like slightly better lab hygiene protocols and setups and software so that you don't step on the pieces of Lego on the floor, but sort of have that goal to access those pieces readily. 9:08 And build them and put them together so that you can pull data from someone's lab in Tokyo, pull a protocol from Germany, and remix them with your ideas to build something new, to learn something new, to discover something new. 9:23 And so these, these are the 3 states of scientific components and modularity. 9:30 They're on the floor, that chaos, that serendipity, that remixing. 9:33 In the buckets organized by types, protocols, data. 9:36 Repositories. 9:37 That's that first wave of modular science that was really tackling that problem. 9:42 Or they're on the shelf. 9:43 That composed artifact. 9:45 That paper. 9:46 That story. 9:47 And all of these are right. 9:50 They're correct. 9:51 They're like good ways that are fit for purpose for that task. 9:55 And they're sort of that different side of that NecroCube. 9:58 That illusion that flips and pops in and out for that right time that you need it. 10:04 For teaching or learning or discovering or building. 10:10 And so I think this is what the next wave of modular science is about. 10:17 It's not just sort of breaking it apart to share, which is sort of somewhat unnatural, especially if you have these sort of smaller artifacts that don't fit on a shelf, sort of behind that trophy. 10:30 Is thinking about those sort of aggregate components that are the connective space of science. 10:36 And sharing those— there's such a high bar for sharing those in this sort of first wave. 10:43 And it ends in this sort of disconnection, this constellation of research artifacts where you have to tell a fable to imagine how it's all connected together instead of actually connecting it together. 10:58 And the second wave of modular science is about improving that packaging experience, working with the researcher as they're on the floor, as they're building. 11:08 That is where the modularity emerges. 11:11 You don't write a manuscript top to bottom, 10,000 unbroken words, one after another, with perfect figures. 11:17 You iterate on the figures. 11:19 You iterate one panel at a time, one script at a time. 11:22 Asking for feedback in this sort of modular, composable way with collaborators. 11:28 And building up from that modular work. 11:31 And it's a connected graph of discourse or computational narratives supported by modern tooling like Obsidian or Mist Markdown that can define the context of that data and how it travels together. 11:44 And the result of all of this is sort of this multiplicative, additive, compounding effect of how these things come together. 11:54 And so I think it's also not worth losing sight of the end goal of this is actually to tell some of these new stories. 12:04 These compositions are the point of science. 12:07 And maybe we should aspirationally have these less— be less like trophies on a shelf behind glass, but shared a little bit more continuously, a little bit earlier. 12:17 And here, there's like this meta-study that is crying out from here of how teapots are associated with the Lego bricks on the shelf, that you actually want to pull these out and put them together and try something new. 12:32 And so that is like a very different thought of modular science than this first wave. 12:40 So instead of thinking about the figures off on their own, we're thinking about the structure of these artifacts inside. 12:47 So not import figure, but import figure from a paper. 12:51 And that can compose all the way down so that you can get inside of these artifacts and bring modularity with you as it's traveling throughout the ecosystem. 13:05 And so I just want to give a flavor of sort of the syntax of how we're thinking about modular science in this second wave. 13:15 And it really does— I think the Lego metaphor works well here because it does have these specific ways that science fits together. 13:23 This is the schemas and the standards and the pieces behind research of how they click together, that you can make very different things. 13:31 And there isn't glue here. 13:33 You're also able to take them apart and put them into different compositions. 13:41 And that is the goal of a project out of the Continuous Science Foundation called Open Exchange Architecture, OXA, or OXA. 13:49 They are new standards for modular and composable scientific content. 13:54 And you can learn more about that at oxa.dev. 13:58 And this is a large project that's bringing together a lot of existing players in the scientific ecosystem. 14:06 All the way from authoring tools like Quarto or Mist Markdown that people might have started to use, to publishers like eLife for traditional manuscripts or Open Archive for preprints, as well as licensing organizations like Creative Commons, who were involved in sort of bringing all of these together into a modern computational complete modular and composable format that can actually support our scientific narratives. 14:34 And so, some of the principles that we have for OXA is, of course, it's open. 14:40 It's a JSON schema-based representation. 14:43 It's a tree-based representation of the scientific content, specifically. 14:48 And so, this is anything from the paragraph citations, figures, equations, or code. 14:53 It is extensible to these sort of new types of data, composable So that you can bring one person's work and compose it inside of yours. 15:01 Typed down very similar to sort of the @proto schemas. 15:06 And thinking about that interoperability modularity. 15:09 And throughout all of this, really thinking about the computation as a first-class citizen within the schema itself. 15:19 It's also going through— we're thinking at this stage a lot about the governance and how we're actually building this up. 15:26 And so there's an RFC process that actually goes through, and we're beginning this movement, I think, in the right way from that perspective. 15:34 So that we're actually building this up and being fit for purpose all the way through. 15:39 And it also, of course, has an @proto recognition on that. 15:44 So you can transfer out of this much more schema— sorry, tree-based schema, which is sort of the native representation. 15:53 Of OXA to something that's a little closer to what @proto works with, which is a facet-based piece where you can actually have some of those annotations, which might be a citation or an additional abbreviation or something that is deeply nested inside of the schema but can be distributed from sort of a social perspective. 16:14 And so there's a lot of thought going in here, and this is definitely early stages. 16:19 And so we're inviting a lot of people to come be involved in sort of the definition of the schema as it starts to exist. 16:28 And I want to just give a couple different flavors with my Curvenote hat on about some of the case studies that we're bringing to bear with these ideas. 16:38 And so this— Elemental Microscopy is a journal that we're working with that's trying to do something a little bit more exciting than just sort of a picture on a page with a blowout of that sort of 1.5 terabytes of data is you should actually be able, similar to Google Maps, to just zoom into it directly as you're reading. 17:02 We have all of these sorts of visuals out there already. 17:06 Scientists are using them in their day-to-day work. 17:08 So we should be able to compose those into the ways that we are sharing. 17:12 Of science, and this is speaking to that modularity as well. 17:16 That 1.5 terabytes doesn't live inside of the paper. 17:20 It does live in its appropriate bucket, but it can be composed into the way that we share and read and communicate science. 17:28 So thinking about not a constellation of research objects that we're sort of painting the stars, but actually showing you how it all fits together. 17:41 The other project that's— or one of the other projects that Curvenote is working on is a partnership with Open Archive. 17:48 Open Archive is the parent organization around bioRxiv and medRxiv, which are the largest biomedical preprint servers. 17:56 And we are currently reprocessing 8.1 terabytes of articles to this new format. 18:03 Formats. 18:04 And so this is something that's ongoing right now. 18:06 There's about half a million preprints that are coming together. 18:10 And this is going to support more exciting reading experiences where you can hover over a figure and bring that in. 18:18 You can dive into that right away. 18:20 And so this is within the context of a single article that you can bring that modular piece to bear in the context and bring that sort of forward in your way of working. 18:33 But that also should be able to exist across research articles. 18:41 And so some research articles, if you hover over them, are closed access and they're not that exciting. 18:47 But if they're open access or a preprint, you should be able to bring the figure from that to bear right as you're reading. 18:55 And so really thinking again about this sort of modular composable way of working that really can improve both the reading experience of a researcher. 19:04 Like, if you have a reference that says, "See Figure 3 in someone at all," like, you have to scroll to the bottom of the page, download the PDF, jump through a paywall, bring it up in your reference manager, scroll to page 50, and actually get the figure there. 19:20 And by the time you've got there, you've lost all of your context, and machines often can't follow all of that work either. 19:30 And so this, I think, is thinking about that sort of next wave of modular science, both from the user experience standpoint, the schemas that are supporting it, as well as the ways that sort of the community can get involved throughout. 19:46 And so not thinking about trophies on a shelf that are glued together, but pieces that you can take, and those are a starting point for the next analysis. 19:59 Lowering the barrier to entry for inclusion in these buckets that you can maybe package these up with the pieces that you are sharing and still be indexed in these different ways, as well as supporting researchers as they're working on the floor with better tooling better graph-based ways of working, better computational notebooks that actually support the organization and communication of all of these results. 20:24 And so that is, I think, the way this new wave of modular science is not saying one of these ways of working is better than the other, but it's flipping that perspective back and forth between all of these different ways of working all at once. 20:40 And so I think the other really exciting piece that was talked about at the start as well, is this is our time. 20:49 This is our semblance of this space. 20:52 And there is so much going on that we can actually affect massive change in the way that science is done. 20:59 So I am Groen Kockett. 21:01 I'm the CEO of Curvenote and also one of the co-founders of Continuous Science Foundation. 21:05 And follow me on Blue Sky. 21:08 So thank you very much. Speaker C 21:41 Hello. Matt Akamatsu 21:52 Yes, great. Speaker C 22:20 Well, thank you, Rowan, for such an inspiring talk. 22:24 Rowan and I first met about this time last year at a Continuum Science Foundation workshop that Rowan organized in Banff, in which we converged at this idea of modular research as a new paradigm for open science. 22:40 And so I'll show today how our research lab at the University of Washington uses modular science principles for interoperable and extensible open research practices. 22:53 Why don't I turn this down a little bit. 22:56 There's a little bit of an echo. 23:06 Or is it this? 23:08 Turn that off. 23:09 Better? 23:10 Better, yes. 23:11 So my name is Matt Akamatsu. 23:13 I'm an assistant professor of biology at the University of Washington. 23:17 And today I'll show how discourse graphs help our labs to enable modular and interoperable research practices. 23:27 So I started my lab at the University of Washington about 3.5 years ago. 23:31 Here we are on the right. 23:33 And I've also co-founded the Discourse Graphs Project, which is an open science and collaboration tooling software development platform and organization. 23:43 And through these two organizations, we design, build, and test new ways to do scientific research that enables better, faster research collaboration and sharing of our research outputs in real time. 24:00 These projects are motivated by many of the systemic barriers to research collaboration that Rohin just described that have been exacerbated now that generative AI is a regular participant in the scientific research and publishing process. 24:16 That is best exemplified by this scandal that broke out sometime last year at the International Conference on Learning Representations, where it became apparent that the majority of, or many of the submissions, as well as reviews for this conference, for these conference papers on machine learning, were entirely generated by low-quality AI that didn't make any sense. 24:44 And so that caused this researcher to say, we're entering an era in which distinguishing real science from low-quality AI will be essential. 24:54 An AI tool did an analysis of these reviews and submissions that indicated that over 20% of the submissions were entirely AI-generated and another large percentage of them were moderately AI-generated. 25:11 And so it says it seems that 21% of these reviews may be AI, but that leads to the same type of question. 25:18 How strongly should we trust this particular report and this tool? 25:23 So the caveat, LLM-generated text detection is not perfect. 25:27 So this is illustrating that we need some better grounded piece of information, some evidence to help us to weigh how much we should trust this measurement, and by extension, how much we should trust these scientific research submissions that up until now have been the substrate for how you share your research communication. 25:49 But that's under tremendous strain because of large language models. 25:54 So we wonder, in this day and age, what is the appropriate What is the role of source code for scientific communication in the age of AI? 26:03 And I pose this question tongue in cheek because as many of you know, the role of source code itself has changed quite a bit over the past year in programming because it now can be freely generated by AI and regenerated from specific instructions. 26:18 So we expect the same transformation may occur for scientific research communication. 26:24 Where up until now, the state of the art has been this long-form narrative PDF that's dozens of pages long, compiles 5 years' worth of work and several dozen results into this single document with a linear authorship list. 26:40 Likely, we'll need some other substrate for communicating our research that is smaller than this long-form journal PDF, but larger than the individual raw data or text. 26:53 And our principles of operation, as Rowan introduced, are that we'd like this system to be reusable so that we can build upon different elements of the research process and synthesize them into living theories of knowledge about our favorite corner of the natural world. 27:12 We need each element to have clear provenance and attribution, and we'd love for it to center human decision-making. 27:19 In an era when the AI scientists are getting increasingly powerful. 27:24 Absent a deliberate design for collaboration, we might find ourselves out of a job for the fun parts of doing science. 27:32 Therefore, our goal is to decouple research into modular elements that each can be reused, cited, and synthesized. 27:40 Those atomic elements are questions, claims, and evidence. 27:45 Which we connect into a graph. 27:48 This data model comes from the science philosopher Stephen Toulmin, who argues that science is less a series of monolithic truths and more a series of claims that are supported by more or less evidence. 28:01 And over time, you gather more evidence, some of the claims go away, and new claims arise. 28:06 And so we'd like to be able to capture not only the claims, but the evidence that supports them so that we can update them. 28:12 And this particular project is the product of a 5.5-year collaboration between myself and human-computer interaction researcher Joel Chan at the University of Maryland. 28:24 And together we build both the information model and plugins to popular note-taking software so that we can run our labs on this modular research paradigm and test what works, what genuinely helps the advancement of our lab's research. 28:42 So to this end, we've built Discourse Graphs, which are both a protocol and an app for interoperable knowledge exchange. 28:49 Here's the information model. 28:51 We take our scientific arguments and projects and decouple them into questions, claims, and evidence, and connect them into a graph. 29:00 The question is the unknown that you'd like to make known with your research project. 29:04 The claim is the current answer to the research question, which it wants to be supported by one or more lines of evidence. 29:13 Each piece of evidence is a specific observation from your work with an outcome, and it's grounded in a single study or experiment, or at least a published research article. 29:25 The article grounds the evidence. 29:27 The evidence supports or opposes the claim. 29:30 And the claim, when incomplete, always motivates a new experiment that you want to do, driving the research cycle forward. 29:39 I'll give a historical example. 29:42 70 years ago, there were two competing claims for what is the three-dimensional structure of DNA. 29:50 Is it a double helix, or is it a triple helix? 29:52 Where are the negatively charged phosphates in this DNA helix. 29:57 And there was no evidence to distinguish between these two competing claims until Rosalind Franklin's famous X-ray diffraction experiment, which showed this very characteristic diffraction pattern that gave us some information about what the spacing is likely to be between these DNA bases. 30:17 And that supported the double helix model for the structure of DNA. 30:25 So in this discourse graphs framing, Rosalind Franklin could have posted this single result to the group, to the group's personal data server, I suppose, and said, this is a result. 30:40 It's reproducible. 30:41 I'm going to show you— I'm going to tell you the observation. 30:44 I'm going to give you the associated figure and a little methods context. 30:49 "Let's figure out what it means together." And in order to strengthen their claim that DNA has a double helix structure, Watson and Crick would have needed to cite Franklin's piece of evidence, otherwise it would remain an untested claim. 31:03 And what I love about this model is that it creates an automatic attribution structure that appropriately credits Franklin's contribution to this discovery. 31:16 Over the course of just making a stronger scientific argument. 31:21 Alternatively, in a world absent evidence, the modelers could instead request that somebody do this DNA diffraction experiment because it's important for their model. 31:31 And then some— an experimentalist might claim that experiment, carry it out, tell them what the result was. 31:37 And then the modelers get some credit for their intellectual contribution to the idea. 31:42 and the experimentalists get their credit for, for doing the work. 31:46 In both cases, the attribution trail isn't some after-the-fact accounting. 31:51 Instead, it's baked into the ongoing research process. 31:56 So we got very excited about this model and whether it might support our researchers' everyday work and also allow for this natural sort of self-organizing system for modular science. 32:09 So we've spent the last number of years developing open source plugins for some of our favorite note-taking apps like Obsidian or Roam Research, in which I'm presenting, in order to make life easier for the researchers in our labs. 32:24 These plugins help grad students, postdocs, undergraduates to orient their projects in context with open questions and key hypotheses and keep them on track towards the question that they decided they wanted to answer. 32:38 It allows them to inventory super early stage results as soon as they think they might have an observation and share them with the team every week in our group meetings, and then synthesize those results into scientific arguments and scientific stories in context with the published literature so that they have rigorous and updatable stories of the state of knowledge for their research field. 33:04 And so we've found that running our labs on this information model has enabled large-scale collaboration, coordination, and sensemaking among members of our lab. 33:18 So here's a couple examples showing how using discourse graphs as our lab's graph-based shared lab notebook and meetings documentation and storyboard for our manuscripts and stories has led to a richly interconnected graph of questions, claims, and evidence that our lab runs on. 33:42 In this visualization, each line represents a single reference of one person's question, claim, or evidence by another person in our 10-person lab that grows over the 3.5 years of our lab's creation, which shows how over 3,500, around 3,500 of these cross-author references have been generated, indicating that we are building off of each other's intellectual work in real time. 34:14 This is both evidence from the published literature as well as new evidence from our ongoing experiments and simulations. 34:21 or even building theories of knowledge where one person's evidence supports somebody else's hypothesis. 34:29 Here's another example showing that we're not only sharing knowledge, but we're also distributing the work, the requested experiments. 34:39 In this plot, we show the transfer of a requested experiment. 34:45 Somebody says, you know, this analysis experiment is a good idea, somebody should do it. 34:51 I don't have time to do it right now, but either me in the future or an undergraduate will come in, see this dashboard, claim it, and say that this is how I can contribute to important problems in the lab. 35:03 And so this diagram shows how the transfer of experiments that were requested and then claimed, often by a different person in the lab, and some of which culminate in a new observation that completes the experiment. 35:18 I have the result, I have the analysis, here's the outcome, we can close this issue. 35:25 And so this show— and what I— the other thing I love about this transfer is that a number of these posts and claiming and completion of experiments were done entirely without my knowledge until they were completed, which as the principal investigator gives me a lot of assurance that researchers are getting the mentorship that they need from each other, finding important work to do, and having a very clear trail of who did the work in a self-organizing fashion. 35:55 So in this case, something like 30% of these issues, requests for experiments were claimed, 40% of them yielded a result, and 15% of those happened between different lab members. 36:09 So with this proof of concept within our own labs, we now seek to bring these tools in a pilot program to other research labs throughout North America. 36:19 So we're halfway through a user pilot supported by the Chan Zuckerberg Initiative and the Navigation Fund, in which we develop these plugins so that they are easy to use for these research labs and organizations. 36:35 so that they may better track their experiments, post research updates, and synthesize their work into their ongoing manuscripts, etc. 36:46 What we found so far is that these discourse graphs help researchers, for example, synthesize results into scientific stories in context with the literature. 36:57 First, in this nonlinear format akin to the floor of Lego bricks that, that Rowan just described. 37:04 Here we're looking at an open canvas where the researcher in Obsidian can share these individual experiments, results, meaning evidence, and claims in this open graph along with whatever notes he might want to include and connect them into this nonlinear graph. 37:26 He calls this his murder board or his whodunit about his pertaining to his research question that is grounded in the individual evidence from the literature and from his ongoing experimental observations. 37:41 And he uses it to plan what is the right experiment to do next and communicate to the rest of his lab what his findings are. 37:51 We're making it possible for him to share individual results with collaborators or wider networks of researchers. 37:58 But it's also possible to compile these individual results into a figure for a traditional manuscript if that's what you need to do. 38:09 So here's an example from our lab where an undergraduate is compiling the figures for his manuscript, starting with the question. 38:16 Each panel of the figure corresponds to one piece of evidence. 38:20 It's an observation that you put there because it's strengthening your overall claim that you're trying to make in the figure. 38:26 So using discourse graphs demystifies the process of compiling or recompiling your scientific narrative from the underlying evidence. 38:36 We've found so far that discourse graphs help researchers in the pilot labs and in our own labs to do the following. 38:44 It helps them think like a scientist. 38:46 People report through qualitative research that we're doing with, with Joel's lab, that they have found huge improvements in their thinking and doing of science. 38:56 And even when they're not using the plugin, it helps keep them in the frame of mind, they call it the Discourse Graphs frame of mind, where they remember their goal is to be addressing questions by providing evidence that supports or opposes their hypotheses. 39:10 It also helps them feel like a scientist, saying that they now have the confidence to share a result without having to claim that they have the be-all end-all interpretation of that result. 39:21 They say, this is the observation, let's all figure out together what you think it means. 39:25 I'm just an undergraduate. 39:26 And so now they have the confidence to share their results much earlier than they would have otherwise, given that they're sharing it within a trusted environment of their lab or their collaborators. 39:37 Here's another researcher saying that now I have a motivation because I feel like I have permission to use the words hypothesis, supporting my hypotheses, because they're now baked into the way that I'm sharing my work. 39:52 So now I can use those words, and that makes it much easier and makes me feel like I'm a scientist. 40:00 And so it feels like it's true progress, where I'm not just sort of collecting data, but I'm generating novel observations instead of just where do I start. 40:10 And very excitingly, we have an early indication that some researchers feel that documenting their research in discourse graphs better reflects the true nature of doing science than writing up narratives and making slideshows. 40:23 So they say converting to this kind of structure is maybe more aligned with what the actual process is like of answering research questions. 40:30 That can be complementary to writing up your long-form narratives, but may end up being more useful and truer to the way that the process of science that people actually want to be carrying out, particularly in collaborations. 40:47 I'm also excited about how having this modular graph structure makes it easy for our research knowledge base to interact with AIs so that they can give us attributable synthesized information from our lab knowledge base. 41:06 So here I'm plugging in our lab graph into our favorite AI, Claude, and gave it some instructions for how it can navigate our graph and asking it to give me an update on recent results so that we can have a real-time view of the key results in the lab and a type of real-time research assessment. 41:25 So it says, here are the recent results from the lab, and I'll give you some links. 41:29 I'll tell you who did it. 41:31 I'll tell you how important it is. 41:33 And we can vibe code any arbitrary visualization or dashboard that we think might be relevant. 41:41 And this, I think, reflects the heterogeneity of algorithms that ATproto enables, too. 41:48 Where once you and your community decide what the relevant metrics or what's the most important thing, then you can show for your community the highest relevance, in this case, individual results connected to the key questions, motivated by a piece of evidence that I collected that's in my, one of my postdoc papers, and connected to an undergraduate's result as well, that you can freely navigate through the graph and then also contribute to. 42:23 We now can share our research not only in long-form narratives, but also in these modular research elements, which we call the evidence bundle. 42:34 The evidence bundle is a data object that contains the observation, the key figure, and a little bit of methods context, and optionally the raw data and the plots that led to the creation of this plot, which get remixed and reused in different contexts. 42:51 The evidence bundle was born in my lab notebook in Rome, and I use our DiscourseGraph software to inventory it and update it over time. 43:01 It then becomes a persistent object in the lab's graph with enough information for the rest of the lab to understand it, which we can then connect into these mini discourse graphs that support our ongoing hypotheses. 43:15 We put this in a repository somewhere, and then we can use it to compile either a traditional narrative or multiple narratives in different venues for different audiences. 43:26 And then we can also post it on BlueSky. 43:30 Right now, as a prototype, this evidence bundle lives in GitHub, but there's no reason it couldn't live on a personal data server and then take advantage of the many apps and tools that many of you in the audience are building. 43:43 And I would love to do that together. 43:46 So, We are very excited about the promise for modular research as a way to support people's ongoing science. 43:54 We think for it to take hold, we need to have the attribution of your modular research contributions be more valuable, useful, and accurate than the current method of counting up the authorships in journal articles. 44:11 So we are working on that technology and on that, on that —on demonstrating the ability to do modular research attribution. 44:21 So here you might have two labs that are working on a similar question. 44:25 We'd like to make it possible, if they each have their own discourse graphs, for each person to be able to submit one result and have it cited by the other lab or their larger community. 44:36 And one lab can request an experiment. 44:39 Each result has shared-upon metadata. 44:42 In a schema of the type that Rowan just described so that you can attribute each person's contribution in a way that initially requires some access control so that they're willing to share their work individually and then is more widely shareable. 45:02 We're making this possible in a workshop, Catalyzing Modular Interoperable Research Attribution. 45:08 And in a workshop in Ireland in June, where we're bringing together tool builders and researchers to make a proof of concept system by which you can trade individual research elements. 45:26 And Ronan and I have co-founded a community of organizations called the Kairos Network, which is meant to support different tool-building organizations that have shared values that are committed to interoperability so that modular research can be carried out with the hope that it becomes a persistent and sustainable source of support for this tool— this new tool-building ecosystem. 45:53 So in summary, by decoupling research into its atomic elements, questions claims and evidence and connecting them into a graph. 46:03 We enable trainees, experts and AIs alike to make discrete contributions to shared research collaborations in a way where you can see the direct provenance, you can attribute each person's contribution and you can build upon it in a way that leaves space for each researcher to be recognized and allows them to work collaboratively with AIs and with each other and preserve the most fun parts of doing research together. 46:36 Thank you. Matt Akamatsu 46:48 Okay, that was excellent. Speaker C 46:49 Can you hear me? 46:50 Is this mic— So, um, any questions? Matt Akamatsu 46:54 And Rowan? 46:55 Okay, uh, start, go across. 47:17 On how you can try and kind of sell the solution to these funding agencies and get buy-in from them? Speaker C 47:22 Because I think if we can get— Matt Akamatsu 47:23 if we can persuade them to change things, that would probably get momentum going in the ecosystem. Speaker C 47:30 Can I just before— just for the streaming, the question was about can we get the research funding organizations on board, summarized. Matt Akamatsu 47:40 Yeah, so I do think there are— There is movement on the research funding agencies. 47:47 Gates Foundation came out with some changes about a year ago focusing more on preprints than the sort of journal artifact. 47:56 That was followed through by Howard Hughes Medical Institute about a month and a half ago, and then Michael J. 48:03 Fox Foundation just like 2 weeks ago has changed their policy. 48:07 So there is actually movement that's happening in that space. 48:11 One of the things that I do at Curvenote is we run internal workspaces for Howard Hughes Medical Institute, for example, and a large part of that is actually compliance dashboards for the open access researchers so that they can have different metrics on sort of how they're assessing research that can then, like, have some of these, like, different skews towards data or evidence or the actual mechanisms and process of science rather than the end artifact. 48:42 So I think there's room to be optimistic up and down the stack of research all the way up to these funders all the way at the end. Speaker C 48:55 Okay, who's next? 48:56 Moving along. 48:57 Yes? Matt Akamatsu 49:26 Scientific research is actually closed access. 49:29 A lot of it more recently is moving towards open access, but there are still a lot of paywalls in this space. 49:36 From my specific focus, I'm like looking forward, like the future is open, the future is open access, and sort of building these modular tools around that and using those to reinforce practices in that space that allow people to share openly as well. 49:55 And so I think that's my goal personally. 49:59 [INAUDIBLE] So I think also, and maybe Matt, you want to talk about this, because I think there is that early stage where you are in a much more trusted environment and you have these sort of circles of trust that you want to build. 50:23 I think it's not appropriate necessarily to share everything openly all the time for everyone. 50:30 And so I definitely think that we need these ways of working that support tight-knit, trusted communities. 50:37 And that, that is especially important. Speaker C 50:41 Yeah, we think you can reuse these individual panels from an existing figure, from an existing published paper, and reuse it for your own purposes in your sort of collaboration's knowledge graph as a sort of fair reuse example. 50:59 And as Rohan was saying, more and more of these published papers, even if they end up in a closed journal, have an open preprint alternative. 51:06 And so the more accessible that each preprint becomes, the more that becomes the genuinely useful unit for knowledge. Matt Akamatsu 51:14 [Speaker:AUDIENCE] In this model of science, what does correcting errors look like? Speaker C 51:26 Like if you have a graph of knowledge that you produce and you go back and you realize 3 nodes deep that an undergrad with all the right intentions messed something up and so this figure's wrong. Matt Akamatsu 51:36 What is correcting that downstream? Speaker C 51:38 [Speaker] Yeah, for us, it's making sure you have the right timestamp and making sure that you can see which reference that you made. 51:47 We haven't found it necessary to have some sort of auto-update yet. 51:51 A notification is enough at this stage. 51:55 So for the current scope that we're looking at, we found it sufficient to have this persistent object and you get an update if it changes. 52:04 Because we're not trying to algorithmically generate conclusions yet, I think so long as there are timestamps and you can scroll forward and backward in the history of the result, that will get us a lot of the way there. 52:29 Across the United States. 52:35 I'm curious how you interweave conversation with what you're doing with Quest Graph and how that works. 52:43 I see there's a lot here for how do you train researchers in how to think, how do you train researchers how to share things, not just to share but how to share. Matt Akamatsu 52:53 You're really embedding all the attribution requirements which is awesome. 52:56 So I'm just curious, like, how do people talk in this lab? 53:00 And I don't know if you guys have any reflections on how they talk in this lab versus how people talk in other labs that you've been in that didn't use this strategy. Speaker C 53:09 Yeah, so one part is in structuring the content so that it's clear that there should be a place where there's methods context, and then you share it and you talk about it in a way that you think other people in your lab will be able to understand. 53:22 And right now in a relative freeform manner, you can also include comments in a way that we're just sort of hacking into locally for Roam research. 53:34 But I think it's important to have basically the conversation and those comments connected into the backstory. 53:43 So right now for every app that we use, we just allow the sort of threaded conversation to be associated. 53:51 And it sort of gives— it's almost like an extended part of the lab notebook that you then eventually sort of close and you can see the history even though most people are gonna, just like in Wikipedia, wanna see the state. 54:03 [INAUDIBLE] Yeah, yeah, right now what we find is that, so for example, there was an undergraduate who joined the lab was looking for a project to work on, and he had heard in lab meeting from this other postbac researcher and thought, I like how this person works. 54:25 Let me go see if they have any open issues. 54:28 And that is an excuse for me to go message him on Slack and then talk to him in person and get more of the in-person mentorship. 54:35 So the matching happened through the app, and much of the mentorship was still live in person. 54:41 Informal. 55:05 Yeah, so the question is, can are there plans to allow people to publish into the App Proto system? 55:11 And yes, so we're designing the systems for people to be able to share from one graph to another and then more publicly as well. 55:19 So we have a draft schema for sharing peer-to-peer, and we have a draft lexicon, I learned yesterday, that isn't in use yet. 55:30 But the hope is during this in-person modular attribution workshop, we make it technically possible and show examples of people being able to share, keeping in mind that the majority of use cases at first are this peer-to-peer. 55:47 So, so long as there's some access controls between people's PDSs, I think it'll take off, and then you flip a switch when they're ready to share it more openly. Matt Akamatsu 56:02 So, as with any attempt to have it set outside, there's at some point there's malicious actors and there's motivations for that, scarcity of jobs, reputation, etc. 56:19 And so, you know, stealing attribution and stuff, I mean, there's mechanisms mechanisms people have tried to use for this going back to Fortran, right? 56:28 So I guess what I haven't heard is anything about authentication and security and adversary models for when somebody tries to game this new system. 56:42 And you know, you don't want to do what the web unfortunately did, which was build it and then you bolt the security on from the outside. Speaker C 56:49 So have you thought about that? 56:51 What are you doing? 56:53 Yeah, I mean, I think that's what app proto can be so good for, if you can tie this— Security thing. 56:59 Oh, yeah. 56:59 Yeah, so the question is, what do you do about security? 57:02 What do you do about attempts to game the system that have been tried with the current version of publishing that people will eventually try in this system as well? 57:15 A persistent identifier for you and your history of work, I think goes a long way. 57:20 Seeing the extent to which your work is reproduced by others in other contexts and having that be a signal that some communities elevate, I think will, will help a lot as well. 57:29 And for us, having the sharing be with your trusted communities who you have a social connection to through conferences, etc., makes it more likely that you will have some social cohesion as you eventually share to a public platform that lacks context. 57:49 But I think having these expanding circles of communities and trust who are genuinely interested in addressing the research question together has been a missing part in the first generation of platforms and sharing. Matt Akamatsu 58:02 Yeah, I think one thing that I'll maybe add to that is even Or if we're looking at some of these attacks today are hallucinated references that are just completely made up. 58:13 And in this better networked attribution, then you can actually hover over something and see if that thing exists. 58:23 Just that, better formats, wipes out a whole side of the attack. 58:29 And then I think, again, leaning on some of the expertise that has built up in the social media space around moderation and that side. 58:37 I think that is more mature than what is in the scientific space right now. Speaker C 58:43 Okay, that's all we've got time for right now. 58:45 Thank you very much for your questions.