Dr. Sean Jungbluth 0:00 Thank you everyone for the opportunity to present here today. 0:04 I'm very happy to be at the Atmosphere Conference and at the first @ProtoScience Conference and tell you about my work. 0:09 So I'm Sean Jabluth. 0:10 I'm a researcher at the San Francisco State Estuary Marine Science Center. 0:15 I'm an adjunct professor there and I'm a former Berkeley Lab transplant. 0:19 I'm a biologist, oceanographer, and computational person. 0:22 And I'm telling you about BioKia, this company that I founded and this first product that we're trying to build called Agentis. 0:29 So what do we do? 0:30 I'm working with a lot of insect enthusiasts and we are running around California and we are sampling all the different parts and we are taking it back into the laboratory. 0:41 And my group, we have robotics there where we're looking at these things in very, very high throughput, cataloging all the diversity of, of life that's in California. 0:50 And then I'm leveraging the power of scientific storytelling through LLMs and trying to bring all this information to the light. 0:59 And so BioKia, the company is the engine and the product Agentis is the output. 1:04 So what does a user journey look like here in the context of an, an @proto connection? 1:09 So the use case is imagining that you have researchers that are traveling to the Santa Monica Mountains to sample some dirt and then try to understand the scientific life that's inside of it. 1:20 And so first the folks will ingest the data into our system. 1:24 We'll analyze the data using canonical tools. 1:27 We'll draft a manuscript, a very bare-bones sort of data descriptor manuscript, and then we have also a reviewer engine built into this as well. 1:35 And then the final piece is to broadcast this information to the masses. 1:41 And so this is just a quick mock-up of what the portal looks like right now. 1:46 We're going to sign in with our @proto account, giving everybody universal access to these decentralized identities. 1:52 The first portal that you might get into is this like upload data type of portal where you're uploading your sequence data or images of plants or insects, anything that is kind of collected from the scientists doing this sort of work. 2:07 And then once you've uploaded your data, which it could be, you know, plants, DNA sequence data and the like, metadata that helps, we grind that through our analysis workflows and essentially give you a readout of the diversity information inside of your samples. 2:24 From there, because this is not ultra-complex science, I feel confident in taking the use of AI to create some draft narratives from these things. 2:34 And so you can imagine this rapid dissemination of work through LLMs, where instead of having data products that are essentially waiting for analysis, we have a rapid ability to essentially communicate this in language that humans tend to agree upon. 2:53 All right, so once the papers are published, what does this look like? 2:57 This is just kind of a glimpse into our current imaginative process here, but we think that AI reviewers and human reviewers are both going to be required. 3:06 We know AI is going to become, you know, it's already a commodity, and so we can review papers with the push of a button essentially, but I do truly think we will always need humans in the loop who know what these things are and know what they're doing and the different categories stories here, for example, novelty, we probably want that scored by a human more so than a robot. 3:26 So just kind of a glimpse into how we're approaching this. 3:32 And so what do you do with all this information? 3:35 So I'm an editor at two scientific journals and it's totally broken and we can drink beers and pour one out for that broken system. 3:42 And so the next system is this more public one I think that could be built on App Protocol and one that shows reviews in a transparent format. 3:51 You know, we can share these things. 3:52 They're immutable. 3:53 They're reproducible. 3:54 They can be cited themselves. 3:56 And then further, researchers, they can start to take credit for these things. 4:01 And this is a missing part of the scientific discourse that has basically completely been missed. 4:10 And so the science that I do, the folks that collect these samples, they tend to do these like fun excursions where they go to like the California coast, for example. 4:19 And so what I'm imagining for something akin to like the CHI ePrints or my version of that are these interactive story maps that can be automatically generated from these datasets. 4:32 And these are these nice digital artifacts that folks can explore and they're interactive and they're meant to help tell a story of this science. 4:42 And then from there, what can we do? 4:44 Well, we can hopefully amplify this stuff through platforms like BlueSky. 4:49 And there's a bunch of different teasers here for how this can all be connected, some including getting ahead of the point here. 4:56 But you can advertise your review that you completed it on BlueSky. 5:00 You can highlight through HypeGen, which we haven't talked about here, but Joel Chan was talking about that in a previous meeting that I had with him. 5:08 And we have automatic hypothesis generation that can be captured. 5:13 And then also a hat tip to Assemble, which I love. 5:16 I'm imagining this could be a way that folks can build communities and try to gather trails of evidence that folks can then pursue with downstream work. 5:28 Further, Matt, so it's very clear that the discourse graphs can play a huge role here, right? 5:35 We all do science and we all appreciate how science is conducted. 5:39 And this is the framework that I think can be very useful for naturalists because, for example, there's a whole lot of ancestral knowledge and like local knowledge that doesn't exist online. 5:51 Online, and I think that a combination of discourse graphs and Sembel and the Rails that @proto represents can really capture this. 6:00 And so we're leaving the era of traditional publishing. 6:04 We're no longer going to be looking at static PDFs and Word docs. 6:07 Everything's going to be more universal. 6:09 We're going to have submissions that accept raw data itself instead of your written Publix projects. 6:17 The AI process is going to be faster. 6:19 It's painfully slow right now. 6:21 It's also going to continue to leverage human experts. 6:26 The output format, you know, dead paywall text is old. 6:30 Out with it. 6:31 We are more excited about interactive story maps, knowledge graphs that are amenable to large-scale ML applications. 6:37 Certainly the future. 6:38 And then in terms of data provenance, clearly we have a much more connected system. 6:43 System here where the decentralized app protocol can help link all these things together. 6:48 And folks can get credit for your reviews, your data production, your— everything that we agree is required for science. 6:57 All right, so last slide. 6:58 So the future ahead. 6:59 And I don't know where we jump in this loop, but let's pretend we're starting at the top here. 7:04 So we're a bunch of bug nerds. 7:06 We're running around California. 7:07 We're collecting butterflies and all these things. 7:10 And let's say we take those back into the laboratory and we look at the DNA, and we take pictures of these things, and then we get to the bottom part here where we publish them with these sort of little interactive story maps. 7:21 And then from there, we can have these automated hooks to produce hypotheses from these datasets that folks online and through connections with Semble, we can eventually get to a point at which you would say, aha, time for field campaign 2.0. 7:36 We're going to pursue this next set of hypotheses. 7:39 And in total, I guess I feel that this is the scientific loop that is possible if we build this and with the community that we're hoping to see. 7:56 Any questions for Sean? 7:58 Thank you, Sean. 8:04 Random question. Speaker B 8:05 Are you using the IGSNs at all for the inner— the GeoSample numbers identifiers in what you're doing? Dr. Sean Jungbluth 8:12 Just curious. Speaker B 8:13 Yeah. 8:13 Yes. 8:13 Cool. 8:14 Awesome. Dr. Sean Jungbluth 8:14 Yep. 8:15 Trying to connect in all of the relevant, you know, keys from everybody's databases. 8:20 Nice. 8:20 Okay. 8:22 More questions. Speaker B 8:27 Thank you for shouting out Discourse Graphs. 8:30 So what kind of schema do you think helps? 8:33 What kind of schema do you think helps put guardrails on the generation of the initial manuscript from the data? Dr. Sean Jungbluth 8:41 Hmm. 8:45 I guess I want to understand, do you feel like there will be too many folks that are able to contribute? 8:51 Or I guess I'm trying to understand your— Speaker B 8:53 Yeah, like going from the data analysis to the scientific story, like maybe that's where you can plug in discourse graphs as well. 8:59 And do this claim-based review the same way that like QED, this other AI review program does. 9:10 Yeah. 9:11 Keep it, if you keep the claim structure and each line of evidence as something that people can look back to that gets regenerated into these different narratives, then maybe it'll be more auditable. Dr. Sean Jungbluth 9:22 Perfect. 9:22 Yeah, I think integration at multiple points with the tools that you're building up is the way to go. 9:35 All right. 9:36 Let's thank Sean again. 9:37 And yeah, Alex, feel free to come up.