Jay Patel 0:00 All right, that's like 24 of the timer. 0:06 All right, welcome to my talk. 0:07 I'm presenting Crowdsourcing Research Synthesis on AT/PODO, Envisioning an Inclusive Future. 0:13 And I'm Jay Patel from the University of Maryland. 0:18 So I want to preface this by the motivation for this lightning talk. 0:23 Back in summer 2025, I was doing my first systematic review. 0:26 So who's familiar with the idea of a systematic review? 0:29 In research? 0:30 Okay, who's familiar with the idea of any kind of review, research synthesis? 0:34 Very good, okay. 0:36 So systematic review is a very systematic way of sort of assessing how much evidence there is out there in the broader research literature, and that involves 4 key steps, which is searching for relevant preprints and papers through keyword searches and natural language searches, screening for relevant reports, so from all the papers that you search for, which ones are relevant to your questions, and appraising or evaluating the study quality. 1:01 After appraising for study quality, you might drop off, filter out a few more articles, and then when you do that, you'll then synthesize the evidence that you have into some sort of prose which is suitable for publication. 1:15 So this is what I've been working on for the past year, and the question that I was really trying to answer was how suitable are large language models for assessing research reports. 1:25 So I was doing this, I noticed a few recurring problems that are not exactly novel, but I really felt them in my bones. 1:31 One is just that it's very slow to do this. 1:33 It takes about 1 to 2 days to read a paper very deeply and to be able to say something smart about that. 1:39 There are many such citations for that claim. 1:41 It's also very tedious after you learn it, after you go through the first dozen or so papers. 1:46 And if you want evidence of that, you can just go to me because I've done this hundreds of times. 1:51 This is actually a very costly procedure, so if you have a trained graduate student and maybe one other person working on it, it takes quite a lot of money, and there are only a few citations for this sort of thing, but estimates range between $50,000 and $100,000 USD. 2:06 Actually, in pharmacology or pharma companies, it'll cost about $141,000 per systematic review. 2:15 So that's quite a lot. 2:17 You know, and sadly, this is also an error-prone sort of thing. 2:21 93% of systematic reviews have at least one error in the search process. 2:25 About 59% of the sampled ones have at least one error in extraction process, extracting key details of methods and analyses. 2:34 And then when it comes to appraisal or evaluating the study quality, well, there are very many problems, and I would call that FUBAR, or I guess you know that acronym. 2:43 And there's also a number of other problems, like you use very often small homogeneous teams, which limits the quality of your systematic review. 2:53 Ideally, you use very large heterogeneous teams so they could get through the work more quickly, tackle the problem from multiple frames using different kinds of knowledge, different kinds of skills, and use different connections to make sure you have a really great product at the end of it. 3:07 And even more problematically, many of these systematic reviews, because they take so long, are dead upon arrival. 3:13 They're rarely updated after publication and very troublesome to maintain, even when you have a fancy sort of living web page. 3:22 So what we would ideally have is, you know, solutions to these 4 key problems: volume and velocity. 3:30 We would increase the speed at which we do reviews so that they take weeks, not years. 3:34 We would improve the veracity or accuracy, and I propose that we do that with crowdsourcing, increasing the team size, and ensuring that certain team members are always engaged with error detection and error correction. 3:48 We would hopefully increase the diversity of the different perspectives and contributors on our teams. 3:54 That would allow us to reframe things, make suggestions, make interesting connections, and then also have living systematic reviews, which helps with providing timely data to people who might benefit from that data. 4:06 And we kind of do this thing already with open source software development, right? 4:11 So how do we do this? 4:11 Do we use some kind of Shiny app with our open science repos? 4:15 Do we use like journal-specific tools or journalware? 4:18 This question has been asked on Blue Sky actually quite recently by this user who asked, what is the best platform to host something like this, meaning a living systematic review? 4:27 Is it Shiny? 4:28 Is it a website? 4:30 Well, you know, one other user had this interesting idea of a wiki, which is a more dynamic approach where multiple researchers can edit and attach new evidence and reviewers, and then various other people can make approvals, rejections, and modifications. 4:44 So something like Wikipedia, but with additional guardrails. 4:47 The nice thing is that the AT Proto system is kind of like a full-ish stack right now. 4:54 We have all the sort of necessary tooling to do an MVP. 4:58 So suppose you have a bunch of research posters, a bunch of preprints, a bunch of fully polished papers, and then you sort of discourse graph them with their questions, claims, and evidence and push those data into symbol collections and marginal— marginal annotations. 5:13 Then you could probably layer that on to some kind of UI where users can validate the different extracted discourse graph nodes. 5:22 Agree, and then publish that into a leaflet blog post, which recently has been working on creating sort of living documents, living posts. 5:35 So if we do this, you know, I think the near-term goal could be to leverage the existing apps, create the MVP. 5:41 Hopefully that can be done within a few weeks to months. 5:44 In the medium term, I think it'd be cool to query data across multiple crowdsourced reviews. 5:48 And scaffold also what's called primary research, which would be not reviews but primary experiments, surveys, interviews. 5:56 And in the very long term, it'd be very nice if we could build interesting visual dashboards and data stories that would appeal more to the public, more for public engagement, rather than just creating, you know, typical research reports for scientists. 6:11 So there are a fair number of things that we can do, and I'd like you to kind of get your help with this. 6:16 So if you're a builder, can you help me build the missing pieces? 6:18 If you're a scholar or someone else, you know, can you help me test drive the MVP, spread the word? 6:23 And I don't know who else is out here except a builder or a scholar or advocate, but anyone else, you know, you can also help test drive the MVP, spread the word, get involved. 6:34 We'd really kind of love your advice here. 6:36 I'd like to thank my research mentor, Joel Chan, for conceptual guidance here, and then Ronan for also motivating this talk and bringing me out here. 6:45 And finally, I'm available on ORCID and BlueSky. 6:48 Thank you. Speaker B 6:57 Thank you, Jay. Jay Patel 6:58 Sure. Speaker B 6:59 Questions? 7:00 Anyone? 7:02 Maybe I'll ask a question about the Yeah, just comparison, like what's your sort of mental model for where App Proto fits in versus like AI? 7:14 How do you see the two combining? Jay Patel 7:19 I'm agreeable to both approaches. 7:22 I haven't thought about AI as much because I think simply increasing the team size would be such a great benefit to the problem at hand. 7:31 I know that for systematic reviews, there's an incredible amount of work being done right now for screening papers, extracting bits of questions, claims, and evidence, and also doing the synthesis. 7:41 I agree more for using it for search, screening, and extracting questions, claims, and evidence. 7:49 A little bit less so for synthesis because that's way more complicated and way more fun. 7:53 As an amateur, I've tried to sort of do that myself and it's not been as successful. 7:58 But certainly I think, you know, search In fact, I couldn't have done part of my systematic review without using some AI-assisted elements into my workflow. 8:07 It really would not have been possible. 8:08 So I strongly advocate for using the right AI programs to help with search and screening, although my systematic review is so small that I can do the screening myself manually. 8:19 And then when it comes to appraisal, I have a certain way of doing it that I like, and so I feel comfortable just doing that myself. 8:26 But I'm open to anyone who might be able to sort of automate custom-built checklists and things using some sort of AI agent. 8:34 So that is, yeah, that's quite doable. 8:38 Cool. Speaker B 8:39 Anyone else? Jay Patel 8:44 You mentioned using high-quality LLMs. 8:47 Did you benchmark maybe like mid, medium level or low-quality ones and —As the change from— I did not do the, for the systematic review that I was conducting, I did not do the benchmarking myself. 8:59 I was merely reviewing the studies that others had done, the benchmark experiments that others have done. 9:04 And so what I noted from that, which is actually helpful for this sort of meta project, is that there are a smattering of results. 9:11 The more you focus these foundation models, the better they do using sort of, fine-tuned data, better prompts, better context engineering, those things all sorts of help. 9:23 But I think none of these models achieve expert human performance on every trial. 9:28 So it would really be beneficial to have humans and AI agents work together across the systematic review process. 9:36 Certainly I see them being very useful for appraisal, right? 9:39 Let's say you have a long checklist that's useful for evaluating the methods or analyses. 9:44 You should consider, certain LLMs, I think that's perfectly fine. 9:48 Yeah. Speaker B 9:50 Any more questions? 9:51 Maybe a quick one while Sophie sets up, or we'll just move to the next talk. 9:58 Okay. 9:58 Well, let's thank Jay again.