Francisco Carvalho 0:00 I think I'm going to talk here. 0:05 Yeah, I hope you guys don't mind the Twitter aspect. 0:07 This is all obviously very translatable to Blue Sky. 0:13 Yeah, my talk's about, as you can see, digital anthropology. 0:18 I'm going to tell you a story I had fun writing last night. 0:24 Okay, my name is Francisco. 0:27 I have some background in AI safety and tools for I've worked at METER and the EU Commission, and I run my own small R&D org called Epistemic Garden, and we build AI tools for community flourishing. 0:43 Epistemic Garden. 0:45 Okay, so I spend a lot of time on Twitter. 0:49 There's this part of Twitter that I love, and you could describe them as intellectual hippies. 0:56 I think of them as a pre-paradigmatic research scene online. 1:01 And these are people with varied interests, all of mine. 1:09 There have been potentially many waves. 1:12 This is how you can think of them. 1:15 But the anthropology has always been done manually in pictures like this. 1:21 This one. 1:26 Oh yeah, this is just to, I wanna make the argument that it's a generative scene that ends up being very influential. 1:35 You know the motto of the conference? 1:36 I'm pretty, I'm relatively confident my friends came up with it way back when. 1:43 The Department of War, I know it's not a great slide, but Again, I'm making the argument that memetically this group of people is kind of upstream of a lot of stuff. 1:54 Okay, so I love these people, I love this space online, and I wanted to understand what makes it tick, what makes it work, and I wanted to make it go even faster. 2:07 But Twitter data is closed, oh no. 2:10 So I started something called the Community Archive. 2:13 'Cause if I moved to Bluesky, I love Bluesky, but if I moved to Bluesky, my people wouldn't be there. 2:18 So I started this thing, and I got a few hundred people to upload their archives. 2:23 You need to download them from Twitter, and it's a whole laborious process. 2:28 But ultimately, we end up with a really nice dataset to run experiments and build prototypes. 2:33 You have a few influential people in the top follows. 2:38 Patio's an ex-Stripe person who's a famous blogger. 2:41 Emmett Shear was CEO of OpenAI for like a second. 2:47 And people did build all sorts of tools and prototypes. 2:51 These are a few that I gathered, but you have like a Google Trends type thing. 2:55 Someone did like an idea, like viral spread of ideas thing. 3:03 I have a BangerBot over there. 3:04 Someone did people search on the archive. 3:11 Many tools. 3:14 But the tools don't answer my questions. 3:17 I wanted to understand the stories and the flow of ideas in this community. 3:23 So I tried semantic clustering first. 3:26 You embed all the tweets, you try to cluster them, and you get groups of tweets that are a little bit too broad. 3:35 So an example is you might end up with a cluster of just broadly meditation-y themed things, but I want stories. 3:46 I want threads of narrative that are much more specific than that. 3:50 And I examined the meditation cluster and found one really cool story. 3:58 People, I think, Fitch, Nutt, Han, Some, I think Vietnamese Dharma person has this saying, the next Buddha is a Sangha. 4:09 And over time people kind of remix this and all the Dharma-oriented people made the next Buddha is an internet community. 4:22 Fun. 4:23 That's a story that you can tell, that I found in the data. 4:28 But semantic clustering didn't work that well to do it in an automated exhaustive way, right? 4:32 So I tried to find strands, which worked better. 4:37 So I'm gonna tell you a little bit about strands. 4:41 They work better. 4:43 We took this tweet about community building, the original one, and we found tweets causally downstream of it and tweets topically upstream of it. 4:56 Like I was saying, trends are centered on one tweet, like causal light cones, downstream, upstream, backwards and forwards in time. 5:07 This is my working diagram, but it's what it says on the tin. 5:15 Oh yeah, I guess I could explain. 5:17 It's like you have a tweet, it gets quoted a bunch of times, in the future or it gets replied to, and all of those are causally downstream. 5:26 And you could implement the causally upstream in many ways. 5:28 In my case, I just did semantic similarity and it's close enough. 5:32 We still get pretty cool results. 5:35 Oh yeah, I forgot I had slides explaining exactly this. 5:38 Quotes enriching with semantic search, and then you get all the threads from the quotes and the replies and so on. 5:46 and you end up with something that's kind of shaped like this. 5:49 I analyzed a bunch of them manually just to make sure that the quality was right. 5:55 Then I kind of automated them and got 250 of them. 6:01 But what tweets do we base the strands on? 6:05 Well, bangers, of course. 6:08 Bangers are tweets with lots of quote tweets over time. 6:12 From the same community. 6:15 That kind of implies that they're good enough to cite over and over and use as building blocks in the future when making arguments. 6:23 Works pretty well. 6:25 These are some of the bangers. 6:29 If you're familiar with the community, you'll recognize these as pretty canonical tweets that end up being referenced over and over. 6:37 We found 250 of them. 6:40 This is kind of a little semantic atlas of the strands that we found. 6:47 And you can ask, can you braid the strands together? 6:52 Can we make a full picture? 6:54 Can we make a fuller map? 6:57 I tried getting PlotCode to make sense of them because there's 250 of those. 7:03 As visual support, this is the chaos of the dataset, and these are the individual strands, and this is what I'm hoping for. 7:10 It's not perfect, but it's a start. 7:12 It works fine. 7:15 It still requires a lot of human care. 7:17 I wouldn't trust it— I wouldn't present the results without having already the tacit knowledge of having been on Twitter for this long, but I think we can get there. 7:29 So what can we say about the big history of the scene? 7:32 Well, these are walls of text, so I won't bore you. 7:36 But the TL;DR is there's this guy Visa who figures out a bunch of really good posting norms that people end up adopting over time. 7:47 During COVID lots of kind of rationalist-y, Dharma-y flavored people join Twitter. 7:53 They find Visa's posting norms, and it creates this kind of really generative intellectual scene where people who are previously pretty rationalist-minded, pretty left-brained, end up discovering all of these spiritual traditions, and therapy modalities, and embodied practices. 8:14 And that was kind of the golden age, but things keep going. 8:19 This is just the implications of that big left-brain, right-brain merge, and kind of the era that we've been in for the past couple of years. 8:29 I guess is characterized by community building in real life and data infrastructure, like the Community Archive. 8:42 So we just found a bunch of stories. 8:45 We had cloth codes stitched in together. 8:48 I have a personal theory of the scene. 8:53 I don't think we're going to go through it very in-depth. 8:58 I will just say that I think the core loop of what's happening here is people who are high in epistemic rigor, have the tools of rationality, like Bayesian thinking and so on, and systems thinking, look at these old traditions and try them out and explain them in rationalist terms, and that makes them more accessible to new people who are more more left-brain emphasis. 9:29 Having access to these practices, they end up having more nervous system capacity and executive function, which lets them relate to people better, do more projects that has an outgrowth of community and friendship and flourishing, and is all supported by technical infrastructure like Twitter, Bluesky, and tools that people may build on top. 9:53 This is with overwhelming icons added. 9:57 I don't think it's worth looking at. 9:59 There's still work to do. 10:03 There are a few key events that I preregistered that I thought I would be able to find in the data that I was not. 10:11 And that's fine. 10:14 For example, the first time the community met in person, there was this thing called Vibe Camp, but Brooke, the founder, isn't in the Community Archive, so we couldn't really find the strand exactly for that. 10:25 In the same way, jhanas, which are a kind of meditation practice that became really big recently, but the main people aren't there, so it was hard to find the strands that did exactly that. 10:37 And then Fractal NYC is an in-person community where the people are in fact present, but we still for one reason or another didn't find the great strand. 10:49 Bless you. 10:51 If the data— yeah, okay. 10:53 Obviously if we have complete data this wouldn't be a problem, but the people I care about are still on Twitter. 10:59 And we were talking about this on the panel just earlier. 11:02 It's kind of a socio-technical problem. 11:04 Maybe we need to throw a better party even though the house is really nice. 11:10 Exactly. 11:12 Yeah, so what's next for me? 11:15 I'm building tools for community flourishing to get 100x more serendipity if we can. 11:23 Right now, the next thing I'm doing is kind of like a P2P network of permissioned data between people and to find opportunities between us. 11:35 Ronan's agent talks to my agent. 11:36 Maybe I know someone who wants to fund Ronan. 11:39 Maybe I'm trying to get rid of a couch and Ronan needs a couch. 11:43 And the goal is to have this big ecosystem of AI fairies conspiring in your favor. 11:50 This is where the fairies live. 11:54 Yeah, this is Epistemic Garden. 11:56 I'm Francisco. 11:58 You can find me on Substack. 11:59 I publish lab notes fairly regularly. 12:01 With my work, and thank you. 12:04 Thank you guys for paying attention. Speaker B 12:10 Thanks, Francisco. 12:11 You're welcome. 12:12 Any questions? Francisco Carvalho 12:18 It's a lot of information. Speaker C 12:23 This was really fun. 12:25 What do you— you talked a little bit about limitations, but like, yeah, what do you wish you could build that you feel like you can't? 12:30 Build, or where are you limited? 12:32 Like you were saying there were some events you were expecting to see that you didn't see, but are there bigger things you're like, I wish I could have this, but I don't have it yet? Francisco Carvalho 12:42 I think definitely just the amount of data is one, right? 12:47 Just being comprehensive across all the people that might matter. 12:50 Another is permissioned data, where for the opportunity mining that I was talking about I don't want to be public about everything, but for people I meet once or twice at a conference or friends of friends, I wouldn't mind. 13:04 There's a lot of information that would make it easy to coordinate with other people that I wouldn't be able to post on Twitter. Speaker B 13:14 I just got to say, I feel like the whole day, if I had to summarize it in one sentence that I learned, is that Blue Sky is a nice house, but it needs to throw a better party. 13:22 I think you can teach us how to throw good parties. 13:24 This is cool stuff. 13:25 Thanks. 13:26 We should jam on it. 13:28 Any other questions? Speaker D 13:39 You talked sort of briefly about assembling threads from posts. 13:44 I was wondering if you could elaborate a bit more on that process. 13:47 I felt like it was kind of— unclear to me how you take one post and turn it into this whole— how do you find posts that form some sort of story? Francisco Carvalho 13:56 Yeah, yeah, yeah. 13:58 So I think the core insight is that first we pick the central posts well because they're posts that are very likely to have mattered. 14:14 They have lots of quote tweets. 14:16 They're cited lots of times. 14:17 And then the actual way we build the strands is— so you get the post. 14:27 Then you take all the times it was quoted, all the posts that quote the original one. 14:34 Then we take all of those. 14:37 We do semantic search on them and find the 50, if you want, most semantically similar posts so that we have some chance of getting past posts, posts from the past, 'cause otherwise it would be all in the future. 14:52 Then for each of those, we get all their threads, all their reply trees, and that gives us something that looks kind of like this. 15:02 And did I, at first when you asked your question, I thought you were asking about how to stitch the strands together, but Was that not the question? 15:14 Yeah. 15:15 Yeah, so this— Speaker B 15:16 maybe we can also do that part offline because we're a bit behind schedule. 15:19 OK. 15:20 If there's any last quick questions people have, or we'll move on. 15:31 So we'll move to Billy for the last talk. 15:35 Thanks again, Francisco. Francisco Carvalho 15:37 You're welcome. 15:37 Thank you.