Sophie Greenwood 0:00 Amazing. 0:00 All right. 0:02 Hello, everyone. 0:03 My name is Sophie. 0:05 Today I'm going to be presenting my research with my advisor Nikhil Garg, studying social media through the atmosphere. 0:12 And I'm so excited to be here today and meet everyone whose presences I recognize from online. 0:20 Cool to see you all in person. 0:23 Yeah, so just a brief kind of background. 0:25 Nikhil and I are researchers at Cornell Tech. 0:28 And our research area is studying algorithms and society. 0:31 So kind of the social aspects of recommendation systems and feeds, understanding dynamics and how to improve systems along social axes, understanding social media as attention markets, and studying human-AI collaboration. 0:50 Today in our talk, I'm going to be talking first about our work aiding and studying scientific communication on Academic Bluesky. 0:58 Next, I'll talk about what we believe is the potential for studying novel feed experiments through the custom feed ecosystem, as well as novel interfaces through the app protocol. 1:09 So I'll start with Academic Bluesky. 1:12 So the backstory for this project is really that Nikhil and I had been doing a project researching fairness and recommendation. 1:18 It was a theory project, which is my kind of original research area. 1:21 And we had worked with a master's student who built a recommender for archive preprints. 1:26 And we used this to simulate kind of market outcomes in this recommendation system. 1:32 And we were curious, what happens in real data? 1:34 How can we study the fairness outcomes in attention markets on a real deployed system? 1:39 How do we experiment? 1:41 And what this really looks like in the literature is either you run a lab experiment, you build some sort of simulated simulated recommender, you get people to interact with it and see their behavior. 1:53 Another option is to deploy browser extensions. 1:57 So on Twitter, you get someone to download your extension and engage with Twitter and you will manually, or you will algorithmically re-rank the posts that Twitter is returning. 2:07 And this is also frictionful because you have to recruit participants and get them to download the extension and engage with engage with it. 2:16 And then the final option is to collaborate with a platform. 2:19 But these are kind of few— these opportunities are few and far between. 2:24 So suddenly, November 2024, many academics were using Bluesky and in particular looking for a personalized academic experience, which is something that we were really excited about building. 2:37 And so Nikhila had this idea, let's build a custom feed for academics on Bluesky. 2:44 And so we had a little hackathon and built Paper Sky Jest, which is a custom feed for academics showing posts about papers from accounts you're following. 2:52 So it's a very simple algorithm. 2:55 And our objective here is to both support academic research discovery, support academics on Bluesky, but also in terms of our research, enable novel experiments to study open research questions. 3:10 So yeah, we launched last March, a year ago, and we've seen kind of sustained usage over that time. 3:19 And also kind of very positive testimonials from people saying that this has really improved their experience using BlueSky as academics and being able to see filtered academic content in one place that's personalized to them. 3:35 And with the data that we've collected from Paper Sky Desk, we can look at interesting questions here. 3:41 So one plot from our paper is looking at a histogram of archive categories across archive posts on BlueSky that aren't associated with a bot account, which we've labeled. 3:56 And so this is sort of the distribution across all posts. 4:00 But then you can ask what is the distribution of posts that we sent to users on Paper Sky Jest when they requested the feed? 4:09 You can see that it's a bit higher on the AI and LG is like machine learning as well as computer vision and linguistics. 4:22 And then similarly, the posts that people interact with, you can also look at What are people interacting with in this content on Sky News that we've returned? 4:32 So that's just an example of one of the figures we have in our paper and these kind of questions that we can now look at with this data of people using our feed. 4:41 And there's a variety of spinoff projects that we're working on here. 4:45 One being looking at doing social science analysis of what people are talking about and working on on Bluesky. 4:53 And leveraging emerging NLP tools like sparse autoencoders to do topic analyses. 5:00 We're trying to extend Paper Skygest to include trending content and more sophisticated recommendation to aid discovery and discourse. 5:09 And also running experiments on Paper Skygest to A/B test different algorithms and also answer interesting science of science questions. 5:21 And this leads to kind of the novel feed experience section of the talk. 5:25 So we've been piloting experiments on Paper Sky, just particularly including reposts versus not including reposts. 5:34 But one thing that I'm really excited to talk about today is some research that we've been doing. 5:40 Oh, this is out of order. 5:42 But yes, we did have the pilot experiment, which kind of shows some weak results. 5:49 But excited to talk about this project we're doing in collaboration with Andra, who runs the trending news feed, and Grace, who hosts the trending news feed. 5:59 So this is a really cool feed that includes posts sourced from around 300 verified news organizations and has around 10,000 daily users. 6:12 and we're excited to answer some interesting research questions. 6:16 So there's this exciting paper from 2006 commonly known as like the Music Lab Experiment which studies things about rich get richer effects which I think got mentioned on a panel earlier today which basically shows that kind of like little random disturbances in what content gets exposure early on in the process can kind of result in these drastically different long-term outcomes and change like whether posts get exposed according, really in terms of like, are the highest quality posts being shown? 6:52 So we want to understand yet to what extent does initial randomness and account size produce these rich get richer effects in a real social setting, 'cause this is like a lab experiment, and also, How can we solve this? 7:04 How can we mitigate these effects and try to get posts being shown that are truly the quality content that people want to see and smooth out this randomness? 7:15 And the bigger picture here that we're excited about custom feed experiments on Bluesky is that, yeah, there's a bunch of new directions for feed experiments. 7:23 There's new stakeholders, feed creators who are community members who are passionate about showing their community the content that they want to see, and potentially those algorithms that are good for different communities are heterogeneous and different communities want to see different things. 7:40 As kind of like with this music lab extension that we're thinking about, it's a new opportunity to ask these age-old social media questions in new settings and deployment. 7:52 And yeah, as I was saying, new questions about how this is heterogeneous across communities. 8:00 And finally, I want to talk about novel interfaces through the app protocol. 8:05 So there's a couple things that we've been working on here. 8:08 In particular, we've been thinking about how can we kind of allow people customization experience on their paper SkyJest, as well as this project led by Kenny Peng from my lab on kind of extending outside of the linear feed structure to trails and these post-linear feeds. 8:31 So the customization interface here. 8:33 So custom feeds gives users algorithmic choice, and there's kind of been a lot of interest, I think, in academia and industry in sort of giving users more agency over their algorithms and their online experiences. 8:45 But this leads to a lot of research questions such as how much granularity do users want? 8:51 Are they, is the feed, choosing between different feeds that are created by other people, is that kind of the sweet spot, or do people want more fine-grained control? 9:01 What types of controls do they like? 9:02 Do they like to control who they're looking at, or do they wanna see kind of different mixtures? 9:09 And is there some sort of consensus about what's good for people, or are there different kind of types of users? 9:15 And so to that end, we built this customization interface, which for the sake of smoothness, I will probably just show the screenshots instead of the actual demo. 9:24 But find me after. 9:25 I'll show the demo. 9:27 But basically, this configuration editor, which is similar to, I think, tools that others have built. 9:36 But basically, on the one hand, you can kind of tune your Skagit experience and it will responsively show you a preview of what you're what the sky just will look like. 9:47 So one thing we're really excited about here is building good kind of defaults for people to use and then seeing which defaults people gravitate toward, especially kind of like what do people want and what do people end up using here. 10:02 But then we hope to give sort of more flexibility, including like these follow weights. 10:07 So like, can you talk, can you kind of, put different weights or inclusion on your followers. 10:12 So for example, if I'm following a bunch of academics but I want my Paper Sky just to mostly be specifically the econ CS academics, then I could filter down to that. 10:24 And then different weights both in terms of global likes as well as interactions from people I'm following. 10:31 And then, oh yeah, this is just showing the time decay does things. 10:36 And then also filtering down to specific specific categories of content. 10:40 So I've here filtered the AI category. 10:44 These are like trending things in the last 72 hours in AI archive posts. 10:53 And yeah, you can imagine swapping these out for sparse autoencoders or other topics. 11:01 Like I think the Chai talk had a bunch of topics and stuff that we could maybe integrate with. 11:08 And then how am I doing for time? 11:09 Sorry. 11:12 2 minutes? 11:13 OK, amazing. 11:13 I have time for this. 11:14 So yeah. 11:15 So this last thing is a project I'm really excited about that is led by Kenny Peng at Cornell Tech. 11:22 And this is, yeah, just a really cool project. 11:28 So we all know that algorithmic feeds are flawed in many ways. 11:33 There are filter bubbles, there's doom scrolling. 11:36 People feel trapped often in their, by their algorithms. 11:41 And there's kind of extensive research on improving feed algorithms and understanding the ways in which they fail. 11:47 And improving fairness, improving diversity. 11:52 But Kenny's idea is why do we have to be restricting ourselves to this linear feed interface? 11:58 So the original vision of the web was to browse hypertext. 12:02 That's where HTTP comes from. 12:04 And this goes all the way back to the 1950s, the vision of Vannevar Bush of navigating trails of information and how can we switch between different topics and make connections. 12:20 And there's a connection here to the work that Semble is doing, also building trails. 12:25 And so while Assemble assembles trails using human curation, Kenny's idea is building trails for interacting with social media through new LLM technology to basically generate these trails algorithmically. 12:47 So he uses sparse autoencoders which basically generate interpretable labels for different pieces of content. 12:54 And excitingly, he implemented this on like a week's worth of BlueSky data. 12:59 So you should all check this out. 13:00 I'll show a demo in a second. 13:02 But if you want to pull it up, this is called SkyTrails. 13:05 And this is his basically BlueSky browser. 13:13 So if you open SkyTrails, you can see a selection, a random selection of the trails that are available. 13:21 And I'm just going to click into the P.G. 13:23 Wodehouse and Jeeves. 13:24 And here are a bunch of posts related to this topic. 13:27 And as I'm scrolling, I see, okay, now I'm interested in discussions about ChatGPT. 13:32 So I click on this trail and I'm led to the trail about ChatGPT. 13:37 I can scroll down here and then, oh, this is really interesting and esoteric, is seeking or suggesting alternatives. 13:43 I'm now on this trail. 13:45 I can scroll down. 13:46 Oh, stylish and fashionable aesthetics. 13:48 So I scroll here, and so now you can see how I've really navigated the space of content. 13:53 I've found things that are kind of cross-cutting these concepts that you maybe wouldn't have expected even are being discussed, or certainly there's not gonna be a custom feed for, what was the last one, seeking or suggesting alternatives. 14:09 But maybe it's something I'm interested in. 14:10 And so there's this ability to quickly navigate world of content, find new interests, find new discussions that people are having. 14:20 And so that's the idea of post-linear feeds. 14:24 So yeah, with that, thank you to our wonderful set of collaborators, including folks in this room, like Andra. 14:32 And yeah, thank you for listening. 14:35 And I also want to shout out the STec labeler from the Social Technologies Lab, which is the account activity labeler, which I think is quite popular and very effective, and I'm a big fan. 14:46 So that's from another group at Cornell Tech. 14:49 So thank you, everyone. Speaker B 14:56 Amazing. 14:56 Thank you, Sophie. 14:58 Questions, anyone? 15:05 It's not really a question, sorry, but Six, six, six, six, six, six, six. 15:13 I like the sky trails. 15:16 I like the lateral movement through topic space, the Memex-like interface. 15:23 Yeah. 15:29 You mentioned earlier the possibility of doing science of science using the data from your Sky Chest tool. 15:36 I'm curious, are you actively working in that area? 15:39 What do you have planned? 15:40 Because as someone who's just dipping his toes into Scientometrics, you know, this is very exciting and I'm wondering what you're thinking of. Sophie Greenwood 15:50 Yeah, thank you. 15:50 I think our interest is like kind of science of academic communication specifically. 15:56 So understanding— I think our angle will be like looking at the— these like feed the trending algorithm or the different algorithm choices we make on PaperSkaggis and how those choices, especially kind of interpretable design decisions, impact kind of discourse and engagement and who's following who and stuff like that. 16:19 So that's, I think, the angle that we're interested in, especially as well as these kind of more observational questions of across BlueSky, as we're ingesting the fires, what are we seeing about what people are talking about and engaging with. Speaker C 16:38 Great talk. 16:38 Thanks. 16:40 One thing I think a lot about with custom feeds in the network and incentives is— so you can do these studies where you're like, how does it impact individuals' behaviors through what feeds they see? 16:51 But a lot of what feeds are doing is they're setting— they're kind of like the rules of the game. 16:54 And they end up, you know, certain kinds of content if they're getting a lot of traction or visibility or people are interacting even like liking or following or getting these social interactions more than just like social media interactions that will incentivize people to create different kinds of content and get this big loop going. 17:12 And so you change what kinds of content is available through that. 17:16 But that's hard to do with the exciting thing with BlueSky, the app, and the protocol is that we can do these— people can create new things and do experiments pretty easily for individuals, but it's harder for groups. 17:27 Do you have any ideas on how to suss that out or try to get signal through, especially if it's only impacting a smaller part of the overall graph or network? Sophie Greenwood 17:38 Yeah, that's a really interesting question. 17:39 I guess there's two aspects to my answer, though. 17:43 The first is that this is one of the things we're excited about with the trending news feed. 17:48 Is looking at, so we did a pilot experiment, basically putting a lot of the weight on these raised multipliers on the reposts and seeing how that changed what content was. 18:00 So we ended up seeing a lot more smaller news organizations were rising to the top, and you ended up seeing sort of an amplification of, you created a higher weight on reposts and then you saw a lot more reposts relative to the other feed. 18:15 But I think maybe your question is more just like, okay, the trending news feed is sort of an outlier in terms of feed size, and so how can we actually study how these feeds are impacting sort of things on a bigger scale when they're small? 18:31 And I have not actually thought of that question, and I'll think more about it. 18:34 Thank you. 18:36 Hi. 18:37 So apologies in advance if my question just completely completely out of nowhere because I'm really sorry I joined your talk late. 18:43 But I'm wondering, like, especially looking at the kind of, like, trails of conversations and topics that, you know, users and people are kind of posting about, I'm wondering if you've done