Jim Calabro 0:01 All right, I'm going to go real fast because we've got a lot to get through. 0:04 But I'm Jim. 0:05 I run the platform team at Bluesky. 0:07 Platform team is the team that runs our infrastructure, our data centers, cloud stuff, write a lot of backend code as well. 0:14 And yeah, we have a lot to get through. 0:15 So I'm going to go super fast, but hit me up after if you have any questions or want to talk more. 0:19 Who am I? 0:19 I'm Jim. 0:20 I live in Boston. 0:21 I run the platform team, as I said. 0:22 I've been at Bluesky for about a year. 0:23 And I'm here today to share some information with you on how we do stuff, our atmospheric systems, our app view. 0:30 I wanna talk about what's going well, what could be improved, and maybe give some recommendations or at least like food for thought for you as well. 0:36 All right, let's get into it. 0:38 Who's this for? 0:39 So there's a lot of people here who are in the weeds of App Proto. 0:42 A lot of people have different needs and wants out of App Proto. 0:44 And so this is really a talk for people who wanna achieve high scale, such as running a big whole world app view, such as the Blue Sky app view, right? 0:52 Another persona might be running a large fleet of PDSs, EuroSky is coming online. 0:58 That's awesome. 0:59 BlackSky, like there's so much movement on this and it's really cool. 1:02 And I want to talk about, well, okay, so not all projects are in that, this category. 1:06 Totally rad. 1:07 It's super awesome. 1:09 And yeah, I'll tell you a little bit about what we do. 1:10 This is just like fact dump. 1:12 So we're just gonna get through it real quick. 1:13 PBS Fleet, we run about 110. 1:16 Wow, that's small. 1:16 Run about 110 rented bare metal cloud hosts in US East and US West. 1:22 They range from 16 to 64 cores depending on what year we spun them up. 1:26 They have 256 gigs of RAM. 1:28 They have about a gig up, a gig down, and they have various disk configs. 1:32 This is kind of fun and goofy. 1:33 I learn about new disk configs every now and then. 1:37 Most of them are XFS set up in RAID 1. 1:40 One of them I learned was running ZFS. 1:42 That's a little nifty experiment. 1:43 It's been live for a long time. 1:44 That's cool. 1:45 We also run one against our will on Ceph, so that's kind of cool. 1:51 We had an emergency situation situation where we had to lift and shift to PDS and all we had was a Ceph array in our data centers. 1:57 That was right after Christmas. 1:59 They cost about $600 a month each on OVH and i3d are the two suppliers that we use there. 2:05 And I'd say, you know, one thing on this, if you are looking to set up a big fleet of PDSs, we are way over-provisioned on these things. 2:12 So here's a btop and that is not showing up at all, but you can see we're using somewhere around like 5% CPU, 5% of the RAM, so that's like 12.5 gigs. 2:22 It's doing relatively little network I/O as well, even though you do kind of want to have at least a little bit of a beefy setup there. 2:32 Most of it actually is like disk storage. 2:33 We're kind of coming up on like actually starting to fill up disks. 2:36 We're going to have to expand and stuff. 2:38 And so you can do a lot with a little with the PDS. 2:40 I'll also say we have about 500,000 users per host. 2:44 I'll say also that was my production PDS, so like that's dapperling. 2:47 There's 500,000 users running on that box. 2:49 There's no special sauce. 2:50 We're just running the open source code. 2:52 There's nothing behind the scenes. 2:53 There's no like rate limit bypass tokens or anything. 2:56 It's just the code. 2:58 And yeah, we do— I'll put a note on that. 3:00 We have our own auth server, but our config looks like this. 3:04 We run an HAProxy on each one. 3:06 There's 16 PDS containers. 3:07 Each one of them has— sorry, each user has a SQLite and we back those SQLite up with Rclone once a day and we do a Lightstream for live. 3:17 We have some Redis, some Datadog for monitoring, Tailscale for network auth. 3:21 It's all fully automated, zero-touch provisioning. 3:24 So like you just run one Ansible command, boom, you got a new one. 3:26 It's really fast to stand up new ones. 3:29 Second major topic is our POPs. 3:31 A POP is a point of presence and it's basically like a, it's a colocation center, it's a small data center. 3:36 We run our own hardware that we own, we bought it. 3:39 We run the relay in our data centers, we run the AppView primarily powered by ScyllaDB. 3:44 In the data centers. 3:45 We run Discover in the data centers. 3:47 There's a few other things. 3:48 Our search cluster's in there. 3:50 Yeah, there's two of them, one in California, one in Ashburn, Virginia. 3:55 And there's about 80 very large servers in each that we own and operate. 4:00 We have super duper fast networks, and it's easy to add more. 4:04 Super fast disks, and a lot of them. 4:06 It's really high bandwidth, and we have active-active everything. 4:09 So there's like two ISPs. 4:10 There's two copies of pretty much everything. 4:12 And yeah, we want that. 4:14 High degree of redundancy so you can provide really solid service. 4:17 Here's some pictures. 4:18 This is the Posting Factory. 4:20 You can see me and Austin. 4:21 Austin's over there down on the back. 4:23 My friend Patrick's actually behind Austin. 4:25 Sorry, Patrick. 4:26 But yeah, this is one of them. 4:27 This is what a data center looks like. 4:29 So you get a cage that's like your full suite, and in it you put a bunch of racks. 4:33 Here's one of the racks. 4:34 Within the rack you have a couple of switches, and then you have a bunch of compute servers. 4:38 And then you have two of those, right? 4:40 So two of everything. 4:41 It's kind of what it looks like. 4:43 It's pretty bog standard. 4:45 Next is our AWS account. 4:46 I'm not gonna talk too, too much about AWS, but it's where we run a bunch of like singleton stuff that's, you know, lots of stuff in there, super important. 4:52 Some of it's kind of chill. 4:54 PLC is like obviously really important. 4:55 The main beesky.app website, like the HTML, the CSS, the JavaScript, the assets are served out of there. 5:01 A few other things. 5:02 And we have a ton of Postgres in there. 5:03 Postgres is really hard to run. 5:05 It's really annoying. 5:06 RDS is great. 5:06 It's very expensive. 5:08 What's going well? 5:11 PoPs are goated. 5:11 PoPs are the GOAT. 5:12 They're super duper cheap and they're extremely high performance. 5:15 The PDS fleet is working quite well actually. 5:18 It's really easy to add more servers. 5:20 So as we're growing, we can just chuck new servers up. 5:24 They're very reasonably priced. 5:26 AWS is AWS, it's fine. 5:28 Yeah, RDS is real, it's so good. 5:30 It's worth every penny. 5:31 Renting GPUs is quite convenient. 5:32 We do run a bunch of GPUs up there for various things. 5:35 Besides that, it is very, very expensive. 5:39 Do some very rough napkin math, and I'm really not gonna get into this too, too much, but it costs us probably about $800 grand a year to run the PoPs, amortizing for depreciation of the assets over 4 years. 5:50 The roughly equivalent AWS install is literally impossible because our, like, our total switching capacity in the PoPs is just, like, shocking. 5:56 Like, it's crazy. 5:58 And you couldn't do our AWS setup You couldn't do our POP setup in AWS, but if you did, it'd probably be about 10x that. 6:04 So about $8 million a year. 6:06 And that's with like heavily negotiated long-term reservations, probably about $14 million if you were doing on-demand. 6:11 Heavy asterisk, I vibe coded all that. 6:13 So yeah, vibe finance. 6:17 That being said, the POPs are an absolute shitload of work. 6:20 It requires deep expertise. 6:21 Austin has gone like absolutely crazy on trying to do our reprovisioning of all this stuff and make it sane and make it easy to work with. 6:29 It has really slow iteration cycles. 6:31 Once you're getting new hardware, it takes a while to get it online unless you have excellent operational practices. 6:35 Again, kudos Austin. 6:38 RAM and storage also is up and to the right, unfortunately. 6:40 And so we bought a bunch of stuff thanks to Jazz, like here-ish. 6:46 Yeah, about here. 6:47 So Jazz is— Yeah, Jazz is the GOAT. 6:52 So what's next? 6:55 And I'm running out of time, so I'm gonna go fast. 6:57 Pops make them easier to operate. 6:59 As I said, Austin's been doing Yeoman's work on this. 7:02 Increase our compute density as well. 7:04 Previously, we basically were assigning one service to a box, and oftentimes the service would need less than 1% of one of the CPUs, and each one of them has 256. 7:14 And so I'm gonna say the evil Kubernetes word. 7:18 So yeah, trying to improve our density there. 7:21 We're improving our provisioning, making it faster to get new stuff online. 7:24 We migrated our network architecture from a single switch into a spine and leaf close network topology. 7:30 It's really fun. 7:31 It's like a network of networks essentially. 7:33 This is what everybody does as well. 7:34 A lot of people do this at least. 7:36 Kubernetes for compute density. 7:38 The net result is higher engineering velocity, robust high availability systems, and we can actually reclaim a lot of cloud spend and bring that back to our PoPs. 7:47 We're also going to improve the PDS hosting in some way. 7:49 We're still talking about this, but when you have 110 servers that you rent, those servers are bare metal servers. 7:55 They're ours. 7:56 They're not virtual. 7:57 I literally have like IPMI login on all those things. 8:01 They all fail independently and OVH is not sending their best. 8:05 And so as you add more servers, your mean time between failure increases, meaning your on-call burden goes up a lot. 8:11 You're at the mercy of your hosting providers. 8:13 When a server goes down, we're waiting for like 6 hours to like, get notice from OVH. 8:17 And it's like, in the meantime, it's like, okay, we can restore to a different server or we're just gonna eat it. 8:22 And so that sucks. 8:24 Shared storage also is a big thing that we're talking about. 8:26 SQLite on the server is rough. 8:28 PDS is like the best case of this, but I am kind of a SQLite hater. 8:32 So I'm just gonna leave it at that and we will chat. 8:35 Yeah, boo you. 8:37 So I'm thinking about virtual PDS, the shared storage, whatever that looks like. 8:41 I'm just kind of hand waving. 8:42 And then more interesting PDS implementations are coming online. 8:46 I'd love to talk about it if you have weird PDS ideas. 8:49 Advice or lessons learned. 8:51 I'm gonna start with the do-nots. 8:53 You must have two of everything. 8:55 Single points of failures will die and you will be sad and your users will be sad. 8:59 Reputation is hard-earned and quickly lost. 9:02 Skip the SQLite layer in your app view. 9:05 Go with, go with my personal favorite, MySQL or Postgres if you don't like good things. 9:11 And then only move past it when you're sure you need it. 9:14 Start simple, basically. 9:16 Don't accept local maximums. 9:17 You can do hard stuff. 9:18 You can do great things. 9:20 That's a do not. 9:21 Now dos. 9:22 First is think really hard about your data access patterns. 9:24 So we've really optimized the absolute daylights out of the BlueSky data plane. 9:29 You want to have this notion of mechanical sympathy. 9:31 Be in tune with your hardware and try and optimize the shit out of it because you'll pay for it otherwise in dollars and also your sanity. 9:40 Bloom filters are your friend. 9:42 Memcache is your friend. 9:43 Redis is not your friend. 9:45 You should have elastic compute and storage, even on-prem. 9:47 You should use cattle, not pets. 9:49 That kind of comes back to two of everything, right? 9:51 You should have very, very thorough observability, be able to answer any question dead about your systems and do it before you have an outage. 9:58 So here's some recs on that. 9:59 And then finally, one more thing of dos real quick. 10:03 Build a team that's very strong operationally. 10:05 You can't do it alone. 10:06 And then come talk to us. 10:07 Let's organize. 10:08 Like Brian posted a while ago about like, what does Nanog of that proto look like? 10:11 Let's talk about it. 10:12 I wanna like, you know, we're doing, I posted a minute ago like L7 BGP. 10:16 Like let's go talk to each other and like figure out like the right way to do this. 10:19 And there's no silver bullet. 10:19 It's really hard work. 10:22 That's it.