Bonus Drop #69 (2024-11-24): Blue Skies, Four Squares, And Federated Blogs
Oh The (Foursquarae) Places You’ll Go (In Maine); WhiteWind; My Own Private PDS
Today’s Bonus Drop explores spatial data visualization using R and Foursquare’s recent data drop, decentralized blogging over ATproto, and setting up a personal data server.
TL;DR
(This is an AI-generated summary of today’s Drop using Ollama + llama 3.2 and a custom prompt.)
- Foursquare’s open-source POI dataset (docs.foursquare.com/data-produ…)
- WhiteWind, a decentralized blogging platform built on AT Protocol for Bluesky integration (whtwnd.com/hrbrmstr.dev/3lbp5x…)
- Setp and run your own Personal Data Server (PDS) for Bluesky with practical implementation steps (github.com/bluesky-social/pds)
Oh The (Foursquare) Places You’ll Go (In Maine)
Foursquare recently open-sourced over 100 million places of interest (POI). It’s a great gesture, but we warned the data is not fully normalized or validated.
I needed to make at least one map for the 30-Day Map Challenge, and wanted to see what this data looked like (especially since I helped build it back in the day when I was foolish enough to “check-in” at locations via the Foursquare app).
Getting the data is easy. You will need the AWS CLI, and can make an unsigned request to retrieve the Parquet files:
$ aws s3 sync s3://fsq-os-places-us-east-1/release/dt=2024-11-19/places/parquet/ --no-sign-request ./places/$ tree -h places[4.0K] places├── [434M] places-00000.snappy.parquet├── [434M] places-00001.snappy.parquet├── [434M] places-00002.snappy.parquet├── [434M] places-00003.snappy.parquet├── [434M] places-00004.snappy.parquet├── [434M] places-00005.snappy.parquet├── [434M] places-00006.snappy.parquet├── [434M] places-00007.snappy.parquet├── [434M] places-00008.snappy.parquet├── [434M] places-00009.snappy.parquet├── [434M] places-00010.snappy.parquet├── [434M] places-00011.snappy.parquet├── [434M] places-00012.snappy.parquet├── [434M] places-00013.snappy.parquet├── [434M] places-00014.snappy.parquet├── [434M] places-00015.snappy.parquet├── [434M] places-00016.snappy.parquet├── [434M] places-00017.snappy.parquet├── [434M] places-00018.snappy.parquet├── [434M] places-00019.snappy.parquet├── [434M] places-00020.snappy.parquet├── [434M] places-00021.snappy.parquet├── [434M] places-00022.snappy.parquet├── [434M] places-00023.snappy.parquet└── [434M] places-00024.snappy.parquet
I did this on my home server, and I’m being very lazy today and working from a comfy chair three floors up and many square feet removed from said box. While I would normally use {duckdbfs}
to do data ops on those files, I refuse to suffer the extra few seconds of query delay, so you get SQL:
COPY ( FROM read_parquet('./foursquare/places/*.parquet') SELECT name, latitude, longitude, post_town, fsq_category_labelsWHERE (lower(region) = 'maine' OR upper(region) = 'ME') AND country = 'US') TO '4sqme.json' (FORMAT JSON);
A quick scp
to my laptop and we can now get down to bidnez.
I always have Maine geo data handy, so we’ll pull in the border and counties:
me_counties <- read_sf("~/Data/me-counties.geojson")me_border <- read_sf("~/Data/me-border.geojson")
Now we’ll read in the Foursquare points from Maine:
jsonlite::stream_in(file("~/Data/4sqme.json")) |> filter( !is.na(longitude), !is.na(latitude), ) |> st_as_sf( coords = c("longitude", "latitude"), crs = st_crs(me_counties) ) -> me4sq
As noted, the data is a bit janky, so we’ll need to make sure the points are in my state, also remove any POI without a category, and only grab the top-level category so I can use a decent palette:
me4sq |> st_filter( me_border, .predicate = st_within ) |> filter( lengths(fsq_category_labels) > 0 ) |> mutate( top_level = fsq_category_labels |> map_chr(\(.x) .x[[1]]) |> stri_replace_all_regex("> .*", "") |> stri_trim_both() ) -> actually_in_maine
Finally, we plot it all (ref. section header):
ggplot() + with_shadow( geom_sf( data = me_border, fill = "white" ), x_offset = -2, y_offset = -2 ) + geom_sf( data = me_counties, fill = NA, size = 1/4 ) + geom_sf( data = actually_in_maine, aes(color = top_level), size = 1/4, alpha = 1/6, show.legend = FALSE ) + scale_fill_tableau() + coord_sf( datum = NA ) + facet_wrap(~top_level, ncol = 5) + labs( title = "Oh The Places You'll Go (In Maine)!", subtitle = "Locations plotted from top-level categories in the Foursquare Places open data." ) + theme_ipsum_gs(grid="") + theme( strip.text.x.top = element_text(hjust = 0.5) )
I’m pretty sure Foursquare released this to gin up sales for their higher-quality premium data available via their API. But, it’s hard to complain about having some free geo data to play with.
WhiteWind
WhiteWind is a free blogging platform built on the AT Protocol (atproto) that integrates with Bluesky accounts. The platform enables users to publish markdown-formatted content while maintaining complete control over their data through personal data servers (PDS).
The project uses a mixed technology stack with Go and TypeScript. The backend implements XRPC API functionality in Go, while the frontend utilizes Next.js. Development environments are containerized using a Go-based devcontainer configured for TypeScript development.
A primary attribute of WhiteWind is its decentralized approach to data storage and user management. Content is stored on assigned personal data servers, preventing the WhiteWind service from having direct control over user content modification, visibility, or deletion. This tracks with ATproto’s core principles of decentralized user account management and user-controlled data storage.
The project is under active development with rapid architectural changes. While formal contribution guidelines and documentation are still in development, the project welcomes both pull requests and bug reports from the community.
You can read this section on WhiteWind.
My Own Private PDS
I’ve been holding off pointing to the Bluesky repo that lets you run your own Personal Data Server (PDS) until I had a chance to set it up (to make sure it was straightforward enough).
Gosh they made it pretty much painless. Just grab the installer.sh
as they tell you to in that repo, run it the way they said to, and you’ll be up and running in no time. You just need a domain, an IP, and some scant system resources.
You can use a cheap VPS if you want to only run a PDS, but I’ve got some beefy cloud boxes and ended up co-hosting it with some other apps. I’m also running Ubuntu 24.04. Both of those items changed up a few things.
First, I had to modify the script to think 24.04 was OK to use (it is). That’s just changing an if
statement high up in the script.
Then, I had to exit
the script at the point where it generated the Caddy config and pulled down the Docker Compose YAML (ugh) file. I added the Caddy config to my own Caddy setup and deleted Caddy portion from the Docker Compose file.
I ran the remainder of the script, and it worked great!
========================================================================PDS installation successful!------------------------------------------------------------------------Check service status : sudo systemctl status pdsWatch service logs : sudo docker logs -f pdsBackup service data : /pdsPDS Admin command : pdsadminRequired Firewall Ports------------------------------------------------------------------------Service Direction Port Protocol Source------- --------- ---- -------- ----------------------HTTP TLS verification Inbound 80 TCP AnyHTTP Control Panel Inbound 443 TCP AnyRequired DNS entries------------------------------------------------------------------------Name Type Value------- --------- ---------------pds.rudis.dev A 104.225.216.74*.pds.rudis.dev A 104.225.216.74Detected public IP of this server: 104.225.216.74To see pdsadmin commands, run "pdsadmin help"========================================================================
I did a test post from Bash:
ACCESS_JWT=$(curl -s -X POST "https://pds.rudis.dev/xrpc/com.atproto.server.createSession" \ -H "Content-Type: application/json" \ -d '{"identifier": "bob.pds.rudis.dev", "password": "yes-there-is-a-password"}' | jq -r .accessJwt)curl -X POST "https://pds.rudis.dev/xrpc/com.atproto.repo.createRecord" \ -H "Authorization: Bearer ${ACCESS_JWT}" \ -H "Content-Type: application/json" \ -d '{ "repo": "bob.pds.rudis.dev", "collection": "app.bsky.feed.post", "record": { "text": "Hello from my self-hosted PDS!", "createdAt": "2024-11-24T13:43:10Z" } }'
and, it worked (sort of):
I still need to figure out that “invalid handle” message, and have the Bluesky network crawl the PDS (if I want it federated…not sure I do).
I gave ATFile a go with it, too:
$ atfile upload bw-shield.pngUploading '/Users/hrbrmstr/Documents/bw-shield.png'...---Uploaded: 🖼️ bw-shield.png↳ Blob: pds.rudis.dev/xrpc/com.atproto… Key: 3lbp6kujvqk2a↳ URI: atfile://did:plc:ktycg4pjzupqr5755su5mz6j/3lbp6kujvqk2a
And, that also worked: pds.rudis.dev/xrpc/com.atproto…
The PDS supports arbitrary Blob storage, is authenticated, and has a well-defined protocol for interacting with these cryptographically signed records. Sounds like a great new data toy to play with! I may try to put the Markdown for all the Drops into it and provide a way to list them and view them.
FIN
We all will need to get much, much better at sensitive comms, and Signal is one of the only ways to do that in modern times. You should absolutely use that if you are doing any kind of community organizing (etc.). Ping me on Mastodon or Bluesky with a “🦇?” request (public or faux-private) and I’ll provide a one-time use link to connect us on Signal.
Remember, you can follow and interact with the full text of The Daily Drop’s free posts on Mastodon via @[url=https://dailydrop.hrbrmstr.dev/@dailydrop.hrbrmstr.dev]dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev[/url]
☮️
GitHub - bluesky-social/pds: Bluesky PDS (Personal Data Server) container image, compose file, and documentation
Bluesky PDS (Personal Data Server) container image, compose file, and documentation - bluesky-social/pdsGitHub