https://github.com/itaru2622/bluesky-selfhost-env
this repository aims to get self-hosted bluesky env in easy with:
- configurable hosting domain: easy to tune by environment variable (DOMAIN)
- reproducibility: disclosure all configs and operations, including reverse proxy rules, and patches to sources.
- simple: all bluesky components runs on one host, by docker-compose.
- less remapping: simple rules as possible, among FQDN <=> reverse proxy <=> docker-container, for easy understanding and tunning.
at current, my latest release is 2024-12-22 based on codes 2024-12-22 of bluesky-social.
as below, most features work as expected on self-hosting environment.
unfortunately, it may not work all features, some of reasons are described in bluesky-social/atproto#2334
test results with 'asof-2024-06-02' and later:
- ok: create account on pds (via social-app, bluesky API).
- ok: basic usages on social-app
- ok: sign-in, edit profilie, post/repost article, search posts/users/feeds, vote like/follow.
- ok: receive notification when others vote like/follow
- ok: subscribe/unsubscribe labeler in profile page.
- ok: report to labeler for any post.
- not yet: DM(chat) with others.
- ok: integration with feed-generator NOTE: it has some delay, reload on social-app.
- ok: moderation with ozone.
- ok: sign-in and configure labels on ozone-UI.
- ok: receive the report sent by user.
- ok: assign label to the post/acccount on ozone UI, then events published to subscribeLabels.
- ok: the view of post changed on social-app according to label assignments, when using workaround tool.
- NOTE: without workaround tool, the view is not changed. refer bluesky-social/atproto#2552
- ok: subscribe events from pds/bgs(relay)/ozone by firehose/websocket.
- ok: subscribe events from jetstream, since 2024-10-19r1
- not yet: others.
below, it assumes self-hosting domain is mysky.local.com (defined in Makefile).
you can change the domain name by environment variable as below:
# 1) set domain name for self-hosting bluesky
export DOMAIN=whatever.yourdomain.com
# 2) set asof date, to distinguish docker images / its sources.
# 2024-12-22(for latest prebuild, in %Y-%m-%d), or latest (following docker image naming manner in lazy).
export asof=2024-12-22
# 3) set email addresses.
# 3-1) EMAIL4CERTS: to lets encrypt for signing certificates.
export EMAIL4CERTS=your@mail.address
# for self-signed certificates, use below(`internal` is reserved keyword).
# It is recommended to use `internal` for avoid meeting rate limits, until you are sure it ready to self-hosting.
export EMAIL4CERTS=internal
# 3-2) PDS_EMAIL_SMTP_URL: for PDS, like smtps://youraccount:your-app-password@smtp.gmail.com
export PDS_EMAIL_SMTP_URL=smtps://
# 3-3) FEEDGEN_EMAIL: for feed-generator account in bluesky
export FEEDGEN_EMAIL=feedgen@example.com
## install tools, if you don't have yet.
apt install -y make pwgen
(cd ops-helper/apiImpl ; npm install)
(sudo curl -o /usr/local/bin/websocat -L https://github.com/vi/websocat/releases/download/v1.13.0/websocat.x86_64-unknown-linux-musl; sudo chmod a+x /usr/local/bin/websocat)
# 4) check your configuration, from the point of view of ops.
make echo
# 5) generate secrets for bluesky containers, and check those value:
make genSecrets
- make DNS A-Records in your self-hosting network.
at least, following two A-Records are required.
refer appendix for sample DNS server(bind9) configuration.
- ${DOMAIN}
- *.${DOMAIN}
- generate and install CA certificate (usecases for private/closed network, and others using self-signed certificates).
- after generation, copy crt and key as ./certs/root.{crt,key}
- note: don't forget to install root.crt to your host machine and browser.
the easiest way to get self-signed CA certificates is below.
# get and store self-signed CA certificate into ./certs/root.{crt,key}, by using caddy.
make getCAcert
# install CA cert on host machine.
make installCAcert
# don't forget to install certificate to browser.
# check DNS server responses for your self-host domain
dig ${DOMAIN}
dig any.${DOMAIN}
# start containers for test
make docker-start f=./docker-compose-debug-caddy.yaml services=
# test HTTPS and WSS with your docker environment
curl -L https://test-wss.${DOMAIN}/
websocat wss://test-wss.${DOMAIN}/ws
# test reverse proxy mapping if it works as expected for bluesky
# those should be redirect to PDS
curl -L https://pds.${DOMAIN}/xrpc/any-request | jq
curl -L https://some-hostname.pds.${DOMAIN}/xrpc/any-request | jq
# those should be redirect to social-app
curl -L https://pds.${DOMAIN}/others | jq
curl -L https://some-hostname.pds.${DOMAIN}/others | jq
# stop test containers, without persisting data
make docker-stop-with-clean f=./docker-compose-debug-caddy.yaml
=> if testOK then go ahead, otherwise check your environment.
first, describes deploying bluesky with prebuild images.
later describes how to build images from sources by yourself.
# 0) pull prebuild docker images from docker.io, to enforce skip building images.
make docker-pull
# 1) deploy required containers (database, caddy etc).
make docker-start
# wait until log message becomes silent.
# 2) deploy bluesky containers(plc, bgs, appview, pds, ozone, ...)
make docker-start-bsky
# below ops is no more needed by patching/152-indigo-newpds-dayper-limit.diff
# 3) set bgs parameter for perDayLimit via REST API.
# ~~~ make api_setPerDayLimit ~~~
# 1) check if social-app is ready to serve.
curl -L https://social-app.${DOMAIN}/
# 2) create account for feed-generator
make api_CreateAccount_feedgen
# 3) start bluesky feed-generator
make docker-start-bsky-feedgen FEEDGEN_PUBLISHER_DID=did:plc:...
# 4) announce existence of feed ( by scripts/publishFeedGen.ts on feed-generator).
make publishFeed
# 1) create account for ozone service/admin
# you need to use valid email address since ozone/PDS sends email for confirmation code.
make api_CreateAccount_ozone email=your-valid@email.address.com handle=...
# 2) start ozone
# ozone uses the same DID for OZONE_SERVER_DID and OZONE_ADMIN_DIDS, at [HOSTING.md](https://github.com/bluesky-social/ozone/blob/main/HOSTING.md)
make docker-start-bsky-ozone OZONE_SERVER_DID=did:plc: OZONE_ADMIN_DIDS=did:plc:
# 3) start workaround tool to index label assignments into appview DB via subscribeLabels.
# ./ops-helper/apiImpl/subscribeLabels2BskyDB.ts --help
./ops-helper/apiImpl/subscribeLabels2BskyDB.ts
# 4) [required in occasional] update DidDoc before sign-in to ozone (required since asof-2024-07-05)
# first, request and get PLC sign by email
make api_ozone_reqPlcSign handle=... password=...
# update didDoc with above sign
make api_ozone_updateDidDoc plcSignToken= handle=... ozoneURL=...
# 5) [optional] add member to the ozone team (i.e: add role to user):
# valid roles are: tools.ozone.team.defs#roleAdmin | tools.ozone.team.defs#roleModerator | tools.ozone.team.defs#roleTriage
make api_ozone_member_add role= did=did:plc:
make docker-start-bsky-jetstream
on your browser, access https://social-app.${DOMAIN}/
such as https://social-app.mysky.local.com/
refer screenshots, for UI operations to create/sign-in account on your self-hosting bluesky.
# subscribe almost all collections from jetstream
websocat "wss://jetstream.${DOMAIN}/subscribe?wantedCollections=app.bsky.actor.profile&wantedCollections=app.bsky.feed.like&wantedCollections=app.bsky.feed.post&wantedCollections=app.bsky.feed.repost&wantedCollections=app.bsky.graph.follow&wantedCollections=app.bsky.graph.block&wantedCollections=app.bsky.graph.muteActor&wantedCollections=app.bsky.graph.unmuteActor"
# choice1) shutdown containers but keep data alive.
make docker-stop
# choice2) shutdown containers and clean those data
make docker-stop-with-clean
export u=foo
make api_CreateAccount handle=${u}.pds.${DOMAIN} password=${u} email=${u}@example.com resp=./data/accounts/${u}.secrets
#then, to make another accounts, just re-assign $u and call the above ops, like below.
export u=bar
!make
export u=baz
!make
after configuring params and optional env, operate as below:
# get sources from all repositories
make cloneAll
# create work branches and keep staying on them for all repositories (repos/*; optional but recommended for safe.)
make createWorkBranch
then build docker images as below:
# 0) apply mimimum patch to build images, regardless self-hosting.
# as described in https://github.com/bluesky-social/atproto/discussions/2026 for feed-generator/Dockerfile etc.
# NOTE: this ops checkout new branch before applying patch, and keep staying new branch
make patch-dockerbuild
# 1) build images with original
make build DOMAIN= f=./docker-compose-builder.yaml
# below ops is now obsoleted and unsupported bacause of fragile(high cost and low return). also below patch has no effect on PDS scaling out(multiple PDS domains).
# ~~ 2) apply optional patch for self-hosting, and re-build image ~~
# ~~ 'optional' means, applying this patch is not mandatory to get self-hosting environment. ~~
# ~~ NOTE: this ops checkout new branch before applying patch, and keep staying new branch ~~
#
# ~~ make _patch-selfhost-even-not-mandatory ~~
# ~~ make build services=social-app f=./docker-compose-builder.yaml ~~
when you set fork_repo_prefix variable before cloneAll,
this ops registers your remote fork repository with git remote add fork ....
then you have additional easy ops against multiple repositores, as below.
export fork_repo_prefix=git@github.com:YOUR_GITHUB_ACCOUNT/
make cloneAll
# manage(push and pull) branches and tags for all repos by single operation against your remote fork repositories.
make exec under=./repos/* cmd='git push fork branch'
make exec under=./repos/* cmd='git tag -a "asof-XXXX-XX-XX" '
make exec under=./repos/* cmd='git push fork --tags'
# push something on justOneRepo to your fork repository.
make exec under=./repos/justOneRepo cmd='git push fork something'
# refer Makefile for details and samples.
- get all env vars in docker-compose
# names and those values
_yqpath='.services[].environment, .services[].build.args'
_yqpath='.services[].environment'
# lists of var=val
cat ./docker-compose-builder.yaml | yq -y "${_yqpath}" \
| grep -v '^---' | sed 's/^- //' | sort -u -f
# output in yaml
cat ./docker-compose-builder.yaml | yq -y "${_yqpath}" \
| grep -v '^---' | sed 's/^- //' | sort -u -f \
| awk -F= -v col=":" -v q="'" -v sp=" " -v list="-" '{print sp list sp q $1 q col sp q $2 q}' \
| sed '1i defs:' | yq -y
# list of names
cat ./docker-compose-builder.yaml | yq -y "${_yqpath}" \
| grep -v '^---' | sed 's/^- //' | sort -u -f \
| awk -F= '{print $1}' | sort -u -f
- env vars regarding {URL | DID | DOMAIN} == mapping rules in docker-compose
# get {name=value} of env vars regarding { URL | DID | DOMAIN }
cat ./docker-compose-builder.yaml | yq -y .services[].environment \
| grep -v '^---' | sed 's/^- //' | sort -u -f \
| grep -e :// -e did: -e {DOMAIN}
# get names of env vars regarding { URL | DID | DOMAIN }
cat ./docker-compose-builder.yaml | yq -y .services[].environment \
| grep -v '^---' | sed 's/^- //' | sort -u -f \
| grep -e :// -e did: -e {DOMAIN} \
| awk -F= '{print $1}' | sort -u -f \
| tee /tmp/url-or-did.txt
- get mapping rules in reverse proxy (caddy )
# dump rules, no idea to convert into easy readable format...
cat config/caddy/Caddyfile
- files related env vars in sources
# files named *env*
find repos -type f | grep -v -e /.git/ | grep -i env \
| grep -v -e .jpg$ -e .ts$ -e .json$ -e .png$ -e .js$
# files containing 'export'
find repos -type f | grep -v /.git/ | xargs grep -l export \
| grep -v -e .js$ -e .jsx$ -e .ts$ -e .tsx$ -e .go$ -e go.sum$ -e go.mod$ -e .po$ -e .json$ -e .patch$ -e .lock$ -e .snap$
- get all env vars from source code
#in easy
_files=repos
#ensure files to search envs
_files=`find repos -type f | grep -v -e '/.git' -e /__ -e /tests/ -e _test.go -e /interop-test-files -e /testdata/ -e /testing/ -e /jest/ -e /node_modules/ -e /dist/ | sort -u -f`
# for javascripts families from process.env.ENVNAME
grep -R process.env ${_files} \
| cut -d : -f 2- | sed 's/.*process\.//' | grep '^env\.' | sed 's/^env\.//' \
| sed -r 's/(^[A-Za-z_0-9\-]+).*/\1/' | sort -u -f \
| tee /tmp/vars-js1.txt
# for javascripts families from envXXX('MORE_ENVNAME'), refer atproto/packages/common/src/env.ts for envXXX
grep -R -e envStr -e envInt -e envBool -e envList ${_files} \
| cut -d : -f 2- \
| grep -v -e ^import -e ^export -e ^function \
| sed "s/\"/'/g" \
| grep \' | awk -F\' '{print $2}' | sort -u -f \
| tee /tmp/vars-js2.txt
# for golang from EnvVar(s): []string{"ENVNAME", "MORE_ENVNAME"}
grep -R EnvVar ${_files} \
| cut -d : -f 3- | sed -e 's/.*string//' -e 's/[,"{}]//g' \
| tr ' ' '\n' | grep -v ^$ | sort -u -f \
| tee /tmp/vars-go.txt
# for docker-compose from services[].environment
echo {$_files} \
| tr ' ' '\n' | grep -v ^$ | grep -e .yaml$ -e .yml$ | grep compose \
| xargs yq -y .services[].environment | grep -v ^--- | sed 's/^- //' \
| sed 's/: /=/' | sed "s/'//g" \
| sort -u -f \
| awk -F= '{print $1}' | sort -u -f \
| tee /tmp/vars-compose.txt
# get unique lists
cat /tmp/vars-js1.txt /tmp/vars-js2.txt /tmp/vars-go.txt /tmp/vars-compose.txt | sort -u -f > /tmp/envs.txt
# pick env vars related to mapping {URL, ENDPOINT, DID, HOST, PORT, ADDRESS}
cat /tmp/envs.txt | grep -e URL -e ENDPOINT -e DID -e HOST -e PORT -e ADDRESS
- find {URL | DID | bsky } near env names in sources
find repos -type f | grep -v -e /.git -e __ -e .json$ \
| xargs grep -R -n -A3 -B3 -f /tmp/envs.txt \
| grep -A2 -B2 -e :// -e did: -e bsky
- find bsky.{social,app,network} in sources ( to check hard-coded domain/FQDN )
find repos -type f | grep -v -e /.git -e /tests/ -e /__ -e Makefile -e .yaml$ -e .md$ -e .sh$ -e .json$ -e .txt$ -e _test.go$ \
| xargs grep -n -e bsky.social -e bsky.app -e bsky.network -e bsky.dev
this hask uses the result(/tmp/envs.txt) of the above as input.
# create table showing { env x container => value } with ops-helper script.
cat ./docker-compose-builder.yaml | ./ops-helper/compose2envtable/main.py -l /tmp/envs.txt -o ./docs/env-container-val.xlsx
this self-hosting env tried to use self-signed certificates as usual trusted certificate by installing certificates into containers. The expected behavior is: by sharing /etc/ssl/certs/ca-certificates.crt amang all containers, containers distinguish those in ca-certificates.crt are trusted.
unfortunately, this approach works just in some containers but not all. It seems depending on distribution(debian/alpine/...) and language(java/nodejs/golang). the rule cannot be found in actual behaviors. then, all of below methods are involved for safe, when it uses self-signed certificates.
- host deploys /etc/ssl/certs/ca-certificates.crts to containers by volume mount.
- define env vars for self-signed certificates, such as GOINSECURE, NODE_TLS_REJECT_UNAUTHORIZED for each language.
create account | sign-in |
---|---|
components | url (origin) |
---|---|
atproto | https://github.com/bluesky-social/atproto.git |
indigo | https://github.com/bluesky-social/indigo.git |
social-app | https://github.com/bluesky-social/social-app.git |
feed-generator | https://github.com/bluesky-social/feed-generator.git |
pds | https://github.com/bluesky-social/pds.git |
ozone | https://github.com/bluesky-social/ozone.git |
did-method-plc | https://github.com/did-method-plc/did-method-plc.git |
jetstream | https://github.com/bluesky-social/jetstream.git |
other dependencies:
components | url (origin) |
---|---|
reverse proxy | https://github.com/caddyserver/caddy (official docker image of caddy:2) |
DNS server | bind9 or others, such as https://github.com/itaru2622/docker-bind9.git |
description of test network:
DOMAIN for self-hosting: mysky.local.com
IP:
- docker host for selfhost: 192.168.1.51
- DNS server: 192.168.1.27
- DNS forwarders: 8.8.8.8 (upper level DNS server;dns.google.)
DNS A-Records:
- mysky.local.com : 192.168.1.51
- *.mysky.local.com : 192.168.1.51
the above would be described in bind9 configuration file as below:
::::::::::::::
/etc/bind/named.conf
::::::::::::::
include "/etc/bind/rndc.key";
controls {
inet 127.0.0.1 allow { 127.0.0.1; } keys { "rndc-key"; };
};
options {
directory "/etc/bind";
// UDP 53, from any
listen-on { any; };
// HTTP 80, from any
listen-on port 80 tls none http default { any; };
listen-on-v6 { none; };
forwarders { 8.8.8.8 ; }; # dns.gogle.
allow-recursion { any; };
allow-query { any; };
allow-query-cache { any; };
allow-transfer { any; };
};
zone "local.com" { type master; file "zone-local.com"; allow-query { 0.0.0.0/0; }; allow-update { 0.0.0.0/0; }; allow-transfer { 0.0.0.0/0; }; };
::::::::::::::
/etc/bind/zone-local.com
::::::::::::::
$ORIGIN .
$TTL 259200 ; 3 days
local.com IN SOA local.com. root.local.com. (
2024022809 ; serial
3600 ; refresh (1 hour)
900 ; retry (15 minutes)
86400 ; expire (1 day)
3600 ; minimum (1 hour)
)
NS local.com.
A 192.168.1.27
$ORIGIN local.com.
$TTL 3600 ; 1 hour
mysky A 192.168.1.51
$ORIGIN mysky.local.com.
* A 192.168.1.51
cf. the most simple way to use the above DNS server(192.168.1.27) in temporal,
add it in /etc/resolv.conf as below on all testing machines
(docker host, client machines for browser)
nameserver 192.168.1.27
special thanks to prior works on self-hosting.
- https://github.com/ikuradon/atproto-starter-kit/tree/main
- bluesky-social/atproto#2026 and https://syui.ai/blog/post/2024/01/08/bluesky/
hacks in bluesky: