Skip to content

Commit

Permalink
feat: dockerize and improve logging and resilience of validator (#361)
Browse files Browse the repository at this point in the history
* override default endpoint

* override default endpoint

* set the endpoint default correctly

* set the endpoint default correctly

* set the endpoint default correctly, and get rid of exta init

* set the customized endpoint

* oy the network var doing this

* set the endpoint default correctly, and get rid of exta init

* try to not set network param.

* fix log dumps

* don't call config file this way its already here

* use network, no more chain_endpoint

* use network, no more chain_endpoint

* attractive logs

* attractive logs

* attractive logs

* attractive logs

* attractive logs

* attractive logs

* attractive logs

* attractive logs

* use main and test for now, too many rpc timeouts for masa

* use main and test for now, too many rpc timeouts for masa

* use main and test for now, too many rpc timeouts for masa

* iconic logging.

* distinguish among responses from masa-ai as to tweet verification (fail, 429, network error, etc.)

* deal with missed validation attempts fairly.

* deal with missed validation attempts fairly.

* deal with missed validation attempts fairly.

* deal with missed validation attempts fairly.

* deal with missed validation attempts fairly.

* deal with missed validation attempts fairly.

* even better logging

* even better logging

* even better logging

* even better logging

* even better logging

* even better logging

* even better logging

* even better logging

* even better logging

* even better logging

* even better logging

* whose tweet got validated

* whose tweet got validated

* whose tweet got validated

* whose tweet got validated

* whose tweet got validated

* top 10 bottom 10

* miner tweet validation checks

* miner tweet validation checks

* miner tweet validation checks

* update deps

* update deps

* any words match the query

* dont try to ping validators

* search through hashtags for query terms check

* search through hashtags for query terms check

* search through hashtags for query terms check

* search through hashtags for query terms check

* search through hashtags for query terms check

* search through hashtags for query terms check

* search through hashtags for query terms check

* search through hashtags for query terms check

* handle none response correctly

* ooops all tweets

* better logging

* better logging

* better logging

* better logging

* per miner response metrics

* per miner response metrics

* per miner response metrics

* per miner response metrics

* per miner response metrics

* per miner response metrics

* per miner response metrics

* restore correct checking validation

* fix config call

* fix config call

* fix random tweet checker

* fix random tweet checker

* zero for miners with bad tweets

* zero for miners with bad tweets

* zero for miners with bad tweets

* zero for miners with bad tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* log the checking of the tweets

* Temporarily saving logging improvements

* feat: add docker-compose and start.sh

* exclude .bittensor/ from .gitignore

* rm old docker

* publish image

* readd dockerfile

* updates to set up tee worker under docker

* additional docker related scripts

* log miner checks

* use tweets in the state

* restore

* don't serve the axon actually

* filter out baddies

* log miner  pass /fail tweet id check

* log miner  pass /fail tweet id check

* log tweet validation fails and passes

* log tweet validation fails and passes

* granular tweet validation check

* granular tweet validation check

* granular tweet validation check

* granular tweet validation check

* granular tweet validation check

* granular tweet validation check

* ping only miners at init

* init versions

* fix uid pool back

* fix uid versions init

* fix uid versions init

* fix uid versions init

* use protocol api for tweet validation

* use new way (oracle api) to validate tweets for now

* ping the axons

* ping the axons

* fix url for protocol api to default to correct one

* just ping miners for uncalled versions

* validate correctly with protocol

* validate correctly with protocol

* only count tweets once that pass both id and spot check

* log all the validation tests

* log all the validation tests better

* log all the validation tests a touch better

* log miner scores at weight set

* log deduped tweets

* reset uncalled uids correctly

* fix check valid id

* fix check valid id

* fix check valid id

* fix check valid id

* fix check valid id

* fix check valid id

* fix check valid id

* cpu only for torch

* cpu only for torch

* cpu only for torch

* cpu only for torch

* don't wait for initialization

* updates to starting up docker containers

* docker setup touchups

* no this must not switch off (wait for inclusion)

* make masa backup

* raise timeout to get tweets from protocol, catch weight set issue and die

* print out tweets we're sending to protocol (random sample)

* print out tweets we're sending to protocol (random sample)

* fix strict check

* fix strict check

* fix strict check

* fix strict check

* details on validated tweet

* details on validated tweet

* details on validated tweet

* details on validated tweet

* details on validated tweet

* don't worry about checking for 150 scored uids except in main

* don't worry about checking for 150 scored uids except in main

* fix: prefer python over python3

* fix: python3 to python and defaults in dockerfile

* set timeParsed

* update config.json settings for testnet

* fix: update docker compose to repo

* fix: dedicated services for miner / validator for supporting running side by side with .env

* update config.json settings for testnet

* fix: tests

* debug logs for weights setting

* don't use use_torch()

* don't use use_torch()

* fix: env example and compose edit

* fix: bumps to 1.5.0

* fix: vali test

* fix: env example

* fix: makefile to main

* fix: remove duplicate poetry file

* fix: makefile

* fix: pyproject file back

* fix: add to pyproject

* fix: put pyproject back

---------

Co-authored-by: Grant Foster <grantdfoster@gmail.com>
  • Loading branch information
5u6r054 and grantdfoster authored Feb 25, 2025
1 parent df562f9 commit fa4692c
Show file tree
Hide file tree
Showing 31 changed files with 1,977 additions and 483 deletions.
15 changes: 14 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@

ENV=dev
NETUID=165
SUBTENSOR_NETWORK=test
SUBTENSOR_ADDRESS=wss://test.finney.opentensor.ai:443

WALLET_NAME=miner
HOTKEY_NAME=miner_1

VALIDATOR_WALLET_NAME=validator
VALIDATOR_HOTKEY_NAME=validator_1

VALIDATOR_API_HOST=127.0.0.1
VALIDATOR_API_PORT=8000
ORACLE_BASE_URL=http://127.0.0.1:8080/api/v1


139 changes: 66 additions & 73 deletions .github/workflows/docker-publish.yml
Original file line number Diff line number Diff line change
@@ -1,90 +1,83 @@
name: Docker Build and Publish
name: 'Build and Publish Images to Docker Hub'

on:
push:
branches: [ "dockerize" ]
release:
types: [published]
branches:
- fix/double-connection
tags:
- 'v*' # Only trigger on version tags

jobs:
check-and-build:
build-and-publish:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
strategy:
matrix:
image: [subtensor, subnet, miner, validator, protocol]
timeout-minutes: 240 # Increased timeout for ARM64 builds
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Checkout
uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Set up QEMU
uses: docker/setup-qemu-action@v3
with:
platforms: linux/amd64,linux/arm64

- name: Login to Docker Hub
uses: docker/login-action@v3
with:
fetch-depth: 0
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}

- name: Cache last successful build info
uses: actions/cache@v3
# Tag generation with latest and release handling
- name: Generate Docker metadata
id: meta
uses: docker/metadata-action@v5
with:
path: last_successful_build_${{ matrix.image }}.txt
key: ${{ runner.os }}-last-build-${{ matrix.image }}-${{ github.sha }}
restore-keys: |
${{ runner.os }}-last-build-${{ matrix.image }}-
images: masaengineering/masa-bittensor
tags: |
# Always push latest
type=raw,value=latest
# Branch builds with timestamp
type=ref,event=branch,suffix=-{{date 'YYYYMMDDHHmmss'}}
# SHA with timestamp
type=sha,format=short,prefix=sha-,suffix=-{{date 'YYYYMMDDHHmmss'}}
# Version tags (v1.2.3 -> 1.2.3, latest)
type=semver,pattern={{version}},value=${{ github.ref_name }}
type=semver,pattern={{major}}.{{minor}},value=${{ github.ref_name }}
type=semver,pattern={{major}},value=${{ github.ref_name }}
- name: Check for changes
id: check_changes
# Debug step to see what tags will be used
- name: Debug Docker Tags
run: |
if [ -f last_successful_build_${{ matrix.image }}.txt ]; then
LAST_SUCCESSFUL_SHA=$(cat last_successful_build_${{ matrix.image }}.txt)
if [ "${{ matrix.image }}" == "subtensor" ]; then
CHANGED=$(git diff --name-only $LAST_SUCCESSFUL_SHA HEAD -- docker/subtensor)
else
CHANGED=$(git diff --name-only $LAST_SUCCESSFUL_SHA HEAD -- docker/${{ matrix.image }} **/*.py)
fi
if [ -n "$CHANGED" ]; then
echo "Changes detected for ${{ matrix.image }}. Building image."
echo "changed=true" >> $GITHUB_OUTPUT
else
echo "No changes detected for ${{ matrix.image }}. Skipping build."
echo "changed=false" >> $GITHUB_OUTPUT
fi
else
echo "No previous successful build found for ${{ matrix.image }}. Building image."
echo "changed=true" >> $GITHUB_OUTPUT
fi
echo "Tags to be used:"
echo "${{ steps.meta.outputs.tags }}"
echo "Is this a release? ${{ startsWith(github.ref, 'refs/tags/v') }}"
- name: Log in to GitHub Container Registry
if: steps.check_changes.outputs.changed == 'true' || github.event_name == 'release'
uses: docker/login-action@v2
- name: Build and push
uses: docker/build-push-action@v5
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
context: .
file: ./Dockerfile
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ steps.meta.outputs.tags }}
cache-from: type=gha,scope=${{ github.ref_name }}
cache-to: type=gha,mode=max,scope=${{ github.ref_name }}

- name: Build and push image
if: steps.check_changes.outputs.changed == 'true' || github.event_name == 'release'
# Verify the push was successful
- name: Verify Push
run: |
if [ "${{ matrix.image }}" == "subtensor" ]; then
CONTEXT="./docker/subtensor"
else
CONTEXT="."
fi
TAG=${{ github.event_name == 'release' && github.event.release.tag_name || github.ref_name }}
docker build -t ghcr.io/masa-finance/masa-bittensor/${{ matrix.image }}:$TAG -f docker/${{ matrix.image }}/Dockerfile $CONTEXT
docker push ghcr.io/masa-finance/masa-bittensor/${{ matrix.image }}:$TAG
- name: Mark successful build
if: steps.check_changes.outputs.changed == 'true' || github.event_name == 'release'
run: echo ${{ github.sha }} > last_successful_build_${{ matrix.image }}.txt
echo "Verifying pushed images..."
for tag in $(echo "${{ steps.meta.outputs.tags }}" | tr '\n' ' '); do
echo "Checking tag: $tag"
docker pull $tag
done
display-tags:
needs: check-and-build
runs-on: ubuntu-latest
steps:
- name: Display image tags
# Announce the release in the logs
- name: Announce Release
if: startsWith(github.ref, 'refs/tags/v')
run: |
TAG=${{ github.event_name == 'release' && github.event.release.tag_name || github.ref_name }}
echo "The following images may have been built and pushed:"
echo "ghcr.io/masa-finance/masa-bittensor/subtensor:$TAG"
echo "ghcr.io/masa-finance/masa-bittensor/subnet:$TAG"
echo "ghcr.io/masa-finance/masa-bittensor/miner:$TAG"
echo "ghcr.io/masa-finance/masa-bittensor/validator:$TAG"
echo "ghcr.io/masa-finance/masa-bittensor/protocol:$TAG"
echo "🎉 Released Agent Arena Subnet version ${GITHUB_REF#refs/tags/v}"
echo "Published tags:"
echo "${{ steps.meta.outputs.tags }}"
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -160,3 +160,5 @@ cython_debug/
#.idea/

testing/

.bittensor/
54 changes: 54 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Build stage for compiling dependencies
FROM --platform=linux/amd64 python:3.12-slim as builder

# Upgrade pip
RUN pip install --no-cache-dir --upgrade pip

# Install build dependencies
RUN apt-get update && apt-get install -y \
git \
curl \
build-essential \
pkg-config \
libssl-dev \
&& rm -rf /var/lib/apt/lists/*

# Install Rust
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
ENV PATH="/root/.cargo/bin:${PATH}"

# Set working directory
WORKDIR /app

# Copy requirements
COPY requirements.txt .

# Install CPU-only PyTorch first to avoid duplicate installations
RUN pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# Install remaining dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Final stage
FROM --platform=linux/amd64 python:3.12-slim

# Install runtime dependencies only
RUN apt-get update && apt-get install -y \
libssl-dev \
git \
&& rm -rf /var/lib/apt/lists/*

# Set working directory
WORKDIR /app

# Copy installed packages from builder
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin

# Set environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONPATH=/app \
USE_TORCH=1

# Command to run the application
CMD ["sh", "-c", "python -m neurons.${ROLE}"]
33 changes: 16 additions & 17 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,11 @@ NETWORK ?= main

# Network-specific configurations
ifeq ($(NETWORK),test)
SUBTENSOR_NETWORK = network test
SUBTENSOR_CHAIN = chain_endpoint wss://test.finney.opentensor.ai
SUBTENSOR_CHAIN = network test
NETUID = 165
else ifeq ($(NETWORK),main)
SUBTENSOR_NETWORK = network finney
SUBTENSOR_CHAIN = chain_endpoint wss://entrypoint-finney.masa.ai
SUBTENSOR_CHAIN = network finney
# SUBTENSOR_CHAIN = network wss://entrypoint-finney.masa.ai
NETUID = 42
else
$(error Invalid network specified. Use NETWORK=test or NETWORK=main)
Expand All @@ -26,45 +25,45 @@ list-wallets:
btcli wallet list

overview-all:
btcli wallet overview --all --subtensor.$(SUBTENSOR_NETWORK)
btcli wallet overview --all --subtensor.$(SUBTENSOR_CHAIN)

balance-all:
btcli wallet balance --all --subtensor.$(SUBTENSOR_NETWORK)
btcli wallet balance --all --subtensor.$(SUBTENSOR_CHAIN)

list-subnets:
btcli subnets list --subtensor.$(SUBTENSOR_NETWORK)
btcli subnets list --subtensor.$(SUBTENSOR_CHAIN)

register-miner:
btcli subnet register --wallet.name miner --wallet.hotkey default --subtensor.$(SUBTENSOR_NETWORK) --netuid $(NETUID)
btcli subnet register --wallet.name miner --wallet.hotkey default --subtensor.$(SUBTENSOR_CHAIN) --netuid $(NETUID)

register-validator:
btcli subnet register --wallet.name validator --wallet.hotkey default --subtensor.$(SUBTENSOR_NETWORK) --netuid $(NETUID)
btcli subnet register --wallet.name validator --wallet.hotkey default --subtensor.$(SUBTENSOR_CHAIN) --netuid $(NETUID)

register-validator-root:
btcli root register --wallet.name validator --wallet.hotkey default --subtensor.$(SUBTENSOR_NETWORK)
btcli root register --wallet.name validator --wallet.hotkey default --subtensor.$(SUBTENSOR_CHAIN)

stake-validator:
btcli stake add --wallet.name validator --wallet.hotkey default --subtensor.$(SUBTENSOR_NETWORK) --netuid $(NETUID)
btcli stake add --wallet.name validator --wallet.hotkey default --subtensor.$(SUBTENSOR_CHAIN) --netuid $(NETUID)

boost-root:
btcli root boost --netuid $(NETUID) --increase 1 --wallet.name validator --wallet.hotkey default --subtensor.$(SUBTENSOR_NETWORK)
btcli root boost --netuid $(NETUID) --increase 1 --wallet.name validator --wallet.hotkey default --subtensor.$(SUBTENSOR_CHAIN)

set-weights:
btcli root weights --subtensor.$(SUBTENSOR_NETWORK)
btcli root weights --subtensor.$(SUBTENSOR_CHAIN)

run-miner:
@echo "Running miner on $(NETWORK)net (netuid: $(NETUID))"
python3 neurons/miner.py --netuid $(NETUID) --subtensor.$(SUBTENSOR_NETWORK) --subtensor.$(SUBTENSOR_CHAIN) --wallet.name miner --wallet.hotkey default --axon.port 8091 --neuron.debug --logging.debug --blacklist.force_validator_permit
python neurons/miner.py --netuid $(NETUID) --subtensor.$(SUBTENSOR_CHAIN) --wallet.name miner --wallet.hotkey default --axon.port 8091 --neuron.debug --logging.debug --blacklist.force_validator_permit

run-validator:
@echo "Running validator on $(NETWORK)net (netuid: $(NETUID))"
python3 neurons/validator.py --netuid $(NETUID) --subtensor.$(SUBTENSOR_NETWORK) --subtensor.$(SUBTENSOR_CHAIN) --wallet.name validator --wallet.hotkey default --axon.port 8092 --neuron.info --logging.info --neuron.axon_off
python neurons/validator.py --netuid $(NETUID) --subtensor.$(SUBTENSOR_CHAIN) --wallet.name validator --wallet.hotkey default --axon.port 8092 --neuron.info --logging.info --neuron.axon_off

hyperparameters:
btcli subnets hyperparameters --subtensor.$(SUBTENSOR_NETWORK) --netuid $(NETUID)
btcli subnets hyperparameters --subtensor.$(SUBTENSOR_CHAIN) --netuid $(NETUID)

metagraph:
btcli subnets metagraph --subtensor.$(SUBTENSOR_NETWORK) --netuid $(NETUID)
btcli subnets metagraph --subtensor.$(SUBTENSOR_CHAIN) --netuid $(NETUID)

test-miner:
pytest -s -p no:warnings tests/test_miner.py
Expand Down
4 changes: 2 additions & 2 deletions config.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@
"timeout": 10
},
"synthetic": {
"timeout": 10,
"sample_size": 5,
"timeout": 60,
"sample_size": 20,
"blocks": 1
},
"healthcheck": {
Expand Down
Loading

0 comments on commit fa4692c

Please sign in to comment.