Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up get_datasite_states #528

Merged
merged 12 commits into from
Feb 4, 2025
Merged

speed up get_datasite_states #528

merged 12 commits into from
Feb 4, 2025

Conversation

abyesilyurt
Copy link
Contributor

@abyesilyurt abyesilyurt commented Jan 28, 2025

Preliminary work on identifying bottlenecks in syncing on client and server side.

benchmark hash_dir

from pathlib import Path

from syftbox.lib.hash import hash_dir
from syftbox.lib.profiling import pyspy

path = Path("/Users/azizwork/SyftBox/datasites")


@pyspy()
def profile_hash_dir():
    for _ in range(1):
        hash_dir(path, root_dir=path)


profile_hash_dir()

benchmark get_datasite_states

from collections import defaultdict
from fastapi import Request
import httpx
from loguru import logger
from syftbox.client.server_client import SyftBoxClient

import sqlite3
import traceback
from pathlib import Path
from typing import Dict, List

from syftbox.lib.profiling import pyspy
from syftbox.server.api.v1.sync_router import dir_state, get_db_connection, get_file_store
from syftbox.server.db.db import get_all_datasites
from syftbox.server.db.file_store import FileStore
from syftbox.server.models.sync_models import FileMetadata, RelativePath
from syftbox.server.settings import ServerSettings, get_server_settings


def get_datasite_states(
    conn: sqlite3.Connection,
    file_store: FileStore,
    server_settings: ServerSettings,
    email: str,
) -> Dict[str, List[FileMetadata]]:
    # for datasite in all_datasites:
    #     try:
    file_metadata = file_store.list_for_user(None, email)
    # except Exception as e:
    #     logger.error(f"Failed to get dir state for {datasite}: {e} {traceback.format_exc()}")
    #     continue
    # datasite_states[datasite] = datasite_state

    # dict of datasite -> list of files
    datasite_states = defaultdict(list)
    for metadata in file_metadata:
        user_email = metadata.path.root
        datasite_states[user_email].append(metadata)

    datasite_states = dict(datasite_states)
    return datasite_states


@pyspy()
def main_loop():
    for i in range(1000):
        main()


def main():
    # Setup dependencies
    request = Request(scope={"type": "http", "method": "POST", "path": "/sync/datasite_states"})
    request.state.server_settings = ServerSettings(data_folder=Path("/Users/azizwork/Workspace/syft/server_data"))
    email = "aziz@openmined.org"  # Replace with actual email

    # Run function
    conn = get_db_connection(request).__next__()
    file_store = get_file_store(request).__next__()
    server_settings = get_server_settings(request)
    states = get_datasite_states(conn, file_store, server_settings, email)


if __name__ == "__main__":
    main_loop()

Copy link
Contributor Author

@abyesilyurt abyesilyurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some changes needs to be reverted before merge.

syftbox/client/base.py Outdated Show resolved Hide resolved
syftbox/client/plugins/sync/manager.py Outdated Show resolved Hide resolved
syftbox/server/users/auth.py Outdated Show resolved Hide resolved
syftbox/server/users/auth.py Outdated Show resolved Hide resolved
@abyesilyurt abyesilyurt changed the title WIP: benchmark sync WIP: speed up get_datasite_states Jan 30, 2025
@abyesilyurt abyesilyurt changed the title WIP: speed up get_datasite_states speed up get_datasite_states Feb 4, 2025
@abyesilyurt abyesilyurt marked this pull request as ready for review February 4, 2025 11:50
@abyesilyurt abyesilyurt merged commit a35fddc into main Feb 4, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants