A node.js based microservice aimed to serve danbooru2019 dataset over API, with batch creation and training and verification data splitting.
docker run -d -v metadbpath:/db/metadata.sqlite3 -p 3939:8000 r1cebank/danbooru-meta-api
You can generate them from the metadata repo https://github.com/r1cebank/danbooru-meta
Get the stat of the current metadata
Response
{
"num_posts": 3708763,
"num_ratings": 3,
"num_tags": 400147
}
Get detailed info for a tag
Response
{
"category": 0,
"id": 566835,
"name": "multiple_girls"
}
Get random posts, from start to end with size
Response
{
"count": 64,
"result": [
{
"file_ext": "png",
"file_size": 810555,
"height": 1000,
"id": 2431883,
"location": "717/1574717.png",
"md5": "bbd9f630aa2b72d909960633ccd94951",
"pixiv_id": 40421351,
"post_id": 1574717,
"rating": "s",
"source": "http://i1.pixiv.net/img11/img/hentay-shrimp/40421351_big_p0.png",
"tags": [
1821,
13200,
6059,
]
}...
]
}
Get regular batches, basic pagination
Response
{
"count": 64,
"result": [
{
"file_ext": "png",
"file_size": 810555,
"height": 1000,
"id": 2431883,
"location": "717/1574717.png",
"md5": "bbd9f630aa2b72d909960633ccd94951",
"pixiv_id": 40421351,
"post_id": 1574717,
"rating": "s",
"source": "http://i1.pixiv.net/img11/img/hentay-shrimp/40421351_big_p0.png",
"tags": [
1821,
13200,
6059,
]
}...
]
}
Create a batch
Eample Body
{
"batch_size": 64, // Size of each batch
"validation_split": 10, // Validation data percentage split
"test_split": 10, // Test data percentage split
"seed": 123131231 // Seed used to see rng during batch creating, creates reproducible results
}
Response
{
"id": "df31e073520a47ffbeaec9d7ad74ebb2"
}
List all the batches in the system
Response
{
"id": [
"9ff686fcdce64aeb8d207f31abef0b0b",
"e771b3c36a3b499bb91471bf9a176364"
]
}
Get a specific training batch for batch id
Response
{
"count": 64,
"result": [
{
"file_ext": "png",
"file_size": 810555,
"height": 1000,
"id": 2431883,
"location": "717/1574717.png",
"md5": "bbd9f630aa2b72d909960633ccd94951",
"pixiv_id": 40421351,
"post_id": 1574717,
"rating": "s",
"source": "http://i1.pixiv.net/img11/img/hentay-shrimp/40421351_big_p0.png",
"tags": [
1821,
13200,
6059,
]
}...
]
}
Get a specific test batch for batch id
Response
{
"count": 64,
"result": [
{
"file_ext": "png",
"file_size": 810555,
"height": 1000,
"id": 2431883,
"location": "717/1574717.png",
"md5": "bbd9f630aa2b72d909960633ccd94951",
"pixiv_id": 40421351,
"post_id": 1574717,
"rating": "s",
"source": "http://i1.pixiv.net/img11/img/hentay-shrimp/40421351_big_p0.png",
"tags": [
1821,
13200,
6059,
]
}...
]
}
Get a specific validation batch for batch id
Response
{
"count": 64,
"result": [
{
"file_ext": "png",
"file_size": 810555,
"height": 1000,
"id": 2431883,
"location": "717/1574717.png",
"md5": "bbd9f630aa2b72d909960633ccd94951",
"pixiv_id": 40421351,
"post_id": 1574717,
"rating": "s",
"source": "http://i1.pixiv.net/img11/img/hentay-shrimp/40421351_big_p0.png",
"tags": [
1821,
13200,
6059,
]
}...
]
}
Get info about a specific batch id
Response
{
"test_batches": 2897,
"total_batches": 28974,
"train_batches": 23180,
"validation_batches": 2897
}
When requesting POST /posts/batch, app will request stat from the metadata database and grab the number of posts available. Then it ran a bunch of simple partition logic
- Posts ids are broken into partition (start, end) tuples, number of partitions = total_posts / batch_size * 2
- Random ids are chosen from each partition, this gives us batches with post ids
- The batches is then shuffled
- Based on the verification and test split, batch is picked from the list of batches and set to each category
- Final result stored inside server memory hashmap, an id that points to the result is returned to user.
You can provide a seed with each random request to make the randomization predictable, this is useful to reproduce results and make sure same batch is always created.
Because we are using sqlite for the data backend (mainly for the portability), we are using executing many query with each api call, some might call this inefficient but with SQLite it is very efficient. As shown here https://www.sqlite.org/np1queryprob.html