This repository has been archived by the owner on Oct 30, 2023. It is now read-only.
forked from ianknowles/dgg-data
-
Notifications
You must be signed in to change notification settings - Fork 0
Data pipeline overview
ianknowles edited this page Feb 2, 2022
·
2 revisions
The collection target queue is setup and the collection started.
session = FacebookCollection(date_stamp)
session.create_target_queue()
session.collect()
The calls to predict here begin the analysis pipeline stages. The analysis runs locally and the result is uploaded to the S3 bucket.
mau_key = r_analysis_wrapper.predict(date_stamp, 'mau')
r_analysis_wrapper.predict(date_stamp, 'dau')
Once the analysis is completed successfully the index file is updated, this is read by the website frontend to provide the list of available analyses.
# model index
s3_bucket = S3Bucket()
index = ModelIndexFile(s3_bucket)
index.add_latest(date_stamp, mau_key)