This repo is for testing some ideas on storing historical information and doing analytics on citybikes data.
$ pip install -r requirements.txt
First, let's fire up a producer for bike share information, by using citybikes hyper publisher.
$ hyper publisher
The producer will open a tcp socket at port 5555, so we can connect our data collector to
$ python collect.py
This will create a citybikes.db
sqlite file. Run aggregated queries like
$ sqlite3 stats.db << EOF
SELECT count(*) as activity, network_tag
FROM stats
GROUP BY entity_id
ORDER BY activity DESC
LIMIT 25
EOF
$ sqlite3 stats.db << EOF
SELECT count(*) as activity, entity_id, latitude, longitude, network_tag
FROM stats
GROUP BY entity_id
ORDER BY activity DESC
LIMIT 25
EOF
$ sqlite3 stats.db << EOF
SELECT count(*) as activity, entity_id, latitude, longitude, network_tag
FROM stats
WHERE network_tag = 'bicing'
GROUP BY entity_id
ORDER BY activity DESC
LIMIT 25
EOF
Exporting data requires duckdb installed on the system.
$ bash export.sh quack stats.db stats.duck
$ bash export.sh parquet stats.duck --from 2024-11-01 --to 2024-11-15 --network bicing > bicing.parquet
$ bash export.sh csv stats.duck --from 2024-11-01 --to 2024-11-15 --network bicing > bicing.csv
$ bash export.sh parquet stats.duck -o world.parquet
Use the plot script to visualize information. This script requires duckdb and a python environment with matplotlib.
duckdb -s "COPY(\
select tag, nuid, name, bikes, extra.ebikes, bikes::int-extra.ebikes::int as normal, free, timestamp \
from read_parquet('cb.parquet') \
where tag='bicing' and nuid='100' \
) TO '/dev/stdout' WITH (FORMAT 'csv', HEADER)" | python plot.py -s 5min - -p Blues
duckdb -s "COPY(\
select tag, nuid, name, bikes, extra.ebikes, bikes::int-extra.ebikes::int as normal, free, timestamp \
from read_parquet('cb.parquet') \
where tag='citi-bike-nyc' and nuid='66ddbd20-0aca-11e7-82f6-3863bb44ef7c' \
) TO '/dev/stdout' WITH (FORMAT 'csv', HEADER)" | python plot.py -s 5min -
duckdb -s "COPY(\
select tag, nuid, name, bikes, extra.ebikes, bikes::int-extra.ebikes::int as normal, free, timestamp \
from read_parquet('cb.parquet') \
where tag='bicing' \
) TO '/dev/stdout' WITH (FORMAT 'csv', HEADER)" | python plot.py -s 5min -