-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathappendix-supervision1.qmd
92 lines (63 loc) · 3.77 KB
/
appendix-supervision1.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
title: "Fortnightly Update - 10 June 2024"
---
## Summary
This week has been about getting back on track after an extended period away from the dissertation. It has mainly revolved around refocusing, rescoping, reacquainting and rekindling motivation.
## Accomplishments
* **Results:**
* [x] recreated single student graph timetable - see [poc-1-basic](/model/poc1-basic-phl/poc-1-basic.qmd)
* [x] created SQL to csv pipeline
* [x] created cypher queries
* **Project Management**
* [x] new github repo (currently private) <https://github.com/zoonalink/graph-project>
* [x] draft readme
* [x] folder/file structure
* [x] weekly update template
* **Data Collection:**
* [x] SQL to CSV files for single student
* [x] raw SQL data tables
* **Analysis / Wrangling:**
* [x] Transformation in SQL query
* **Model Development:**
* **Investigation**
* [x] pipelines to automate data flow - prefect
* [x] how to represent time in graph
## Issues/Blockers
* **Technical:**
* Original servers were deleted and access to new servers was gone
* Some original work no longer works
* **Methodological:**
* I need to ensure that scope does not creep.
* I need to stay focused and not start solving problems which are out of scope or are interesting.
* **Data-Related:**
* develop a robust anonymisation process
* representing time - see [representing time](/model/poc2-time/representing-time.qmd)
## Next Steps
### Weekly Goals
1. Bigger cohort of data (MSc DataScience?)
2. Explore time in Graph
3. Data pipeline documentation and steps (anonymisation)
* **Project Management** Project tasks - planning, admin.
* [ ] add supervisor to github repo
* **Data:**
* [ ] Anonymisation or generation
* [ ] MSc Data Science cohort isolated into separate csv files
* [ ] splitting data into single rows
* [ ] Pipeline steps plotted
* **Analysis / Wrangling:**
* [ ] More advanced cypher queries
* **Modeling:**
* [ ] Comparing different representations of time
* **Validation:**
* **Deadlines:**
## Post-Meeting Notes
We met on 13 June 2024 for approximately 45 minutes. I showed Xiaodong what I have been up to referencing my working files, poc files, and some code. I also showed what I currently have in my free instance of neo4j aura - which was MSc DS students activities and rooms. Staff and Students to be loaded but I want to ensure my anonymisation function is tested first.
We discussed what my aims are with this project, clarifying that it is not building a timetable schedule based on graph structure; it is also not a timetable optimiser. Instead it is a data engineering project that ultimately explores whether representing timetables in graph format can bring opportunities for reporting and insight which is currently difficult to produce using the current system.
We agreed to meet in two weeks' time. I will schedule a meeting for us.
## Additional Notes
**Summary from intial meeting**
I met with Xiaodong Li my supervisor on Wednesday 01 May 2024 where we introduced ourselves and our backgrounds. We discussed my proposal which Xiaodong had read in advance of our meeting.
I showed Xiaodong my work so far, most of it completed in January which included proof-of-concept data engineering steps to extract, transform and load data from a relational database to a graph structure. I showed the steps I had taken, the challenges encountered and the possible opportunities of graph data structures which I am hoping to explore in more detail.
We discussed project scope and outcomes and recognised that I need to ensure that I keep within scope.
The next few weeks will require attention on other matters (work, taught modules, etc.) but I will look to pick up project work soon.
If possible, Xiaodong was going to investigate Neo4j and graph to get a bit of context.