This project uses a statistical approach to solve a problem in mathematics.
To give a short description of the problem, we have a directed network, and every second, three rules are applied:
- Every message already on the network sends a copy of itself to all available neighbors of their current node,
- Some number of additional messages are injected into the network, and
- Any messages sent/injected to the same node collide, and those copies are deleted from the network.
The goal of this project is to deliver a conjecture about what structural properties of a network influence the survival time of messages on that network, and to support this conjecture with statistical evidence. I believe that this can be achieved by applying a branch of statistics known as survival analysis.
The intended audience of this project is twofold:
- Employers who want to see a demonstration of skills relevant to data science, and
- Mathematicians, especially my former collaborators, who may want to understand this work.
This is a challenging project, and when it is complete, is will demonstrate several skills:
- Experience with Python libraries used in data science, namely numpy and scikit.learn,
- The ability to handle network data,
- Study design and the generation of synthetic data,
- Survival analysis,
- Research skills, and
- Persistence, in that I have continually come back to this project despite getting stuck many times.