fix: move project over to veracity repo

veracity · Nov 16, 2022 · 4abba67 · 4abba67
1 parent 02b6956
commit 4abba67
Show file tree

Hide file tree

Showing 12 changed files with 1,446 additions and 1 deletion.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,131 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+/.vscode
diff --git a/README.md b/README.md
@@ -1 +1,57 @@
-# KnowledgeGraphGenerator-
+# knowledge Graph Generator
+
+Knowledge Graph Generator - A way to create a node graph to visualize connections between data. The program is written in python 3.9.2 and mainly uses pandas for data management. It has a simple UI interface (currently with some bugs) that lets you define data, nodes, and edges and also generate the files. The program itself does not visualize the data, but generate the necessary files to use a visulization tool like [Gephi](https://gephi.org/).
+
+## Running the program
+
+You have two options to run the program. Either run the executable in releases, or with python.
+
+### Running with python
+
+First you need to install the required libraries. To do this, run `python -m pip install -r requirements.txt`. After that you can run the program with `python main.py`.
+
+## How it works
+
+The program is sectioned into 5 sections:
+* [Data](#data)
+* [Data settings](#data-settings)
+* [Node settings](#node-settings)
+* [Edge settings](#edge-settings)
+* [Generate Graph](#generate-graph)
+
+### Data
+
+In this section it lets you define what files you want to import. It will let you import multiple files, but in the current version it only uses the first one. If a file is removed from the data section, it will not remove all the connected nodes and edges to that file in the current version.
+
+So a rule of thumb: Only use 1 file (as of the beta version)
+
+### Data settings
+
+In this section it lets you define what the column types are. To sets the data columns, just click the file in the datapane on the left. The settings menu is a little buggy, so you might need to click the datafile a few times before the column settings want to behave properly. There are only a couple of options as of right now: *integer*, *float*, *string*, and *boolean*. All columns are read as strings when the program launches, so make sure to change the column types before you get data from it if you want to do more data processing on it later.
+
+### Node settings
+
+In this section it lets you define what nodes you want. Select the datafile you want in the pane on the left, and the nodes you can add shows up on the bottom. The select node pane is a list of your columns that you can set as nodes. When you check a column of as a node, it appears in the node pane on the left.
+
+### Edge settings
+
+In this section it lets you define edges from the nodes you have added. This menu is also a little buggy, so you might need to click a node a few times before the menu behaves properly. In the menu under the node pane to the left, and edge pane to the right, you see your selected node on the bottom, and a box to select other nodes on the right. Under you can select if you want the edge to be directional or not.
+
+### Generate Graph
+
+In this section you define the output files for the program. Here you set the path for the nodeFile and edgeFile. A warning about setting the node and edge file is that it will reset the file to 0 bytes, so be sure not to overwrite any files you want. To generate the files after setting the path for the node and edge path, just hit the *Generate Graph* button. This make take some time depending on the size of the data and the amount of nodes and edges, but usually finishes in under 15 seconds.
+
+## Known Bugs
+
+These are the known bugs:
+* Settings menues not acting correctly before they have been clicked a few times.
+* Edges are only removed visually, but are not actually removed.
+* Nodes are only removed visually, but are not actually removed.
+
+## Plan moving forward
+
+The plan forward is to continue to fix and develop the frontend, and hopefully switch over to [eel](https://github.com/python-eel/Eel) - a python library to use HTML and JS as GUI for apps - and add more options for having metadata in the nodes and edges.
+
+## How to contribute
+
+If you find something you want to change, please feel free to create a pull request with the changes you have created. If you do not have the time to implement the changes yourself, you can add it as an issue such that we can add it to the development plan.
diff --git a/data/data.py b/data/data.py
@@ -0,0 +1,73 @@
+import pandas as pd
+
+class Data:
+    __path: str
+    __name: str
+    __type: str
+    __df: pd.DataFrame
+    __loaded: bool
+    __error: bool
+
+    def __init__(self, path: str) -> None:
+        self.__path = path
+        self.__name = path.split('/')[-1].split(".")[0]
+        self.__type = path.split('/')[-1].split(".")[1]
+        self.__loaded = False
+        self.__error = False
+        pass
+
+    @property
+    def path(self) -> str:
+        return self.__path
+
+    @property
+    def name(self) -> str:
+        return self.__name
+
+    @property
+    def type(self) -> str:
+        return self.__type
+
+    @property
+    def df(self) -> pd.DataFrame:
+        return self.__df
+
+    @property
+    def loaded(self) -> bool:
+        return self.__loaded
+
+    @property
+    def error(self) -> bool:
+        return self.__error
+
+    def __str__(self) -> str:
+        return f"path: {self.__path}, name: {self.__name}, type: {self.__type}"
+
+    def __repr__(self) -> str:
+        return self.__str__()
+
+    def __eq__(self, __o: object) -> bool:
+        if not isinstance(__o, Data): return False
+        if (self.__path != __o.path): return False
+        return True
+
+    def __hash__(self) -> int:
+        return hash(self.__path)
+
+    def loadData(self) -> None:
+        self.__error = False
+        if self.__type == "csv":
+            try:
+                self.__df = pd.read_csv(self.__path, delimiter=';', decimal=',', dtype='string')
+                if (self.__df.shape[1] == 1):
+                    self.__df = pd.read_csv(self.__path, delimiter=',', decimal='.', dtype='string')
+            except Exception as e:
+                print(e)
+                print("could not load")
+                self.__error = True
+        elif self.__type == "xlsx":
+            self.__df = pd.read_excel(self.__path)
+            pass
+        elif self.__type == "json":
+            self.__df = pd.read_json(self.__path)
+        return
diff --git a/data/dataManager.py b/data/dataManager.py
@@ -0,0 +1,88 @@
+from typing import Set, Union
+from data.data import Data
+from data.edge import Edge
+from data.node import Node
+from data.edgeDef import EdgeDef
+from data.nodeDef import NodeDef
+
+import pandas as pd
+
+class DataManager:
+    __nodeDefs: Set[NodeDef]
+    __edgeDefs: Set[EdgeDef]
+    __nodes: Set[Node]
+    __edges: Set[Edge]
+    __data: Set[Data]
+
+    def __init__(self) -> None:
+        self.__nodeDefs = set()
+        self.__edgeDefs = set()
+        self.__nodes = set()
+        self.__edges = set()
+        self.__data = set()
+        return
+
+    @property
+    def data(self) -> Set[Data]:
+        return self.__data
+
+    @property
+    def nodeDefs(self) -> Set[NodeDef]:
+        return self.__nodeDefs
+
+    @property
+    def edgeDefs(self) -> Set[EdgeDef]:
+        return self.__edgeDefs
+
+    def addData(self, data: Data) -> None:
+        self.__data.add(data)
+        return
+
+    def findData(self, path: str, name: str, type: str) -> Union[Data, None]:
+        for d in self.__data:
+            if d.name == name and d.path == path and d.type == type:
+                return d
+
+    def removeData(self, data: Data) -> None:
+        self.__data.remove(data)
+        return
+
+    def addNodeDef(self, d: NodeDef) -> None:
+        self.__nodeDefs.add(d)
+        return
+
+    def removeNodeDef(self, d: NodeDef) -> None:
+        self.__nodeDefs.remove(d)
+        return
+
+    def findNodeDef(self, field: str) -> Union[NodeDef, None]:
+        for n in self.__nodeDefs:
+            print(f"field: {n.field}, inField: {field}")
+            if n.field == field:
+                return n 
+        return None
+
+    def addEdgeDef(self, d: EdgeDef) -> None:
+        self.__edgeDefs.add(d)
+        return
+
+    def removeEdgeDef(self, d: EdgeDef) -> None:
+        self.__edgeDefs.remove(d)
+        return
+
+    def generateData(self) -> None:
+        [n.createNodes(list(self.__data)[0]) for n in self.__nodeDefs]
+        [e.createEdges(list(self.__data)[0]) for e in self.__edgeDefs]
+        return
+
+    def generateNodeFile(self, path: str) -> None:
+        [self.__nodes.update(d.nodes) for d in self.__nodeDefs]
+        df = pd.DataFrame.from_records([n.as_dict for n in self.__nodes])
+        df.to_csv(path, index=False, sep=';', decimal='.')
+        return
+
+    def generateEdgeFile(self, path: str) -> None:
+        [self.__edges.update(d.edges) for d in self.__edgeDefs]
+        df = pd.DataFrame.from_records([e.as_dict for e in self.__edges])
+        df.to_csv(path, index=False, sep=';', decimal='.')
+        return