-
Notifications
You must be signed in to change notification settings - Fork 37
parsed mds by for chatbot parser script for account.md, connecting.md, compiling_your_software.md (FOR REVIEW ONLY, NO MERGE!) #664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 36 commits
1ebc363
34df842
10edb20
85a93ec
dfff5fa
649ddec
2116d6e
159aa62
75765e5
57d9cfe
75d345b
ff7a9fc
47a33b7
7d279d6
8047572
7d1c5ed
984b0cd
8f5eeaa
2b97b7a
b595301
90c8ab7
b8ae706
b751497
0f8eb5d
9938e92
508b22c
a25ce2d
80d0535
9163a75
1dcffc1
4d7fbdb
671f7f3
e5c39bd
c6492fc
aff8198
a981002
1f3b343
b6388d3
48cad97
df58f23
c423e07
2c333fe
ce52352
5db34af
4226d28
d730a26
f3182e3
aee54de
675bec5
f1e58ef
2bf1075
a168509
09b86c9
f95b99e
06bb7b9
2f3e5b3
0c4dbe8
38c4572
5cbd653
0e6f8b2
cd77837
98eb695
27457e3
bb72287
67cb19e
cf9834a
da32459
093200b
5d0ffe9
a3e34a9
7c6154b
8d5b50d
0c10376
f8ee860
6533733
2e7a00f
f5e0579
6757b4f
6d9558d
2c7025a
ae99bb9
084b421
cf7f5f0
b7c10d3
4a441f3
662134f
05eab4a
b85a8fb
39a3c99
af9e6cc
f4163a7
833f964
79b1a56
cec154c
2f4a277
cd0c8eb
3be262a
1d32aab
5902c96
6f97d5f
6e48800
0bc440b
e6e6023
a6d99d9
2be834f
532543a
107464e
dd64381
ef3fd58
4d7db8f
df9bac5
631d9e9
c6e600d
d1c6194
695ffd6
af4832b
8805c8c
a265ffd
6c2a61c
ed08879
176af13
d4ceac8
815a863
d15469f
daa6b36
4c19f44
9a6ff58
56543f0
52a3861
692e77b
7f493a1
0e34396
fa00044
b3952b2
73072bf
3161309
7d4d7f9
3407be3
6d04bbc
f33cfb3
1c389d7
3227f19
67aed53
9e297b1
b6b8610
57a2139
170a10c
f279701
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Chatbot parser | ||
|
||
`chatbot_parser.py` is a script that transforms the markdown sourcefiles into a structured directory for a chatbot to be trained on. | ||
|
||
## Generated file structure | ||
|
||
This directory structure is written as a subdirectory of `parsed_mds`. In `parsed_mds`, two subdirectories can be found: | ||
|
||
- `generic` contains the parts of the markdown sources that were non-OS-specific | ||
- `os_specific` contains the parts of the markdown sources that were OS-specific | ||
|
||
Withing `os_specific` a further distinction is made for each of the three possible operating systems included in the documentation. | ||
|
||
These subdirectories then contain a subdirectory for each individual markdown sourcefile. In the file specific subdirectories, further divisions are made according to the titles and subtitles found in that markdown sourcefile. | ||
|
||
Finally, each of these subtitle-specific subdirectories contains a `.txt` file with the (processed) plaintext of that section and at the end a reference link to the corresponding part of the documentation website on <docs.hpc.ugent.be>. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. any chance you can also make a script (or an option) to dump the the whole structure in a json list of dicts with eg following metadata:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. perhaps also the title as separate metadata instead of part of text |
||
## Requirements | ||
|
||
- The required Python packages are listed in `requirements.txt` | ||
- [Pandoc](https://pandoc.org/installing.html) must be installed and must be added to the system PATH | ||
|
||
## Usage | ||
|
||
The script can be ran in a shell environment with the following command: | ||
|
||
```shell | ||
python chatbot_parser.py | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to force that this script runs on eg every commit?
it would be nice to see (in the PR) how any changes to the (real) md sources refelct in the bot sources.
and maybe replace the
for a chatbot to be trained on
withas input for a chatbot
. it's very uncertain ifthis will ever be used for any training/finetuningThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can set up a GitHub Actions workflow for this I think...
Not sure we can/should do that as a part of this PR though (since the workflow would need write access to the PR branch to update files)