-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.txt
113 lines (89 loc) · 3.02 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
- AUTHORS
* Anna Delgado Tejedor (s162178)
* Balbina Virgili Rocosa (s181866)
* Valentijn Floris Broeken (s172092)
- IMPLEMENTATION
All of our code has been developed in Python3 except the obtainment of the BLASTp results to verify the reliability of the implemented method, which have been developed with Bash.
- PROJECT STRUCTURE
.
|
├── Report.pdf
├── README.txt
│
└── Code
├── benckmarking.py
├── benckmarking_permutations.py
├── lsh.py
├── main.py
├── ProteinsManager.py
├── ResultsDB.py
├── UniprotDB.py
├── uniProtein.py
│
├── Uniprot_DB.sqlite
├── Results_DB.sqlite
│
├── lsh.pickle
├── minhashes.pickle
│
├── Ecolx.xml
├── PseA7.xml
├── uniprot_sprot.xml
│
├── all_results_nofilter.txt
└── query_match_identity_alignmentLength.txt
To execute the main functionalities of our developed code, the following command must be executed.
$ python3 main.py
With it, the developed interactive command line program will be executed. The main functionalities that can be used are the following ones:
1-. Load Database or L
2-. Delete Database or D
3-. Calculate LSH or C
4-. Recalculate LSH or RC
5-. Query LSH or Q
6-. Query All LSH or A
7-. Read BLAST or B
8-. Compare Results or R
9-. Save LSH or S
10-. Load LSH or LL
11-. Exit or X
We consider that most of them are very self-described but if you want to read more details about the functionalities of our program you can read our written report.
The most useful happy-paths for our functionalities are the following ones:
A: Without precomputed results
> Delete Database
> Load Database
> Ecolx.xml
> Load Database
> PseA7.xml
> Calculate LSH
> Save LSH
> 0
> Query LSH
> Q9EXN6
> Read BLAST Results
> all_results_nofilter.txt
> Compare results
> Exit
B: With precomputed database
> Calculate LSH
> Save LSH
> 0
> Query LSH
> Q9EXN6
> Compare Results
> Exit
C: With precomputed results [RECOMENDED FOR TESTING]
> Load LSH
> 0
> Query LSH
> Q9EXN6
> Compare Results
> Exit
To be able to execute the benchmarking developed, the following commands must be executed.
$ python3 benchmarking.py
$ python3 benchmarking_permutations.py
** WARNING **
We HAVE NOT included the entire SwissProt xml file with all the proteins as it has a weight of more than 6GB. However, one can find it here:
https://www.uniprot.org/uniprot/
To make things easier, we have already included on the uploaded files the pre-loaded databases with the information of all proteins that we have used on our experiments. These files are Uniprot_DB.sqlite and Results_DB.sqlite.
Additionally, we have also saved the already calculated LSH results for all the proteins saved on theses databases.
So you are able to directly execute the C path of functionalities to quickly see the results obtained.