-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathunderstanding_results.tex
194 lines (147 loc) · 11.3 KB
/
understanding_results.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
\section{Understanding Results}
ks.py optionally produces five different outputs. They are:
\begin{enumerate}
\item Bucket Convergence History
\item Feature Kernel Sub problem Status
\item Solution File
\item Feature Kernel Cache File
\item Console Log
\end{enumerate}
\subsubsection{Bucket Convergence History}
This output logs the convergence history of the bucket iteration. It is configured using the parameter
\textbf{DEBUG}, see Table \ref{tab:params}. When this parameter is enabled will log any successful bucket iteration into a CSV file containing the following
values:
\begin{table}[h]
\centering
\caption{Debug File Parameters}
\begin{tabular}{|l|l|l|}
\hline
Value & Description & Type \\
\hline
\hline
bucket & current bucket id & Integer\\
\hline
iteration & current iteration & Integer\\
\hline
value & objective function & Variable\footnotemark\\
\hline
time & required time, in seconds & Integer \\
\hline
nodes & number of explored nodes & Integer\\
\hline
kernel\_size & number of variable in the kernel & Integer\\
\hline
bucket\_size & number of variable in the current bucket & Integer \\
\hline
\end{tabular}
\end{table}
\footnotetext{Depends on the specific instance}
This file contains various useful clues that will help to understand if the current configuration is good or not. For example is possible to view how the
objective value varies, with the buckets and the time. This should help to address some issues in the bucket structure. Also the number of explored nodes, especially
if correlated with the required time can show if the configuration is fine or not. Those data can be collected with data from other executions
to understand if the issue in finding a good solution comes from a bad configuration or if the Kernel Search itself is not the right method to solve it.
Some issues that can be found, and addressed using this data are:
\begin{itemize}
\item Bucket to small: in this case the objective value does not change by a relevant amount for a lot of iterations.
\item Bucket to large: in this case the required time for each single problem is to large and the number of explored nodes is very high
\item Mip Gap to small: long execution time and small improvements in the objective function can also mean that the Mip gap is to small. Try to increase it and let
be Kernel Search to improve this value not Gurobi
\item Too many iterations: it is possible that after a certain point the value of the objective function is stuck to a certain value, this means the ks.py reached a
local minima and increasing the number of iteration cannot lead to a better solution.
\item Bucket method is wrong: it is possible that many sub problems are infeasible (they are not shown into the list), in this case is possible that a different
bucket or kernel builder can produce better results
\end{itemize}
This are only some of the possible usage of this output data. Depending on the specific instance and values of this data many other information can be gained, especially
when considering also other outputs.
\subsubsection{Feature Kernel Sub problem Status}
This output log the final status of each random sub problem. It is configured using the parameter \textbf{LOG\_FILE}, see Table \ref{tab:params}. When this output
is enabled it will produce a CSV file made of this fields:
\begin{table}[h]
\centering
\caption{Feature Kernel Log Fields}
\begin{tabular}{|l|l|l|}
\hline
Field & Description & Type \\
\hline
\hline
Iteration & Iteration Index & Integer \\
\hline
Problem Size & Variables in current sub problem & Integer \\
\hline
Status & Status of the sub problem & Integer (see \ref{tab:fk-status}) \\
\hline
\end{tabular}
\end{table}
\begin{table}[h]
\centering
\caption{Feature Kernel Status}
\begin{tabular}{|l|l|}
\hline
Value & Description \\
\hline
\hline
& problem infeasible \\
\hline
0 & relaxation feasible, integer infeasible \\
\hline
1 & integer feasible \\
\hline
2 & time out \\
\hline
\end{tabular}
\label{tab:fk-status}
\end{table}
Table \ref{tab:fk-status} show the meaning of the various final statuses. This information combined with the size of the
sub problem can provide big hints into the problem. It is possible to understand an approximated threshold of both continuous
and integer feasibility. This value is partially influenced by the timeout values and by the randomness of the method, but if
the number of random sub problems is high enough conclusions from this method can be considered quite correct.
It is possible to define the Continuous Feasibility Threshold (CFT) as the average value of the size of sub problems feasible in the continuous but
not in the integer domain divided by the size of the whole model. Analogously it is possible to define Integer Feasibility Threshold (IFT)
as the average value of the size of sub problems feasible in the integer domain but domain divided by the size of the whole model. Those two values
can provide a good idea about the problem. For example a very high CFT and IFT means that the sub problems hardly lead to feasible solution, so only a
sub problem very close to the whole problem can be feasible. In this case probably Kernel Search is not the right method and is very unlikely that
changing some parameters can lead to improvements. On the other side if CFT and IFT are small that means that for this particular problem Kernel Search
can provide a good method to find in a quick time a good solution.
It is possible that none of the sub problems results to be continuous or integer feasible (sometimes both) in this case the relative index is simply
undefined.
You should also check for \emph{time out} statuses. This means that the given time is not enough and that should be increased. You should start by setting
\emph{min} and \emph{max} to some standard values like 10 seconds and 30 seconds then start adjusting those values accordingly to the final status.
The general rule of the thumb is to increase the \emph{max} time if many problems and with \emph{timeout}, then when the difference between \emph{max} and \emph{min}
becomes larger then the number of iteration it is possible to increase \emph{min} time. If you start going to high
probably Feature Kernel is not the right method.
The initial count should be set to a value that is not too large nor too small, like 50. Once you start getting some useful results, like a uniform enough distribution
of the various statuses, you can consider to increase this number in order to create a large pool of sub problems and create a meaningful initial kernel.
\subsubsection{Solution File}
This output generate a \href{https://en.wikipedia.org/wiki/Sol_(format)}{SOL} file, an open format used to describe the solution of mathematical programming
problem. This file is automatically used by ks.py to load a previous solution on the first iteration of successive run, if present. This file can also be
used to export and use the final solution.
\subsubsection{Feature Kernel Cache File}
This output generate a cache file for the Feature Kernel sub problems. In some cases FK is may take a very long time. This is not an issue by itself, but
when the bucket configuration happens to be wrong (this happens more often then not) recomputing the some initial kernel over and over becomes annoying and
time wasteful. To address this issue you can set the cache file and once the initial kernel is satisfying you can (without removing the cache file or touching
this parameter) set \textbf{COUNT} to zero. This will create zero sub problem and simply load the previously found and then build the initial kernel from there.
Because those sub problems are randomly generated a configuration cannot be better than the other so
the cache file accumulates any generated model, so if you run FK six times with \textbf{COUNT} set to fifty
the cache file will contain three hundred sub problems. This trick allows to create the initial kernel once, without the need to recompute it for each different
bucket configuration.
\subsubsection{Console Log}
This output is mainly composed of the Gurobi logs, and some ks.py error messages. At the current time those data can only be accessed as screen log and
the only way to store them all is by piping the stdout into a file. This must be done by the parent program.
The Gurobi log contains some of the most useful information. In fact this log contains a detailed convergence history for each Sub problem, both during
the initial kernel generation and during the bucket iterations. A number of very useful information can be obtained looking at this logs:
\begin{itemize}
\item The MIP Gap is too large: the problem reaches a good solution quickly but then it remains stuck and requires a long time to improve the solution
\item The Buckets are misconfigured: this happens when:
\begin{itemize}
\item it requires a long time to find that this sub problem is infeasible
\item it takes a long time to solve a sub problem
\item the solution slowly converges
\end{itemize}
\item Time Limit is to small: some problem cannot be solved because the time limit is too small for this specific instance
\end{itemize}
\subsubsection{Final Guidelines on Outputs}
Except for \emph{solution} and \emph{cache} file that are used to export a result, the other outputs are useful only to guide you to find a
better configuration and to understand if the instance is suitable for Kernel Search or not. In general it is a good idea to keep any output
enabled and take a look at all of them to better understand how to proceed.
My final advice is to start solving some \emph{easy} benchmark instance before solving your real problems, this will give you a bit of experience
in understanding the results from ks.py.