This repository presents a hybrid approach combining Reinforcement Learning (RL) with the Tabucol, which is a version of tabu search specifically designed for the Graph Coloring Problem (GCP), enhanced by Graph Neural Networks (GNNs), to tackle the Graph Coloring Problem(GCP).
Let
The chromatic number of
self.action_space = spaces.Tuple((spaces.Discrete(self.N), spaces.Discrete(self.k)))
The default reward for per step is -1e-4. The reward consists of 2 parts, RL reward and Heuristic(Tabucol) reward.
In the RL reward, we consider two main factors, number of conflicts and number of colors used.
The reward function considering these two factors is as follows.
# 2 factors: # of conflicts and # of colors used
del_conflicts = conflicts - new_conflicts
del_colors_used = self.n_colors_used - n_colors_used
# reward = -lambda*(f(s)-f(s')) + mu*(C(s)-C(s'))
# We excluded color factor
_lambda = 0.01
_mu = 0
self.cur_reward = -_lambda * del_conflicts + _mu * del_colors_used
In the Heuristic(Tabucol) reward, it's calculated by the difference between the number of conflicts after
The final reward considering these 2 rewards is as follows.
The Q-Network processes these combined features through multiple layers, including GNN layers for message passing and feature aggregation, followed by fully connected layers. The final output layer produces Q-values corresponding to each (node, color) pair, representing the expected future rewards of selecting that specific action in the current state.
The node and color features are concatenated to form comprehensive (node, color) pair representations. These combined features are then passed through multiple GNN layers, particularly Graph Convolutional Network (GCN) layers, within our Q-Network. In these GCN layers, message passing is performed to aggregate and update feature representations based on the graph's connectivity, effectively capturing both local and global structural information. This process results in an updated feature graph that integrates the nuanced interactions between nodes and colors.
Comparison of Chromatic Numbers produced by our model, RLTB, GNN, and two other heuristic algorithms on easy instances of DIMACS graphs
RLTB, consistently outperformed existing heuristic algorithms such as Tabucol and a greedy algorithm, as well as GNN methods on the easier DIMACS graph instances. RLTB was able to determine the chromatic number for all easy DIMACS graphs, except the queen11_11 graph.
Overall, RLTB has demonstrated strong and consistent performance, particularly on smaller and mediumsized graphs. However, as the order and size of the graphs increase, the accuracy of RLTB tends to decrease
Our project is primarily developed in Python, with the heuristic algorithm for Tabucol implemented in C++.
The reinforcement learning framework is built using CleanRL and extensively utilizes PyTorch.
Seok-Jin Kwon / IyLias