Presented to the Graduate Program of Computer Science and Data Analytics of the School of Information Technology and Engineering
A full understanding of the local Azerbaijani web space is necessary to analyze information flow patterns and influences in the local network and review the dependency of Azerbaijan on external sources in case of cyber-attacks or national emergencies. To develop this knowledge and to create efficiency in local data collection processes, a web crawler with a subsequent graphical analysis is a must. The goal of this research is to create a big graph of Azerbaijani web, analyze its linkages and most influential nodes. This study aims to develop a catalog of local websites, create a web crawler to browse each web page and outgoing links, construct a graph-based visualization with valuable information and apply a ranking algorithm to measure the influence scores. A multiprocessing program in Golang is developed to crawl the database of local webpages supplied by the Ministry of Communication & Information Technologies. The program consists of a master, multiple concurrent workers, and a Postgres database. The constructed graph consists of nodes representing web pages, and edges which are connections in-between. A page ranking algorithm is implemented to measure the importance of nodes. The observations are such that the graph is not too strongly connected, and governmental web pages are the most linked ones due to redirections to various services.
Keywords: web crawling, graph theory, big data, page ranking, multiprocessing