 Research
 Open Access
 Published:
Solving the kdominating set problem on very largescale networks
Computational Social Networks volume 7, Article number: 4 (2020)
Abstract
The wellknown minimum dominating set problem (MDSP) aims to construct the minimumsize subset of vertices in a graph such that every other vertex has at least one neighbor in the subset. In this article, we study a general version of the problem that extends the neighborhood relationship: two vertices are called neighbors of each other if there exists a path through no more than k edges between them. The problem called “minimum kdominating set problem” (MkDSP) becomes the classical dominating set problem if k is 1 and has important applications in monitoring largescale social networks. We propose an efficient heuristic algorithm that can handle realworld instances with up to 17 million vertices and 33 million edges. This is the first time such large graphs are solved for the minimum kdominating set problem.
Introduction
Problem context and definition
The wellknown minimum dominating set problem (MDSP) deals with determining the smallest dominating set of a given graph \(G=(V, E)\). The dominating set is a subset of the vertex set V such that each vertex in V is a member of the dominating set or is adjacent to a member of the dominating set. The applications of MDSPs are quite rich. The problems can be used in the study of social networks [1,2,3], design of wireless sensor networks [4], protein interaction networks [5, 6] and covering codes [7].
In a recent industrial application, the authors have been confronted with a more general variant of the MDSP which received, until now, only limited attention in the academic literature. We take the viewpoint of a company that runs a very large social network in which users can be modeled as nodes and the relationship among users can be modeled as edges. One of the important tasks of the company is monitoring all the activities (conversations, interactions, etc.) of the network users to detect anomalies such as cheating or spreading fake news. With millions of users, it is impossible to observe all users in the network. A potential solution is to construct a subset of users that can represent key properties of the network. The typical dominating set could be a good candidate. But in the case of social network scale, it is still too expensive to construct a dominating set because the size of the dominating set could be large. Therefore, we need to consider the general version of dominating set named kdominating set \(D_k\) which is defined as following: each vertex either belongs to the \(D_k\) or is connected to at least one member of \(D_k\) through a path of no more than k edges. The classical minimum dominating set corresponds to a special case when \(k=1\). For value \(k > 1\), the cardinality of kdominating set is less than that of 1dominating set: \(D_k \le D_1\), the monitoring cost of the network is therefore reduced.
It should be noted that the value of k should be selected carefully. If k is too large, the users in the resulting dominating set cannot be the representatives for the original graph. But if k is too small, the monitoring cost would be very high due to the large size of the dominating set. In our application, k is in general set to 3. Figure 1 illustrates the solutions of the MkDSP (the kdominating sets including the black nodes) in the cases of \(k = 1\) and \(k = 3\).
In this paper, we aim to construct the minimum k dominating set of a graph. The problem is called the minimum kdominating set problem (abbreviated as MkDSP for short). Its application in determining a good approximation of largescale social networks can be also found in [8]. The variant which requires vertices in kdominating set to be connected can be used to form the backbone of an ad hoc wireless network as mentioned in [9, 10].
The MDSP is proved NPcomplete [8], thus the MkDSP is clearly NPhard because it reduces to the classical MDSP when \(k = 1\). For further reading, we present a number of notations in the follows. If u is a vertex in the kdominating set, and v is connected to u through a path with no more than k edges, we say ukdominates (or covers) v or v is kdominated (or covered) by u. In context without ambiguity we could remove the prefix k for short. We call a vertex of dominating set as a kdominating vertex or dominating vertex for short. A vertex is a kcovered or kdominated vertex if it is covered by a dominating vertex. The problem can be modeled as the following mixedinteger linear programming (MILP) model,
where \(z_v\) is the binary variable representing whether the vertex v belongs to the kdominating set, i.e., \(z_v = 1\) if and only if \(v \in D_k\). The objective (1) is to minimize the number of vertices in \(D_k\) while constraints (2) assure that each vertex u must be covered by at least one dominating vertex. Here, \({\mathcal {N}}(u,k)\) denotes the set of vertices that can cover u, i.e., the vertices connect to u through a path with no more than k edges. The cardinality of \({\mathcal {N}}(u,k)\) plays an important role in the investigation of complexities of the algorithms presented in the next sections. In general, we denote \(n_k = {\mathcal {N}}(u,k)\); its value can be estimated on average by \(n_k = ({\overline{d}}^{k + 1}  1) / ({\overline{d}}  1) \approx {\mathcal {O}}({\overline{d}}^k)\), where \({\overline{d}}\) is the average degree of vertices in the graph and is equal to \({\raise0.7ex\hbox{${2E}$} \!\mathord{\left/ {\vphantom {{2E} {V}}}\right.\kern\nulldelimiterspace} \!\lower0.7ex\hbox{${V}$}}\). The optimal algorithm to compute \({\mathcal {N}}(u, k)\) is the breadth first search algorithm which has the complexity of \({\mathcal {O}}(n_k)\).
It can be seen that both the number of binary variables and the number of constraints in the MILP model above are equal to the size of vertex set V. This is a very large number of graphs arising in the context of social networks. Modeling and solving such a big formulation appears to be an impossible task for current MILP tools and computing capacity.
Literature review
Literature has attempted to deal with the MSDP. The most efficient exact method for the problem and other variants is presented in [11] where a branchandreduce method is developed. Although this method can provide an optimal solution, it handles only smallsize instances defined on graphs with a few hundred vertices in acceptable running time. Several efforts are spent to design approximation algorithms. Grandoni [12] proposes an algorithm in \({\mathcal {O}}(1.9053^n)\) while Rooij and Bodlaender [11] propose algorithm in \({\mathcal {O}}(1.4969^n)\) time and polynomial space.
The MDSP can also be tackled by existing approaches proposed to solve its variants. The most popular variant of the problem deals with a weight associated with each vertex of the graph, called the minimum weight dominating set (MWDS) problem (Ugurlu et al. [13]). The objective function seeks to minimize the total weight, without regarding the cardinality of the dominating set. The best metaheuristic in terms of solution quality for the MWDS is recently introduced by [14]. It is a hybrid metaheuristic combining a tabu search with an integer programming solver. The MILP solver is used to solve subproblems in which only a part of the decision variables, selected relative to the search history, are left free. The authors also introduce an adaptive penalty to promote the exploration of infeasible solutions during the search, enhance the algorithm with perturbations and node elimination procedures, and exploit richer neighborhood classes. The performance of the method is investigated on small and mediumsize instances with up to 4000 vertices. For massive graphs, Wang et al. [15] develop a local search algorithm called FastMWDS. Two new ideas are used in FastMWDS. First, a new fast construction procedure with four reduction rules is proposed. This procedure includes three parts: reducing, constructing, and shrinking. After this construction procedure, the size of massive graphs is reduced. Second, a new configuration checking with multiple values is designed, which can exploit the relevant information of the current solution.
Relating to MkDSP problem, a number of variants of this problem have been proposed and studied. As most of the related problems studied in the literature are in the context of wireless networks, in works, the dominating set is usually required to be connected. The problem can be solved in polynomial time on several restricted graphs such as distancehereditary graphs [16], HTgraphs [17], and graphs with bounded treewidth [18]. The hardness and approximation algorithms are introduced in [19, 20]. Two approximation algorithms are also developed to solve the minimum 2connected kdominating set problem in [9]. The first one is a greedy algorithm using an ear decomposition of 2connected graphs. The second one is a threephase algorithm that can be used to handle disk graphs only. Rieck et al. [10] propose a distributed algorithm to provide approximate solutions. The algorithm is tested on a small graph with only several hundred vertices.
To the best of our knowledge, the only work that proposes efficient algorithms to solve the MkDSP in the context of a large social network has been recently published by Campan et al. [8]. The MkDSP is first converted to the classical minimum dominating set problem by adding edge connecting vertices that are not adjacent but have distance not exceeding k. The MkDSP can now be solved by directly applying one of three greedy algorithms that work for the MDSP. The performance of algorithms is tested on mediumsize real social networks with up to 36,000 vertices and 200,000 edges. However, as shown in the experimental section, the method proposed in [8] cannot provide any solution for the instances with millions of vertices and edges in acceptable running time.
Problem challenges and contributions
One of the challenges to solve the MkDSP is to determine the domination relation between pairs of vertices. In general, this often leads to a procedure that we call kneighbor search to compute the set \({\mathcal {N}}(u,k)\) for vertex u, which is very expensive on massive graphs with \(k > 1\). As a consequence, approaches proposed in the literature that precompute the dominating set of every vertex are infeasible in the context of massive graphs. For example, the method proposed in [14] uses a decomposition method to tackle the MWDS and solves multiple subproblems, each corresponds to an MILP and then uses several local search operators. To speed up the local search procedure, the set \({\mathcal {N}}(u,k)\) for every vertex u in the graph has to be precomputed. Multiple MILP programs and the precomputation of \({\mathcal {N}}(u,k)\) make the algorithm perform slowly in the case of very largescale graphs. Similarly to the algorithm proposed in [15], even though it can handle largescale instances in the context of social networks but it works only in the case where \(k = 1\). Applying this algorithm to solve our problem with \(k > 1\) is not practical because when k increases, the algorithm gets stuck as it has to iteratively compute the set \({\mathcal {N}}(u,k)\) for every vertex u.
The MkDSP can be converted to a typical dominating set problem by inserting additional edges to the graph G that joint two nonadjacent vertices if the number of edges on the path among them is not greater than k. This polynomial complexity conversion procedure allows using any efficient algorithm proposed for the 1dominating set problem to solve kdominating set problem. The idea is proposed in [15]. However, inserting edges increases significantly the degree of vertices in the graph, leading to tedious performance of the method in terms of computational speed when tackling largescale social networks with millions of vertices as shown in the experiments in "Experimental results" section.
In this paper, we consider the MkDSP in the context of social networks. Our main contribution is an algorithm that can efficiently solve the MkDSP. The novel features of our method are (i) a prepossessing phase that reduces the graph’s size; (ii) a construction phase with different greedy algorithms; and (iii) a postoptimization phase that removes redundant vertices. In all phases, we also use techniques to reduce the number of times to compute kneighbor set of vertices which is very expensive on graphs arisen in social networks.
We have investigated the performance of our method on different sets of graphs which are classified mainly by their size of vertex set. A graph is labeled as a large size category if it has more than 100 thousand vertices, while the small one has less than 10 thousand vertices, the remaining cases are of medium size. The obtained results show the performance of our method. It outperforms the algorithm currently used by the company mentioned above in terms of solution quality. It can also handle real largescale instances with up to 17 million vertices that the algorithm proposed in [8] could not. Finally, it is worth noting that an extended abstract of this paper is published in [21]. In the current work, we describe in more details the main sections including literature review, heuristic method, and experimental results. In particular, we add an additional section to show the hardness of the problem and carry out more experiments to analyze the performance of the methods.
Solution methods
In this section, we describe in detail an efficient algorithm for largescale MkDSP problems. Our heuristic consists of three phases: preprocessing phase to reduce the graph size, construction phase to build a kdominating set that will be reduced in the postoptimization phase by removing redundant vertices.
Preprocessing phase
As mentioned above, the first phase of our algorithms is reducing the size of the original graph. We extend the reduction rules in [15] to kdominating set by finding structures that we call kisolated clusters. A kisolated cluster is a connected component whose vertices are kdominated by a single vertex. If there exists a vertex \(v \in V\) such that \({\mathcal {N}}(v, k) = {\mathcal {N}}(v, k + 1)\), set \({\mathcal {N}}(v, k)\) is a kisolated cluster associated with v. We can remove the vertices belonging to this kisolated cluster from G and add vertex v to the kdominating set. Algorithm 1 describes our reduction rule on small and mediumsize graphs. To estimate the complexity of Algorithm 1, it is easy to see that the for loop in Line 2 has V steps and in each step, \(k+1\) and k neighbors have been calculated. Therefore, the complexity of Algorithm 1 in the worst case is \({\mathcal {O}}(Vn_{k + 1})\).
Algorithm 1 does not work on largesize graphs due to the expensive cost of kneighbor search \({\mathcal {N}}(v, k)\). As a sequence, on massive graphs with more than 100,000 vertices, we implement a modified version of Algorithm 1 that is shown in Algorithm 2. The idea is based on the observation that, if \({\mathcal {N}}(v, k) \ne {\mathcal {N}}(v, k + 1)\), it is highly possible that \({\mathcal {N}}(u, k)\) would not be an isolated cluster for every \(u \in {\mathcal {N}}(v, k + 1)\). We could thus ignore the isolated clusters checking on \({\mathcal {N}}(u, k)\). In Algorithm 2, for each vertex v, the variable f[v] is set to False if \({\mathcal {N}}(v, k)\) has a high probability of not being an isolated cluster. If a vertex is marked False, it is not checked through the condition in Line 7, to avoid computing kneighbor searches. The complexity of Algorithm 2 is \({\mathcal {O}}(Vn_{k+1}/n_k)\). More precisely, the for loop in Line 5 repeats V times and there are \(V/n_k\) vertices that we need to compute their (\(k+1\))neighbor set, which runs in \({\mathcal {O}}(n_{k + 1})\).
kdominating set construction phase
To begin this subsection, we introduce the greedy heuristic that is currently used by our partner mentioned in the first section. The idea is originated from the observation that the higher degree vertex would tend to dominate more vertices. Thus, the vertices in the graph are first rearranged in descending order of their degree and then consecutively consider each vertex in the received list. If the considering vertex v is uncovered, it is added to the kdominating set \(D_k\) and all members of the kneighbor set \({\mathcal {N}}(v, k)\) is marked as covered. This greedy heuristic is denoted as \(\text{HEU}_1\) and is shown in Algorithm 3.
The complexity of \(\text{HEU}_1\) is \({\mathcal {O}}(V\log(V) + D_kn_k)\). First, sorting the vertices in Lines 2–3 costs \({\mathcal {O}}(V\log(V))\). In the for loop in Lines 6–13, there are \(D_k\) times a vertex is added to \(D_k\). And at each addition operation, we need to compute \({\mathcal {N}}(v, k)\), which runs in \({\mathcal {O}}(n_k)\) (for loop in Lines 9–10). Therefore, the complexity of for loop in Lines 6–13 is \({\mathcal {O}}(V + D_kn_k)\).
The heuristic \(\text{HEU}_1\) is fast and can handle very largescale instances but such a simple greedy algorithm cannot provide highquality solutions. To search for better solutions, we now present the second greedy algorithm called \(\text{HEU}_2\) whose pseudocode is provided in Algorithm 4. This algorithm is different from the first one in the way to treat covered vertices. In \(\text{HEU}_1\), covered vertices are never added to the dominating set while in \(\text{HEU}_2\), they can be still added if some conditions are satisfied. In Algorithm 4, \(\mathcal {N'}(k, v)\) denotes the set of uncovered vertices in \({\mathcal {N}}(k, v)\). Line 10 in Algorithm 4 indicates that if the vertex v is uncovered or the number of uncovered vertices in \({\mathcal {N}}(k, v)\) is greater than a predefined parameter \(\theta\), vertex v will be selected as a dominating vertex. In practice, the operations from Line 6 to Line 16 of \(\text{HEU}_2\) are quite timeconsuming. While \(\text{HEU}_1\) has to compute the kneighbor sets for a number of vertices that is equal to the size of dominating set, the operations of Lines 6–16 in \(\text{HEU}_2\) have to compute the kneighbor sets for every vertex in the graph. To speed up the process, we limit the running time for the operations 6–16 by the conditions in Line 7 using the parameter \(t_{\text{loop}}\). Here, \(t_{616}\) is the running time of the for loop 6–16. If \(t_{\text{loop}}\) is set to a large value, the running time of the algorithm could be very high due to the computation of kneighborhood sets of all vertices on Line 10. However, another observation is that once the running time \(t_{616}\) exceeds \(t_{\text{loop}}\), \(\text{HEU}_1\) will be applied on the remaining unexplored vertices. That means if \(t_{\text{loop}}\) is set to a too small value, \(\text{HEU}_2\) would behave almost like \(\text{HEU}_1\), possibly leading to lowquality solutions. Therefore, the parameter \(t_{\text{loop}}\) should be neither too large nor too small. It should be neither less than \(t_{\text{min}}\) seconds nor greater than \(t_{\text{max}}\) seconds, and is computed as \({t_{{\rm{loop}}}} = \max \left( {{t_{{\rm{min}}}},{t_{{\rm{max}}}}.{\raise0.7ex\hbox{${V}$} \!\mathord{\left/ {\vphantom {{V} N}}\right.\kern\nulldelimiterspace} \!\lower0.7ex\hbox{$N$}}} \right)\) (seconds) where N is approximately the number of vertices in the largest instances. We select the values of \(t_{\text{min}}, t_{\text{max}}\), and N mainly by experiments. In experiments, we set \(t_{\text{min}} = 400\), \(t_{\text{max}} = 950\), and \(N = 17,000,000\). If the running time of for loop at Line 6 excesses \(t_{\text{loop}}\) and there are still uncovered vertices (Line 17), \(\text{HEU}_2\) applies the same strategy as in \(\text{HEU}_1\) for uncovered vertices (Lines 17–18).
The complexity of Algorithm \(\text{HEU}_2\) is \({\mathcal {O}}(V\log (V) + Vn_k)\). The sorting operation in Line 1 runs in \({\mathcal {O}}(V\log(V))\). The for loop in Lines 6–16 runs V times. Each time, if the considering vertex v is covered its kneighbor set will be computed; otherwise, the uncovered subset \(\mathcal {N'}(v, k)\) of \({\mathcal {N}}(v, k)\) will be computed. The computations of \({\mathcal {N}}(v, k)\) and \(\mathcal {N'}(v, k)\) have the same complexity as \({\mathcal {O}}(n_k)\). Therefore, the main operation is to construct a kneighbor set with the complexity of \({\mathcal {O}}(n_k)\) on average.
Experiments show that the performance of the algorithm \(\text{HEU}_2\) heavily depends on the value of \(\theta\). An interesting fact is that \(\text{HEU}_2\) behaves similarly as \(\text{HEU}_1\) if \(\theta\) and \(t_{\text{loop}}\) are set to very large numbers. If the value of \(\theta\) is large enough, \(\text{HEU}_2\) provides the same solutions as \(\text{HEU}_1\), but it is more timeconsuming (due to the computation of \(\mathcal {N'}(v, k)\) in Line 10). Therefore, to get better solutions, we decide to execute \(\text{HEU}_2\) with several small integer values of \(\theta\) from 0 to 4 and choose the best one.
Postoptimization phase
The kdominating set \(D_k\) obtained from algorithm \(\text{HEU}_2\) can contain redundant vertices that can be removed while the remaining vertices still kdominate the graph. We implement a procedure named greedy redundant removal to remove such redundant vertices. The algorithm is shown in Algorithm 5.
The for loop in Lines 4–23 in Algorithm 5 considers every dominating vertex \(v \in D_k\) to check if it is redundant. The variable \({\mathcal {S}}\) gets TRUE value if v is redundant and FALSE otherwise. If v is not redundant, there exists a vertex u in \({\mathcal {N}}(v, k)\) such that u is not covered by any vertex w in \(D_k \setminus \{v\}\). Instead of computing \({\mathcal {N}}(u, k)\) and checking whether \(w \in {\mathcal {N}}(u, k)\), which are very expensive on largescale instances, we verify if \({\mathcal {N}}(u,k_1)\) and \({\mathcal {N}}(w,k_2)\) are not disjoint. Here, \(k_1\) and \(k_2\) are positive integers such that \(k_1 + k_2 = k\).
The sorting operation in Line 1 runs in \({\mathcal {O}}(D_k\log (D_k))\). The for loop in Lines 6–19 repeats for \(D_k\) times. The for loop in Lines 9–14 operates \(n_k\) iterations in the worst case. Inside this loop, there is a \(k_1\)neighbor set construction \({\mathcal {N}}(u, k_1)\) in Line 8. To verify the condition in Line 10, we sort the element of the smallsize set and perform binary search of elements in the largesize set on the smallsize set. The complexity of this operation is \({\mathcal {O}}(\max \{n_{k_1}, n_{k_2}\} \log (\min \{n_{k_1}, n_{k_2}\}))\), where \(n_{k_1}\) and \(n_{k_2}\) are cardinalities of sets \({\mathcal {N}}(u, k_1)\) and \({\mathcal {N}}(w, k_2)\), respectively, leading to the complexity \({\mathcal {O}}(D_k^2n_k\max \{n_{k_1}, n_{k_2}\}\log (\min \{n_{k_1},n_{k_2}\}))\) of the whole Algorithm 5. We also note that if we do not separate k into \(k_1\) and \(k_2\), the complexity of the algorithm becomes \({\mathcal {O}}(D_k^2n_k^2)\).
It is observable that when the gap between \(k_1\) and \(k_2\) gets larger, the computational cost \(\max \{n_{k_1}, n_{k_2}\}\log (\min \{n_{k_1},n_{k_2}\}))\) gets higher. As a result, we set \(\left\{ {{k_1},{k_2}\} = \{ \lfloor {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern\nulldelimiterspace} \!\lower0.7ex\hbox{$2$}}(k + 1) \rfloor ,k  \lfloor {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern\nulldelimiterspace} \!\lower0.7ex\hbox{$2$}}(k + 1) \rfloor } \right\}\) that guarantees \(k_1  k_2 \le 1\). Inside the for loop 6–19, a number of \(k_2\) neighbor sets \({\mathcal {N}}(w, k_2)\) are computed while only one \(k_1\) neighbor set \({\mathcal {N}}(u, k_1)\) must be evaluated. Therefore, it is better if \(k_1 \ge k_2\); and we assign \({k_1} = \lfloor {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern\nulldelimiterspace} \!\lower0.7ex\hbox{$2$}}(k + 1) \rfloor\) and \(k_2 = k  k_1\). For example, in case of \(k = 3\), we set \(k_1 = 2\) and \(k_2 = 1\). The complexity of Algorithm 5 becomes \({\mathcal {O}}(D_k^2n_3n_{2}\log ( n_{1})) \approx {\mathcal {O}}(D_k^2{\overline{d}}^5\log ({\overline{d}}))\), which is better than \({\mathcal {O}}(D_k^2n_3^2) \approx {\mathcal {O}}(D_k^2{\overline{d}}^6)\), the complexity of the algorithm if we directly verify the condition \(w \in {\mathcal {N}}(u, k)\). Here, we recall that \({\overline{d}}\) is the degree on average of vertices in the graph.
After finishing the greedy redundant vertex removal, we continue to perform the second postoptimization phase by solving MILP programs as follows. We divide the vertices in the obtained kdominating set \(D_k\) of degree less than a given value \(d_{p}\) into several groups; each contains \(n_{p}\) vertices maximum. For such a group B, let X be the set of neighbors of the vertices in B, i.e., \(X = \cup _{v \in B} {\mathcal {N}}(v, 1)\). Let S be the set of vertices that are only dominated by vertices in B and not by ones in \(D_k \setminus B\). We solve the following integer programming problem in a limited time of \(t_{p}\). The number of groups is about \(D_k/n_p\) and the running time to tackle each group is limited to \(t_p\), the total running time in the worst case is therefore \({\raise0.7ex\hbox{${{t_p}{D_k}}$} \!\mathord{\left/ {\vphantom {{{t_p}{D_k}} {({n_p}.{n_t})}}}\right.\kern\nulldelimiterspace} \!\lower0.7ex\hbox{${({n_p}.{n_t})}$}}\), where \(n_t\) is the number of threads used for this phase.
If the feasible solution \(B'\) has smaller size than B, we replace elements of B in \(D_k\) by \(B'\), i.e., \(D_k = (D_k \setminus B) \cup B'\). The values of \(d_{p}, n_{p}\), and \(t_{p}\) must be carefully selected so that the performance of the algorithm is assured while the running time is still kept reasonable. By experiments, we decide to use the setting \(n_{p} = 15,000\) and \(t_{p} = 6\) s. The algorithm is first run with the value of \(d_{p} = 500\) and then is repeated with \(d_{p} = 5000\) to search for further improvement.
Experimental results
This section presents the results of the proposed methods on graphs of various sizes. Experiments are conducted on a computer with Intel Core i7—8750h 2.2 GHz running Ubuntu OS. The programming language is Python using igraph package to perform graph computations. We use CPLEX 12.8.0 to solve MILP programs. The preprocessing and set dominating construction phases take 1 thread while the MILP solver takes 4 threads.
We test the approaches on three instance classes categorized by the size of their graphs. Small instances are taken from [14] with the number of vertices varying from 50 to 1000. This dataset contains 540 instances. To avoid long result tables, we select to show results for only five groups, each contains 10 instances with the same vertex and edge numbers. Six mediumsize instances are from the Network Data Repository source [22] which are also used by [8] to test their algorithm. The third instance class includes six largesize instances: two with approximately 17 million vertices and 30 million edges extracted from the data of our partner (socpartner1 and socpartner2) and four from Network Data Repository source. Table 1 shows the characteristics of the instances containing name, vertex size (column V), and edge size (column E). It also reports the results of the preprocessing phase including the number of isolated clusters (NoC) and the number of vertices in isolated clusters (NoR) in three cases corresponding to three values of k: 1, 2 and 3.
As can be seen in Table 1, the number of isolated clusters and reduces vertices increases when the value of k is higher. On the small graphs, these numbers are all zero except two classes s4 and s5 in the case \(k = 3\) where the preprocessing phase can reduce 800 and 1000 vertices, respectively. Remarkably, in these cases, all the vertices are reduced; hence, the algorithm gets the optimal solution right after the preprocessing phase. On the mediumsize graphs, the preprocessing procedure cannot remove any vertex. But in half instances of large graphs, the number of isolated clusters and removed vertices is significant.
We compare the performance of four algorithms: the MILP formulation with running time limited to 400 s, the greedy algorithm currently used by our partner \(\text{HEU}_1\), the best algorithm proposed by [8] called \(\text{HEU}_3\), and our new algorithm called \(\text{HEU}_4\) including all components mentioned in the last section. For each method, we report the objective value of its solutions (Sol) and the running time (T) in seconds. For the method using MILP formulation, we also show the gaps (Gap) returned by CPLEX. Because the MILPbased method cannot handle efficiently medium and largesize graphs, we only present its results obtained on smallsize graphs. In result tables, the numbers in italic show the best found kdominating sets over all methods and the marks “−” denote the instances that cannot be solved by HEU3 in the running time of several days or due to “out of memory” status.
Table 2 shows the experimental results on the small graphs which are average values over 10 instances. The numbers in italic show the best found kdominating sets overall methods. An interesting observation is that the MILPbased method can solve to optimality more instances when k increases. More precisely, it can solve all instances with \(k=3\). Therefore, for exact methods, instances with larger values of k tend to be easier. \(\text{HEU}_1\) is the worst in terms of solution quality, but it is the fastest. Considering \(\text{HEU}_3\) and \(\text{HEU}_4\)’s solution quality, \(\text{HEU}_4\) dominates \(\text{HEU}_3\) in 10 cases while \(\text{HEU}_3\) is better in only one case. \(\text{HEU}_4\) also provides better solutions than MILP formulation in several instances that cannot be solved to optimality, i.e., when gap values are greater than zero.
Table 3 shows the experiments on the mediumsize graphs. The algorithm \(\text{HEU}_4\) performs better than \(\text{HEU}_1\) and \(\text{HEU}_3\) on all instances but one in terms of solution quality. And finally, Table 4 shows experiments for the large instances. As can be seen, although slower as expected, \(\text{HEU}_4\) still provides significantly better solutions than \(\text{HEU}_1\). The heuristic \(\text{HEU}_3\) gets trouble on largescale instances when it cannot give any solution in several days of computation for five over six instances. This shows the scalability of the new algorithm \(\text{HEU}_4\) compared with \(\text{HEU}_3\). An interesting observation is that when the value of k increases, the running time of the algorithms tends to decrease. An explanation for this phenomenon is that the increase of k leads to solutions with smaller cardinality of kdominating sets. More precisely, if the cardinality of kdominating set \(D_k\) is smaller, the for loop 6–16 of Algorithm 4 would tend to be finished faster because the IF condition on Line 7 would halt the for loop 6–16 if every vertex is covered. In the postoptimization phase, the cardinality of \(D_k\) also affects the running time of both steps. For the greedy redundant vertex removal, the number of operations of for loops 4–23 and 9–14 of Algorithm 5 is proportional to the cardinality of \(D_k\). For the postoptimization using MILP, the number of programs to solve and their size also depend on the cardinality of \(D_k\). However, this phenomenon is not observed in \(\text{HEU}_4\) on several instances because of the execution of the postoptimization phase with CPLEX, whose running time could depend on not only the size of the dominating sets but also other unknown characteristics of input data.
Conclusion
In this paper, we study the kdominating problem in the context of very largescale input data. The problem has important applications in social network monitoring and management. Our main contribution is a new heuristic with three components: the preprocessing phase, the greedy solution construction, and the postoptimization phase. We perform extensive experiments on graphs of vertex size varying from several thousand to tens of millions. The obtained results show that our algorithm provides a better tradeoff between the solution quality and the computation time than existing methods. In particular, it helps to improve the solutions of the method currently used by our industrial partner. All in all, our new algorithm becomes the stateoftheart approach proposed to solve the MkDSP on very largescale graphs of social networks with million vertices and edges.
Availability of data and materials
The data including two very largescale instances from the industrial partner are available upon request.
Abbreviations
 MDSP:

Minimum dominating set problem
 MkDSP:

Minimum kdominating set problem
 MILP:

Mixed integer linear programming
 MWDS:

Minimum weight dominating set
References
 1.
Wang F, Du H, Camacho E, Xu K, Lee W, Shi Y, Shan S. On positive influence dominating sets in social networks. Theor Comput Sci. 2011;412(3):265–9.
 2.
Wang G, Wang H, Tao X, Zhang J. Finding weighted positive influence dominating set to make impact to negatives: a study on online social networks in the new millennium. In: Kaur H, Tao X, editors. ICTs and the millennium development goals, vol. 412. Berlin: Springer; 2014. p. 67–80.
 3.
Khomami MMD, Rezvanian A, Bagherpour N, Meybodi MR. Minimum positive influence dominating set and its application in influence maximization: a learning automata approach. Appl Intell. 2018;48:570–93.
 4.
Yua J, Wang N, Wang G, Yu D. Connected dominating sets in wireless ad hoc and sensor networks—a comprehensive survey. Comput Commun. 2013;36(2):121–34.
 5.
Wuchty S. Controllability in protein interaction networks. Proc Natl Acad Sci. 2014;111:7156–60.
 6.
Nacher JC, Akutsu T. Minimum dominating setbased methods for analyzing biological networks. Methods. 2016;102:57–63.
 7.
Östergård PRJ. Constructing covering codes by tabu search. J Comb Des. 1997;5(1):71–80.
 8.
Campan A, Truta TM, Beckerich M. Approximation algorithms for \(d\)hop dominating set problem. In: 12th international conference on data mining. 2016. p. 86–91.
 9.
Li X, Zhang Z. Two algorithms for minimum 2connected \(r\)hop dominating set. Inf Process Lett. 2010;110(22):986–91.
 10.
Michael Q, Rieck SP, Dhar S. Distributed routing algorithms for wireless ad hoc networks using \(d\)hop connected \(d\)hop dominating sets. Comput Netw. 2005;47(6):785–99.
 11.
Rooij JMMv, Bodlaender HL. Exact algorithms for dominating set. Discret Appl Math. 2011;159(17):2147–64.
 12.
Grandoni F. A note on the complexity of minimum dominating set. J Discret Algorithms. 2006;4(2):209–14.
 13.
Ugurlu O, Tanir D. A hybrid genetic algorithm for minimum weight dominating set problem. In: Zadeh L, Yager R, Shahbazova S, Reformat M, Kreinovich V, editors. Recent developments and the new direction in softcomputing foundations and applications, vol. 361., Studies in fuzziness and soft computingBerlin: Springer; 2018. p. 137–48.
 14.
Albuquerque M, Vidal T. An efficient matheuristic for the minimumweight dominating set problem. Appl Soft Comput. 2018;72:527–38.
 15.
Wang Y, Cai S, Chen J, Yin M. A fast local search algorithm for minimum weight dominating set problem on massive graphs. In: Twentyseventh international joint conference on artificial intelligence (IJCAI). 2018. p. 1514–22.
 16.
Brandstädt A, Dragan FF. A lineartime algorithm for connected \(r\)domination and steiner tree on distancehereditary graphs. Networks. 1998;31:177–82.
 17.
Dragan F. Htgraphs: centers, connected \(r\)dominated and steiner trees. Comput Sci J Moldova. 1993;1(2):64–83.
 18.
Borradaile G, Le H. Optimal dynamic program for rdomination problems over tree decompositions. In: 11th international symposium on parameterized and exact computation—IPEC 2016, Aarhus, Denmark. 2016.
 19.
Coelho RS, Moura PFS, Wakabayashi Y. The \(k\)hop connected dominating set problem: hardness and polyhedra. Electron Notes Discret Math. 2015;50:59–64.
 20.
Coelho RS, Moura PFS, Wakabayashi Y. The khop connected dominating set problem: approximation and hardness. J Comb Optim. 2017;34:1060–83.
 21.
Nguyen MH, Hà MH, Hoang DT, Nguyen DN, Dutkiewicz E, Tran T. An efficient algorithm for the kdominating set problem on very largescale networks (extended abstract). In: Tagarelli A, Tong H, editors. Computational data and social networks—8th international conference, CSoNet 2019, Ho Chi Minh City, Vietnam, November 18–20, 2019, proceedings. Lecture notes in computer science, vol. 11917. p. 74–6.
 22.
Rossi RA, Ahmed NK. The network data repository with interactive graph analytics and visualization. In: AAAI’15: proceedings of the twentyninth AAAI conference on artificial intelligence. 2015. p. 4292–3.
Acknowledgements
This work is finished during the research stay of the corresponding author at the Vietnamese Institute for Advanced Studies in Mathematics (VIASM). He wishes to thank this institution for their kind hospitality and support.
Funding
The authors gratefully acknowledge the support from the UTSVNU Joint Technology and Innovation Research Centre (JTIRC).
Author information
Affiliations
Contributions
The first author provided the ideas and implemented the algorithms. The second and third authors provided the ideas for the algorithms and verified their performance. The last author provided and processed the data. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Nguyen, M.H., Hà, M.H., Nguyen, D.N. et al. Solving the kdominating set problem on very largescale networks. Comput Soc Netw 7, 4 (2020). https://doi.org/10.1186/s40649020000785
Received:
Accepted:
Published:
Keywords
 kdominating set problem
 Social networks
 Largescale networks
 Heuristic method