Do-it-yourself networks: a novel method of generating weighted networks
Network theory is finding applications in the life and social sciences for ecology, epidemiology, finance and social–ecological systems. While there are methods to generate specific types of networks, the broad literature is focused on generating unweighted networks. In this paper, we present a framework for generating weighted networks that satisfy user-defined criteria. Each criterion hierarchically defines a feature of the network and, in doing so, complements existing algorithms in the literature. We use a general example of ecological species dispersal to illustrate the method and provide open-source code for academic purposes.
1. Introduction
Network theory is gaining traction in the life and social sciences, finding applications in the fields of ecology, economics, epidemiology, finance and more recently, in analysing social–ecological systems [1–4]. In ecology, there is growing interest in the role of connectivity between spatial locations on the stability and resilience of those systems [5,6]. Numerous other applications exist for the study of infectious disease spread [7,8], optimal management of spatially distributed species [9], neural communication [10], bank failure [11] and the diffusion of ideas [12]. While there are methods to generate specific types of networks (table 1), these are primarily limited to unweighted networks [1]. In this paper, we present a general framework for generating suites of weightednetworks that meet user-defined criteria, e.g. symmetry, clustering or other metrics measuring network connectivity [3,23]. The framework makes use of a hierarchy of metrics, where each metric specifies a distinct feature that describes the structural properties of the network; the hierarchy gives a natural order in which to consider the addition of new metrics. Our method uses optimization theory to create adjacency matrices that define the relationship between different entities such as spatial locations or the interactions between species, people and institutions. Our work is most akin to the network generation algorithms of Carayol et al. [15] and Small et al. [16], who use genetic algorithms and pay-off functions to optimize the generation process. However, we specifically generate weighted networks and optimize a suite of network metrics rather than node degree. We demonstrate our method using a general example of ecological species dispersal and provide open-source code of our algorithm for academic purposes.1
Table 1.
Common network algorithms in the literature.
2. Terminology
For the purposes of illustrating the algorithm, we consider a ‘habitat graph’ network composed of spatially distinct habitats or patches that are connected by species dispersing among them. However, our method is not limited to spatial networks and can be used to generate weighted networks in other applied fields such as those of species interaction or social networks. See [1,3] for reviews of different types of networks across disciplines.
In our study, each patch is referred to as a node; each connection is referred to as an edge, which allows for the flow of species between patches. An adjacency matrix describes the connections between patches.
Let us construct an adjacency matrix that represents the connectivity between nspatial locations or patches. Patches are connected by the movement of species, people or any flow between spatial locations. For illustrative purposes, we consider the case of a single species dispersing between patches.
Let the entries of the adjacency matrix correspond to the cost of movement between patches where the dispersal of species is symmetric, i.e. species move bidirectionally at the same rate. We assume a weighted network, where the off-diagonal entries of the adjacency matrix are constrained to be real numbers greater than or equal to zero. Increasing the cost of movement between patches to infinity is equivalent to assuming that two patches are not connected.2
Mathematically we define a randomly generated adjacency matrix describing the cost of species movement between patches as:
We constrain the adjacency matrix to possess a spectral radius of r, along with a specific variance (v) and skewness (s) of the associated dominant eigenvector
The spectral radius is the largest (dominant) eigenvalue of the adjacency matrix.4 In network terms, it represents the average distance it takes to traverse across the entire landscape and is therefore a proxy of network connectivity [26]. A low (high) spectral radius indicates a network that is highly (poorly) connected. We formally define the spectral radius as:
Spectral radius alone is insufficient to guarantee a particular network structure. For a specific spectral radius there exists an infinite number of potential network configurations (figure 1). This makes it difficult to confidently generalize conclusions drawn from network process outcomes. Therefore, we further constrain the adjacency matrix (of spectral radius, r) to possess a defined variance and skewness in the associated dominant eigenvector,
In a network context, the dominant eigenvector represents a collection of node connectivity scores or ‘eigenvector centralities’ of the network [29]. Each component in the dominant eigenvector measures the relative contribution of each node to the overall connectivity of the network. Each network consists of nodes with varying degrees of contribution. Some nodes may be high contributors to connectivity while other nodes may be isolated and contribute less. We define highly (weakly) connected nodes within a given network as nodes with a contribution score greater (less) than the mean contribution; where, the mean contribution score is the average of the dominant eigenvector components.
Variance of the dominant eigenvector measures the spread in node contribution across the network and thus quantifies node heterogeneity for a given network. In statistical terms, variance is the mean squared deviation of node contribution to connectivity. A low variance indicates that each node contributes similarly to network connectivity; a high variance is indicative of a wide distribution in node contribution.
Skewness of the dominant eigenvector measures the net proportion of highly to weakly connected nodes in a network. A skewness of zero implies an even proportion of highly and weakly connected nodes. A positive (negative) skewness indicates a greater (lower) number of highly connected to weakly connected nodes. Taken together, spectral radius, variance and skewness of a network highlight unique topological properties of the desired network (figure 1).
Variance (v) and skewness (s) of
Alternatively, one may use other sets of network metrics to constrain the desired properties of the network. Many established network metrics are highly correlated but a clear description of how they hierarchically build upon each other is lacking in the literature [3,30]. By using the moments approach to summarize a network's node connectivity scores, each of our metrics highlights a distinct feature of networks and make hierarchy a non-issue [28].
3. The algorithm
In this section, we outline an algorithm to generate networks that conform to specific values for all three metrics [28]. Let r*, v* and s* serve as user-defined criteria that indicate the desired spectral radius and variance and skewness of the associated dominant eigenvector.5
We generate a random adjacency matrix, A0, to serve as initial conditions. Denote the spectral radius and variance/skewness of the associated eigenvector of A0 as r0, v0 and s0, respectively. In our algorithm, A0 and its associated network metrics are updated according to a numerical hill-climb method.
We choose the entries of A0 such that they minimize the weighted sum of squared differences between the properties of our ‘new’ adjacency matrix and our user-defined criteria.6 This results in the following minimization problem:
The values of ωk are the weights associated with the preferred degree of accuracy of the estimated entry to the predefined user criteria. The larger the weight, the greater the variable contributes to the sum of squares and the closer the estimated value must be to the desired theoretical value. The values of r0, v0and s0 are determined directly from the entries of A0 and are, therefore, functions of aij.
The minimization problem in (3.1) is constrained by (subject to) a set of equations defining the structural properties of the desired adjacency matrix. The constraints form an n2 × n2 system of equations such that:
Collectively, the above equation is referred to as the equality constraints. On the left-hand side, the matrix B refers to the boundary equality conditions matrix. The matrix B is an n2 × n2 matrix that captures the characteristics of the matrix A0. Each row corresponds to an equation describing the connectivity between patches. Each column corresponds to a variable representing an entry in A0. The matrix B is multiplied by an n2 × 1 vector of variables, one for each entry of A0. The solutions are given by an n2 × 1 vector E, often referred to as the equality conditions. Together (3.2) forms a system of at most n2 equations and n2unknowns that capture the desired structural properties of the adjacency matrix. An illustration of the construction of (3.2) for a three-patch system is found in the electronic supplementary material, A. We present the general method below.
Initially define B and E to be a zero matrix and vector, respectively. For the entries of B corresponding to the diagonal and zero off-diagonals of A0, the following must hold:
-
— A patch cannot be connected to itself. For each diagonal entry of A0 there exists a row in B such that the corresponding entry of aii in B is equal to one with all other entries in the row equal to zero. (After matrix multiplication in (3.2) this results in aii = 0).
-
— To specify that two patches are not connected in the adjacency matrix, set the entries of B corresponding to the respective off-diagonal entries of A0 equal to zero. In other words, if aij = aji = 0 in A0, then there exist two rows in B such that the corresponding entries for aij and aji in B should be one while all other entries in both rows are equal to zero.
For the entries of B corresponding to the non-zero off-diagonals of A0, the following must hold:
-
— For each pair of connected patches in A0, there exists a row in B such that the corresponding entries for aij and aji are equal to 1 and −1, respectively. All other entries in the row are 0. This implies symmetry. (After matrix multiplication we obtain aij = aji.)
The minimized values of A0 are fed into a generic hill-climbing numerical algorithm to solve for the optimal solution. The pseudo-code for the method is outlined below:
-
— Define a new adjacency matrix consisting of the minimized values of A0 as
. Let the values of the network metrics associated with be given by , and . -
— Compare the values of each network metric derived from
to r*, v* and s*. One method to do so is to calculate the absolute value of the difference between the desired and derived metrics (e.g. , , ). Another is to calculate the per cent difference between the desired and derived metrics. -
— If all three metric differences fall below a pre-determined tolerance level, the algorithm concludes.
-
— However, if the difference between any of the metrics exceeds a pre-determined tolerance level:
-
(i) Randomize one off-diagonal element of
subject to the assumptions discussed above. -
(ii) Define the perturbed
as new initial conditions for the minimization problem in (3.2). In other words, update A0, r0, v0 and s0 to be the perturbed adjacency matrix . Minimize the sum of squared differences. -
(iii) Update
.
-
The resulting output is an n × n adjacency matrix that is symmetric and possesses a spectral radius, variance and skewness of its associated eigenvector of r*, v* and s*. Figure 2 illustrates available network configurations generated using our method. Other configurations are found in electronic supplementary material, A, including network configurations of 50, 100 and 250 patches. While small compared with neural networks or the World Wide Web, these network sizes are suitable for many empirical social and ecological networks [4,31–33].
Convergence of the algorithm is dependent on several factors. First, for a given spectral radius and number of patches, there exists a lower and upper bound on the feasible values of the variance and skewness of the dominant eigenvector. These are derived in electronic supplementary material, A. Second, while minimizing the weighted sum of squared differences guarantees a concave function, the value of the desired properties of the adjacency matrix (and their associated weights) in (3.1) determine the specific shape of the function. The flatter the function to be minimized, the more difficult it is for the algorithm to converge to a solution. Finally, we cannot rule out the possibility of convergence to a local minima, in contrast with a global minima. However, this is a general problem characteristic of optimization theory [34,35], and formal analysis is left for future work.
Our algorithm was coded in Matlab 2016a. Source code is available in the electronic supplemental material and is free for academic use.7
4. Discussion
By varying initial conditions one may generate an infinite number of weightednetworks with specific structural properties. Defining multiple criteria in the generation process allows one to fine-tune the structure of the resulting network, and explicitly test the effect of network structure on system properties such as stability or resilience.
Further, our method is not limited to spectral radius and eigenvector centrality, but may be applied using any network metric deemed appropriate to describe the connectivity of the system. For example, it is possible to use clustering and other metrics that are employed to analyse social and biological networks (i.e. clustering, degree centrality and betweenness) [1,2,22]. It is feasible to use our algorithm to generate networks with combinations of distances between nodes, node contributions to connectivity and/or clustering.
Nor is our method limited to symmetric, bidirectional dispersal. By changing the system of equations in (3.2), one may allow for unidirectional or asymmetric dispersal between patches. The method can also be extended to multiple species and unweighted networks. For the former, each species has its own adjacency matrix that describes the flow of that species within the network. The latter is intrinsically more difficult. Intuitively one would constrain the entries of the adjacency matrix to zero or one, and (if desired) solve for an adjacency matrix with a particular average, variance and skewness in the number of connections per node. In this case, the minimization problem in (3.1) becomes a nonlinear binary integer problem, which is difficult to solve and is left for future work.8
Like other network generation methods (table 1), there is the possibility that the networks we generate reflect only a small subset of the entire suite of networks that meet our specified criteria. While we know of no way to explicitly test this, we can combine our method for generating weighted networks with those of unweighted networks to more fully ‘paint the picture’ of a network's structure. We may, for example, create a weighted network that has a particular node degree distribution. To do so, one would first generate an unweighted network describing the structural connections between nodes (e.g Erdõs-Réyni [17], scale-free [20], small world [22]). This would have a particular node degree distribution. The binary connected/not connected coefficients could then be fed into the equality constraints of (3.2) to create a weighted network with a specific degree distribution that also meets the desired properties of the weights.
Our method has applications across disciplines. We briefly discuss several examples. In epidemiology, disease propagation is often faster in structured population networks than random ones [20]. More specifically, the structure of the network of contacts has implications for the management of human and animal disease spread [8,36]. In ecology, the dispersal of species plays a distinct role in the persistence and survival of species. Examples include mass and rescue effects [37,38], source–sink dynamics [39,40] and spatial insurance [41,42]. At the core of each of these principles is the underlying structure describing the movement of species between spatial locations. More recent work has identified ‘keystone patches’—patches that contribute strongly to diversity or stability of the entire system [43]. Finally, the management of spatially distributed resources has been a topic in the field of economics [44]. Though much of the literature focuses on fisheries management [9,44,45], applications exist for the spatial management of other renewable resources [46] and invasive species [47].
While we focus the application of our method on spatial networks, it is by no means limited to them. Our method is suitable for generating matrices of species interactions such as competition, parasitism or mutualism. In ecology, one common measure of ecosystem stability is the dominant eigenvalue of the interaction matrix or ‘asymptotic resilience’ [48]. A negative (positive) value indicates a stable (unstable) system, with the magnitude being a loose indicator of how stable the system is. Recent advances have demonstrated that for random interaction matrices the dominant eigenvalue can be a poor indicator of stability compared to other metrics [49]. It would be useful to explicitly test under which species interaction structures the dominant eigenvalue is a good or poor indicator of stability.
Our method complements existing algorithms for generating networks (table 1) while giving researchers precision in the structure of networks they wish to construct. We hope that our algorithm will aid researchers in testing the effects of explicit network structure on model outcomes.