Transportation Geography and Network Science/Small-world network



= Introduction = A long time ago, people's scope of activity was limited because of several factors (e.g., topology, lack of transportation options, lack of tools and a lack of education). The movement of people, commodities, and ideas usually relied on human and animal power. As such mountains, rivers, and oceans were significant obstacles to mobility. In addition to these physical barriers, citizens of different countries couldn't communicate easily due to language barriers. Therefore, it took a long time to move things and ideas from origin to the destination. However, technology has developed significantly since those days. We can move longer distances easily and quickly thanks to developed transportation networks and transportation modes (e.g., automobiles, airplane). Also, our abilities to communicate with foreigners have improved due to improved education and the spread of a few standardized languages. In addition, we can deliver messages nearly instantly via the telecommunication and computer networks. The world is getting smaller.

We live in a network world and many things are connected with each other, and this connection makes our lifestyle effective. Duncan and Strogatz indicate that real world networks are neither regular networks nor random networks, and they exist in between regular networks and random networks. They called this type of network a “small world network”.

= History = • In 1929: Small world phenomena are first reported by Frigyes Karinthy, a Hungarian writer. He discusses the topic of small world properties in his book, “Everything is different”.

• In 1967: Stanley Milgram carried out the first experimental study of small world phenomena. In this experiment, he analyzed the average degree of separation when a letter was delivered between two strangers.

• In 1991: John Guare wrote a play “Six Degrees of Separation” and the term of “Six Degrees of Separation” became popular in the world.

• In 1998: Duncan Watts and Steven Strogatz examined “small world networks” mathematically by analyzing an actor network, a neural network, and a power grid network. They illustrated small world properties appeared in both natural and technological networks as well as social networks.

= Six Degrees of Separation = Any two individuals in the world are connected within 6 degrees even if they don't know each other. This implies that everyone can be linked to a celebrity (e.g., Madonna, the Dalai Lama, and the Queen) within 6 degrees via friend's network. This idea was first illustrated by Frigys Karinthy in 1929, but was considered a folklore until experiment was conducted in 1969. In 1969, the researcher, Stanley Milgram, conducted a first experimental research for investigating the idea of “Six degrees of separation”. The researchers asked residents in Wichita, Kansas, Omaha, and Nebraska to send a letter, which destination was stock broker in Boston or student in Massachusetts. When residents didn’t know the target person, they had to send a letter to their friends or acquaintance, not the target person. Although not all letters were reached to the target person, the average number of degrees was 5.5. This result showed that Karinthy’s idea was almost correct. After that, this phenomenon was made widely popular by John Guare’s play, “six degrees of separation”, and the parlor game, “Six Degrees of Kevin Bacon”. After Milgram’s experiment, several researchers tested the hypothesis of “six degrees of separation” by changing the network type. Although Milgram’s experiment was limited in U.S., today’s experiment analyzes worldwide network.

In 2008 Microsoft researcher analyzed the 30 billion records of Microsoft messenger, and they founded out that average degrees of separation was 6.6. This means that 78 percent of two people is linked within 7 degrees while some two pairs are separated in 29 degrees. In 2011, Facebook scientist Lars Backstrom and the researcher in the University of Milan analyzed 721 million Facebook users, which is about 10% of the world population. The result revealed that the average degrees of separation was 3.74, and 99.6% of two users were linked within 5 degrees of separation. This result indicates Facebook’s network shows smaller world than Microsoft messenger’s network. However, the level of network connection between two people is different because Microsoft’s research is based on the exchanging message; on the other hand, Facebook’s research is based on “friends” function.

Other results of average degrees of separation [d] are shown in Table1. Among ten networks, E Coli Metabolism is the smallest number at 2.98. On the contrary, Power Grid is the largest number at 18.99. This means that any two power grids are connected within the 19 degrees in the network. This number is much larger than the number of social network. Based on this result, network structure is different depending on the network type. According to Barabashi, this average distance (average degrees of separation) [d] can be calculated in any network if we know the total number of node [N] and the average out-degree (coordination number ) [k]. Average distance is approximately $$ d\propto \log N/log k$$, and this equation implies that the denser the network, the smaller the average distance. Table.1 shows the result of the equation, and estimated value [log N / log k] is similar to average distance [d] in some cases (e.g., Internet, Mobile Phone Calls); however, it is not perfect (e.g., WWW, Power Grid, and Email). For example, in the case of Internet, the average distance [d] is 6.98 and the estimated value of log N / log k is 6.59. On the other hand, in the case of Email, the average distance is 5.88, but estimated value of log N / log k is 18.4.

Barabashi mentions that the relationship between the total number of node [N] and the average distance [d] is significantly different depending on spatial dimension. Image3.10 in Barabashi shows the relationship between them by changing dimension, and the result illustrates that average distance in regular network is much larger than random network. This is one of the reasons of the difference between actual average distance [d] and the estimated value [log N / log k]. Also, Barabashi points out that average distance [d] is different under the different size of network. Image3.11 in Barabashi displays the distance distribution of Facebook users between worldwide users and U.S. users, and the result showed that the average distance between users in U.S. is smaller than between worldwide users.

= Small world: “Duncan Watts model” =

Network structure is significantly different between regular networks and random networks. Watts and Strogatz state real world network structure is neither that of a random network nor that of a regular network. Therefore, they note that real world networks are between the two networks and they called this type of network a “small world network”. The image of small world network in ring lattice is shown in Figure.1. A network has $$ N $$ nodes and each node connects to $$ k $$ nodes, and each node rewired from nearest node to another node randomly according to the probability [p]. In the case of “regular network”, the probability is zero [p = 0] and all nodes connect to nearest neighbors. As the probability increases [0< p < 1], some links are randomly rewired to another node at the probability of $$ p $$, and we call this link as “short cut”. When every link is randomly rewired [p=1], this network is called a “random network”.

Path length [L] and Clustering coefficient [C]
In order to explain the network structural characteristics, Watts and Strogatz used two factors; path length [L] and clustering coefficient [C]. If everyone connected to everyone, C = 1.

In the regular network, path length [L] is $$ L \sim N / 2k $$, and clustering coefficient [C] is $$ C \sim 3 / 4$$. On the other hand, in the random network, path length [L] is $$ L \sim ln (N) / ln (k)$$, and clustering coefficient [C] is $$ C \sim k / N$$. Therefore, regular network is highly clustered; however path length is large. On the other hand, random network is poorly clustered and path length is small. There are several equations to explain clustering coefficient [C] and path length [L]. According to Newman, clustering coefficient [C] can be estimated by $$C=3(k-2)/4(k-1)$$. Moreover, when considering the dimension [m], clustering coefficient [C] can be estimated by $$C = 3(k-2m)/4(k-m)$$.

Barrat and Weigt proposed the formula of clustering coefficient [C] and path length [L] considering the probability p. Path length [L] significantly changes depending on n and p.

$$ C = \dfrac{3(k-2)}{4(k-1)}(1-p)^3 $$

$$ L(n, p) = \dfrac{1}{N(N-1)}\sum_{i \neq j} d_{ij} $$

$$d_{ij} $$ is a chemical distance between node i and j.

Regarding the connectivity distribution, each node has same connectivity [k] in the regular network; however, as probability $$ p $$ increases, distribution of connectivity is non-uniform and distribution becomes broader while average connectivity is $$ k $$.

Small world effect
“Small world effect” means that average distance [d] increases logarithmically with the total number of node [N]. This is different structure of regular network in which average distance [d] increases linearly with the total number of node [N]. On the other hand, this is similar to a random network. However, small world networks are not the same as random networks because they exhibit clustering. Small world networks have both characteristics of regular networks and random networks because they are highly clustered like regular networks and have small path lengths like random networks.

Relationship between the network structural characteristics [L and C] and probability [p]
Figure.5 shows a relationship between the network structural characteristics [L and C] and probability [p]. When $$ p $$ is small, the value of path length [L] declines rapidly while clustering coefficient [C] is almost all fixed. On the other hand, when $$ p $$ is large, the path length [L] is almost constant while clustering coefficient [C] declines largely. The range of small world is that the value of $$ C $$ is much larger than the value of random network; however, the value of $$ L $$ is close to the value of random network.

Watts and Strogatz analyzed three different network in order to test the idea of the small world network; “film actors”, “power grid” and “neural network (Caenorhabditis elegans)”. In terms of film actors, vertices are actors and when two actors joined in same film, edges are created. In terms of power grid, vertices are generators, transformers, and substations, and edges are transmission lines. In terms of C. elegans, vertices are neurons and edges are links by synapse or gap junction.

Watts and Strogatz illustrated actual value and random value of path length [L] and clustering coefficient [C] (Table.2). They found that the actual value of path length [L] is similar to the value in random network, but the actual value of clustering coefficient [C] is significantly larger than the value in a random network. Therefore, all three networks show characteristics of small world networks. Also, they investigated the time of expanding infectious diseases, and they found that disease expands faster in small world network because short cuts exist in the network. Moreover, the time for spreading infectious diseases shows a similar plot curve as path length [L]. All these empirical studies suggest that real world networks show similarity to small world networks.

Classes of small world network
Prizmič mentioned that connectivity distribution in real world network follows the power law by illustrating the example of WWW network. This network shows “small world effect” because the shortest path between two documents increases logarithmically with the total number of node (N). It means that shortest path doesn’t change significantly even if the web size increases. This is one type of small world network.

Amaral et al., mentioned that small world network has three types of network; 1) scale-free networks, 2) broad-scale networks, 3) single-scale networks.

1)	scale-free networks: “vertex connectivity distribution decays as a power law”

2)	broad-scale networks: “connectivity distribution has a power law regime followed by a sharp cutoff”

3)	single-scale networks: “connectivity distribution with a fast decaying tail”

Examples of scale-free networks include WWW and citation network of science papers. Broad scale networks are movie actors. Single scale networks are power grid, friendship at junior high school, neural network, and polymer chain model.

= Application of small world network to transportation field =

Airport network
Amaral et al., analyzed the airport network regarding the number of passenger in transit and the amount of cargo. The result shows that airport network is a single scale network because connectivity distribution decays larger than scale free network. The author states capacity limitation is the reason that airline network doesn’t show scale free network. Aviation network has hub-and-spoke structure, and airline companies are likely to connect to the hub airport; however, the capacity of handling cargo and passenger is limited at each airport. Therefore, the connectivity distribution doesn’t follow the power law model.

Boston subway network
Latora and Massimo analyzed the Boston subway network to see whether this network is small world network or not by using path length [L] and clustering coefficient [C]. While they calculated path length (L=15.55), they couldn’t calculate clustering coefficient because some nodes connect to only one node in the network. Therefore, they used an alternative factor, efficiency [E], to explain small world network. When the shortest path length [d] is longer, efficiency [E] is small. The meaning of $$ E_{glob} $$ is ”efficiency of the whole network” and $$ E_{loc} $$ is “average efficiency of the sub graph of the neighbors of a node”. The author mentions that when both $$ E_{glob} $$ and $$ E_{loc} $$ have high value, this network shows a small world network. The result (Table.3) illustrated that Boston subway doesn’t show the characteristic of small world network because $$ E_{loc} $$ is small while $$ E_{glob} $$ is large. On the other hand, when the author adds bus system into subway network, both $$ E_{glob} $$ and $$ E_{loc} $$ are large. As such the Boston transit network can be modeled as a small scale network.

$$ E = \dfrac{1}{N(N-1)}\sum_{i \neq j} (1/d_{ij}) $$

Autocorrelation statistics on different size of transportation network
[[File:Small world Moran's I and Getis's G.png|thumb|600px|Figure.6 Relationship autocorrelation statistics [Moran's I and Getis's G] and probability [p] [Derivative work: Image is from Xu and Sui ]] Xu and Sui introduced the method to analyze “small world effect” from the relationship between small world properties and network autocorrelation statistics; Moran's $$ I $$ and Getis's $$ G $$.

Moran's $$ I $$ is one of autocorrelation statistics, and positive value of Moran's $$ I $$ illustrates the similarity to the neighbor; on the other hand, negative value of $$ I $$ explain the dissimilarity to neighbors. In terms of Getis's $$ G $$, high value of $$ G $$ indicates that high values are likely to bundled with each other, and low value of $$ G $$ indicates that low values are more bundled. In the regular network, Moran's $$ I $$ is high and Getis's $$ G $$ is low; however, as the probability [p] increases, Moran's $$ I $$ decreases dramatically and Getis's $$ G $$ increases gradually.

Moreover, these two values are intersected with each other in the range of small world network. A low value of Getis's $$ G $$ illustrates that the node has low connectivity is likely to be clustered, while a low value of Moran's $$ I $$ indicates that global correlation is small. This explains the characteristic of small world network, therefore the intersection point between Moran's $$ I $$ and Getis's $$ G $$ shows the small-world network.

They applied this method to three transportation network; national (US interstate and highway network), metropolitan (road network in Houston-Galveston area), and intra city (Boston subway network). The result showed that Moran's $$ I $$ and Getis's $$ G $$ is significantly influenced by the lag distance. According to Xu and Sui, the meaning of lag distance is “the size of neighborhood, namely, how many neighbors are being counted in computing network autocorrelation statistics”. As lag distance increases, Moran's $$ I $$ decreases; on the other hand, Getis's $$ G $$ increases. Moreover, they intersect with each other when the value of Moran's $$ I $$ and Getis's $$ G $$ is low. They found out that the value of the intersection point is similar among all transportation networks.

Relationship between congestion factor and network type
Wu et al, investigated the relationship between congestion factor and traffic volume in three types of network (small world, scale-free, random). In order to assign a traffic flow, user equilibrium (UE) method is used and link capacity is randomly assigned. Network size is $$ 400 \times 400 $$ and average coordination number [k] is 7.

The result illustrates that when the traffic volume is small, scale-free and small world networks are more congested than random networks. On the other hand, when the traffic volume is large, random networks are more congested than scale-free and small world networks. This is because most of the traffic is first concentrated on hub nodes in scale free networks and small world networks, but increase of congestion is relatively slow as traffic volume increases. On the other hand, in the random networks, the number of congestion link grows steadily as the traffic volume increases. Compared with between scale-free and small world networks, congestion is more likely to occur in small world network. Therefore, scale-free networks can handle large amount of traffic volume among three types of networks. Moreover, they analyzed the relationship between congestion factor and rewiring probability [p] and clustering coefficient [C], and results show that when the probability [p] and clustering coefficient [C] is large, the congestion factor is small.

= Conclusion = “Small world network” is in between perfectly regular and random network. Duncan and Strogatz explained this network mathematically by using two characteristics; path length [L] and clustering coefficient [C]. Small world network is highly clustered like regular network and path length is small like random network. Based on the research, they illustrated that real world network have the feature of small world network in biological, technological, and social networks. This is a great discovery because small world network has the potential to model the real world network.

Researchers focus on this topic because it is important to understand the communication (e.g., spread of news, rumors, and fashions). Especially in the medical field, it is required to figure out the feature about expansion of disease (e.g., HIV, SARS). Also, this topic is expanded to transportation research field. Real world network structure is not uniform and it has several types of network, so it is difficult to model the real world network. Research is a work in progress and researchers try to explain this complex network.

= References =