Transportation Geography and Network Science/Social networks

Social networks are networks in which the vertices are people, and the edges represent some form of social interaction between them, such as friendship. Drawing from graph theory, networks are comprised of vertices ("actors") and edges ("ties").

Actors may be individuals or groups, and different types of actors may be included, forming a bipartite graph. An example of a bipartite graph would be of individuals and social organizations. In this case, no ties exist between nodes of a common type (in this case, individuals or organizations).

Ties represent relationships. Relationships could be friendships, but they could also be professional or business relationships, communication relationships, economic relationships, and so on. The types of relationships that appear will depend upon what aspects of the network the researcher is trying to capture. Ties may also incorporate magnitude or character, and plotting multiple different ties between actors results in multiplex relationships. Multiplex relationships may exist between neighbors who go to dinner parties, do lawncare, or babysit for/with one another. These relationships have different characters but exist between the same actors.

Social Network Models
Social networks are an integral part of a functioning society and economy. A social network can be thought of as a group of individuals as well as the connections between those individuals. They spread a wide variety of information and services. For example, an individual searching for a job will search through formal postings, but will also likely let his network of friends and associates know that he is searching for a job and in turn they may let him know of positions they are aware of. In addition, an employer searching for a candidate for an open position may ask other employees if they know of anyone who could be a good fit for the position. Alternately invitations to social gatherings and professional events often spread via word of mouth from an original formal notice to a select group of individuals. Due to the extent and impacts of social networks, several attempts have been made to generate models of social networks and their structure. The following is a brief overview of the types of problems that are typically considered in the context of social networks, as well as the theory utilized in the formation of social network models.

Common Queries
The majority of studies involving social networks can be categorized into one of four potential goals or queries. The first involves the ability to detect communities or groups within larger social networks. The second is the determination of the most influential individuals in a social network for marketing purposes. The third involves the use of incentives in social networks to improve the likelihood of receiving an answer to a posed question. The fourth regards the formation and development of social networks.

Community Detection
Communities are subgroups of larger social networks in which individuals are generally densely connected. An example would be the drama club in a high school. Members of the drama club would likely spend a considerable amount of time with one another, and as such it could be anticipated that any individual in the drama club would directly be connected with most if not all of the other members of the drama club. A second community in the high school might be the football team. Again, members of the football team would likely be directly connected to most or all of the other member of the team. However, it could be anticipated, that their would be very few direct connections between the drama club and the football team. This illustrates the other distinction of communities; there tend to be relatively few connections between communities in comparison to within communities. According to Narahari and Narayanam, the detection of communities or subgroups in a social network can aid in the understanding of the larger network.

Discovering Influential Individuals
Imagine you have a new product, and you would like to infiltrate as much of the market as you can. In order to do so, you must first make people aware of your product. One method could be to utilize the so called 'word of mouth' of existing social networks. Basically, you give information or potentially free samples to a few individuals. They then recommend the product to their connections, who may or may not accept their influence. If so, the recommendations continue through the network. For each set of initial individuals, a subset of the network will utilize the product. However, if you wish to maximize your profit, you need to find the most influential set of initial individuals to give your free samples to. This set of individuals will result in the largest subset of the social network utilizing your

product. Kempe Kleinberg and Tardos, suggest the use of a greedy heuristic and a general threshold and cascade model of diffusion in order to estimate the k most influential individuals in a social network.

Query Incentive Networks
Kleinberg and Raghavan initially discussed this problem in the context of internet protocol. The idea concerns the decentralized request for either information goods or services. By decentralized, it is meant that the originator of the request does not pose their question to a central index such as Google or a library catalog. Instead the request is forwarded to a group of peers, with the expectation that they will forward the request through their social networks, and report back an answer when it is found. However, it is theorized that there is an effort associated with the forwarding of a request and reporting back an answer. In order to get past this the request must be sent with an associated reward for a returned answer or service. In addition each individual included in the answer chain, must receive a portion of the reward for their efforts, or they are unlikely to make the effort. Klienberg and Raghavan, suggest that this process occurs in three steps: the propagation of the request, indication to the requester when answers are found, and the selection of an answerer and the report of the answer back to the requester. It is argued that most individuals associate minimal effort with the first two stages, and thus will likely only require compensation if their effort is required for the third step. They suggest that at each level of propagation the reward is reduced by the propagators desired cut in the event an answer is found. As such, at some point the request will not continue to be propagated, due to a zero remaining value of the reward offer for 'child' nodes. So, if a reward or incentive is not large enough an answer may not be reached. It is therefore desirable to determine the reward level required to produce a certain probability of achieving an answer. Raghavan and Kleinberg utilize game theory in their modeling of this process. Basically each individual is expected to strategically attempt to maximize their payoff, and the network is modeled as an infinite tree with a branching parameter in which any given node has a probability of having the answer. This formulation allows for a unique Nash equilibrium.

Social Network Formation
Social networks are generally formed via the interactions of a large number of individuals with their own needs and desires. Networks resulting from such interactions are generally considered to be equilibrium networks; however they are not necessarily efficient networks. Efficient networks are generally thought of as centrally enforced networks. These networks are also frequently referred to as star networks and are characterized as having a central node i that every other node is connected to. In other words, every link in the social network has node i at one end (Jackson and Wolinsky). In contrast, equilibrium networks are highly variable and dependent on assumptions about individual behavior as well as mechanisms for the formation and severance of links and the manner in which links are valued. For example some authors assume that the number of links between individuals impacts the value of the link and any potential information that could be shared between the two nodes. Watts explores the results of different assumptions about the values and costs of indirect and direct connections on the types of networks that a network formation process has as its equilibrium. She concludes that efficient networks, star networks, are unlikely to be the result of formation process in which the value of an indirect link is assumed to be greater than the net value of a direct link in cases where there are four or more individuals in the network, with decreasing probability as the number of individuals in the network increases. Richards on the other hand explores a model parameterization of social networks based on levels of leadership, bonding, and diversity which he compares to small world model analysis, random graphs, and scale free graphs.

Theoretical Frameworks
According to Narahari and Narayanam, social network analysis is based in techniques developed mainly in four other fields: graph theory, spectral theory, optimization theory, and game theory.

Graph Theory
In general, a graph consists of a number of nodes (or vertices) which are connected via links (or edges). In the context of social network models, individuals are considered to be nodes and they are connected to other individuals vial links. Three graph constructions are used to model social networks: Erdos-Renyi random graphs, small world constructions, and scale free graphs (Richards). Erdos-Renyi random graphs are randomly generated with each link between any given pair of nodes having equal probability, and have properties similar to local road networks. Small world graphs on the other hand tend to have high levels of clustering with any two nodes linked via a small chain of nodes (Brown et. al.). Scale free graphs on the other hand tend to exhibit power-law distributions, wherein a few individuals have many connections and most individuals have few connections. Brown et. al. argue that real world social networks as a general rule exhibit both power-law distributions, dense clustering, and short chains of connections between any two individuals. For example, consider the well-known study by Stanley Milgram, in which he finds that on average the shortest path between any two individuals had a chain of six links.

Spectral Theory
Spectral theory is a mathematical theory concerning the simplifiction of operators, such as integrals for example, into summations of simpler operators. These processes are particularly helpful in the analysis of matrices and their properties (Spectral Theory). In regards to social networks, there is a specific subset of spectral theory which uses matrix representation of graphs, such as those used in the definition of social networks. Spectral graph theory, allows for the analysis of graphs and the properties of graphs based on the matrix representations of those graphs (Liberty).

Optimization Theory
Optimization theory basically works to either maximize or minimize objective functions via both linear and nonlinear programming techniques. Frequently, as is the case with social network modeling and analysis, realisitic optimization problems do not have closed form solutions and are to large to solve in a straightforward or brute force computational manner. As such, a large set of heuristics have been developed in order to find near optimal solutions. One such heuristic the greedy heuristic, is utilized by Kempe, Klienberg, and Tardos in their paper discussing the identification of the set of the k most influential individuals.

Game Theory
Narahari and Narayanam argue that game theory is fundamental to accurate understanding and modeling of social networks, because the three theories mentioned thus far cannot account for the strategic behavior and decisions of individuals in the social network. Game theory, is a mathematical theory focused on the behavior of a set of players who are assumed to be rational and independent of each other. As such, it can be very useful in modeling the decisions of the individuals in social networks, whether they are deciding to form a new link, sever an old link, forward a query, or accept the suggestions made by their peers.

Importance
The importance of understanding social networks is highlighted in a story of community survival. As related by Gans, the primarily Jewish and Italian immigrant community of Boston's West End neighborhood was incapable of even forming an organization to oppose the urban renewal project that ended up razing large parts of their hometown and drastically altering their way of life. While social cohesion was observed to be very strong between persons, this microscopic characteristic disguised the society's larger symptoms of fragmentation into cliques. The absence of even weak ties between these tightly grouped cliques prevented communication of ideas and the call to action between these groups.

The opportunity for profit is also a great driver to understand social networks. The internet-based social network site Facebook.com was valued by Goldman Sachs at $50 billion dollars in January 2011. Many companies and organizations are trying to leverage social media to help with marketing and demand forecasting, taking advantage of the large and growing amount of data about social interactions that are available on Facebook and from similar sources.

At the same time, due to developments in network science and computing power, standards are being accepted in the sociology community regarding measures and methods for analysis and comparison of social networks. While academic interest has surrounded the topic for some time, social network problems were often too complex and too large for researchers to handle. The increased demand for understanding, along with the promulgation of tools to investigate social networks, have driven rapid growth in subfields seeking to understand network dynamics, power, prestige, and diffusion across networks, such as would be caused by introduction of new products, new ideas, and valuable information like evacuation procedures.

History in Sociology
Conventional sociology drew individuals out of their social structures in order to study relationships between identifiable variables, thereby defining any "structure" by individual attributes. When looking for reasons to explain behaviors, “norms” were suggested, which were drawn from these categories, or from conventional wisdom or attitudes. However, this analysis at its heart amounts to either attempting to confirm prior expectations, or else correlating categories in a way which doesn’t necessarily correspond to the underlying causes being researched - by analyzing the relationships of variables within pools of individuals with no regard to the individuals' networks or network positions, valuable information that could distinguish individuals from one another and highlight present discrepancies is effectively thrown out.

Considering social networks requires relational and structural analysis between actors, which requires that ties and relationships are incorporated into the analysis. When individuals or groups with sets of categories are considered as structurally related, the norms and causes from these structures make sense. Network analysis studies individuals and their links as a means towards an overarching structural analysis, in which the structure is not simply the sum of its actors but requires understanding of the network as a whole. Likewise, it is noteworthy that structures influence but cannot control individual actions. One can now examine the density of a network or an actor’s position to assist an investigation in ways that simple categories cannot. Individuals (actors) cannot be considered the basic unit of social networks, because an individual is - at least to some extent - a product of the social structure in which she finds herself, which extends far beyond the reach of her individual person.

Interviews and Surveys
Interviews and surveys are common means of data collection for studies in the social sciences. Interviews and surveys can be used to build multiplex, well-defined networks, since they are able to utilize a consistent, transparent measure for evaluating ties and actors. They are insightful to the extent the questions asked mirror reality, and may provide insight into actor bias. On the downside, these methods often tend to be relatively laborious and inaccurate, capable of capturing and modeling only small networks. However, if well-constructed, surveys can avoid or mitigate many shortcomings.

Name generators ask respondents to offer names to fill certain descriptions: “__ is my best friend.” A common method of generating data, name generators are, by nature, directional and so produce digraphs; they also tend to be very limited in their ability to finding weak ties, due both to survey construction (which tends to focus on an actor's strongest and richest ties) and to respondent memory and bias. Along similar lines, name generators tend to artificially bound an actor's (out-)degree by only asking for so many names rather than allowing for an exhaustive list to be compiled.

Using names from directories means grabbing a random sample of last names from a directory, e.g. phone book, and asking respondents to list people they know with that name – useful for “how many people do you know” studies. It must be cautioned, however, that biases in regional names or populations be addressed when attempting to use this method and extrapolate its results, especially for one of these studies.

Snowball sampling asks respondents to pass surveys or some other indicator on to their contacts, who pass it on to their contacts, and so forth. This method can be very enlightening in studies of network dynamics and allow access to populations the researcher otherwise might have otherwise been excluded from. On the other hand, response rates tend to be low, and initial surveys must be distributed carefully if getting many surveys caught in small, tight populations is to be avoided.

Direct Observation
Observation requires watching and recording actors objectively, and clearly defining actors and ties. In some social networks, for example, those of animals, this method is the only one available to researchers. Direct observation has the advantage of avoiding inaccuracies due to memory lapse or bias, as well as the inclusion of every observable actor rather than only those who opt to respond to a questionnaire. This method usually allows for collection of reliable, accurate data in a short period of time, although ties that are observed may not be as clearly defined as is possible with a survey, and nuances or multiplexity present in those ties that are discerned may be missed by a distant third party.

Diaries may be kept by a researcher or research subject, with the basic purpose of recording names of people and interactions. These may also serve as name generators for other studies. However, diaries are arduous for research subjects to keep, and fatigue and complacency may bias results.

Archival and Third-Party Records
Archival records include historical records, but may also include electronic chains: e-mail records, internet links, public-key signing network. Like direct observation, collection of archival data limits exposure to inaccuracies due to memory lapse, bias, and low response rates. A large amount of data may also be accessed and organized in a very short time. However, similar issues arise in that ties may not offer clear indications of strength or multiplexity; this can be an especially difficult problem with archival records because the researcher is far removed from the actors and ties being examined, and the data used is usually final, with limited opportunity to add to it or fill in gaps.

Regardless of the method used, the resultant network must be framed and qualified in such a way that data collection methods (and the implications thereof) be made clear.

Social Network Analysis
Since types and meanings of social networks are varied, so are the methods use to analyze and compare them. Some measures may transfer well between analyses, while others are better-suited (or exclusively meant) for specific types of networks. The list of methods utilized should be tailored to suit the researcher's analytical goal. Following is a survey of basic methods and concepts that form a basis for social network analysis.

Network Size
Every network requires a boundary, which must be arbitrary to some point. “Total network” studies are impractical, although methods exist to estimate the connectedness of very large networks. Sola Pool used names drawn from phone books as well as diary methods to estimate the number of people he and other subjects knew by face and name. Stanley Milgram's famous 1967 "six Degrees" experiment used snowball methods to transfer journals from residents in Nebraska and Boston to a Boston stockbroker, with each successful journal traveling across a mean 5.2 ties to its target.

For studies that require a pre-defined boundary, within which all actors and ties should be examined, geography often provides a suitable guide for partitioning populations. This is not only convenient for researchers, but a 1986 study by the French government concluded that '[Social] circles coalesce in certain places.' While actors within these circles surely participate in ties outside of them and outside of the geographical area of interest, a geographical boundary that is well-drawn, perhaps with the aid of a person who knows the area and its inhabitants, can provide a useful and distinct boundary for the social network to be constructed.

Ego Networks
A common form networks may take is that of the ego network, which comprises a central actor (the ego) and her contacts (alters). While the ties between the ego and alters may be enlightening for some purposes, often more important are the ties between alters. In fact, ego networks are often considered without the ego or his links to the alters, as these tend to be consistent and, as such, unnecessary. (In the case that they are not, of course, including the ego as an actor may still be prudent.)

But how far out from the ego should we look to capture the network that affects him? Mark Granovetter suggests that two degrees, or "friends of friends", should be sufficient for most purposes. Regardless of the extent of the network, ego networks are useful for examining local network properties and their possible influence on an actor.

The Strength of Weak Ties (1973)


"The strength of weak ties" is Mark Granovetter's groundbreaking work that shed light on the impact of tie strength on network analysis, with strong implications for network dynamics and community cohesion. Granovetter uses concepts of the strength of ties (a tie that is present is considered either strong or weak, depending on some measure of social proximity of the actors it links) and transitivity of ties to propose a "forbidden triad." The potential tie between two alters to an ego depends upon the strength of each alter's tie to the ego: if one or both are tied only weakly, there is no need for the alters to be tied; if, however, both have strong ties to the ego, there is a great likelihood that they will have met and will have enough in common to have established (at least) a weak tie. The forbidden triad, then, is that which violates this transitivity.

The immediate implication of this premise is that only weak ties may be “bridges,” or unique connections between two groups of actors. Of course, following Milgram's premise of "Six Degrees" (above), no groups will ever be perfectly isolated, but their proximity in the absence of such a direct tie can be so low as to consider the long path that connects them to be negligible. In short, a tie is valuable because of its uniqueness, and transitivity of ties causes strong ties to become less unique: this ability to be very unique is the strength of weak ties.

Drawing from notions of network resilience and reliability, the unique positions that weak ties may occupy indicate their relevance in increasing a network's connectedness and centralization. In ego networks, the absence of weak ties (which may also be caused by a preponderance of strong ties) leads to actor encapsulation. Drawing this phenomenon out to larger networks, an examination of microstructure shows that strong cohesion prevails, but the absence of weak ties to connect these tight clusters of actors results in a very weakly connected set of groups making up the community, which hinders communication and a sense of identity across the network. Taking Boston's West End (above) as a prototype, communities exhibiting this sort of structure are at risk when facing a common threat that requires a unified effort on their part.

Besides community survival, weak ties play a role in assisting in the job hunt. While it may be readily hypothesized that strong ties may be more useful for an ego in making contacts since those alters connected by strong ties will have greater motivation to assist ego, and empirical evidence bears out the idea that network structure appears to have primacy over motivation in gathering references while job hunting.

In terms of idea innovation and adoption, encapsulation again opposes diffusion. Studies of drug prescriptions have shown that “first adopters” to prescribe new drugs are marginal in networks of doctors, while “early adopters” are more centrally located and tied to the first adopters. (That is, they happened on this new idea and spread it well to their contemporaries through their position of prestige and strong ties.) Granovetter suggested diffusing new information starting with people with many weak ties. These will be marginal in a large number of networks, allowing access to more potential “early adopters.”

Density and Multiplexity


Density and multiplexity are measures of network “tightness.”

Density is a network characteristic, and is the ratio of existing links to possible links.


 * $$Density = \frac{2t}{n\left(n-1\right)}$$

where


 * $$t$$ = the total number of existing ties, and
 * $$n$$ = the total number of nodes.

Multiplexity is a measure used to describe relationships between actors that may include different types or strengths of ties. Multiplex relationships may exist, for example, between neighbors, who go to dinner parties, do lawncare, or babysit for/with one another. These relationships have different characters but exist between the same actors. Numerically, multiplexity may be considered to be the ratio of total exchanges to total ties. If it is boiled down to a number, the magnitude of this number characterizes an "average tie strength."


 * $$Multiplexity = \frac{e}{t}$$

where t is as above, and


 * $$e$$ = the total number of exchanges in all ties.

Cohesion and Equivalence
Cohesion and equivalence are properties that relate actors and groups of actors to one another.

Cohesion is concerned with cliques, or strongly connected groups of actors, and the overlap between these. Affiliation networks, which are bipartite networks that include some actors that are, for example, individuals and some that are organizations or other meeting groups, are especially able to depict network cohesion. Relevant terms used in examining cohesion are:


 * n-clique – clique with paths of length ≤ n between any two nodes.
 * k-plex – a set of nodes such that each is adjacent to all but k others.

These definitions are applied maximally – that is, n-cliques and k-plexes include as many actors as possible.

Equivalence deals with common types of relations and roles between classes of actors, even in the (possible) absence of direct ties. One example of equivalence is the relationship of doctor to patient. Not all doctors know all patients; neither all doctors know each other nor all patients know each other. Yet the relations of a given doctor to a given patient bear resemblance.

Regular equivalence requires that every actor must have at least one tie befitting her class, and relate consistently to actors in other classes.

Structural equivalence requires that every actor in a class relates to every other in a certain class in a certain manner.

At first, this latter definition might appear hard to achieve, but structural equivalence may be possible in a case such as that all members of an army at war must treat members of the opposing army as enemies.

Equivalence can be sticky (which is why it may also be nice.) Take an example from teaching: a graduate student might be a teaching assistant, teaching courses but also take classes. Since the student is sometimes learning and sometimes teaching, he is equivalent to neither. Moreover, a hierarchy trying to take into account graduate students cannot use measures of equivalence defined in this way. In practice, statistical approaches to equivalence, along with analysis of clusters, is usually suitable.

Centrality, Power and Prestige
An actor's centrality, power, and prestige are measures that are largely influenced by her network position and relationships. While the former is a well- (albeit variously-) defined term in network science, concepts of power and prestige are particular to social networks, and are subject to nonlinear and confounding influences from surrounding actors.

Centrality concerns an actor's position in a network with respect to all other actors' positions. Based on the network and purpose of research, different measures of centrality may used. Degree centrality counts the number of ties (in, out, or both) that an actor has, and is therefore a straightforward but local measure. Closeness centrality gives a measure of “inverse remoteness” to other nodes by taking into account the shortest distance between an actor and every other actor in the network in its computation. Betweenness centrality is similar to closeness centrality, but demands that the point be an intermediary on many geodesics that connect other actors to one another, thereby capturing the importance of an actor's position to the relationships that take place over the whole network. Flow betweenness requires that ties be weighted by the flow of some good or information along them, and simply adds up flows over the same geodesics as the computation for betweenness centrality.

Centralization is a measure of a network that combines the centrality of all its actors. It may be normalized and used for comparison of networks.

Power and prestige are based on the idea of centrality in a social network, but centrality is often not an adequate measure to model these real-world concepts. Organizational social scientists say “there is no power, only powerful ties.” It is also acknowledged that different types of power exist, though there is general consent that these are inversely related to an actor's vulnerability and dependence. Power or prestige may also be dwarfed (or amplified) due to proximity to another actor with higher power or prestige. In short, these factors must include power or prestige of nearby actors. The following measures also utilize directional tie values.

One formulation that takes this into account is Bonacich’s measure of centrality:


 * $$C_i = \sum_j \left(\alpha + \beta C_j\right)r_{ij}$$

where


 * $$\alpha$$ = normalizing parameter
 * $$C_j$$ = centrality of actor j
 * $$r_{ij}$$ = value of i-j relation
 * $$\beta$$ = modeling parameter

The modeling parameter $$\beta$$ is chosen with regard to the situation at hand. $$\beta$$ = 0 means that centrality increases with number and magntiude of direct relations. $$\beta$$ > 0 takes alter centrality into account in a positive manner, such that being near powerful or prestigious actors increases one's own power or prestige. $$\beta$$ < 0 takes alter centrality into account in a negative manner - forcing actors to "share the limelight."

Another valuable measure is that of extended relations, which weights relations and results in a normalized average marginal strength of received ties. This measure sums an actor's relationship as a proportion of its importance to the alter, and is therefore a good measure of prestige:


 * $${ER}_i = \sum_j \frac{\left(z_{ij}/{max}\left(z_{jk}\right)\right)}{\left(N-1\right)}$$

where


 * $${ER}_i$$ = extended relation value of actor i, a number between 0 and 1
 * $$z_{ij}$$ = strength of i-j tie
 * $$N$$ = total number of actors in network

Power is also considered as a quantity, its calculation taking into account exclusivity and power of ties, whereby power is shared positively between actors:


 * $$p_i = \sum_j \left(z_{ji}/\sum_k z_{jk}\right)p_j$$

where


 * $$p_j$$ = power of actor j
 * $$z_{ij}$$ = strength of i-j tie

As with extended relations, this last measure sums a similar measure of an ego's importance to each alter times each alter's power. This requires $$p_j$$ to compute $$p_i$$, and also vice-versa, so an iterative algorithm must be used.

Structural Balance
The case of triads is considered, where ties may be either positive or negative. Four triads are possible, and these may be considered either balanced or unbalanced. Balanced graphs are either entirely harmonious or exhibit an "enemy of my enemy is my friend" characteristic. Unbalanced graphs tend toward balance, but do subsist - consider "love triangles" with enmity on only one side, or three-way races in which each actor competes against each other. Naturally, these triads are just building blocks for what readily become very complex structures.

Future Opportunities
The explosion of interest and capability in the study of social networks has allowed a number of fields to be highlighted for future study. Longitudinal studies are possible and feasible with social network sites, e.g. Facebook, and one is being undertaken by the Christakis group at Harvard. The study of power in formal and informal organizations can help optimize hierarchical organizations (such as governments and militaries) into more effective structures. Studies of network dynamics, including innovation adoption, will continue to be useful in marketing as well as politics, and studies of concepts of network equilibria and the "social contract" may impact political science and civics. And of course, sociology will always continue its study of norms, social order, and social action, and the powerful lens of network science will aid the field in its research.