Introduction to the SNA Library for R Carter Butts Department of Social and Decision Sciences and Center for the Computational Analysis of Social and Organizational Systems Carnegie Mellon University v0.0.1, 5/6/00 1. Overview The following is intended to provide a very basic overview of some of the features of the SNA library. As the library itself is a work in progress, there is some chance that later revisions may not correspond to a greater or lesser extent to what is presented here; eventually, this should be rectified by the replacement of this document by a full set of R man pages. In any event, this brief introduction should serve to give newcomers some notion of how to proceed. 1.1 Purpose The purpose of the SNA library is to generate a useful, full-featured toolkit for network analysis within an established statistical computing environment. This library is to be made freely available via copyleft to facilitate use, criticism, and further development of the package by the network community. It is hoped that this toolkit will facilitate the use of statistical network analysis methods by network analysts, and will ease integration of network methods with those of standard statistical analys> is. 1.2 Functionality At present, the SNA library includes the following functions: addisolates - Add isolates to a graph set bbnam - Draw from Butts' (Hierarchical) Bayesian Network Accuracy Model bbnam.probtie - Internal routine for finding tie likelihoods under bbnam betweenness - Find the betweenness centralities of network positions centralgraph - Find the central graph of a graph stack centralization - Generic centralization routine (uses centrality routines) closeness - Find the closeness centralities of network positions cugtest - Generic Conditional Uniform Graph (CUG) test for bigraphic statistics degree - Computes the degree centralities of network positions diag.remove - NAs the diagonals of graphs in a stack event2dichot - Convert observed event matrices to dichotomous matrices gclust.boxstats - Plot statistics associated with clusters gclust.centralgraph - Get central graphs associated with clusters gcor - Computes the correlation between graphs gcov - Computes the covariance between graphs gden - Computes the density of one or more graphs geodist - Finds geodesic distances and shortest paths within a graph gliop - Return a binary operation on GLI values computed on two graphs (for test routines) gplot - Plots graphs using eigenvector projection, classical MDS, or custom coords graphcent - Find the graph centralities of network positions grecip - Computes the reciprocity of one or more graphs gvectorize - Vectorization of adjacency matrices hdist - Computes the Hamming distance between two labeled graphs netcancor - Canonical correlations for network variables (requires mva) netlm - Fits a network OLS regression model (w/CUG or QAP tests) netlogit - Fits a network logit model (w/CUG or QAP tests) npostpred - Take posterior predictive draws for functions of graphs nties - Find the number of ties in a given graph numperm - Get the nth permutation vector, using a periodic numbering scheme plot.bbnam - Plotting for bbnam objects plot.cugtest - Plotting for cugtest objects plot.qaptest - Plotting for qaptest objects print.bbnam - Printing for bbnam objects print.cugtest - Printing for cugtest objects print.netcancor - Printing for netcancor objects print.netlm - Printing for netlm objects print.netlogit - Printing for netlogit objects print.qaptest - Printing for qaptest objects print.summary.bbnam - Printing for bbnam summary objects print.summary.cugtest - Printing for cugtest summary objects print.summary.netcancor - Printing for netcancor summary objects print.summary.netlm - Printing for netlm summary objects print.summary.netlogit - Printing for netlogit summary objects print.summary.qaptest - Printing for qaptest summary objects qaptest - Generic QAP test for bigraphic statistics rgraph - Draws Bernoulli graphs rmperm - Randomly permutes the rows and columns of a graph stack sdmat - Estimate the matrix of structural distances among unlabeled graphs sr2css - Convert a row-wise self-report matrix to a CSS matrix with missing observations stresscent - Find the stress centralities of network positions structdist - Estimate the structural distance between unlabeled graphs summary.bbnam - Detailed printing for bbnam objects summary.cugtest - Detailed printing for qaptest objects summary.netcancor - Detailed printing for netcancor objects summary.netlm - Detailed printing for netlm objects summary.netlogit - Detailed printing for netlogit objects summary.qaptest - Detailed printing for qaptest objects upper.tri.remove - NAs the upper triangles of graphs in graph stacks These functions run the gamut from simple utilities (sr2css, nties) to classical measures (betweenness, degree) to more exotic routines (bbnam, netcancor). While the range of functionality does not rival more established packages such as UCINET, the author has found that having access to the full power of the S language under R in conjunction with these routines generally compensates for the deficit. Hopefully, future development will also serve to improve the functionality of this toolkit. 1.3 Caveats A number of caveats apply to the 0.0.1 release of the SNA library. First and foremost, it must be emphasized that this is alphaware -- it is mostly untested and almost certainly buggy, with all that this condition implies. Researchers are welcome to use the library as they see fit, but are strongly advised to check all results carefully prior to publication. (Consider yourself warned!) A second caveat is that (as should be obvious by now) the SNA library is very poorly documented at present; hopefully,> this will eventually be rectified, but for the moment the user is more less left to fend for him or herself. Finally, the caveat should be offered that this software is made available on an essentially unsupported basis. While the author is certainly interested in hearing about bugs, and will in general try to fix them, he is regrettably unable to provide assistance to users, to fix bugs on demand, or otherwise to supply software support. The author is neither callous nor disinterested in this; he does> , however, have a number of other (rather pressing) demands on his time. 1.4 Development Platform The current version of the SNA library was developed for R version 1.0.0, using the RedHat Linux 6.x port. Given the 1.0 code freeze, it is not expected that R development (which is proceeding rapidly as of the time of this writing) will break the SNA library any time soon. (Or, at least, let's hope not.) The library code should be executable on any R port of the appropriate version, though performance may vary. As an aside, most of the SNA library should be relatively easy to port to S-Plus, a well-known commercial implementation of the S language (of which R is a non-commercial implementation). Such a port has not been attempted by the author, but he would welcome such an effort from any interested parties. 2. Setup Setup of the SNA library is quite straightforward: get R (if you don't already have it) and source the library code. An additional note in this section describes the data format which is expected of the currently implemented routines, which may be of general interest as well. 2.1 Getting R The R statistical environment is freely available from the Comprehensive R Archive Network (CRAN), among other places. Users in the US may access CRAN from its CMU mirror at http://www.stat.cmu.edu/R/CRAN; others may wish to use an alternate site, such as the primary CRAN site (http://cran.r-project.org/) or one of its other mirrors. (See the above URL for the latest listing.) As of the time of this writing, R has been implemented for a variety of UNIX variants (notably including linux), Windows, and the Macintosh, though other platforms may be supported in the future. Users should seek installation and use instructions from the CRAN site; for the uninitiated, a general text on the S langauge may also be helpful. 2.2 Loading the SNA Library (and Related Libraries) This step is quite trivial. Given that the sna.R file has been placed at some path, e.g. /foobar/sna.R, it may be loaded by entering #Load the SNA toolkit as an R source file source("/foobar/sna.R") at the R command prompt. (Obviously, you should replace "/foobar/" with the path in question.) This will load the routines transparently, and afterwards you should be able to invoke any of the standard functions as you would a native R command. (See the R manual for details.) Note that certain routines in the SNA library rely on existing R functions which are not included in the "base" package which is automatically loaded when R is executed. In particular, some (e.g., netcancor) require the "mva" package, which must be loaded manually. To accomplish this, enter the following at the command prompt: #Load the mva (multivariate analysis) library library(mva) This should load the appropriate library, unless it was not installed with R itself (unusual, as it is included by default). The "mva" package can be obtained directly from CRAN if needed, although most users should already have it available. (Note, by the way, that the SNA library cannot be loaded using the "library" command. This is because, technically, it is not an R package as yet. This will be changed in some future version of the toolkit.) 2.3 Data Format and Other Preliminaries Most of the routines in the SNA library require that graphs be stored as square matrices (created using the "matrix" or "array" commands) or as mxnxn arrays, where the first dimension indexes each graph and the second two index the rows and columns (respectively) of the adjacency matrix. In general, the routines in question are fairly intelligent about dealing with user data (e.g., most will automatically recognize and deal with graph stacks as opposed to single graphs) but it is not wise to force the iss> ue where it can be avoided. Many routines also expect dichotomous data (though some don't) and may not check for it; caveat emptor. Those routines sensitive to such things will often accept a boolean parameter indicating whether or not the diagonal should be used, and a mode parameter which will specify whether one wishes the structure to be treated as a digraph or as a simple graph...eventually, this will be documented, but for the present one can get a good idea of what is expected by entering the command and name without the closing parentheses (e.g., "netlogit" instead of "netlogit()") and inspecting the code. Mercifully, intelligent defaults are used wherever possible, though here again the wise will not rely too heavily on this. Another preliminary which will undoubtedly be of interest is the question of loading data into R from an external source. While this is covered in standard R or S references, a brief note will be included here regarding the importation of adjacancy matrices. The process is generally straightforward: given a data file "/foobar/data.file" containing a single matrix, we load it using the command #Read a data file into a matrix, calling it "mygraph" mygraph<-matrix(scan("/foobar/data.file",byrow=TRUE),nrow=n,ncol=n) where mygraph is the name the graph is to take in R and n is the number of rows and columns. Note that this assumes each element of data.file to be seperated with whitespace, with one row per line. Alternative approaches are available; check your R documentation for details. Finally, a note here is in order regarding the computational resources demanded by the sna library. While the specific resources (in terms of storage, memory, and CPU cycles) required to use these tools will vary by platform, the sna library is generally quite resource-intensive. Given this reality, some analyses may require that R be executed with additional heap memory (see the --vsize option), and instant answers should not be expected in all cases. As development of the package continues, it is hoped that substantial efficiency gains will be possible through refinement of the underlying code; at present, however, the author is more concerned with functionality (and correctness!) than speed. 3. Creating Random Networks For the demonstrations which follow, we will need some data on which to practice; in traditional CMU fashion, we will proceed by making it up. (Yes, the author's tongue is firmly in his cheek.) In the process, we shall see some of the SNA library's functionality with respect to random graph generation, as well as some examples of how R may be used together with the SNA library to extend its capabilities in practice. NOTE: Subsequent examples will assume that these examples have already been executed. 3.1. A Random Relation and Some Covariates Our first foray into random data generation with R will consist of a simple binary relation, here imagined to represent a social support network, and a number of fictitious mental health outcomes which are related to the network in question. As indicated previously, our purpose here is to demonstrate how one may use the SNA library (and R generally) to simulate and analyze network data; the reader should not suppose, then, that the empirical content here attributed to the networks in question are anything other than conveniences to aid intuition. With this caveat, in mind, then, let us proceed to the matter of generating our fictitious support network. To begin, let us imagine that we are assessing a population of 25 actors, each of whom may have either a weak, moderate, or strong tendency to send or recieve network ties. To model this tendency, we assign each actor randomly to one of the nine possible states and draw from two binomial distributions as follows: #Create a vector of incoming tie parameters for social support sup.in<-rbinom(25,50,sample(c(0.1,0.5,0.9),25,replace=TRUE)) #Create a vector of outgoing tie parameters for social support sup.out<-rbinom(25,50,sample(c(0.1,0.5,0.9),25,replace=TRUE)) Each actor, then, has two "parameters," one reflecting a tendency to receive ties, and another to send them. For purposes of interpretation, it may be useful to think of these as reflecting the outcome of a hypothetical experiment in which each actor was independently given the chance to nominate or to be nominated by a randomly chosen actor 50 times, with the parameter value being equal to the number of nominations. (If this appears too abstract, however, simply recall that this is an essentially ad hoc mechanism for generating a fictitious network.) From our parameters, we may then draw a matrix of tie probabilities; these will then be used to draw the support network itself, using the rgraph routine. #Draw a matrix of tie probabilities for social support sup.tp<-matrix(nrow=25,ncol=25) for(i in 1:25) for(j in 1:25) sup.tp[i,j]<-rbeta(1,sup.out[i]+sup.in[j]+1,100-(sup.out[i]+sup.in[j])+1) #Model tie probs as betas Note that, in the above, we have elected to use a beta distribution to integrate the above row and column effects into tie probabilities. As it happens, this choice is not entirely arbitrary: there is a useful mathematical relationship between the beta and binomial distributions which, while employed here purely for convenience, is of considerable importance in Bayesian analysis. We shall see this later, when we employ a hierarchical Bayesian model to analyze network data. For the moment, however, our concern is generating a random network from a set of tie probabilities. To accomplish this, we employ the rgraph routine as follows: #Draw the social support graph, using the tie probabilities support<-rgraph(25,tprob=sup.tp) Given the above parameters, rgraph produces a 25-actor random digraphic adjacency matrix with a zero-valued diagonal, using the tie probabilities given by the suptp matrix. If we were so inclined, we could set other options in the above command to draw multiple matrices using the same tie probabilities, to permit values on the diagonal, or to force the matrix to be symmetric (for simple graphs). At the moment, however, we shall keep our support network in its present form, and continue with our example. Having produced a support network, we now consider some fictitious mental health outcomes associated with it. In each of the cases below, we use the rnorm function to create 25 draws from a normal distribution with mean equal to a function of one or more nodal indecies. Later, we shall examine the use of these nodel index functions more closely; for now, simply recognize that we can use properties of individual positions to more or less seamlessly create a set of associated variables for subsequent analysis. #Now for some fictitious mental health outcomes.... sup.outc.1<-rnorm(25,degree(support,cmode="indegree"),2) sup.outc.2<-rnorm(25,stresscent(support),2) sup.outc.3<-rnorm(25,degree(support,cmode="outdegree")-degree(support,cmode="indegree"),1) sup.outc.4<-rnorm(25,0,10) 3.2. Related Relations We have seen, in the previous section, how a simple random process may be employed to create a single relation. Here, we shall create several relations, such that some are related to each other. To start with, we simulate a symmetric "friendship" relation, again using rgraph: #Draw a symmetric friendship network friends<-rgraph(15,tprob=0.3,mode="graph") Note here that we have drawn a single graph of size 15, with tie probability uniformly equal to 0.3. (The rgraph routine is, as we shall see, fairly intelligent about interpreting the tprob argument; it is used as an overall tie probability, a per-matrix tie probability, or a per-tie tie probability, when given as a single number, a vector, or a matrix, respectively.) The mode parameter here tells rgraph to draw a simple graph (that is, a symmetric adjacency matrix) instead of a digraph (which is the default option). Compare this with a simulated "advice" relation: #Draw an unrelated advice network advice<-rgraph(15,tprob=0.2) In this case, we have set a uniform tie probability, but allowed for asymmetric relationships. For a third relation, let us draw a random graph as usual, but then "zero out" the lower triangle to create a hierarchy (ego "has authority over" alter). To do this, we employ rgraph as well as the lower.tri function of R. #Draw yet another network, calling it authority. We'll make it into a hierarchy.... authority<-rgraph(15,tprob=0.3) authority[lower.tri(authority)]<-0 #Zero out the lower triangle Inspection of the "authority" relation should reveal it to be an upper triangular matrix with 0s on the diagonal and an upper triangular density of approximately 0.3. Having drawn this, the last of our three unrelated networks, let us now move to the consideration of relations which are dependent upon those previously created. For the first, we will construct a fictitious "prestige" relation which depends upon both advice and friendship in a reasonably intuitive fashion. As before, we proceed by creating a matrix of tie probabilities which will then be used with rgraph to draw the actual network. Unlike our previous example, however, we here use a logistic density to determine the tie probabilities: #Draw a prestige network which depends upon the first two relations, but not the third pres.tp<-matrix(nrow=15,ncol=15) for(i in 1:15) for(j in 1:15) pres.tp[i,j]<-plogis(2*friends[i,j]*advice[j,i]+5*advice[j,i]-3*friends[i,j]-2*advice[i,j]) prestige<-rgraph(15,m=1,tprob=pres.tp) Again, prestige is a single matrix drawn as a Bernoulli graph with parameter matrix pres.tp; the m parameter, which tells rgraph how many matrices to produce, is given here for illustrative purposes, but could be omitted in this case (since m=1 is the default). Having drawn one such related relation, we now turn to the very different case of a simulated sociometric graph based on advice and authority. Here, we model the ties of the former as being iid normal, with means dependent upon the states of the latter relations. This is accomplished via rnorm, using the following command: #Draw a sociometric power graph which is based on the second and third relations, but not the first perpower<-matrix(rnorm(15^2,2*as.vector(advice)+5*as.vector(authority),3),nrow=15,ncol=15) This completes our set of five related relations. For convenience, we may wish to have all of the above in a single data structure; thus, we here elect to construct a graph stack (array) called rrel, which contains each of our relations. The basic format for rrel, in which the first dimension indexes the graph and the latter two index nodes, is the standard graph stack format used within the SNA library. Creation of the rrel stack is as follows: #For convenience, stick our relations into one stack rrel<-array(dim=c(5,15,15)) rrel[1,,]<-friends rrel[2,,]<-advice rrel[3,,]<-prestige rrel[4,,]<-authority rrel[5,,]<-perpower rrel.names<-c("friends","advice","prestige","authority","perpower") Note that we have also created a vector called rrel.names, containing the names of the relations in question. It is often useful to have such a name vector for data display purposes, as we shall see. 3.3. Some Graph Sets In the above, we created a series of relations which were (in some cases) related to one another. Here, we shall create larger sets of relation, based on a small number of "archetypal" structures. For this example, we shall think of the data as representing the reporting structures within a number of small work teams. The three "archetypes," entered by hand, are: #Enter three archetypal structures team.arch<-array(dim=c(3,7,7)) team.arch[1,,]<-t(matrix(c( 0,1,1,1,1,1,1, 1,0,0,0,0,0,0, 1,0,0,0,0,0,0, 1,0,0,0,0,0,0, 1,0,0,0,0,0,0, 1,0,0,0,0,0,0, 1,0,0,0,0,0,0),nrow=7,ncol=7)) team.arch[2,,]<-t(matrix(c( 0,1,1,0,0,0,0, 0,0,0,1,1,0,0, 0,0,0,0,0,1,1, 0,0,0,0,0,0,0, 0,0,0,0,0,0,0, 0,0,0,0,0,0,0, 0,0,0,0,0,0,0),nrow=7,ncol=7)) team.arch[3,,]<-t(matrix(c( 0,1,1,1,1,1,1, 1,0,1,1,1,1,1, 1,1,0,1,1,1,1, 1,1,1,0,1,1,1, 1,1,1,1,0,1,1, 1,1,1,1,1,0,1, 1,1,1,1,1,1,0),nrow=7,ncol=7)) These three structures -- a two-way star, a two-branching outtree, and a clique -- will serve as the basis for our tie probability matrices. In particular, we shall here assume that while each group's actual structure will loosely follow an archetype, the teams will vary somewhat in their structural characteristics. As a simple model of such a process, we then create Bernoulli graph parameter matrices from the above, allowing a symmetric 0.15 probability of deviation per tie: #Turn the archetypal structures into Bernoulli graph parameter matrices team.arch<-0.85*team.arch+(1-team.arch)*0.15 Given the archetypal parameter matrices, we draw the team structures by randomly assigning each team to an archetype (using the sample function) and then employing rgraph with the appropriate parameter matrix. The procedure is otherwise as per the earlier examples. #Generate a stack of graphs using the above team.arch.key<-sample(c(1,2,3),15,replace=TRUE) teams<-array(dim=c(15,7,7)) for(i in 1:15) teams[i,,]<-rgraph(7,1,tprob=team.arch[team.arch.key[i],,]) In conjunction with the team structures, we now create two covariates; one will be an "experience" variable, associated with the team's "choice" of archetype, and one will be a "performance" variable based on betweenness centralization. As before, we use rnorm to produce the covariates: #Generate an experience covariate of archetype and a performance variable team.exp<-rnorm(15,team.arch.key,0.5) team.perf<-vector() for(i in 1:15) team.perf<-c(team.perf,rnorm(1,10*centralization(teams,betweenness,g=i),1)) The behavior of the centralization function (which takes a centrality routine as one of its parameters) will be discussed later in this document. For the moment, however, we will move on to the generation of yet another set of relations. These relations are composed of two valued graph sets -- "trade" and "diplomacy" -- which are related to each other in multiple, complex ways. The graphs for the trade and diplomacy relations are drawn below, using rnorm, and follow the same procedural form used for the perceived power relation. #Produce two sets of graphs with complex interrelations trade<-array(dim=c(5,25,25)) trade[1,,]<-rnorm(25^2,10000,1000) trade[2,,]<-rnorm(25^2,0.7*trade[1,,],500) trade[3,,]<-rnorm(25^2,-0.5*trade[1,,]+2*trade[2,,],700) trade[4,,]<-rnorm(25^2,4*trade[2,,],300) trade[5,,]<-rnorm(25^2,-1.5*trade[1,,]-0.5*trade[4,,]+10*trade[3,,],3000) diplomacy<-array(dim=c(4,25,25)) diplomacy[1,,]<-rnorm(25^2,5,1) diplomacy[2,,]<-rnorm(25^2,0.00027*trade[1,,]-0.5*diplomacy[1,,],2) diplomacy[3,,]<-rnorm(25^2,-0.0005*trade[2,,]+0.0003*trade[5,,],1) diplomacy[4,,]<-rnorm(25^2,2.5*diplomacy[1,,]-3*diplomacy[2,,]+3*diplomacy[3,,],3) 3.4. Noisy Informants For our last data set, we will create a stack of graphs forming a hypothetical cognitive social structure (CSS) based on the reports of error-prone informants regarding a "real" relation (here taken to be a fictional directed interaction structure). We model our informants as making two sorts of errors -- false positives and false negatives -- independently with error rates which vary by individual. To set the error parameters, we take draws from a beta(2,10) distribution: #Draw the vectors of false positive and false negative error rates inform.ep<-rbeta(10,2,10) inform.em<-rbeta(10,2,10) To model the actual directed interactions, we draw a random digraph with density arbitrarily set to 0.4: #Draw the actual interaction structure inform.act<-rgraph(10,tprob=0.4) Now, for the informant reports. These are easily enough modeled using the real interaction structure and each actor's error rates: #Create a CSS from informant reports informants<-array(dim=c(10,10,10)) for(i in 1:10) informants[i,,]<-rgraph(10,1,tprob=((1-inform.em[i])*inform.act+(1-inform.act)*inform.ep[i])) Observe here that we are able to create the matrix of tie probabilities for each actor using two scalar variables and elementwise multiplication; though a simple operation, it nevertheless serves to illustrate the flexibility of the S language in dealing with data manipulation. 4. Sample Analyses Having created a variety of hypothetical data (illustrating some of the SNA library's random graph generation tools in the process), we will now consider a number of analyses which will hopefully serve to demonstrate some of the capabilities of the toolkit for network analysis. As before, it will not be our primary purpose either to provide extensive documentation on individual commands, nor to provide a tutorial on data analysis; instead, the aim of this document is to give something of the flavor of how the SNA library may be used in practice, along with a rudimentary exhibition of its primary functions. The reader, as always, is encouraged to experiment on his or her own, and to seek out additional sources of information concerining data analysis in the R statistical computing environment. 4.1. Nodal Indecies and Attributional Data The first of the sample analyses we shall consider involves the computation of nodal indecies (NLIs) and their relation to data concerning the properties (or attributes) of individuals. To accomplish this, we shall employ the fictitious social support data set developed in section 3.1 above. It should be noted that, as with all of the analyses conducted here, it is assumed that the reader has performed the necessary data generation steps before proceeding to the analysis. The analyses, similarly, should be performed in order, as most involve the creation of temporary objects which are used subsequently. 4.1.1. Computing Nodal Indecies To begin our analysis of the social support data set, we compute a number of classical centrality measures on the nodes of the support network. Under the SNA library, each centrality measure is implemented seperately; each, however, utilizes a similar command structure, and is designed so as to interface with a number of other commands (most notably centralization) which rely upon them. On the support network, then, we compute each of the currently supported centrality measures: #Compute a number of indecies on the social support data sup.deg.in<-degree(support,cmode="indegree") #Degree ---------- sup.deg.out<-degree(support,cmode="outdegree") sup.deg.free<-degree(support,cmode="freeman") #Could omit the cmode, since this is the default sup.clo.und<-closeness(support,cmode="undirected") #Closeness ---------- sup.clo.dir<-closeness(support,cmode="directed") #Could omit the cmode, since this is the default sup.bet.und<-betweenness(support,cmode="undirected") #Betweenness ---------- sup.bet.dir<-betweenness(support,cmode="directed") #Could omit the cmode, since this is the default sup.str.und<-stresscent(support,cmode="undirected") #Stress ---------- sup.str.dir<-stresscent(support,cmode="directed") #Could omit the cmode, since this is the default sup.grc.und<-graphcent(support,cmode="undirected") #Graph Centrality ---------- sup.grc.dir<-graphcent(support,cmode="directed") #Could omit the cmode, since this is the default sup.ev<-evcent(support) #Eigenvector ---------- Note that each of the above objects now consists of a vector containing the computed centrality scores for the indicated measure; as one might expect, they can be viewed by simply typing the object name, or examined using a routine such as hist or density. For the moment, however, we will not be viewing them directly. Instead, we will focus on their relation to each other, and to the hypothetical mental health outcomes generated earlier. Before proceeding, let us follow an example used in section 3, and bind our indecies together in a single data structure; this will greatly facilitate their analysis. The following commands generate just such a combined data matrix, and names the columns according to the centrality measure employed. #Combine the indecies into a single data structure sup.nod.ind<-cbind(sup.deg.in,sup.deg.out,sup.deg.free,sup.clo.und,sup.clo.dir,sup.bet.und,sup.bet.dir,sup.str.und,sup.str.dir,sup.grc.und,sup.grc.dir,sup.ev) sup.nod.names<-c("sup.deg.in","sup.deg.out","sup.deg.free","sup.clo.und","sup.clo.dir","sup.bet.und","sup.bet.dir","sup.str.und","sup.str.dir","sup.grc.und","sup.grc.dir","sup.ev") colnames(sup.nod.ind)<-sup.nod.names 4.1.2. Working with Nodal Indecies Given that we now have a set of NLIs (here centrality scores) on each node of the support matrix, how are they distributed? R provides a wide range of tools for dealing with such questions, two of which are the quantile and hist commands. Here, we print various useful quantiles of the nodal index distributions, and then plot the associated histograms: #Examine the index distributions apply(sup.nod.ind,2,quantile,prob=c(0,0.05,0.25,0.5,0.75,0.95,1)) #Print quantiles par(mfrow=c(3,4)) for(i in 1:12) hist(sup.nod.ind[,i],xlab=sup.nod.names[i],main=paste("Histogram of",sup.nod.names[i])) #Plot histograms par(mfrow=c(1,1)) Another question we might ask is that of how the NLIs here examined are related. One simple means of approaching this question is to examine the correlation structure among indecies, using the cor command. This can be performed as follows: #Examine the correlation structure among the nodal indecies sup.nod.cor<-cor(sup.nod.ind) #Compute the correlation matrix sup.nod.cor #Display the correlation matrix A quick look at the correlation matrix should confirm that the nodal indecies have substantial first-order relationships with one another. In some cases, this is true by construction: Freeman degree centrality, for instance, is a linear combination of indegree and outdegree. Even across measure types, however, we see a substantial degree of interrelatedness. To further examine these relationships, we conduct a quick principal component analysis on the correlation matrix (removing any zero variance indecies): #Perform a PCA on the nodal indecies sup.nod.cor<-sup.nod.cor[apply(sup.nod.ind,2,var)>0,apply(sup.nod.ind,2,var)>0] #Remove any NAs sup.nod.pca<-eigen(sup.nod.cor) #Extract the eigenstructure colnames(sup.nod.pca$vectors)<-1:dim(sup.nod.pca$vectors)[2] #Label the vectors for convenience sup.nod.pca$vectors[,1:4] #Print the first 4 principal components sup.nod.pca$values[1:4]/sum(sup.nod.pca$values) #Also examine the variance explained #Plot the NLIs on the first two components plot(sup.nod.pca$vectors[,1],sup.nod.pca$vectors[,2],type="n",main="Nodal Indecies on their First Two Principal Components",xlab="PrinComp1",ylab="PrinComp2") text(sup.nod.pca$vectors[,1],sup.nod.pca$vectors[,2],sup.nod.names) #Show a scree plot of variance explained barplot(sup.nod.pca$values/sum(sup.nod.pca$values),xlab="Principal Component",ylab="Fraction of Variance Explained",main="Scree Plot of Nodal Index PCA",names.arg=1:length(sup.nod.pca$values)) The exact results of the PCA will vary, given that they depend upon the randomly generated data. Fairly typical results, however, will show a very strong first component (accounting for close to half of the total variance) which correlates positively with all of the centrality indecies; this component can be thought of as a sort of "generalized centrality" index of sorts, expressing the general tendency of positions to be high or low on multiple measures simultaneously. A common second component (often accounting for 20-30% of the total variance) seperates out measures closely related to clique membership (e.g., graph, closeness, and eigenvector centralities) from those (such as betweenness and stress centralities) in which central actors are generally those whose alters are not tied to one another. In many cases, the third component appears to be related to whether a measure depends on the directed graph, or the underlying simple graph; this component is usually substantially lower in magnitude than the first two (5-10% of the total variance), and the contribution of further components tends to drop off rapidly. Plotting the centrality measures on the first two principal components tends to reveal several fairly distinct groups of measures, along with certain outliers (interestingly, often including indegree). As before, specific results will vary depending on the data, but it is fairly common to find stress, betweenness, and undirected closeness together on the one hand, with directed closeness, directed graph centrality, and indegree together on the other. Such results, of course, may not generalize perfectly to other data sets; it should be borne in mind that the data generated here is not only random, but also constructed from a particular stochastic modeling framework. (Note that we could have performed the above analysis using the prcomp or princomp routines from the mva library, as well; under R, however, the eigen analysis is trivial enough to perform directly with a minimum of fuss. We might also have examined the covariance matrix rather than the correlation matrix (as the latter constitutes an implicit rescaling of the data to constant variance units), though this would have caused our results to be sensitive to the differential scaling of the centrality scores themselves.) 4.2.3. Relating Indecies to Outcomes Having seen how we may compute and examine the properties of nodal indecies, let us now move to the matter of relating index values to outcomes. Recalling our four fictitious mental health outcomes, we may begin by examining scatter plots among the various variables in question. R's pairs command is ideal for this purpose, and hence we employ it here as follows: #View pairs plots pairs(cbind(sup.nod.ind,sup.outc.1,sup.outc.2,sup.outc.3,sup.outc.4)) Clearly, our NLIs and outcome measures are related in a number of respects. While an actual data analysis would undertake a more systematic approach towards discerning and modeling these relations, we shall for the moment content ourselves with regressing the four outcome measures on our various NLIs using the lm function. #Fit naive linear models sup.lm.1<-lm(sup.outc.1~sup.deg.in+sup.deg.out+sup.deg.free+sup.clo.und+sup.clo.dir+sup.bet.und+sup.bet.dir+sup.str.und+sup.str.dir+sup.grc.und+sup.grc.dir+sup.ev) sup.lm.2<-lm(sup.outc.2~sup.deg.in+sup.deg.out+sup.deg.free+sup.clo.und+sup.clo.dir+sup.bet.und+sup.bet.dir+sup.str.und+sup.str.dir+sup.grc.und+sup.grc.dir+sup.ev) sup.lm.3<-lm(sup.outc.3~sup.deg.in+sup.deg.out+sup.deg.free+sup.clo.und+sup.clo.dir+sup.bet.und+sup.bet.dir+sup.str.und+sup.str.dir+sup.grc.und+sup.grc.dir+sup.ev) sup.lm.4<-lm(sup.outc.4~sup.deg.in+sup.deg.out+sup.deg.free+sup.clo.und+sup.clo.dir+sup.bet.und+sup.bet.dir+sup.str.und+sup.str.dir+sup.grc.und+sup.grc.dir+sup.ev) Obviously, our earlier analysis has shown us that our regressors contain substantial multicollinearity; hence, the above model would be quite inappropriate, were we interested in anything other than mere illustration. Nevertheless, we may examine the fits and diagnostic plots in the usual fashion: #Get summaries and diagnostic plots summary(sup.lm.1) #Outcome measure number 1 plot(sup.lm.1) plot(fitted(sup.lm.1),sup.outc.1,main="Fitted Plot for Outcome 1") abline(0,1) hist(resid(sup.lm.1)) summary(sup.lm.2) #Outcome measure number 2 plot(sup.lm.2) plot(fitted(sup.lm.2),sup.outc.2,main="Fitted Plot for Outcome 2") abline(0,1) hist(resid(sup.lm.2)) summary(sup.lm.3) #Outcome measure number 3 plot(sup.lm.3) plot(fitted(sup.lm.3),sup.outc.3,main="Fitted Plot for Outcome 3") abline(0,1) hist(resid(sup.lm.3)) summary(sup.lm.4) #Outcome measure number 4 plot(sup.lm.4) plot(fitted(sup.lm.4),sup.outc.4,main="Fitted Plot for Outcome 4") abline(0,1) hist(resid(sup.lm.4)) Because of the manner in which our dependent variables were constructed, the above regressions should provide fairly reasonable results despite the presence of multicollinearity. Note that the predicted effects and standard errors of the residual are approximately as expected given the constructed variables, and that the fourth outcome measure is unrelated to any of the centrality scores (as it was chosen independently). We might also attempt to fit reduced models of one sort or another, based on additional analyses; for instance, we could use the above PCA to suggest a smaller set of parameters which do a reasonable job of spanning the space of centrality scores without being highly correlated with one another. Repeating the above on such a set for the first outcome measure gives us the following: #Use PCA to suggest a more parsimonious choice of parameters sup.lm.1<-lm(sup.outc.1~sup.deg.in+sup.deg.out+sup.bet.und+sup.str.dir) summary(sup.lm.1) plot(sup.lm.1) plot(fitted(sup.lm.1),sup.outc.1,main="Fitted Plot for Outcome 1") abline(0,1) hist(resid(sup.lm.1)) Another, alternative means of using the PCA is to regress the data on a subset of the set of principal component vectors; this introduces some bias in the result, but does guarantee that the regressors are orthogonal. Again, we examine the first mental health outcome, this time on the first four principal components: #Regress outcome 1 on the first four principal components sup.nod.pca.scores<-princomp(sup.nod.ind[,apply(sup.nod.ind,2,var)>0],scores=TRUE,cor=TRUE)$scores #Use princomp for convenience in extracting scores sup.lm.1<-lm(sup.outc.1~sup.nod.pca.scores[,1:4]) summary(sup.lm.1) plot(sup.lm.1) plot(fitted(sup.lm.1),sup.outc.1,main="Fitted Plot for Outcome 1") abline(0,1) hist(resid(sup.lm.1)) Observe that, in this case, the principal component regression is much less clean than the initial model; clearly, where one has the opportunity to regress a dependent variable on the actual independent variable which created it, the natural regressor trumps the composite. Likewise, there may be difficulty in practice in interpreting the outcome of a PCR, given that the components on which the regression takes place are themselves difficult to interpret. The above analysis, however, does serve to provide something of an idea of the flexibility of nodal data analysis which is possible under the R system. 4.3. Null Hypotheses for Network Statistics Despite its many shortcomings, the classical method of null hypothesis testing can be a powerful heuristic device for the assessment of statistical comparisons. The peculiarities of network data, however, commonly prohibit the use of standard null hypotheses on both theoretical and practical grounds. For this reason, alternative approaches have been developed within the literature on social network analysis; the two approaches supported by the SNA library are QAP permutation testing and testing against the congitional uniform graph (CUG) hypothesis. While QAP and CUG are built into most of the network-level analysis routines, the SNA library also supports the direct testing of QAP and CUG null hypotheses for bigraphic network statistics. In this subsection, then, we will illustrate some of these basic features. 4.3.1. QAP Permutation Testing Hubert's quadratic assignment procedure (QAP), when applied to network data, can be interpreted as comparing a graph-level test statistic to the distribution of that statistic over the test graph's automorphism group. Thus, the QAP null hypothesis is (informally) that the realized value of the statistic is "typical" for that implied by the underlying unlabeled graph (or graphs, in some cases), and hence it cannot be concluded that the particular labeled configuration observed is responsible for the extremity of the observed result. Such a hypothesis is potentially useful when evaluating statistics such as the correlation between two labeled graphs, where one is interested in the degree to which the observed correspondence is unusual controlling for the underlying structures in question. It is more problematic in other contexts (e.g., comparison of unlabeled structures) and completely inapplicable in others (e.g., evaluating differences between reciprocity scores between structures). (The procedure also has the potentially nonobvious property that the significance of any statistic associated with a given graph is constrained by the ratio of the cardinality of that graph's automorphism group to the cardinality of its permutation group; thus it is impossible to attain significance at the 0.05 level (for instance) for any bivariate graph statistic when at least one of the graphs in question has |Aut G|/|V(G)|!>0.05.) Like all permutation tests, it must be remembered that QAP tests evaluate the significance of labelings, not structures per se; this is one of the key elements to consider in determining when to employ QAP versus CUG null hypotheses. For a simple demonstration of the standalone QAP testing functionality of the SNA library, we will show how the qaptest routine may be used to test the observed Hamming distance between relations against the null hypothesis that the distance was drawn from a distribution isomorphic with that which would result from having labeled the relations at random. To start the process, we place our five relations of interest into a single graph stack: #For convenience, stick three of our relations into one stack rrel<-array(dim=c(3,15,15)) rrel[1,,]<-friends rrel[2,,]<-advice rrel[3,,]<-prestige rrel[4,,]<-authority rrel[5,,]<-perpower We now take Hamming distances for each of the first three relations, testing them against a QAP null hypothesis. (Note that this is highly computationally intensive; the reps argument (set to 1000 by default) may be reduced to speed computation at cost of numerical precision.) #Test Hamming distances of various relations against the automorphism hypothesis rrel.frad.hd.qap<-qaptest(rrel,hdist,g1=1,g2=2) #Perform the QAP test rrel.frpr.hd.qap<-qaptest(rrel,hdist,g1=1,g2=3) rrel.adpr.hd.qap<-qaptest(rrel,hdist,g1=2,g2=3) summary(rrel.frad.hd.qap) #Summarize the results summary(rrel.frpr.hd.qap) summary(rrel.adpr.hd.qap) plot(rrel.frad.hd.qap) #Plot the test distributions plot(rrel.frpr.hd.qap) plot(rrel.adpr.hd.qap) The estimated p-values are given by the fraction of draws greater/less than or equal to the test statistic, respectively; significance is interpreted in the usual fashion. Note that the summary and plot methods for qaptest objects provide additional information regarding the distribution of observations under the null hypothesis, which may be useful as a diagnostic device. The qaptest function can be used in conjunction with functions other than hdist: any function which takes a graph stack and returns a single value can in principle be used (e.g., gcor, gcov). Additional parameters to this function can also be passed by specifying them as additional arguments to qaptest. 4.3.2. CUG Tests A somewhat different null hypothesis against which graph statistics may be tested is based on a comparison of the test statistic with the distribution arising from a set of graphs drawn uniformly from a population conditioned on size, density, and/or other factors. Null hypotheses of this type are collectively referred to as conditional uniform graph (or CUG) null hypotheses, and tests based on them are referred to as CUG tests. Note that the CUG null hypothesis is of a very different form than that assumed by QAP: whereas QAP tests for the effect of labeling conditional on structure, CUG tests for the effect of structure conditional on certain basic constraints (generally size and the distribution of tie states). CUG is thus relevant to a number of comparisons for which QAP cannot be sensibly employed (e.g., all permutation invariant graph-level indecies, including reciprocity and centralization measures), and results in a very different interpretation where both can be utilized. Here, we demonstrate the use of cugtest (the standalone CUG test method) on Hamming distances and differences in graph reciprocity. First, in evaluating Hamming distances, we follow a format which is similar to qaptest. (Note that this routine, like qaptest, is computationally expensive.) #Test Hamming distances of various relations against the size/density conditioning hypothesis rrel.frad.hd.cug<-cugtest(rrel,hdist,g1=1,g2=2) #Perform the CUG test rrel.frpr.hd.cug<-cugtest(rrel,hdist,g1=1,g2=3) rrel.adpr.hd.cug<-cugtest(rrel,hdist,g1=2,g2=3) summary(rrel.frad.hd.cug) #Summarize the results summary(rrel.frpr.hd.cug) summary(rrel.adpr.hd.cug) plot(rrel.frad.hd.cug) #Plot the test distributions plot(rrel.frpr.hd.cug) plot(rrel.adpr.hd.cug) As above, we observe that the Hamming distance between advice and friendship is not significant, while those between friendship and prestige and between advice and prestige do differ significantly from what would be expected under the hypothesis that the graphs were drawn randomly from a population of equivalent size and tie probability. Though similar in form, this should not be confused with the result from qaptest indicating a concordance which was surprisingly high given what would be expected under random relabeling of nodes. In addition to evaluating comparisons of metric distance and the like, the cugtest routine can also be employed for evaluation of other graph statistics. Here, we use the gliop (for Graph-Level Index OPerations) function to compute the difference in recpirocity between the pairs of test structures. Although gliop can be used with any binary operator (using the OP="operator" argument), its default behavior is to take the difference between the indicated GLI function computed on two graphs; this is how we use it in this example. #Test reciprocity differences of various relations against the size/density conditioning hypothesis rrel.frad.rec.cug<-cugtest(rrel,gliop,g1=1,g2=2,GFUN=grecip) #Perform the CUG test rrel.frpr.rec.cug<-cugtest(rrel,gliop,g1=1,g2=3,GFUN=grecip) rrel.adpr.rec.cug<-cugtest(rrel,gliop,g1=2,g2=3,GFUN=grecip) summary(rrel.frad.rec.cug) #Summarize the results summary(rrel.frpr.rec.cug) summary(rrel.adpr.rec.cug) plot(rrel.frad.rec.cug) #Plot the test distributions plot(rrel.frpr.rec.cug) plot(rrel.adpr.rec.cug) 4.2. Network Regression 4.2.1. Graph Correlation #Compute graph covariance matrix rrel.cov<-gcov(rrel) rownames(rrel.cov)<-rrel.names colnames(rrel.cov)<-rrel.names rrel.cov #Compute graph correlation matrix rrel.cor<-gcor(rrel) rownames(rrel.cor)<-rrel.names colnames(rrel.cor)<-rrel.names rrel.cor #Network PCA - Analyze the eigenstructure of the graph correlation matrix rrel.pca<-eigen(rrel.cor) colnames(rrel.pca$vectors)<-1:dim(rrel.pca$vectors)[2] rrel.pca$vectors #Examine the principal components rrel.pca$values/sum(rrel.pca$values) #Proportion of variance explained by the above plot(rrel.pca$vectors[,1],rrel.pca$vectors[,2],type="n",main="Related Relations on their First Two Principal Components",xlab="PrinComp1",ylab="PrinComp2") text(rrel.pca$vectors[,1],rrel.pca$vectors[,2],rrel.names) 4.2.2. OLS Network Regression #Regress perceived power on other relations, using the QAP null hypothesis rrel.lm.qap<-netlm(rrel[5,,],rrel[c(1,2,4),,],nullhyp="qap") summary(rrel.lm.qap) #Same as above, but using a CUG w/tie bootstrapping rrel.lm.cug<-netlm(rrel[5,,],rrel[c(1,2,4),,]) summary(rrel.lm.cug) 4.2.3. Logistic Network Regression #Fit a logistic network regression of prestige on other relations, using the QAP null hypothesis rrel.logit.qap<-netlogit(rrel[3,,],rrel[c(1,2,4),,],nullhyp="qap",reps=100) summary(rrel.logit.qap) #Same as above, but using a CUG w/tie bootstrapping rrel.logit.cug<-netlogit(rrel[3,,],rrel[c(1,2,4),,]) summary(rrel.logit.cug) 4.2.4. Canonical Graph Correlation #Find the canonical graph correlations between trade and diplomacy relations traddip.ncc<-netcancor(trade,diplomacy) summary(traddip.ncc) 4.3. Metric Inference for Social Networks 4.3.1. Computing Distance Matrices on Graph Sets #For the labeled case, compute distances directly using hdist team.lab.hd<-hdist(teams) #For the unlabled case it's a bit faster to use sdmat than structdist; here, use monte carlo for speed team.ulab.sd<-sdmat(teams,reps=100) 4.3.2. Cluster Analysis of Social Structures #Perform hierarchical clusterings of the labeled and unlabled team structures using complete linkage team.hier.lab<-hclust(as.dist(team.lab.hd)) team.hier.ulab<-hclust(as.dist(team.ulab.sd)) #Plot the dendogram for the labeled case plot(team.hier.lab) title("Hierarchical Clustering of Labeled Structures") #Plot the dendogram for the unlabeled case plot(team.hier.ulab) title("Hierarchical Clustering of Unlabeled Structures") #Plot the dendogram for the labeled case with a three group division, by archetype plot(team.hier.lab,label=team.arch.key) rect.hclust(team.hier.lab,k=3) title("HC of Labeled Structures w/Archetypes, 3 Group Division") #Plot the dendogram for the unlabeled case, with a three group division, by archetype plot(team.hier.lab,label=team.arch.key) rect.hclust(team.hier.ulab,k=3) title("HC of Unlabeled Structures w/Archetypes, 3 Group Division") #Fit a classical MDS (in 2 dimensions) to the distance data team.mds.lab<-cmdscale(as.dist(team.lab.hd)) team.mds.ulab<-cmdscale(as.dist(team.ulab.sd)) #Plot the MDS model, by archetype (labeled) plot(team.mds.lab,main="2-D MDS of Labeled Structures, by Archetype",type="n") text(team.mds.lab[,1],team.mds.lab[,2],team.arch.key) #Plot the MDS model, by archetype (unlabeled) plot(team.mds.ulab,main="2-D MDS of Unlabeled Structures, by Archetype",type="n") text(team.mds.ulab[,1],team.mds.ulab[,2],team.arch.key) 4.3.3. Relating Structural Distances to Covariates #Plot distributions of experience and performance by clustering, three cluster model par(mfrow=c(2,2)) gclust.boxstats(team.hier.lab,3,team.exp) title(main="Distribution of Experience by Cluster",sub="Labeled Case") gclust.boxstats(team.hier.lab,3,team.perf) title(main="Distribution of Performance by Cluster",sub="Labeled Case") gclust.boxstats(team.hier.ulab,3,team.exp) title(main="Distribution of Experience by Cluster",sub="Unlabeled Case") gclust.boxstats(team.hier.ulab,3,team.perf) title(main="Distribution of Performance by Cluster",sub="Unlabeled Case") par(mfrow=c(1,1)) #Fit simple OLS models of performance and experience on cluster membership type1<-as.numeric(cutree(team.hier.lab,k=3)==1) type2<-as.numeric(cutree(team.hier.lab,k=3)==2) #Type 3 is the reference group team.perf.lm<-lm(team.perf~type1+type2) team.exp.lm<-lm(team.exp~type1+type2) summary(team.perf.lm) summary(team.exp.lm) #Generate distance matrices for experience and performance team.perf.dist<-matrix(nrow=15,ncol=15) for(i in 1:15) for(j in 1:15) team.perf.dist[i,j]<-abs(team.perf[i]-team.perf[j]) team.exp.dist<-matrix(nrow=15,ncol=15) for(i in 1:15) for(j in 1:15) team.exp.dist[i,j]<-abs(team.exp[i]-team.exp[j]) #Look at the correlation structure gcor(team.lab.hd,team.perf.dist,mode="graph") gcor(team.lab.hd,team.exp.dist,mode="graph") gcor(team.ulab.sd,team.perf.dist,mode="graph") gcor(team.ulab.sd,team.exp.dist,mode="graph") #Plot the distance/difference relationships par(mfrow=c(2,2)) plot(jitter(as.vector(team.lab.hd)),jitter(as.vector(team.perf.dist)),xlab="Hamming Distance",ylab="Perf Difference",main="Labeled Dist and Abs Perf Diff") plot(jitter(as.vector(team.lab.hd)),jitter(as.vector(team.exp.dist)),xlab="Hamming Distance",ylab="Exp Difference",main="Labeled Dist and Abs Exp Diff") plot(jitter(as.vector(team.ulab.sd)),jitter(as.vector(team.perf.dist)),xlab="Structural Distance",ylab="Perf Difference",main="Unlabeled Dist and Abs Perf Diff") plot(jitter(as.vector(team.ulab.sd)),jitter(as.vector(team.exp.dist)),xlab="Structural Distance",ylab="Exp Difference",main="Unlabeled Dist and Abs Exp Diff") par(mfrow=c(1,1)) 4.3.4. Structural Analysis of Graph Clusters #Find the central graphs of the three clusters team.cg<-gclust.centralgraph(team.hier.lab,3,teams) #Plot the central graphs using gplot par(mfrow=c(2,2)) for(i in 1:3) gplot(team.cg[i,,],mode="circle") par(mfrow=c(1,1)) #Plot archetypal structures, for comparison par(mfrow=c(2,2)) for(i in 1:3) gplot(event2dichot(team.arch[i,,],method="absolute",thresh=0.5),mode="circle") par(mfrow=c(1,1)) #Get some nodal indecies for the three central graphs cbind(betweenness(team.cg[1,,]),betweenness(team.cg[2,,]),betweenness(team.cg[3,,])) cbind(degree(team.cg[1,,]),degree(team.cg[2,,]),degree(team.cg[3,,])) cbind(closeness(team.cg[1,,]),closeness(team.cg[2,,]),closeness(team.cg[3,,])) cbind(evcent(team.cg[1,,]),evcent(team.cg[2,,]),evcent(team.cg[3,,])) #Now, for some GLIs grecip(team.cg) gden(team.cg) cbind(centralization(team.cg[1,,],degree),centralization(team.cg[2,,],degree),centralization(team.cg[3,,],degree)) cbind(centralization(team.cg[1,,],betweenness),centralization(team.cg[2,,],betweenness),centralization(team.cg[3,,],betweenness)) cbind(centralization(team.cg[1,,],closeness),centralization(team.cg[2,,],closeness),centralization(team.cg[3,,],closeness)) cbind(centralization(team.cg[1,,],evcent),centralization(team.cg[2,,],evcent),centralization(team.cg[3,,],evcent)) 4.4. Bayesian Network Modeling 4.4.1. Setting Priors #Set some realistic priors inform.prior.ep<-cbind(rep(1,10),rep(5,10)) inform.prior.em<-cbind(rep(1,10),rep(5,10)) inform.prior.n<-matrix(rep(0.5,100),nrow=10,ncol=10) diag(inform.prior.n)<-0 #Compare the "real" prior to our "guess" plot((0:10000)/10000,dbeta((0:10000)/10000,2,10),xlab="Error Rate",ylab="Unnormalized Density",type="l",main="Estimated Densities of Two Accuracy Priors") lines((0:10000)/10000,dbeta((0:10000)/10000,1,5),lty=2) 4.4.2. Drawing from the Posterior #Take draws from the posterior inform.post<-bbnam(informants,nprior=inform.prior.n,emprior=inform.prior.em,epprior=inform.prior.ep,quiet=FALSE,draws=500) #Show components of the bbnam object names(inform.post) 4.4.3. Analysis of Posterior Draws #Plot the estimated posterior marginal distributions plot(inform.post) #Get some numerical estimates summary(inform.post) #Compare posterior marginal of the criterion network with the "actual" network inform.post.cg<-centralgraph(inform.post$net) inform.post.mm<-centralgraph(inform.post$net,normalize=TRUE) inform.post.cg-inform.act #Residuals of the central graph estimator sum(abs(inform.post.cg-inform.act)) #Absolute error (Hamming distance) inform.post.mm-inform.act #Residuals of the mean matrix estimator sum(abs(inform.post.mm-inform.act)) #Absolute error #Compare posterior marginals of accuracy parameters with "actual" accuracy inform.post.em.med<-apply(inform.post$em,2,median) inform.post.ep.med<-apply(inform.post$ep,2,median) inform.post.em.med-inform.em #Residuals of the median estimator inform.post.ep.med-inform.ep median(abs(inform.post.em.med-inform.em)) #Median absolute error by actor median(abs(inform.post.ep.med-inform.ep)) #Posterior estimation of GLIs inform.post.den<-npostpred(inform.post,gden) inform.post.recip<-npostpred(inform.post,grecip) inform.post.dc<-npostpred(inform.post,centralization,degree) inform.post.bc<-npostpred(inform.post,centralization,betweenness) par(mfrow=c(2,2)) hist(inform.post.den,xlab="Density",main="Posterior Distribution of Density",xlim=c(0,1)) abline(v=quantile(inform.post.den,c(0.05,0.5,0.95)),lty=c(2,3,2)) hist(inform.post.recip,xlab="Reciprocity",main="Posterior Distribution of Reciprocity",xlim=c(0,1)) abline(v=quantile(inform.post.recip,c(0.05,0.5,0.95)),lty=c(2,3,2)) hist(inform.post.dc,xlab="Centralization",main="Posterior Distribution of Degree Centralization",xlim=c(0,1)) abline(v=quantile(inform.post.dc,c(0.05,0.5,0.95)),lty=c(2,3,2)) hist(inform.post.bc,xlab="Graph Density",main="Posterior Distribution of Betweeness Centralization",xlim=c(0,1)) abline(v=quantile(inform.post.bc,c(0.05,0.5,0.95)),lty=c(2,3,2)) par(mfrow=c(1,1)) #Posterior estimation of NLIs inform.post.deg<-npostpred(inform.post,degree) inform.post.bet<-npostpred(inform.post,betweenness) inform.post.clo<-npostpred(inform.post,closeness) inform.post.evc<-npostpred(inform.post,evcent) apply(inform.post.deg,1,summary) apply(inform.post.bet,1,summary) apply(inform.post.clo,1,summary) apply(inform.post.evc,1,summary) 5. Closing Comments 5.1. Acknowledgments 5.2. References dmulti<-function(x,p){ n<-sum(x) lcomb<-lgamma(n+1) lprob<-0 for(i in 1:length(x)){ lcomb<-lcomb-lgamma(x[i]+1) lprob<-lprob+x[i]*log(p[i]) } lcomb+lprob } x1<-c(425,241,137) x2<-c(435,246,122) p1<-c(1/3,1/3,1/3) p2<-c(1/3,1/3,1/3) p1<-x1/sum(x1) p2<-x2/sum(x2) -2*log(dmulti(x1,p1)) -2*log(dmulti(x2,p2))