Ciencia de Redes (Humanas y Sociales) #2 · 2019. 5. 3. · RED DE ENFERMEDADES HUMANAS

Post on 10-Aug-2021

0 views 0 download

Transcript of Ciencia de Redes (Humanas y Sociales) #2 · 2019. 5. 3. · RED DE ENFERMEDADES HUMANAS

®

Ciencia de Redes (Humanas y Sociales)

#2

Carlos SarrauteInstituto de Cálculo, Abril-Junio 2019

Conceptos fundamentales de Teoría de Grafos

COMPONENTES DE UN SISTEMA COMPLEJO

§ componentes: nodos, vertices N

§ interacciones: vínculos, enlaces, aristas L

§ sistema: red, grafo (N,L)

Enlaces: no dirigidos (simétricos)

Grafo:

Enlaces dirigidos :URLs en la webLlamados telefónicosReacciones metabolicas

REDES DIRIGIDAS VS. NO DIRIGIDAS

No dirigido Dirigido

A

B

D

C

L

MF

G

H

I

Links: directed (arcs).

Digrafo = directed graph:

Enlaces no dirigidos:Vínculo de coautorRed de actorsInteracciones entre proteinas

AG

F

BC

D

E

Distribución de grados

Grado del nodo: cantidad de enlaces que conectan con el nodo

kB = 4

GRADO DE UN NODO

No

dirig

ido

En los grafos dirigidos se puede definir un in-degree y out-

degree. El grado (total) es la suma de in- y out-degree.

Source: nodo con kin= 0; Sink: nodo con kout= 0.

2k inC = 1koutC = 3=Ck

Diri

gido

AG

F

BC

D

E

A

B

kA =1

å=

ºN

iikN

k1

1

outinN

1i

outi

outN

1i

ini

in kk ,kN1k ,k

N1k =ºº åå

==

k ≡ 2LN

k ≡ LN

GRADO PROMEDIO

No

dirig

ido

Diri

gido

A

F

BC

D

E

j

i

Distribución de gradosP(k): probabilidad de que un nodo al azar tenga grado k

Nk = # nodos con grado k

P(k) = Nk / N ➔ plot

DISTRIBUCIÓN DE GRADOS

DISTRIBUCIÓN DE GRADOS

The degree distribution has taken a central role in net-work theory following the discovery of scale-free networks (Barabási & Albert, 1999). Another reason for its impor-tance is that the calculation of most network properties re-quires us to know pk. For example, the average degree of a network can be written as

We will see in the coming chapters that the precise func-tional form of pk determines many network phenomena, from network robustness to the spread of viruses.

∑==

k kpkk 0

Image 2.4aDegree distribution.

The degree distribution is defined as the pk = Nk /N ratio, where Nk denotes the number of k-degree nodes in a network. For the network in (a) we have N = 4 and p1 = 1/4 (one of the four nodes has degree k1 = 1), p2 = 1/2 (two nodes have k3 = k4 = 2), and p3 = 1/4 (as k2 = 3). As we lack nodes with degree k > 3, pk = 0 for any k > 3. Panel (b) shows the degree distri-bution of a one dimensional lattice. As each node has the same degree k = 2, the degree distribution is a Kronecker’s delta function pk = H(k - 2).

Image 2.4b

In many real networks, the node degree can vary considerably. For exam-ple, as the degree distribution (a) indicates, the degrees of the proteins in the protein interaction network shown in (b) vary between k=0 (isolated nodes) and k=92, which is the degree of the largest node, called a hub. There are also wide differences in the number of nodes with different degrees: as (a) shows, almost half of the nodes have degree one (i.e. p1=0.48), while there is only one copy of the biggest node, hence p92 = 1/N=0.0005. (c) The degree distribution is often shown on a so-called log-log plot, in which we either plot log pk in function of log k, or, as we did in (c), we use logarithmic axes.

DEGREE, AVERAGE DEGREE, AND DEGREE DISTRIBUTION | 29

Matriz de adyacencia

Sección 2

Matriz de adyacencia

• Representa enlaces como matriz– Aij = 1 si nodo i tiene enlace hacia nodo j

= 0 sino

– Aii = 0 salvo que el grafo tenga “self-loops”

– Aij = Aji si el grafo es no dirigido,o si i y j tiene un enlace recíproco

Ejemplo de matriz de adyacencia

1

23

45

0 0 0 0 00 0 1 1 00 1 0 1 00 0 0 0 11 1 0 0 0

A =

Grados de nodos usando matriz

Outdegree =

0 0 0 0 00 0 1 1 00 1 0 1 00 0 0 0 11 1 0 0 0

A =å=

n

jijA

1

ejemplo: outdegree para nodo 3 sumamos la 3er fila

Indegree =

0 0 0 0 00 0 1 1 00 1 0 1 00 0 0 0 11 1 0 0 0

A =å=

n

iijA

1

ejemplo: indegree para nodo 3Sumamos la 3er columna

å=

n

iiA

13

å=

n

jjA

13

1

2

3

45

Lista de aristas

• Lista de aristas– 2, 3– 2, 4– 3, 2– 3, 4– 4, 5– 5, 2– 5, 1

1

23

45

Lista de adyacencia• Lista de adyacencia

– Mas facil de usar para redes• grandes• ralas (sparse)

– Recuperar facilmente losvecinos de un nodo

• 1:• 2: 3 4• 3: 2 4• 4: 5• 5: 1 2

1

2

3

45

a b c d e f g h

a 0 1 0 0 1 0 1 0

b 1 0 1 0 0 0 0 1

c 0 1 0 1 0 1 1 0

d 0 0 1 0 1 0 0 0

e 1 0 0 1 0 0 0 0

f 0 0 1 0 0 0 1 0

g 1 0 1 0 0 1 0 0

h 0 1 0 0 0 0 0 0

Ejemplo de matriz de adyacencia

b

e

g

a

c

f

h d

Las redes reales son ralas (sparse)

Sección 3

La cantidad máxima de vínculos en unared con N nodos:

Lmax =N2

⎝ ⎜

⎠ ⎟ = N(N −1)

2

Un grafo con vínculos L = Lmax se llama grafo completo, su grado promedio es <k> = N-1

Grafo completo

La mayoría de las redes observadas en sistemas realesson ralas (sparse):

L << Lmax

<k> << N-1.

WWW (ND Sample): N=325,729; L=1.4 106 Lmax=1012 <k>=4.51Protein (S. Cerevisiae): N= 1,870; L=4,470 Lmax=107 <k>=2.39 Coauthorship (Math): N= 70,975; L=2 105 Lmax=3 1010 <k>=3.9Movie Actors: N=212,250; L=6 106 Lmax=1.8 1013 <k>=28.78

(Source: Albert, Barabasi, RMP2002)

LAS REDES REALES SON RALAS

MATRICES DE ADYACENCIA SON RALAS

REDES BIPARTITAS

Sección 4

grafo bipartito es un grafo cuyos nodos se pueden dividir en dos conjuntos separados U y V, de manera que cada enlace conecta un nodo en U con uno en V; es decir, U y V son conjuntos independientes.

Ejemplos:

Red de actores del cine argentino

Red de enfermedades

GRAFO BIPARTITO

GRANDATAGRANDATA

Gene network

GENOME

PHENOMEDISEASOME

Disease network

Goh, Cusick, Valle, Childs, Vidal & Barabási, PNAS (2007)

RED DE GENES Y RED DE ENFERMEDADES

RED DE ENFERMEDADES HUMANAS

https://archive.nytimes.com/www.nytimes.com/interactive/2008/05/05/science/20080506_DISEASE.html?ref=health

GRANDATAGRANDATA

Y.-Y. Ahn, S. E. Ahnert, J. P. Bagrow, A.-L. Barabási Flavor network and the principles of food pairing , Scientific Reports 196, (2011).

RED BIPARTITA DE INGREDIENTES Y SABORES

Ejemplos de grafos bipartitos

Ejemplos de grafos bipartitos

• Científicos

• Actores

• Músicos

• Papers

• Películas

• Bandas, conciertos

Ejemplos de grafos bipartitos

Legisladores Leyes

CAMINOS

Sección 5

The distance (shortest path, geodesic path) between two nodes is defined as the number of edges along the shortest path connecting them.

*If the two nodes are disconnected, the distance is infinity.

In directed graphs each path needs to follow the direction of the arrows.Thus in a digraph the distance from node A to B (on an AB path) is generally different from the distance from node B to A (on a BCA path).

DISTANCIA EN UN GRAFO Caminos más corto, camino geodésico

DC

A

B

DC

A

B

Network Science: Graph Theory

1 11

1

2

2

22

2

3

3

3

3

3

3

3

3

44

4

4

4

4

4

4

Distance between node 0 and node 4:

1.Start at 0.

CALCULANDO DISTANCIAS: BREADTH FIRST SEARCH

0

Network Science: Graph Theory

1 11

1

2

2

22

2

3

3

3

3

3

3

3

3

44

4

4

4

4

4

4

Distance between node 0 and node 4:1.Start at 0.2.Find the nodes adjacent to 1. Mark them as at distance 1. Put them in a queue.

Network Science: Graph Theory

0 11

1

CALCULANDO DISTANCIAS: BREADTH FIRST SEARCH

Network Science: Graph Theory

1 11

1

2

2

22

2

3

3

3

3

3

3

3

3

44

4

4

4

4

4

4

Distance between node 0 and node 4:1.Start at 0.2.Find the nodes adjacent to 0. Mark them as at distance 1. Put them in a queue.3.Take the first node out of the queue. Find the unmarked nodes adjacent to it in the graph. Mark them with the label of 2. Put them in the queue.

Network Science: Graph Theory

0 11

1

2

2

22

2

Network Science: Graph Theory

1

1

CALCULANDO DISTANCIAS: BREADTH FIRST SEARCH

Distance between node 0 and node 4:

1.Repeat until you find node 4 or there are no more nodes in the queue.2.The distance between 0 and 4 is the label of 4 or, if 4 does not have a label, infinity.

Network Science: Graph Theory

0 11

1

2

2

22

2

3

3

3

3

3

3

3

3

44

4

4

4

4

4

4

CALCULANDO DISTANCIAS: BREADTH FIRST SEARCH

Diameter: dmax the maximum distance between any pair of nodes in the graph.

Average path length/distance, <d>, for a connected graph:

where dij is the distance from node i to node j

In an undirected graph dij =dji , so we only need to count them once:

d ≡1

2Lmaxdij

i, j≠ i∑

d ≡1Lmax

diji, j> i∑

DIAMETRO DE LA RED Y DISTANCIA PROMEDIO

CAMINOS: RESUMEN

2 5

43

1

l1!4

l1!4

l1!5

Shortest Path

l1!5 = 2

l1!4 = 3

The path with the shortest length between two nodes

(distance).

CAMINOS: RESUMEN

2 5

43

1

Diameter

l1!4 = 3

2 5

43

1

Average Path Length

(l1!2 + l1!3 + l1!4+

+ l1!5 + l2!3 + l2!4+

+ l2!5 + l3!4 + l3!5+

+ l4!5) /10 = 1.6

The longest shortest path in a graph

The average of the shortest paths for all pairs of nodes.

CAMINOS: RESUMEN

2 5

43

1

Cycle

2 5

43

1Self-avoiding Path

A path with the same start and end node.

A path that does not intersect itself.

CAMINOS: RESUMEN

2 5

43

1

2 5

43

1

Eulerian Path Hamiltonian Path

A path that visits each node exactly once.

A path that traverses each link exactly once.

CONECTIVIDAD

Sección 6

Connected (undirected) graph: any two vertices can be joined by a path.A disconnected graph is made up by two or more connected components.

Bridge: if we erase it, the graph becomes disconnected.

Largest Component: Giant Connected Component

The rest: Isolates

CONECTIVIDAD EN GRAFOS NO DIRIGIDOS

DC

A

B

F

F

G

DC

A

B

F

F

G

The adjacency matrix of a network with several components can be written in a block-diagonal form, so that nonzero elements are confined to squares, with all other elements being zero:

CONECTIVIDAD EN GRAFOS NO DIRIGIDOS Matriz de Adyacencia

Strongly connected directed graph: has a path from each node to every other node and vice versa (e.g. AB path and BA path).

Weakly connected directed graph: it is connected if we disregard theedge directions.

Strongly connected components can be identified, but not every node is partof a nontrivial strongly connected component.

CONECTIVIDAD EN GRAFOS DIRIGIDOS

D C

A

B

FG

E

E

C

A

B

G

F

D

Coeficiente de Clustering

Sección 7

Coeficiente de Clustering: qué fracción de tus vecinos están conectados?

Nodo i con grado ki

Ci en [0,1]

COEFICIENTE DE CLUSTERING

Watts & Strogatz, Nature 1998.

COEFICIENTE DE CLUSTERING

Watts & Strogatz, Nature 1998.

SECTION 10

CLUSTERING COEFFICIENT

The local clustering coefficient captures the degree to which the neighbors of a given node link to each other. For a node i with degree ki the local clustering coefficient is de-fined as [5]. (19)

where Li represents the number of links between the ki neighbors of node i. Note that Ci is between 0 and 1:

Ci = 0 if none of the neighbors of node i link to each other;

Ci = 1 if the neighbors of node i form a complete graph, i.e. they all link to each other (Image 2.7).

In general Ci is the probability that two neighbors of a node link to each other: C = 0.5 implies that there is a 50% chance that two neighbors of a node are linked.

In summary Ci measures the network’s local density: the more densely interconnected the neighborhood of node i, the higher is Ci.

The degree of clustering of a whole network is captured by the average clustering coefficient, <C>, representing the av-erage of Ci over all nodes i = 1, ..., N [5], . (20)

In line with the probabilistic interpretation <C> is the probability that two neighbors of a randomly selected node link to each other.

While Eq. (19) is defined for undirected networks, the clustering coefficient can be generalized to directed and weighted [6,7,8,9]) networks as well. Note that in the net-work literature one also often encounters the global clus-tering coefficient, defined in Appendix A.

=−

C Lk k2( 1 )i

i

i i

∑==

C N C1i

i

N

1

Image 2.15Clustering Coefficient.

The local clustering coefficient, Ci , of the central node with degree ki=4 for three different configurations of its neighborhood. The clustering coefficient measures the local density of links in a node’s vicinity. The bottom figure shows a small network, with the local clustering coefficient of a node shown next to each node. Next to the figure we also list the network’s average clustering coefficient <C>, according to Eq. (20), and its global clustering coefficient C, declined in Appendix A, Eq. (21). Note that for nodes with degrees ki=0,1, the clustering coefficient is taken to be zero.

CLUSTERING COEFFICIENT | 41

Coeficiente de Clustering: qué fracción de tus vecinos están conectados?

Nodo i con grado ki

Ci en [0,1]

RESUMEN

Sección 8

Distribución de grados: P(k)

Longitud de caminos: <d>

Coeficiente de Clustering:

TRES MÉTRICAS CENTRALES EN CIENCIA DE REDES

3

Aij =

0 1 1 01 0 1 11 1 0 00 1 0 0

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

Aii = 0 Aij = A ji

L = 12

Aiji, j=1

N

∑ < k >= 2LN �

Aij =

0 1 0 00 0 1 11 0 0 00 0 0 0

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

Aii = 0 Aij ≠ A ji

L = Aiji, j=1

N

∑ < k >= LN

GRAFOS 1

Undirected Directed

14

23

2

14

Actor network, protein-protein interactions WWW, citation networks

Aij =

0 1 1 01 0 1 11 1 0 00 1 0 0

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

Aii = 0 Aij = A ji

L = 12

Aiji, j=1

N

∑ < k >= 2LN �

Aij =

0 2 0.5 02 0 1 40.5 1 0 00 4 0 0

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

Aii = 0 Aij = A ji

L = 12

nonzero(Aij )i, j=1

N

∑ < k >= 2LN

GRAFOS 2

Unweighted(undirected)

Weighted(undirected)

3

14

23

2

14

protein-protein interactions, www Call Graph, metabolic networks

Aij =

1 1 1 01 0 1 11 1 0 00 1 0 1

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

Aii ≠ 0 Aij = A ji

L = 12

Aij + Aiii=1

N

∑i, j=1,i≠ j

N

∑ ? �

Aij =

0 2 1 02 0 1 31 1 0 00 3 0 0

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

Aii = 0 Aij = A ji

L = 12

nonzero(Aij )i, j=1

N

∑ < k >= 2LN

GRAFOS 3

Self-interactions Multigraph(undirected)

3

14

23

2

14

Protein interaction network, www Social networks, collaboration networks

Aij =

0 1 1 11 0 1 11 1 0 11 1 1 0

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

Aii = 0 Ai≠ j =1

L = Lmax = N(N −1)2

< k >= N −1

GRAFOS 4

Complete Graph(undirected)

3

14

2

Actor network, protein-protein interactions