519502.prezentacija_webmining

download 519502.prezentacija_webmining

of 107

Transcript of 519502.prezentacija_webmining

  • 8/2/2019 519502.prezentacija_webmining

    1/107

    Mining Social and Semantic Network Data on the Web

    Markus Schatten, PhD

    University of ZagrebFaculty of Organization and Informatics

    May 4, 2011

  • 8/2/2019 519502.prezentacija_webmining

    2/107

    Introduction

    Web 2.0, Semantic Web, Web 3.0

    Network science

    Data collection

    Semistructured data

    XPath

    Regular expressionsAjax

    APIs

    Data analysis

    Network matricesFolksonomy model

    Logic programming

    Case study - CROSBI

    2 of 54

  • 8/2/2019 519502.prezentacija_webmining

    3/107

    Outline

    Introduction

    Web 2.0, Semantic Web, Web 3.0

    Network science

    Data collection

    Data analysis

    Case study - CROSBI

    3 of 54

  • 8/2/2019 519502.prezentacija_webmining

    4/107

    Questions

    Why mining social and semantic network data?

    How to collect these data? What are the common obstacles and how to solve them?

    How to analyze these data?

    4 of 54

  • 8/2/2019 519502.prezentacija_webmining

    5/107

    Outline

    Introduction

    Web 2.0, Semantic Web, Web 3.0

    Network science

    Data collection

    Data analysis

    Case study - CROSBI

    5 of 54

  • 8/2/2019 519502.prezentacija_webmining

    6/107

    Web 2.0, Semantic Web, Web 3.0

    Why should one bother with these buzzwords?

    6 of 54

  • 8/2/2019 519502.prezentacija_webmining

    7/107

    Why mining (social) web data?

    Millions of people,

    7 of 54

  • 8/2/2019 519502.prezentacija_webmining

    8/107

    Why mining (social) web data?

    Millions of people, on thousands of sites,

    7 of 54

  • 8/2/2019 519502.prezentacija_webmining

    9/107

    Why mining (social) web data?

    Millions of people, on thousands of sites, across hundreds ofdiverse cultures,

    7 of 54

  • 8/2/2019 519502.prezentacija_webmining

    10/107

    Why mining (social) web data?

    Millions of people, on thousands of sites, across hundreds ofdiverse cultures, leave trails

    7 of 54

  • 8/2/2019 519502.prezentacija_webmining

    11/107

    Why mining (social) web data?

    Millions of people, on thousands of sites, across hundreds ofdiverse cultures, leave trails about various aspects of their lives...

    7 of 54

  • 8/2/2019 519502.prezentacija_webmining

    12/107

    Why mining (social) web data?

    Millions of people, on thousands of sites, across hundreds ofdiverse cultures, leave trails about various aspects of their lives...

    Social systems are systems of communication

    (Niklas Luhman)

    7 of 54

  • 8/2/2019 519502.prezentacija_webmining

    13/107

    Why mining (social) web data?

    Millions of people, on thousands of sites, across hundreds ofdiverse cultures, leave trails about various aspects of their lives...

    Social systems are systems of communication

    (Niklas Luhman)

    By analyzing the trails (the results of social systems structural

    coupling) we can analyze the social system itself:

    7 of 54

  • 8/2/2019 519502.prezentacija_webmining

    14/107

    Why mining (social) web data?

    Millions of people, on thousands of sites, across hundreds ofdiverse cultures, leave trails about various aspects of their lives...

    Social systems are systems of communication

    (Niklas Luhman)

    By analyzing the trails (the results of social systems structural

    coupling) we can analyze the social system itself:politics, culture,

    media, business, technology, science ...

    7 of 54

  • 8/2/2019 519502.prezentacija_webmining

    15/107

    Outline

    Introduction

    Web 2.0, Semantic Web, Web 3.0

    Network science

    Data collection

    Data analysis

    Case study - CROSBI

    8 of 54

  • 8/2/2019 519502.prezentacija_webmining

    16/107

    The new science of networks

    source: The New York Times (April 3rd, 1933., p. 17).

    source: An Attraction Network in a Fourth Grade Class (Moreno, Who shall survive?, 1934).

    9 of 54

  • 8/2/2019 519502.prezentacija_webmining

    17/107

    Political and financial networks

    Mark Lombardi (1980s & 1990s)

    10 of 54

  • 8/2/2019 519502.prezentacija_webmining

    18/107

    Terrorist networks

    11 of 54

  • 8/2/2019 519502.prezentacija_webmining

    19/107

    Board of directors membership networks

    source: http://theyrule.net

    12 of 54

    http://theyrule.net/http://theyrule.net/
  • 8/2/2019 519502.prezentacija_webmining

    20/107

    On-line social networks

    13 of 54

  • 8/2/2019 519502.prezentacija_webmining

    21/107

    Internet

    source: Bill Cheswick http://www.cheswick.com/ches/map/gallery/index.html

    14 of 54

    http://www.cheswick.com/ches/map/gallery/index.htmlhttp://www.cheswick.com/ches/map/gallery/index.html
  • 8/2/2019 519502.prezentacija_webmining

    22/107

    Air traffic networks

    source: Northwest Airlines WorldTraveler Magazine

    15 of 54

  • 8/2/2019 519502.prezentacija_webmining

    23/107

    Railway networks

    source: TRTA, March 2003 - Tokyo rail map

    16 of 54

  • 8/2/2019 519502.prezentacija_webmining

    24/107

    Semantic networks

    source: http://wordnet.princeton.edu/man/wnlicens.7WN

    17 of 54

    http://wordnet.princeton.edu/man/wnlicens.7WNhttp://wordnet.princeton.edu/man/wnlicens.7WN
  • 8/2/2019 519502.prezentacija_webmining

    25/107

    Gene networks

    source: http://www.zaik.uni-koeln.de/bioinformatik/regulatorynets.html.en

    18 of 54

    http://www.zaik.uni-koeln.de/bioinformatik/regulatorynets.html.enhttp://www.zaik.uni-koeln.de/bioinformatik/regulatorynets.html.en
  • 8/2/2019 519502.prezentacija_webmining

    26/107

    Food chain networks

    source: http://marinebio.org/Oceans/Biotic-Structure.asp19 of 54

    http://marinebio.org/Oceans/Biotic-Structure.asphttp://marinebio.org/Oceans/Biotic-Structure.asp
  • 8/2/2019 519502.prezentacija_webmining

    27/107

    On-line social and semantic networks

    Where can we find networks? social networking sites

    20 of 54

  • 8/2/2019 519502.prezentacija_webmining

    28/107

    On-line social and semantic networks

    Where can we find networks? social networking sites

    forums, buletinboards, discussion groups, mailing lists

    20 of 54

  • 8/2/2019 519502.prezentacija_webmining

    29/107

    On-line social and semantic networks

    Where can we find networks? social networking sites

    forums, buletinboards, discussion groups, mailing lists

    blogs, microblogs, vlogs

    20 of 54

  • 8/2/2019 519502.prezentacija_webmining

    30/107

    On-line social and semantic networks

    Where can we find networks? social networking sites

    forums, buletinboards, discussion groups, mailing lists

    blogs, microblogs, vlogs

    wikis, semantic wikis

    20 of 54

  • 8/2/2019 519502.prezentacija_webmining

    31/107

    On-line social and semantic networks

    Where can we find networks? social networking sites

    forums, buletinboards, discussion groups, mailing lists

    blogs, microblogs, vlogs

    wikis, semantic wikis social tagging

    20 of 54

  • 8/2/2019 519502.prezentacija_webmining

    32/107

    On-line social and semantic networks

    Where can we find networks? social networking sites

    forums, buletinboards, discussion groups, mailing lists

    blogs, microblogs, vlogs

    wikis, semantic wikis social tagging

    RSS feeds, RDF repositories, metadata collections

    20 of 54

  • 8/2/2019 519502.prezentacija_webmining

    33/107

    On-line social and semantic networks

    Where can we find networks? social networking sites

    forums, buletinboards, discussion groups, mailing lists

    blogs, microblogs, vlogs

    wikis, semantic wikis social tagging

    RSS feeds, RDF repositories, metadata collections...

    20 of 54

  • 8/2/2019 519502.prezentacija_webmining

    34/107

    On-line social and semantic networks

    Where can we find networks? social networking sites

    forums, buletinboards, discussion groups, mailing lists

    blogs, microblogs, vlogs

    wikis, semantic wikis social tagging

    RSS feeds, RDF repositories, metadata collections...

    A social network can be found on any application where users

    communicate.

    20 of 54

  • 8/2/2019 519502.prezentacija_webmining

    35/107

    On-line social and semantic networks

    Where can we find networks? social networking sites

    forums, buletinboards, discussion groups, mailing lists

    blogs, microblogs, vlogs

    wikis, semantic wikis social tagging

    RSS feeds, RDF repositories, metadata collections...

    A social network can be found on any application where users

    communicate.A semantic network can be found in any specification where

    concepts are meaningfully connected.

    20 of 54

  • 8/2/2019 519502.prezentacija_webmining

    36/107

    Outline

    Introduction

    Web 2.0, Semantic Web, Web 3.0

    Network science

    Data collectionSemistructured dataXPathRegular expressionsAjaxAPIs

    Data analysis

    Case study - CROSBI21 of 54

  • 8/2/2019 519502.prezentacija_webmining

    37/107

    Common obstacles in data collection

    Semi-structured data (HTML,

    XHTML, XML, RDF, OWL ...)

    22 of 54

  • 8/2/2019 519502.prezentacija_webmining

    38/107

    Common obstacles in data collection

    Semi-structured data (HTML,

    XHTML, XML, RDF, OWL ...)

    The need for pattern matching

    22 of 54

  • 8/2/2019 519502.prezentacija_webmining

    39/107

    Common obstacles in data collection

    Semi-structured data (HTML,

    XHTML, XML, RDF, OWL ...)

    The need for pattern matching

    Client-side content generation

    22 of 54

  • 8/2/2019 519502.prezentacija_webmining

    40/107

    Semistructured data and XPath

    query language for semistructured data

    part of the W3C standard

    uses path expressions to navigate through documents

    has a standard library with over 100 useful functions

    Example:

    /html/body/div[@id=background]/div/div[@id=mainBar]/div/p[last()-1]

    23 of 54

  • 8/2/2019 519502.prezentacija_webmining

    41/107

    XPather (Firefox add-on)

    24 of 54

  • 8/2/2019 519502.prezentacija_webmining

    42/107

    Regular expressions

    Useful for extracting various formatted data and searching for

    keywordsExample (date matching)

    27-9-2001 [0-9]{1,2}[0-9]{1,2}[1-2][0-9]{3}

    25 of 54

  • 8/2/2019 519502.prezentacija_webmining

    43/107

    Client side generated content

    Some sites use Ajax to generate data on page load or user

    interaction (e.g. mouse click, mouse over etc.)

    26 of 54

  • 8/2/2019 519502.prezentacija_webmining

    44/107

    Client side generated content

    Some sites use Ajax to generate data on page load or user

    interaction (e.g. mouse click, mouse over etc.) In combination with obfuscated JavaScript code this is hard to

    scrape!

    Current solutions:

    Browser scripting (can be slow and cumbersome) - e.g. withselenium

    26 of 54

  • 8/2/2019 519502.prezentacija_webmining

    45/107

    Client side generated content

    Some sites use Ajax to generate data on page load or user

    interaction (e.g. mouse click, mouse over etc.) In combination with obfuscated JavaScript code this is hard to

    scrape!

    Current solutions:

    Browser scripting (can be slow and cumbersome) - e.g. withselenium

    Headless browser - e.g. mechanize, spidermonkey, HtmlUnit

    26 of 54

  • 8/2/2019 519502.prezentacija_webmining

    46/107

    Client side generated content

    Some sites use Ajax to generate data on page load or user

    interaction (e.g. mouse click, mouse over etc.) In combination with obfuscated JavaScript code this is hard to

    scrape!

    Current solutions:

    Browser scripting (can be slow and cumbersome) - e.g. withselenium Headless browser - e.g. mechanize, spidermonkey, HtmlUnit

    26 of 54

  • 8/2/2019 519502.prezentacija_webmining

    47/107

    Open APIs

    Most popular social web sites have an open API that allow you

    to connect your application to their site (Facebook, Twitter,

    Wikipedia, ...)

    27 of 54

    O API

  • 8/2/2019 519502.prezentacija_webmining

    48/107

    Open APIs

    Most popular social web sites have an open API that allow you

    to connect your application to their site (Facebook, Twitter,

    Wikipedia, ...)

    Still such APIs often have drawbacks for network datacollection:

    27 of 54

    O API

  • 8/2/2019 519502.prezentacija_webmining

    49/107

    Open APIs

    Most popular social web sites have an open API that allow you

    to connect your application to their site (Facebook, Twitter,

    Wikipedia, ...)

    Still such APIs often have drawbacks for network datacollection: Limited access rights (e.g. Facebook)

    27 of 54

    O API

  • 8/2/2019 519502.prezentacija_webmining

    50/107

    Open APIs

    Most popular social web sites have an open API that allow you

    to connect your application to their site (Facebook, Twitter,

    Wikipedia, ...)

    Still such APIs often have drawbacks for network datacollection: Limited access rights (e.g. Facebook) Limited queries per time interval (e.g. Twitter) etc.

    27 of 54

    O tli

  • 8/2/2019 519502.prezentacija_webmining

    51/107

    Outline

    Introduction

    Web 2.0, Semantic Web, Web 3.0

    Network science

    Data collection

    Data analysisNetwork matrices

    Folksonomy modelLogic programming

    Case study - CROSBI

    28 of 54

  • 8/2/2019 519502.prezentacija_webmining

    52/107

    Networks are most often analyzed using graph theory:

    DefinitionA graphG is the pair(N, E) wherebyN represents the set of verticles ornodes, andE N N the set of edges connecting pairs fromN.

    29 of 54

  • 8/2/2019 519502.prezentacija_webmining

    53/107

    Networks are most often analyzed using graph theory:

    DefinitionA graphG is the pair(N, E) wherebyN represents the set of verticles ornodes, andE N N the set of edges connecting pairs fromN.

    DefinitionA directed graph or digraphG is the pair(N, A), wherebyN represents

    the set of nodes, andA N N the set of ordered pairs of elementsfromN that represent the set of graph arcs.

    29 of 54

  • 8/2/2019 519502.prezentacija_webmining

    54/107

    Networks are most often analyzed using graph theory:

    DefinitionA graphG is the pair(N, E) wherebyN represents the set of verticles ornodes, andE N N the set of edges connecting pairs fromN.

    DefinitionA directed graph or digraphG is the pair(N, A), wherebyN represents

    the set of nodes, andA N N the set of ordered pairs of elementsfromN that represent the set of graph arcs.

    DefinitionA valued or weighted graph GV is the triple(N, A, V) wherebyNrepresents the set of nodes or verticles, A N N the set of pairs of

    elements fromN that represent the set of graph arcs, and V : N R afunction that attaches values or weights to nodes.

    29 of 54

  • 8/2/2019 519502.prezentacija_webmining

    55/107

    DefinitionA bipartite graphG = (N, E) is a graph for which set of nodes thereexists an partitionN = {X, Y}, such that every arc has one end in X, aand the other in Y .

    30 of 54

  • 8/2/2019 519502.prezentacija_webmining

    56/107

    DefinitionA bipartite graphG = (N, E) is a graph for which set of nodes thereexists an partitionN = {X, Y}, such that every arc has one end in X, aand the other in Y .

    DefinitionA multigraphG = (N, E) is a graph in whichE is a multiset, e.g. therecan exist multiple edges between two nodes.

    30 of 54

    Matrix representation

  • 8/2/2019 519502.prezentacija_webmining

    57/107

    Matrix representation

    DefinitionLetG be a graph defined with the set of nodes {n1, n2, ..., nm} and edges{e1, e2, ..., el}. For every i,j (1 i m and1 j m) we define

    aij =

    1, if there is an edge between nodes ni and nj

    0, otherwise

    Matrix A = [aij] is then the adjacency matrix of graphG. The matrix isymmetric since if there is an edge between nodes ni and nj then clearly

    there is also an edge between nj and ni. Thus A = [aij] = [aji].

    31 of 54

    Simple graph

  • 8/2/2019 519502.prezentacija_webmining

    58/107

    Simple graph

    32 of 54

    Digraph

  • 8/2/2019 519502.prezentacija_webmining

    59/107

    Digraph

    33 of 54

    Digraph

  • 8/2/2019 519502.prezentacija_webmining

    60/107

    Digraph

    34 of 54

    Weighted graph

  • 8/2/2019 519502.prezentacija_webmining

    61/107

    Weighted graph

    35 of 54

    Bipartite graph

  • 8/2/2019 519502.prezentacija_webmining

    62/107

    Bipartite graph

    36 of 54

    Multigraph

  • 8/2/2019 519502.prezentacija_webmining

    63/107

    Multigraph

    37 of 54

    Matrix algebra and interpretation

  • 8/2/2019 519502.prezentacija_webmining

    64/107

    g p

    Matrices can be transposed, multiplied ...

    38 of 54

    Matrix algebra and interpretation

  • 8/2/2019 519502.prezentacija_webmining

    65/107

    g p

    Matrices can be transposed, multiplied ...

    Examples:P - process flow matrix (p p, p - number of processes)

    38 of 54

    Matrix algebra and interpretation

  • 8/2/2019 519502.prezentacija_webmining

    66/107

    g p

    Matrices can be transposed, multiplied ...

    Examples:P - process flow matrix (p p, p - number of processes)W - co-worker matrix (w w, w - number of workers)

    38 of 54

    Matrix algebra and interpretation

  • 8/2/2019 519502.prezentacija_webmining

    67/107

    g p

    Matrices can be transposed, multiplied ...

    Examples:P - process flow matrix (p p, p - number of processes)W - co-worker matrix (w w, w - number of workers)O - process responsibility matrix (p z)

    38 of 54

    Matrix algebra and interpretation

  • 8/2/2019 519502.prezentacija_webmining

    68/107

    Matrices can be transposed, multiplied ...

    Examples:P - process flow matrix (p p, p - number of processes)W - co-worker matrix (w w, w - number of workers)O - process responsibility matrix (p z)WW - co-workers of co-workers

    38 of 54

    Matrix algebra and interpretation

  • 8/2/2019 519502.prezentacija_webmining

    69/107

    Matrices can be transposed, multiplied ...

    Examples:P - process flow matrix (p p, p - number of processes)W - co-worker matrix (w w, w - number of workers)O - process responsibility matrix (p z)WW - co-workers of co-workers

    PT - inversed process flow

    38 of 54

    Matrix algebra and interpretation

  • 8/2/2019 519502.prezentacija_webmining

    70/107

    Matrices can be transposed, multiplied ...

    Examples:P - process flow matrix (p p, p - number of processes)W - co-worker matrix (w w, w - number of workers)O - process responsibility matrix (p z)WW - co-workers of co-workers

    PT - inversed process flow

    OTO - matrix of workers who are responsible for the same

    processes

    38 of 54

    Matrix algebra and interpretation

  • 8/2/2019 519502.prezentacija_webmining

    71/107

    Matrices can be transposed, multiplied ...

    Examples:P - process flow matrix (p p, p - number of processes)W - co-worker matrix (w w, w - number of workers)O - process responsibility matrix (p z)WW - co-workers of co-workers

    PT - inversed process flow

    OTO - matrix of workers who are responsible for the same

    processes

    PO - matrix of workers who are responsible for the forthcoming

    processes

    38 of 54

    Matrix algebra and interpretation

  • 8/2/2019 519502.prezentacija_webmining

    72/107

    Matrices can be transposed, multiplied ...

    Examples:P - process flow matrix (p p, p - number of processes)W - co-worker matrix (w w, w - number of workers)O - process responsibility matrix (p z)WW - co-workers of co-workers

    PT - inversed process flow

    OTO - matrix of workers who are responsible for the same

    processes

    PO - matrix of workers who are responsible for the forthcoming

    processesPTO - matrix of workers who are responsible for the previous

    processes

    38 of 54

    Matrix algebra and interpretation

  • 8/2/2019 519502.prezentacija_webmining

    73/107

    Matrices can be transposed, multiplied ...

    Examples:P - process flow matrix (p p, p - number of processes)W - co-worker matrix (w w, w - number of workers)O - process responsibility matrix (p z)WW - co-workers of co-workers

    PT - inversed process flow

    OTO - matrix of workers who are responsible for the same

    processes

    PO - matrix of workers who are responsible for the forthcoming

    processesPTO - matrix of workers who are responsible for the previous

    processes

    ...

    38 of 54

    ACI model (Mika 2007)

  • 8/2/2019 519502.prezentacija_webmining

    74/107

    A - actors

    39 of 54

    ACI model (Mika 2007)

  • 8/2/2019 519502.prezentacija_webmining

    75/107

    A - actors C - concepts

    39 of 54

    ACI model (Mika 2007)

  • 8/2/2019 519502.prezentacija_webmining

    76/107

    A - actors C - concepts

    I - instances

    39 of 54

    ACI model (Mika 2007)

  • 8/2/2019 519502.prezentacija_webmining

    77/107

    A - actors C - concepts

    I - instances

    T A C I - folksonomy

    39 of 54

    ACI model (Mika 2007)

  • 8/2/2019 519502.prezentacija_webmining

    78/107

    A - actors C - concepts

    I - instances

    T A C I - folksonomyA tripartite hypergraph which can be reduced into three bipartite

    graphs:

    39 of 54

    ACI model (Mika 2007)

  • 8/2/2019 519502.prezentacija_webmining

    79/107

    A - actors C - concepts

    I - instances

    T A C I - folksonomyA tripartite hypergraph which can be reduced into three bipartite

    graphs:

    AC - network of actors and concepts

    39 of 54

    ACI model (Mika 2007)

  • 8/2/2019 519502.prezentacija_webmining

    80/107

    A - actors C - concepts

    I - instances

    T A C I - folksonomyA tripartite hypergraph which can be reduced into three bipartite

    graphs:

    AC - network of actors and concepts

    AI - network of actors and instances

    39 of 54

    ACI model (Mika 2007)

  • 8/2/2019 519502.prezentacija_webmining

    81/107

    A - actors C - concepts

    I - instances

    T A C I - folksonomyA tripartite hypergraph which can be reduced into three bipartite

    graphs:

    AC - network of actors and concepts

    AI - network of actors and instances CI - network of concepts and instances

    39 of 54

    Delicious - analysis

  • 8/2/2019 519502.prezentacija_webmining

    82/107

    40 of 54

    Problems with matrix representation

  • 8/2/2019 519502.prezentacija_webmining

    83/107

    41 of 54

    Problems with matrix representation

  • 8/2/2019 519502.prezentacija_webmining

    84/107

    Only the 1-s contain information

    41 of 54

    Sparse matrix

  • 8/2/2019 519502.prezentacija_webmining

    85/107

    42 of 54

    Problems with sparse matrix

  • 8/2/2019 519502.prezentacija_webmining

    86/107

    If A is connected to B, then B is connected to A (undirected

    graphs)

    43 of 54

    Problems with sparse matrix

  • 8/2/2019 519502.prezentacija_webmining

    87/107

    If A is connected to B, then B is connected to A (undirected

    graphs)

    How to find friends of friends?

    43 of 54

    Problems with sparse matrix

  • 8/2/2019 519502.prezentacija_webmining

    88/107

    If A is connected to B, then B is connected to A (undirected

    graphs)

    How to find friends of friends? Is there a path from node A to node B? Which path is that?

    43 of 54

    Problems with sparse matrix

  • 8/2/2019 519502.prezentacija_webmining

    89/107

    If A is connected to B, then B is connected to A (undirected

    graphs)

    How to find friends of friends? Is there a path from node A to node B? Which path is that?

    How to compute the previously outlined graphs?

    43 of 54

    Problems with sparse matrix

  • 8/2/2019 519502.prezentacija_webmining

    90/107

    If A is connected to B, then B is connected to A (undirected

    graphs)

    How to find friends of friends? Is there a path from node A to node B? Which path is that?

    How to compute the previously outlined graphs?

    43 of 54

    Logic programming as a solution

  • 8/2/2019 519502.prezentacija_webmining

    91/107

    Deductive languages like Prolog, FLORA-2

    44 of 54

    Logic programming as a solution

  • 8/2/2019 519502.prezentacija_webmining

    92/107

    Deductive languages like Prolog, FLORA-2 Advantages:

    Declarative problem solving (on a higher level) Very appropriate for expressing mathematical structures

    44 of 54

    Logic programming as a solution

  • 8/2/2019 519502.prezentacija_webmining

    93/107

    Deductive languages like Prolog, FLORA-2 Advantages:

    Declarative problem solving (on a higher level) Very appropriate for expressing mathematical structures

    Disadvantages: Combinatoric explosion (use heuristics and predicate tabling)

    44 of 54

    Examples

  • 8/2/2019 519502.prezentacija_webmining

    94/107

    If A is connected to B, then B is connected to A

    link( A, B ) :- link( B, A ).

    45 of 54

    Examples

  • 8/2/2019 519502.prezentacija_webmining

    95/107

    Friend of a friend

    ff( A, B ) :- link( A, C ), link( C, B ).

    46 of 54

    Examples

  • 8/2/2019 519502.prezentacija_webmining

    96/107

    Is there a path from node A to node B?

    path( A, B ) :- link( A, B ).path( A, B ) :- link( A, C ), path( C, B ).

    47 of 54

    Examples

  • 8/2/2019 519502.prezentacija_webmining

    97/107

    Which path is that?

    whichpath( X, Y, [ X, Y ] ) :- link( X, Y ).

    whichpath( X, Y, [ X, Z | T ] ) :-

    link( X, Z ),

    whichpath( Z, Y, [ Z | T ] ).

    48 of 54

    Outline

  • 8/2/2019 519502.prezentacija_webmining

    98/107

    Introduction

    Web 2.0, Semantic Web, Web 3.0

    Network science

    Data collection

    Data analysis

    Case study - CROSBI

    49 of 54

    Project outline

  • 8/2/2019 519502.prezentacija_webmining

    99/107

    Croatian Scientific Bibliography (http://bib.irb.hr)

    50 of 54

    Project outline

    http://bib.irb.hr/http://bib.irb.hr/
  • 8/2/2019 519502.prezentacija_webmining

    100/107

    Croatian Scientific Bibliography (http://bib.irb.hr) Scientists enter data about their research activities by them

    selves which makes it a social application

    50 of 54

    Project outline

    http://bib.irb.hr/http://bib.irb.hr/
  • 8/2/2019 519502.prezentacija_webmining

    101/107

    Croatian Scientific Bibliography (http://bib.irb.hr) Scientists enter data about their research activities by them

    selves which makes it a social application

    Research hypotheses:

    50 of 54

    Project outline

    http://bib.irb.hr/http://bib.irb.hr/
  • 8/2/2019 519502.prezentacija_webmining

    102/107

    Croatian Scientific Bibliography (http://bib.irb.hr) Scientists enter data about their research activities by them

    selves which makes it a social application

    Research hypotheses:

    1. analyze two social systems (the Croatian scientific community and

    the Croatian public) and see how the most important researchconcepts (as viewed by the scientific community) are interpretedby the public (does the public understand most important conceptsfrom Croatian science in the last decade?).

    50 of 54

    Project outline

    http://bib.irb.hr/http://bib.irb.hr/
  • 8/2/2019 519502.prezentacija_webmining

    103/107

    Croatian Scientific Bibliography (http://bib.irb.hr) Scientists enter data about their research activities by them

    selves which makes it a social application

    Research hypotheses:

    1. analyze two social systems (the Croatian scientific community and

    the Croatian public) and see how the most important researchconcepts (as viewed by the scientific community) are interpretedby the public (does the public understand most important conceptsfrom Croatian science in the last decade?).

    2. analyze how various scientific fields are conceptually and socially

    interrelated (do scientists do interdisciplinary research together ifthey have conceptually connected fields?)

    50 of 54

    Methodology

    http://bib.irb.hr/http://bib.irb.hr/
  • 8/2/2019 519502.prezentacija_webmining

    104/107

    Social systems are represented by their trails: CROSBI as the

    Croatian scientific community, Croatian Wikipedia as the

    Croatian public

    One system can interpret concepts from another if the concept

    exists within its conceptual network (e.g. keyword or wiki page)

    51 of 54

    Data collection

  • 8/2/2019 519502.prezentacija_webmining

    105/107

    CROSBI data has been collected using Scrapy in November

    2010 (a total of 285,234 entries has been collected)

    Wikipedia data has been collected using the Wikipedia API +Scrapy for parsing

    52 of 54

    Results

  • 8/2/2019 519502.prezentacija_webmining

    106/107

    53 of 54

    Bibliography

  • 8/2/2019 519502.prezentacija_webmining

    107/107

    1. Adamic, L.: Why networks are interesting to study, University of Michigan, School of Information,https://open.umich.edu/education/si/si508/fall2008

    2. Barratm M., Barthlemy, M., Vespignani, A.: Dynamical Processes on Complex Networks, Cambridge UniversityPress, 2008.

    3. Carley, K.M.: Dynamic Network Analysis, Summary of the NRC workshop on social network modeling andanalysis, Committee on Human Factors, National Research Council, 133145, Eds: Breiger, R., Carley, K.M., andPattison, P., 2003.

    4. Divjak, B., Lovrencic, A.: Diskretna matematika s teorijom grafova, TIVA & FOI, 2005

    5. Krackhardt, David, and Carley, Kathleen M.: A PCANS Model of Structure in Organization, Proceedings of the1998 International Symposium on Command and Control Research and Technology, Evidence Based Research,Vienna, VA, June 1998

    6. Mika, Peter: Ontologies are us: A unified model of social networks and semantics, Web Semantics: Science,Services and Agents on the World Wide Web 5(1), volume 5, 515, March 2007

    7. Newman, M., Barabsi, A.-L., Watts, D. j.: The Structure and Dynamics of Networks, Princeton University Press,2006.

    8. Various web sites (images, graphs)

    54 of 54

    https://open.umich.edu/education/si/si508/fall2008https://open.umich.edu/education/si/si508/fall2008