The minerLC software aims at enumerating the abstract closed patterns in an attributed graph.
In attributed graphs addressed by minerLC, a vertex is described as an itemset, i.e. a subset of a set of items. Whenever the vertex labels are not itemsets, some discretization has then to be performed as a pre-processing step.
In standard closed itemset mining, a closed pattern is the largest itemset common to an object subset. When the object are the vertices of a graph, such a vertex subset induces a subgraph. In abstract closed pattern mining, this subgraph is reduced to its core subgraph, according to some core definition, as for instance, the k-core defined by [Seidman1983]. An abstract closed pattern (also called a core closed pattern) is then the largest pattern common to the vertices of the core subgraph derived from some vertex subset.
Technically, when considering some itemset q, to obtain an abstract closed pattern we build the vertex subset ext (q) in which q occurs, then compute the core p(ext(q)) of the subgraph induced by ext(q) using the core operator p. Finally, we compute the associated abstract closed pattern by applying the intersection operator int. To sumarize, to any itemset q we associate:
The general idea is that by considering core subgraphs we only consider cohesive subnetworks together with the items shared by their vertices, This is exemplified in the small example that follows in which in the core subgraph all vertices have to belong to a triangle:
This idea has been adaptated to directed graphs, by intoriducing an appropriate core, namely the hub-authority core, and two-mode networks. In the latter case, the two vertex sets V1 and V2 (representing for instance respectively actors and movies) are described according to two different sets of items, and this leads to define bi-patterns. An abstract closed bi-pattern is then an itemset pair (q1, q2) such that q1 is the largest itemset common to all vertices from V1 in the core subgraph and q2 is the largest itemset common to all vertices from V2 in the core subgraph.
Interestingly we may also have bi-patterns in directed networks and even in undirected netwroks by defining a core as a vertex subset pairs. This allows for example to look, in the LAWYERS network presented below, to a core subgraph in which young lawyers from Boston ask for advice from older lawyers: components.
Abstract closed patterns (and bi-patterns) may be ordered by considering that most interesting patterns are unecpected patterns according to some random model. Here we consider the expected size of the core of a subgraph induced by a random vertex subset of size n drawn from the attributed graph. Given a pattern which occurs in n vertices, we may then consider to what extent the core size is larger than the expected value. Expected core sizes and related standard deviations are computed after the enumeration using a different program.
The core subgraph associated to an abstract closed pattern may contain various structural communities. An obvious definition of a structural community is a connected component, but stronger definitions have been proposed by [Palla2007], namely k-communities.
A k-community is made of k-cliques and within the commuty it is always possible to find a path such that two adjacent k-cliques share a ( k-1)-clique.
We may then define a local closed pattern l as the largest itemset common to a structural community e extracted from the core subgraph associated to some pattern q, and enumerate the triples (l,e,q) we call local rules.
Such a rule expresses the specificity of the structural community e in the pattern q core subgraph:
We have implemented various structural communities definitions, including 3-communities.
Example of enumeration of 16-core abstract patterns of the DBLP_P dataset. Core vertices must have at least 16 neighbors in the core subgraph.
Example of computation of expected abstract supports of found patterns using a monte-carlo sampling with size 100 samplings (takes the outpfile generated supra). supportAttendus takes as input the file generated by minerLC with the same parameters.
Example of enumeration of the 'h-a' HA-core abstract patterns in the LAWYERS dataset. The 'h-a' Hub-Authority core of a directed graph is made of a Hub vertex subset together with an Authority vertex subset. The associated core subgraph is such that all Hub vertices have outdegree at least 'h' towards Authority vertices while all Authority vertices have indegree at least 'a' from Hub vertices. Option -o is required to mine directed graphs.
Example of computation of expected abstract supports of found patterns using a monte-carlo simulation with size 100 sampling.
Example of enumeration of 10-core abstract patterns on the DBLP_C dataset. The -l 0.1 means that only patterns whose core subgraph has local modularity at least 0.1 are output.
Example of enumeration of 6-6- BHA core abstract bi-patterns on the two-mode PtoC_BI dataset.
In a two-mode network there are two vertex sets V1 and V2 and the edges relate vertices in V1 to vertices in V2. The h-a BHA core is a pair (H,A) of vertex subsets where H is a subset of V1 while V2 is a subset of V2. The associated core subgraph is made of the edges relating H to A. All H vertices have degree at least 'h' while all A vertices have degree at least 'a'. A bi-pattern is then a pair (q1,q2) where q1 selects objects in V1 while q2 selects objects in V2.
The format of the input file allwong bi-pattern mining in minerLC is particular and described in the section nri files.
|Description / Source||Name||link|
Social network from : The Teenage Friends and Lifestyle Study.
Original datastes from https://www.stats.ox.ac.uk/~snijders/siena/Glasgow_data.htm.
The dataset provided in the nri format derives from the data of the first year of the study. The graph comes from Glasgow-friendship.RData and the labels from Glasgow-substances.RData. Occasional and regular tobacco and drug consumption labels are undifferenciated, as well as alcohol cconsumption once a week and more than once a week.
|S50||S50.nri.gz (0.7 Ko)|
DBLP co-authorship of publications (DBLP_XL) and selection of the publications at the ICDM conferences (DBLP_C) and song preference of LastFM users.
Source code and original datasets from the author's site: http://research.ics.aalto.fi/dmg/lic.tgz
|DBLP_C||DBLP_C.nri (172 Ko)|
|LASTFM||LASTFM.nri.gz (367 Ko)|
|DBLP_XL||DBLP_XL.nri.gz (367 Ko)|
Orginal Co-authorship labelled graph has been enriched by 3 additional labels that collects interest in domains. Those labels were derived from the original ones.
Original dataset avialable from author on request at https://perso.liris.cnrs.fr/marc.plantevit/doku/doku.php?id=data_sets
|DBLP_P||DBLP_P.nri.gz (1.5 Mo)|
Co-authorship Labelled graphs.
Source code and original datasets from author's site : https://code.google.com/archive/p/scpm/
|DBLP_S||DBLP_S.nri.gz (5.6 Mo)|
Social networking, bookmarking, and tagging information from a set of 2K users from Delicious social bookmarking system.(http://www.delicious.com).
Original dataset are downloadable from https://grouplens.org/datasets/hetrec-2011/
|DELICIOUS||DELICIOUS.nri.gz (1.8 Ko)|
Corporate law partnership advice network.
Original dataset downloadable from https://www.stats.ox.ac.uk/~snijders/siena/Lazega_lawyers_data.htm
|LAWYERS||LAWYERS.nri.gz (1.8 Ko)|
Vairiante of the previous Corporate law partnership advice network.
|Two-Mode (bi-pattern mining).
Network relating particpants to campaigns of the deep sea BENTHOS Program.
|Unidrected (bi-pattern mining).
|Directed (bi-pattern mining).
In case of difficulties please contact us at: