MinerLC/MinerLSD: closed pattern miners in attributed graphs

Dominique Bouthinon, Guillaume Santini, Henry Soldano

Overview

Enumerating abstract closed patterns and bi-patterns

The minerLC software aims at enumerating the abstract closed patterns in an attributed graph.

In attributed graphs addressed by minerLC, a vertex is described as an itemset, i.e. a subset of a set of items. Whenever the vertex labels are not itemsets, some discretization has then to be performed as a pre-processing step.

In standard closed itemset mining, a closed pattern is the largest itemset common to an object subset. When the object are the vertices of a graph, such a vertex subset induces a subgraph. In abstract closed pattern mining, this subgraph is reduced to its core subgraph, according to some core definition, as for instance, the k-core defined by [Seidman1983]. An abstract closed pattern (also called a core closed pattern) is then the largest pattern common to the vertices of the core subgraph derived from some vertex subset.

Technically, when considering some itemset q, to obtain an abstract closed pattern we build the vertex subset ext (q) in which q occurs, then compute the core p(ext(q)) of the subgraph induced by ext(q) using the core operator p. Finally, we compute the associated abstract closed pattern by applying the intersection operator int. To sumarize, to any itemset q we associate:

the abstract closed pattern
f(q) = int ( p ( ext(q) ) )
its core subgraph induced by its abstract extension
p ( ext(q) )

The general idea is that by considering core subgraphs we only consider cohesive subnetworks together with the items shared by their vertices, This is exemplified in the small example that follows in which in the core subgraph all vertices have to belong to a triangle:

This idea has been adaptated to directed graphs, by intoriducing an appropriate core, namely the hub-authority core, and two-mode networks. In the latter case, the two vertex sets V1 and V2 (representing for instance respectively actors and movies) are described according to two different sets of items, and this leads to define bi-patterns. An abstract closed bi-pattern is then an itemset pair (q1, q2) such that q1 is the largest itemset common to all vertices from V1 in the core subgraph and q2 is the largest itemset common to all vertices from V2 in the core subgraph.

Interestingly we may also have bi-patterns in directed networks and even in undirected netwroks by defining a core as a vertex subset pairs. This allows for example to look, in the LAWYERS network presented below, to a core subgraph in which young lawyers from Boston ask for advice from older lawyers: components.

Ranking abstract closed patterns and bi-patterns

Abstract closed patterns (and bi-patterns) may be ordered by considering that most interesting patterns are unecpected patterns according to some random model. Here we consider the expected size of the core of a subgraph induced by a random vertex subset of size n drawn from the attributed graph. Given a pattern which occurs in n vertices, we may then consider to what extent the core size is larger than the expected value. Expected core sizes and related standard deviations are computed after the enumeration using a different program.

Computing structural communities associated to abstract closed patterns [Soldano2015]

The core subgraph associated to an abstract closed pattern may contain various structural communities. An obvious definition of a structural community is a connected component, but stronger definitions have been proposed by [Palla2007], namely k-communities.

A k-community is made of k-cliques and within the commuty it is always possible to find a path such that two adjacent k-cliques share a ( k-1)-clique.

We may then define a local closed pattern l as the largest itemset common to a structural community e extracted from the core subgraph associated to some pattern q, and enumerate the triples (l,e,q) we call local rules.

Such a rule expresses the specificity of the structural community e in the pattern q core subgraph:

if l=q the vertices of the community e have nothing more in common than the nodes of the whole core subgraph while
if l contains items that do not belong to q, these items inform us about something specific of the structural community.

We have implemented various structural communities definitions, including 3-communities.

Source files minerLC and minerLSD

This archive contains a Makefile and a directory with a french documentation that describes the formats of input and output files and explains how to compile and run the programs:

minerLC to enumerate close abstract patterns. This program also implements minerLSD algorithm,
supportAttendus to compute expected abstract extensions of abstract patterns,
supportAttendusBi to compute expected abstract extensions of bi-patterns.

Purposes and Command line examples

Single pattern mining of undirected networks [Soldano2014cr], [Soldano2017fcasna]

Example of enumeration of 16-core abstract patterns of the DBLP_P dataset. Core vertices must have at least 16 neighbors in the core subgraph.

bash $> minerLC DBLP_P.nri -d1 16 > DBLP_P.nri.d1_16

Example of computation of expected abstract supports of found patterns using a monte-carlo sampling with size 100 samplings (takes the outpfile generated supra). supportAttendus takes as input the file generated by minerLC with the same parameters.

bash $> supportsAttendus DBLP_P.nri DBLP_P.nri.d1_16 -d1 16 -n 100 > DBLP_P.nri.d1_16_100

Single pattern mining of directed networks [Soldano:2017ab]

Example of enumeration of the 'h-a' HA-core abstract patterns in the LAWYERS dataset. The 'h-a' Hub-Authority core of a directed graph is made of a Hub vertex subset together with an Authority vertex subset. The associated core subgraph is such that all Hub vertices have outdegree at least 'h' towards Authority vertices while all Authority vertices have indegree at least 'a' from Hub vertices. Option -o is required to mine directed graphs.

bash $> minerLC LAWYERS.nri -o -hub_aut1 4 4 > LAWYERS.nri.o_hub_aut1_4_4

Example of computation of expected abstract supports of found patterns using a monte-carlo simulation with size 100 sampling.

bash $> supportsAttendus LAWYERS.nri LAWYERS.nri.o_hub_aut1_4_4 -hub_aut1 4 4 -n 100 > LAWYERS.nri.o_hub_aut1_4_4_100

Single pattern mining of undirected network with local modularity constraint [Atzmueller2018dn]

Example of enumeration of 10-core abstract patterns on the DBLP_C dataset. The -l 0.1 means that only patterns whose core subgraph has local modularity at least 0.1 are output.

bash $> minerLC DBLP_C.nri -lm 0.1 -d1 10 > DBLP_C.nri.lm_0.1_d1_10

Bi-pattern mining of two-mode and directed networks [Soldano2018mn]

Example of enumeration of 6-6- BHA core abstract bi-patterns on the two-mode PtoC_BI dataset.

In a two-mode network there are two vertex sets V1 and V2 and the edges relate vertices in V1 to vertices in V2. The h-a BHA core is a pair (H,A) of vertex subsets where H is a subset of V1 while V2 is a subset of V2. The associated core subgraph is made of the edges relating H to A. All H vertices have degree at least 'h' while all A vertices have degree at least 'a'. A bi-pattern is then a pair (q1,q2) where q1 selects objects in V1 while q2 selects objects in V2.

The format of the input file allwong bi-pattern mining in minerLC is particular and described in the section nri files.

minerLC PtoC.nri -o -hub_aut1 6 6 > PtoC.nri.o_hub_aut1_6_6

Dataset files

Downloadable files should be unziped before use with minerLC programs.

Description / Source	Name	link
Undirected [Michell1996he] Social network from : The Teenage Friends and Lifestyle Study. Original datastes from https://www.stats.ox.ac.uk/~snijders/siena/Glasgow_data.htm. The dataset provided in the nri format derives from the data of the first year of the study. The graph comes from Glasgow-friendship.RData and the labels from Glasgow-substances.RData. Occasional and regular tobacco and drug consumption labels are undifferenciated, as well as alcohol cconsumption once a week and more than once a week.	S50	S50.nri.gz (0.7 Ko)
Undirected [Galbrun2014dm] DBLP co-authorship of publications (DBLP_XL) and selection of the publications at the ICDM conferences (DBLP_C) and song preference of LastFM users. Source code and original datasets from the author's site: http://research.ics.aalto.fi/dmg/lic.tgz	DBLP_C	DBLP_C.nri (172 Ko)
	LASTFM	LASTFM.nri.gz (367 Ko)
	DBLP_XL	DBLP_XL.nri.gz (367 Ko)
Undirected [Prado2013tkde] Orginal Co-authorship labelled graph has been enriched by 3 additional labels that collects interest in domains. Those labels were derived from the original ones. Original dataset avialable from author on request at https://perso.liris.cnrs.fr/marc.plantevit/doku/doku.php?id=data_sets	DBLP_P	DBLP_P.nri.gz (1.5 Mo)
Undirected [Silva2010kdd] Co-authorship Labelled graphs. Source code and original datasets from author's site : https://code.google.com/archive/p/scpm/	DBLP_S	DBLP_S.nri.gz (5.6 Mo)
Undirected Social networking, bookmarking, and tagging information from a set of 2K users from Delicious social bookmarking system.(http://www.delicious.com). Original dataset are downloadable from https://grouplens.org/datasets/hetrec-2011/	DELICIOUS	DELICIOUS.nri.gz (1.8 Ko)
Directed [Lazega1997] Corporate law partnership advice network. Original dataset downloadable from https://www.stats.ox.ac.uk/~snijders/siena/Lazega_lawyers_data.htm	LAWYERS	LAWYERS.nri.gz (1.8 Ko)
Unirected [Lazega1997] Vairiante of the previous Corporate law partnership advice network.	LAWYERS_U
Two-Mode (bi-pattern mining). Network relating particpants to campaigns of the deep sea BENTHOS Program.	PtoC_BI	PtoC_BI.nri.gz
Unidrected (bi-pattern mining).
Directed (bi-pattern mining).

PtoC_BI.nri LAZEGA_BI.nri DBLP_E_BI.nri COEXP_BI.nri

CDF wolfram computable document (interactive)

Hereunder two CDF documents allowing to investigate interactively the 4-4 and 9-9 Hub Authority core closed patterns. In order to "play" the interactive documents you need to download and install the wolfram CDF player from the following URL:
https://www.wolfram.com/cdf-player/

lawer_advice_4_4_1000.cdf [14 Mo]
lawer_advice_9_9_1000.cdf [608 Ko]

References

[Seidman1983]Network structure and minimum degre. Seidman, Stephen B. (1983) Social Networks (5) 269--287.
[Soldano2015]Local knowledge discovery in attributed graphs. Henry Soldano, Guillaume Santini, and Dominique Bouthinon. In Anna Esposito, editor, 27th IEEE International Conference on Tools with Artificial Intelligence, ICTAI, Vietri sul Mare, Italy, November 9-11 2015, pages 250–257. IEEE Computer Society.
[Palla2007]Quantifying social group evolution. Palla, Gergely and Barabasi, Albert-Laszlo and Vicsek, Tamas (2007) Nature (7136) 664--667.
[Soldano2014cr] Graph abstraction for closed pattern mining in attributed network. Henry Soldano and Guillaume Santini. 21st European Conference on Artificial Intelligence (ECAI 2014), Prague, Czech Republic, 849–854, Torsten Schaub and Gerhard Friedrich and Barry O’Sullivan, IOS Press, Frontiers in Artificial Intelligence and Applications, 263, 2014
[Soldano2017fcasna] Formal concept analysis of attributed network.H. Soldano, G. Santini, and D. Bouthinon. Formal Concept Analysis in Social Network Analysis, Lecture Notes in Social Networks, pages 143–170. Springer, 2017.
[Soldano:2017ab] Hub-Authority Cores and Attributed Directed Network Mining.Henry Soldano and Guillaume Santini and Dominique Bouthinon and Emmanuel Lazega. IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI 2017), Boston, MA, USA, IEEE Computer Society, 2017
[Soldano2018mn] Bi-pattern Mining of Two Mode and Directed Networks. Henry Soldano, Guillaume Santini, Dominique Bouthinon, Sophie Bary and Emmanuel Lazega. International workshop on Mining Attributed Networks Lyon, 23 April 2018
[Atzmueller2018dn] MinerLSD: Efficient Local Pattern Mining On Attributed Graphs. Martin Atzmueller, Henry Soldano, Guillaume Santini, Dominique Bouthinon. The Eighth IEEE ICDM Workshop on Data Mining in Networks November 17, 2018, Singapore.
[Michell1996he] Peer pressure to smoke: the meaning depends on the method. Michell, L., and P. West. Health Education Research, 11(1), 39-49
[Galbrun2014dm]Overlapping community detection in labeled graphs.Esther Galbrun, Aristides Gionis, and Nikolaj Tatti (2014). Data Min. Knowl. Discov. 28(5-6): 1586-1610 (2014)
[Prado2013tkde]Mining Graph Topological Patterns: Finding Covariations among Vertex Descriptors. Adriana Prado, Marc Plantevit, Céline Robardet, Jean-François Boulicaut, (2013), IEEE Trans. Knowl. Data Eng. 25(9): 2090-2104.
[Silva2010kdd]Structural correlation pattern mining for large graphs.Arlei Silva, Wagner Meira Jr., Mohammed J. Zaki, (2010), MLG@KDD 2010: 119-126
[Lazega1997]Position in formal structure, personal characteristics and choices of advisors in a law firm: a logistic regression model for dyadic network data. Lazega, Emmanuel and van Duijn, Marijtjen (1997), Social Networks, (4) 375-397.

Contacts

In case of difficulties please contact us at:

guillaume.santini -at- lipn.univ-paris13.fr
henry.soldano -at- lipn.univ-paris13.fr
db -at- lipn.univ-paris13.fr