DeFFeD

Sequence Classification Based on Delta-Free Sequential Patterns

Pierre Holat, Marc Plantevit, Chedy raïssi, Nadi Tomeh, Thierry Charnois and Bruno Crémilleux

14th IEEE Int. Conf. on Data Mining series (ICDM 2014), IEEE Computer Society Press, Shenzhen, China, December 2014.
To compile DeFFeD from the source code, follow the next steps.

  • Dowload the archive here
  • Extract the content in a folder of your choice
  • Make sure you have the Java JDK properly installed
  • DeFFeD use the library "jargs", a convenient command line option parsers. It is packaged in the archive, but you can also get it here
  • In a console enter the following:
    $ cd ./DeFFeD/src
    $ javac -cp ../lib/jargs.jar:. *.java
    $ jar cfm ../DeFFeD.jar Manifest.txt *.class
    $ cd ..
  • You can now run the binary file:
    $ java -jar DeFFeD.jar

Usage: DeFFeD [-f, --filename STRING VALUE] [-s, --support DOUBLE VALUE ] [-d, --delta INTEGER VALUE ] [-b, --backward_pruning] [-D,--Debug] [{-v,--verbose}] [-c, -csv COMMA SEPARATED LINES OF CLASSES] [-r,--rules COMMA SEPARATED LINES OF CLASSES]
The -f (filename), -s (support) and -d (delta) are the main options for extracting delta-free sequential patterns.
The -c and -f options should only be used for early-classification.

Example on the Premier League data set with a minimum support of 0.75 and a delta of 10. (c.f. Fig 4 in the paper)
$ java -jar DeFFeD.jar -f Premier_League.txt -s 0.75 -d 10
Data need to be encode in a specific format :
  • an item (word, etc...) need to be represented as an integer
  • -1 represents the end of an itemset
  • -2 represents the end of a sequence
You can see the details of the format in the datasets used in the paper :
The data sets S50TR2SL10IT10K and S100TR2SL10IT10K are generated with the IBM Quest Software. The PremierLeague data set is a collection of se-quences of football games played in England in the last 4 years. The ROBOT and PIONEER data sets are downloaded from the UCI Machine Learning Repository
The DEFT08 data set is available for purchase at the European Language Resources Association.