Danushka Bollegala




 

 Licensing and Intended use

Below, you can find the implementations for various algorithms and systems proposed in my papers. These software are provided under the modified BSD license. Please feel free to use them in your research work. I would be grateful if you can kindly let me know when you use these software and cite the appropriate papers. Moreover, please let me know if you find any bugs in these software. Thanks in advance.
 

 Evaluating Sentence Orderings

We introduced a novel measure, average continuity, to compare a set of ordered sentences by an automatic multi-document summarization system. Download a python script to compute average continuity as well as other popular correlation coefficients such as, Kendall rank correlation coefficient and Spearman coefficient with confidence intervals.
 

 Sequential Greedy Clustering

We proposed a sequential greedy clustering algorithm to efficiently cluster a large number of lexical patterns in our WWW2009 paper. Download the archive and decompress it. Inside you will find a python script (seqclust.py) and a sample data file. To run the clustering algorithm use the following command.
$ python seqclust.py -i input_file -c output_file
input_file must be in a sparse matrix format, where each row starts with the row_id and the rest of the elements in a row are values of each element in that row. A colon is used to separate the colum_id from the value. This is the format used by most machine learning classification programs to specify features for training instances (row_id is replaced with the class label). Neither row ids nor column ids are required to be sorted in any particular order. For example, the following line represents the first row of the matrix, the elements (1,2) being 5.
1 1:10 2:5 6:10 10:12
The columns will be sequentially clustered by the proposed clustering algorithm and the column ids in each cluster will be written to the output file in CSV format. A sample data matrix is provided in the above download. To cluster the sample data matrix execute the following command.
$ python seqclust.py -i wpair.matrix -c clusters.results