Ticket #19 (new task)

Opened 2 years ago

Last modified 1 year ago

identify frequent expressions in documents (after the tokenizer)

Reported by: wiking Assigned to: somebody
Priority: normal Milestone: major enhancements
Component: vectorizer Version:
Severity: normal Keywords:
Cc:

Description

Implementing an apriori algorithm on the sequence of tokens in order to find frequent expressions (sequences of words). See also the codes of hitec1. check out the public apriori implementation of Ferenc Bodon (http://www.cs.bme.hu/~bodon/en/apriori/index.html)

Change History

12/21/08 13:39:15 changed by wiking

  • component changed from libhitec to vectorizer.