ARco: Association Rules collaborative tool
ARco is a versatile, complete and powerful suite for the production of association rules with particular focus on gene-expression data combining both gene related descriptors and sample metadata.
ARco (stands for Association Rules collaborative tool) integrates the typical steps on KDD procedures. The KDD process involves different steps, from the selection of appropriated data, coding the data and identifying patterns.
First step of KDD process deals with original data that must be appropriated selected, filtered and transform ed into a transaction file. Each transaction on this file is an experim ental observation consisting of a set of items (e.g., item s that appears together in the sam e shopping; or in the sam e cash-register ticket, values in a row of a gene-expression file; or a set of keywords belonging to particular proteins, etc).
In this step prior knowledge of the application dom ain allows cleaning and pre-processing the data set by removing or filling incom plete data, or by data reduction and transformation using the m ain item features, or applying dim ensionality or variable reduction, invariant representation; etc A second step is focused on finding frequent itemsets, this is to say, item collections that appear together in different transactions more frequent than a given m inim al support threshold. This step fits well with a typical pattern finding procedure.
Then in a third step frequent itemsets are com bined to produce a set of association rules whose antecedents im ply the consequent with a given probability (confidence) Rules are analysed in the last step by he expert by using browsing and filtering procedures. The new knowledge is norm ally used to fine-tuning param eters and selection criterions in the iterative KDD process.