Biometrical Letters vol. 47(1), 2010, pp. 45-56
WEIGHTED GENERALISED AFFINITY COEFFICIENT IN CLUSTER ANALYSIS OF COMPLEX DATA OF INTERVAL TYPE Áurea Sousa1, Fernando Nicolau2, Helena Bacelar Nicolau3, Osvaldo Silva1 1Department of Mathematics, University of Azores, 9501-855-Ponta Delgada, Portugal, e-mail: aurea@uac.pt; osilva@uac.pt 2Department of Mathematics, FCT, New University of Lisbon, 2829-516-Caparica, Portugal, e-mail: fcnicolau@gmail.com 3Laboratory of Statistics and Data Analysis, FPCE, University of Lisbon, 1649-013-Lisboa, Portugal, e-mail: hbacelar@fpce.ul.pt |
Complex Data Analysis is a relatively new field that provides a range of methods for analysing complex/symbolic data, and can be defined as the extension of standard data analysis to more complex data tables. There are two steps in Complex or Symbolic Data Analysis: i) knowledge extraction from large databases as in Data Mining; and ii) application of new tools to the extracted knowledge in order to extend Data Mining to Knowledge Mining. The weighted generalised affinity coefficient appears to be an appropriate resemblance measure between elements (statistical data units or variables) in cases where we deal with complex data from large databases. In this work we apply two different processes to determine values of the weighted generalised affinity coefficient in the case where we are dealing with data units described by variables whose values are intervals of the real axis.
We present one example concerned with real data (with a known structure) in the field of Biometry, in which objects are described by variables whose values are intervals, in order to illustrate the effectiveness of Ascendant Hierarchical Cluster Analysis based on the weighted generalised affinity coefficient and classical and/or probabilistic aggregation criteria. In this example, we applied a method of validation to identify the best partitions.
We present one example concerned with real data (with a known structure) in the field of Biometry, in which objects are described by variables whose values are intervals, in order to illustrate the effectiveness of Ascendant Hierarchical Cluster Analysis based on the weighted generalised affinity coefficient and classical and/or probabilistic aggregation criteria. In this example, we applied a method of validation to identify the best partitions.
Cluster Analysis, VL Methodology, Weighted Generalised Affinity Coefficient, Symbolic Data, Measures of Validation.