Pattern recognition seeks to identify and model regularities in empirical data by algorithmic processes. Successful application of the established methods requires good understanding of their behavior and also how well they match the application context. Difficulties can arise from either the intrinsic complexity of a problem or a mismatch of methods to problems. We describe some measures that can characterize the intrinsic complexity of a classification problem and its relationship to classifier performance. The measures revealed that a collection of real-world problems can span an interesting continuum between those easily learnable to those with no learning possible. We discuss our results on identifying the domains of dominant competence of several popular classifiers in this measurement space.
Advances in digital imaging technologies have led to accumulations of large data archives with rich multimedia contents, enabling both targeted pursuits and open-ended explorations of many kinds. A recent example is the Virtual Observatory that supports sharing of diverse and massive databases containing images, spectra, and catalogs among astronomical researchers. To maximize its advantages, flexible and effective data analysis tools that can handle large data volumes, diverse data types, a wide range of objectives, and highly variable demands on speed are in critical need. We discuss our experiences with Mirage (http://www.cs.bell-labs.com/who/tkh/mirage), a prototypical software for interactive pattern discovery, and its applications in the Virtual Observatory. We focus on how to organize the analysis tool to lay a solid foundation for meeting these requirements and enabling continuous growth.
Learning in everyday life is often accomplished by making many random guesses and synthesizing the feedback. Kleinberg's analysis of this process resulted in a new method for classifier design -- stochastic discrimination (SD). The method constructs an accurate classifier by combining a large number of very weak discriminators that are generated essentially at random. An important advantage is that classifiers designed by this way are insensitive to overtraining.
SD is an ensemble learning method in an extreme form. Studies on other ensemble methods for classification have long suffered from the difficulty of modeling the complementary strengths of the components. The SD theory addresses this rigorously via mathematical notions of enrichment, uniformity, and projectability.
In this tutorial we explain these concepts via a simple numerical example, with a focus on a fundamental symmetry in point set covering that is the key observation leading to the foundation of the SD theory. We illustrate, step by step, how the SD principle operates in this example. We then describe and discuss more sophisticated implementations for practical uses. We believe a basic understanding of the SD method will open the way to explorations of a new classifier technology, and lead to developments of better tools for analyzing other ensemble methods.
Multiple classifier systems are of two catagories. Decision optimization methods try to make best use of the decisions of several given, specialized classifiers. Coverage optimization methods try to generate sufficient classifiers to cover all blind spots. The central difficulty in both approaches is how to model and promote complementary advantages of different classifiers. Kleinberg's theory on stochastic discrimination offers a way to analyze the intricate relationship of three key concerns in pattern recognition: discriminative power, complementary information, and generalization power. I present an outline of this theory using a numerical example, with an emphasis on illustrating the symmetry of probabilities in feature or model spaces.
Despite encouragements from recent progress in ensemble approaches, classification methods seem to have reached a plateau of development. I argue that further advances depend on a better understanding of geometrical and topological characteristics of point sets in high-dimensional spaces, the preservation of such characteristics under feature transformations and sampling processes, and their interaction with geometrical models used in classifiers. I describe my attempts to measure such properties from data sets and relate them to classifier accuracies.
Multiple classifier methods are effective solutions to difficult pattern recognition problems. However, empirical successes and failures have not been completely explained. Amid the excitement and confusion, uncertainty persists in the optimality of method choices for specific problems due to strong data dependences of classifier performance. In response to this, I propose that further exploration of the methodology be guided by detailed descriptions of geometrical characteristics of data and classifier models.