## Classification Algorithms

As mentioned above, the computational algorithm must be designed to spot alterations in the brain activity that relate to a global change of states. This activity is represented by the set of binary time series taken from many neurons, i.e., by spatiotemporal patterns. Thus, we have the pattern classification problem mentioned above. For an algorithm to be useful, it must be optimized to (1) determine the minimal number of hypotheses (possible functional states) concerning the data set; (2) economize on data storage and subsequent data manipulation/calculation; (3) scale for increasing data sets and for the number of functional states; and (4) be robust. The approach to the problem we propose below is based on cluster analysis (Kaufman 1990) and measures of dissimilarity between time series (see, for example, Kantz 1994; Schreiber 1997; Schreiber and Schmitz 1997).

In the first step, the data set will be split into J short time intervals by shifting a time window of length T The time scale T can be varied for different purposes, and its choice is a compromise between speed and reliability in data analysis. Each window will be referred to as "an object" or entity, assuming that a window encompasses an unchanged functional state. Assuming a correct set of hypotheses concerning the number of clusters, K, (e.g., for three global functional states: wakefulness, sleep, and uncertain state, K=3), the J different objects must be related to K functional states.

The algorithm starts with K random clusters and then moves objects between those clusters in order to split objects into clusters such that variance in each cluster would be minimal, while variance between clusters would be maximal. This can be realized by minimization of the so-called cost function (Schreiber and Schmitz 1997). To implement this function, a measure of dissimilarity between objects must be obtained. This can be, for instance, determined by calculating Euclidean distances between objects in a multidimensional space. Figure C.13 shows a sketch of average dissimilarity of object j to cluster k (distance between j and k) and average dissimilarity within cluster k. The optimization strategy to determine the absolute minimum of the cost function will employ simulated annealing (Kirkpatrick, Gelatt, and Vecchi 1983; Press et al. n.d.), which follows local gradients, but can move against the gradient in order to escape "local minima" shadowing an absolute minima.

The algorithm described above works well under the assumption that the correct dissimilarity has been determined. For time series objects, in the simplest case, neuronal firing rates can be used as coordinates in a multidimensional space. However, application of this measure is rigid (although it has its own advantages), as it takes into account only local oscillatory properties. Another useful procedure will be the dissimilarity matrix calculation introduced (Schreiber and Schmitz 1997) based on the Grassberger-Procaccia cross-correlation sum (Grassberger and Procaccia 1983).

The classification algorithm given here may be referred to as unsupervised. It is based on the hypothesis of a "good" dissimilarity measure and does not include any optimization. This approach can be upgraded to a supervised training data set, where the correct results of classification are known a priori for a part of data and may be used as a feedback for improvement of computational speed and reliability. However, even after tuning, the algorithm may fail, since brain plasticity may occur. Thus, the possibility of sudden mistakes may be corrected by means of the feedback.

The basic problem here is the nonstationary nature of brain function. This seems at first glance to be a big obstacle for any time series analysis. However, a detailed study of the problem indicates two features: First, all functional states are temporal and have essentially different time scales. For example, being awake can last for hours, while cognition can be as short as tens of milliseconds. Second, we may assume that only a limited number of functional states can coexist. These two features allow building a new adaptive algorithm capable of discriminating, in principle, any possible functional states.

There are three main parameters at play. The first is length of the time window, T; the next is the number of clusters of objects, K, being separated; and the third is a dissimilarity measurement. We can start the process of classification with relatively long T and small K. Thus, fast processes (functional states) would be eliminated due to averaging over a protracted time. Moreover, functional states with intermediate time scales and with a strong influence onto others would be left out due to very rough classification, since we have split patterns into a few clusters only. Then, when a first

Object 7

Object 7

approximation of cluster boundaries is determined and it can reliably detect functional states of the top level, a step down can be taken by decreasing window size T and by including finer functional states (increasing K). Moreover, it is possible to work "within" a functional state of the upper level and reject all non-fitting. Such modification of the algorithm allows scalability and a method of exploration of all possible functional states. One problem here is that the deeper we go into the functional state hierarchy, the heavier the computation needed. However, the main parts of the algorithm can be easily paralleled and hence effectively performed by parallel computers or even by specially designed electronics.

## Post a comment