## Global Optimization Methods [2829

A much more challenging task than local optimization methods is to find the global minimum of a multi-valley energy landscape as shown in Figure 1. Global Optimization problems involving a given cost function (minimization of energy or maximization of entropy) arise in many simulation problems dealing with nano systems. This subject has received a great deal of attention in recent years, mostly due to the rapid increase in computer power. The symbolic picture shown in Figure 1 provides a rather simple two-dimensional example of the global optimization paradigm. In this figure the sphere symbolizes a molecule the energy, E(x,y), of which changes as the coordinates (x,y) change. The global optimization objective here is to locate the coordinates (x,y) for which the molecule's enerngy, E(x,y), has its absolute minimum. Of course, this figure shows only a portion of the total energy domain.

For a multi-dimensional space, like phase space, the number of local minima tends to increase quite rapidly and possibly exponentially depending on complexity of the molecule under consideration.

A number of computational algorithms have been developed recently for global optimization of molecular simulations. Among those algorithms the simulated annealing and the genetic algorithms have found more applications in structural and dynamic simulation optimization of molecular clusters. Below, we will describe briefly these two methods. Figure 1. A two-dimensional example of the global optimization of a molecule's energy E(x,y). The objective here is to locate the coordinates (x,y) for which the molecule's energy has its absolute minimum.

1. Simulated Annealing Method: The physical process of annealing is how a metallurgist reaches experimentally the strongest state of a metal sample by following the following three steps:

(i) The metal sample is put in an oven and it is heated to high temperatures.

(ii) At high temperatures the metal atoms are randomly distributed and move about at high speeds.

(iii) Then the sample is cooled very slowly to its original temperature. As a result the atoms settle into crystalline structures, minimizing their energy and rendering the metal much stronger than before.

The implementation of simulated annealing as a global optimization technique was first proposed by Kirkpatrick et al. . This method has attracted significant attention due to its suitability for those large scale optimization problems in which a desired global minimum is hidden among many local minima.

Simulated annealing distinguishes between different local optima and is analogous to the physical process of annealing a metal described above. Namely, one starts the simulation at a sufficiently high temperature. Then the temperature is gradually lowered during the simulation until it reaches the global-minimum-energy state (like physical annealing). The rate of decrease of temperature must be kept sufficiently slow so that thermal equilibrium is maintained throughout the simulation. As a result, only the state with the global energy minimum is obtained. Otherwise, if the temperature decrease is rapid (like hot metal quenching), the simulation will be trapped in a state of local minimum energy.

The statistical mechanical basis of simulated annealing is the Metropolis algorithm presented in the previous chapter. According to the canonical ensemble the probability distribution of a state with energy Ej is expressed by the Boltzmann distribution,

where p=1/kBT and with the normalization condition, f Pi (T,e)d ei = 1.

In eq. (15), n(e) represents the number of states with energy Ei which is an increasing function of energy. On the other hand, the Boltzmann factor exp(-j8.E;) is an exponentially decreasing function of E. This gives the probability distribution Pi (T, Ei) a well-known bell-shape functional form as shown in Figure 2.

At high temperatures, p=1/kBT is obviously small, and as a result exp(-8.ei) decreases slowly with ei . This causes the probability, pi (T,e) to have a wide and short bell-shape. On the other hand, at low temperatures ^ is large, and exp(-^.ei) decreases rapidly with ei. So, pi (T,e) has a narrow, but tall bell-shape (see Figure 2).

Figure 2. Symbolic variation of the Boltzmann probability distribution function with changes of temperature.

The process of a simulated annealing global optimization of a molecular simulation should possess the following stages:

1. Start the molecular simulation at a rather high temperature to allow the system to rearrange itself from its status quo.

2. Lower the temperature rather slowly to bring the system into a stable (equilibrium) state. Figure 2. Symbolic variation of the Boltzmann probability distribution function with changes of temperature.

3. Repeat the cycle numerous times so that as many conformations as possible are obtained. Then the global optimum conditions can be found by analysing the results of these conformations.

In simulations it does not matter what dynamics is followed to cross the barriers since the system will be given enough time to come to a stable equilibrium.

The computational code to be written for a simulated annealing should consists of a pair of nested DO-loops. The outer loop varies the temperature according to a "cooling schedule" and the inner loop consist of a Metropolis algorithm at the chosen temperature of the outer loop.

The choice of the cooling schedule is quite important in simulated annealing global optimization. There is a variety of simple cooling schedules to choose from. Amoung them, linear cooling schedule,

and algebraic cooling schedule,

are the most common ones used in practice. Generally, the user may choose an appropriate cooling schedule it by trial and error.

One can also use the adaptive simulated annealing method as proposed by Andersen and Gordon  in which the cooling schedule is chosen from the following evolutionary expression for the temperature:

In this equation T is the temperature, Q is time, v is the velocity of annealing (just a constant coefficient controlling the overall cooling rate), C is the heat capacity and £ is the relaxation time,

coming from the second largest eigenvalue of the transition matrix. To construct the transition matrix, one can use a simple model and classify states according to their energy:

1. First, some energy range is defined within which the total energy of the system is fluctuating.

2. This energy interval is then divided into M equal parts (M could be of the order of 100 or even more). Each interval corresponds to one state meaning that if the energy of the system is between E and Ei+1, we say the system is in the state i (i=1,M).

3. Then the system is allowed to evolve at the temperature T according to some dynamics, be it Langevin dynamics or a Markov process defined by the Metropolis algorithm, or even Newtonian dynamics.

4. Then during some time steps P, say of the order of P= 1000 (this really depends on the size of the system and the complexity of the energy landscape; larger or more complex systems require a larger number of steps), we record the energy of the system, and will increment the matrix element (i,j) of the transition matrix by one unit every time the system goes from state i to state j.

5. If the energy remains in the same interval j, the (j,j) element must of course be incremented by one.

6. At this stage, the elements of each column are rescaled so that their sum is equal to unity. This is the sum rule for the transition matrix as we mentioned in the previous chapter on MC.

7. One now needs to diagonalize this matrix. Its highest eigenvalue must be one, and the corresponding eigenvector contains the equilibrium probabilities for the system to be in each of the M states. From the second eigenvalue one obtains the relaxation time,

The heat capacity may be found from the fluctuations in the energy during these P steps run:

8. Now that all the elements of the cooling equation are known, one can update the temperature according to: T = T - vT/ e4C and rerun the dynamics for another P steps, collect the statistics to calculate C and the new transition matrix, and then recalculate the temperature and so on, until it reaches a desired (low) value where the energy fluctuations are below some tolerance level.

2. Genetic Algorithm: Genetic Algorithm (GA) [35,36] is based on concepts coming from genetics and the "survival of the fittest' idea; being fit meaning having a lower energy. Genes represent the identity of the individuals. In molecular simulation cases, individuals are clusters and their genes are their coordinates. In GA, individuals are mated among themselves, their genes may also mutate, and this way, one obtains a new generation of individuals selected based on their fitness (potential energy).

The mating, also called the crossover operation, consists of taking two individuals at random, combining half the genes of one with half the genes of the other to obtain a new individual. Note that there are many ways to perform this operation, and one may even produce from a single pair, a next generation of many more individuals which are formed from taking many combinations of the genes of their parents. A priori, if we start with N individuals in generation P, one may end up after mating, with say, 20 N individuals in the next generation. One can eventually perform mutations on them. Mutations consist in randomly modifying part of their genes. In our context, mutation means displacing one or more atoms in a cluster at random. Note that mutation is a local operation, whereas mating or crossover is non-local.

Now from the 20 N obtained individuals, one can first perform a quick structure optimization, and then sort their energies and pick the N lowest energy individuals as survivors for the next generation. One must, however, be careful to choose candidates of different geometrical forms so as to keep diversity. It would be completely useless to keep N clusters of the same shape for the next generation, even though their energies were the lowest. Now, one can iterate this process and perform mating, mutation, relaxation and final selection based on shape diversity to obtain a new generation and so on... One is hopeful after few hundred or thousand generations (again this depends on the cluster size and complexity of the problem) the fittest individuals will be found. There is however, no guarantee of the convergence of the process.

In SA or GA, one is just sure to have found a good minimum if not the absolute one.

The choice of genes, mating and mutations in GA and the dynamics in SA depend on the problem and can be tailored to the specific problem at hand, just like sampling in Monte Carlo which is not unique to the Metropolis algorithm.

0 0