Genetic Algorithms, Using JGAP, for the Examinations Timetabling Problem

Hello again,

Today I want to test JGAP, so I need a relatively simple optimization problem. The University of Toronto Examination Timetabling Problem (ETP) is an interesting benchmark problem that is fairly simple to implement in code, based on real universities examination problem instances.

The University of Toronto ETP

The University of Toronto benchmark problems have very simple legality and cost evaluation procedures, so it is very attractive for a realistic test of a framework like JGAP. If you are interested in this problem details, Dr. Rong Qu provides the datasets from her website, as well as a survey paper she has co-authored on the subject.

The ETP problem involves students, time periods and examinations. A candidate examination  timetable is valid only if there is no student who has to sit for more than one examination at the same period. In order to achieve period spread, the evaluation function penalizes examinations scheduled at neighboring periods. Thus, for each student who has to sit for two different examinations, a cost is produced based on the time distance: From one to five periods, the cost weights  are 16, 8, 4, 2 and 1 respectively. So, the total cost for every examinations pair is the sum of the common students times the cost weight corresponding to their scheduled time distance. Finally, the resulting cost is normalized (a.k.a. divided) by the total number of students.

Genetic Algorithms

The Genetic Algorithm (GA) is a population-based search procedure, inspired by the natural evolution process. The GA starts with a population of candidate solutions. Each solution is decomposed into elementary features, named genes, while the sequence of the genes is called a chromosome. All genes take values from a predefined set of values. Each of these values is named allele. From this population, pairs of parents are selected with preference to better individuals, based on the evaluation function.These pairs produce offspring by a recombination of their genes, named crossover. A typical crossover operator (single-point crossover) cuts each parent chromosome at random points and recombines these parts to produce two offspring. After the recombination, a mutation operator is applied. Typical mutation is to have a (very low) probability to flip a random gene to a random allele.The reason for the very low probability arises from the fact that a chromosome has many genes and we have a large population: We don’t want mutation to affect many all individuals and at many genes.

Encoding

The first important choice we have when we’re implementing a GA is the chromosome encoding: How does a chromosome look like, or what are the alleles. There are many options here, we can be creative, but we need to keep in mind that the encoding has to satisfy several major goals: The encoding must not favor the search towards a specific direction of the solution space. The encoding should help the algorithm to avoid illegal solutions. Also, the encoding should not make evaluation difficult. Limiting the search space towards several directions by encoding is obviously bad. If you can’t enter the optimal solution area, how can you find near optimal solutions? Illegal solutions is not generally a problem for the GA, since illegality can be penalized in the evaluation function. Also, there are several meta-heuristics that go through illegal space in order to find a good solution – think Path Relinking, which is usually combined with the evolutionary algorithm Scatter Search.

Without making a big deal about it, we’ll go with a typical ETP encoding: Each allele represents a time period and each gene represents an examination. It’s an almost straightforward representation of the timetable while it does avoid obvious illegal situations. I.e. the timetable representation, to have genes representing time units and examinations being part of the allele, would be a bit difficult to grasp in crossovers. If we used the students as genes, then the allele should represent an individual timetable, which again is difficult to grasp. Such complex allele structures are often presented in the literature with interesting results. At this time through, I don’t want to be creative, I just look forward to get into the code and play with JGAP.

Implementation

At last, I thought I wouldn’t reach that part today with all the babbling above :-). JGAP is available from http://jgap.sourceforge.net with plenty of tutorials, documentation and other resources. We need to extract jgap.jar from the archive and add it to our classpath. Since I already have a Java project available that is able to read Toronto datasets I’ll start from there.

Fitness Function

The first thing we have to implement is the fitness function. This is a skeleton class to implement the Fitness function.

public class CarterFitnessFunction extends FitnessFunction {
	private static final long serialVersionUID = -7426146307775442024L;

	@Override
	protected double evaluate(IChromosome arg0) {
		return 0;
	}

}

The ETP is formulated as a minimization problem, i.e. the objective function is actually a cost function, while JGAP needs a fitness function, which is a maximization/benefit function. I’m going to play with the KFU-93 instance with a best in the area of 12.96 – I actually remember a better number in a paper, like 12.89, but it doesn’t bother me at the moment. Generating random schedules easily creates legal schedules at cost more than 50. What I want to stress in this discussion is the following issue: I need to distinguish good from bad solutions and I need to do that in a uniform way. If I do something like fitness = 200 – cost, a not bad working solution at 13.50 would render to a fitness of 186.5, while the known best is 187.04. What’s important, 187.04-186.5=0.54 is equal to 13.50-12.96, but 0.54 in 12.96 is way more important difference than 0.54 in 187.04. So, we loose information and we need somehow to restore the relative evaluations between different timetables. This issue can have large effect in phenomena like selection pressure.

After having mentioned the fitness versus cost issue, I ‘m going to ignore it for a first version and just use the buggy fitness = 200-cost formula. Also, I’m going to have a worst fitness of 200:

fitness = (cost<200.0) ? 200.0 - cost : 200.0

Also, I am going to reward legal solutions by 2*200, so that legal solutions will be far better than illegal ones.

The resulting fitness function class is the following – this is a highly non-optimized reference implementation:

package ga;
import model.Problem;

import org.jgap.FitnessFunction;
import org.jgap.IChromosome;

public class CarterFitnessFunction extends FitnessFunction {
	private static final long serialVersionUID = -7426146307775442024L;
	Problem p;
	double w[] = {0.0, 16.0, 8.0, 4.0, 2.0, 1.0};

	public CarterFitnessFunction(Problem p) {
		this.p = p;
	}

	private double getCost(IChromosome timetable) {
		double cost = 0.0;
		int numExams = p.E;

		for(int exam1=0; exam1<numExams; exam1++) {
			int period1 = ( (Integer)
				timetable.getGene(exam1).getAllele()).intValue();
			for(int exam2=exam1+1; exam2<numExams; exam2++) {
				int period2 = ( (Integer)
					timetable.getGene(exam2).getAllele()).intValue();
				int d = Math.abs(period1 - period2);
				if(d<6)
					cost += w[d]*p.getConflicts(exam1, exam2);
			}
		}
		return cost/p.S;
	}

	private boolean isLegal(IChromosome timetable) {
		int numExams = p.E;

		for(int exam1=0; exam1<numExams; exam1++) {
			int period1 = ( (Integer)
				timetable.getGene(exam1).getAllele()).intValue();
			for(int exam2=exam1+1; exam2<numExams; exam2++) {
				int period2 = ( (Integer)
					timetable.getGene(exam2).getAllele()).intValue();
				if(period1 == period2) {
					if(p.areExamsInConflict(exam1, exam2))
						return false;
				}
			}
		}
		return true;
	}

	@Override
	protected double evaluate(IChromosome timetable) {
		double cost = getCost(timetable);
		double maxFit = 200.0;
		double fitness;

		fitness = (cost<maxFit) ? maxFit - cost : maxFit;
		if(isLegal(timetable))
			fitness += 2*maxFit;
		return fitness;
	}
}

If you look closely, the 200 limit has been changed with a variable maxFit. Since the Problem class is not presented: Problem.getConflicts() returns the number of common students between two examinations, while Problem.getE() returns the total number of examinations in a problem. Problem.areExamsInConflict() returns true if two exams have common students. In order to get various features of the solution, we use the JGAP IChromosome.getGene(). The Gene.getAllele() returns the value of the gene. In our case, the set of possible alleles is an integer value between 0 and the number of periods. That’s why we convert the allele from Object to Integer.

Basic Genetic Algorithm

Now that we have an encoding and a fitness function, we need to instruct JGAP to implement the genetic algorithm. In JGAP, we need to create a configuration. The following code is used to create a Default GA configuration. It seems that the code comments make it easier to follow what is going on there, so please check them.

	// Create a GA configuration and use our fitness function
	Configuration conf = new DefaultConfiguration();
	FitnessFunction fitnessFunction = new CarterFitnessFunction(p);
	conf.setFitnessFunction(fitnessFunction);

	// Create a sample chromosome so that JGAP can create new ones
	Gene[] sampleGenes = new Gene[p.E];
	for (int i = 0; i < p.E; i++)
		sampleGenes[i] = new IntegerGene(conf, 0, p.P - 1);
	Chromosome sampleChromosome = new Chromosome(conf, sampleGenes);

	// Use the sample chromosome in the configuration
	conf.setSampleChromosome(sampleChromosome);

	// Set the population size
	conf.setPopulationSize(100);

	// Initialize the population with random members
	population = Genotype.randomInitialGenotype( conf );

The above code seems pretty straightforward. I avoided to enable mutation at this point. I suppose one would want to study the available JGAP configurations and what options they provide.

Now that we have the GA configuration in place, we need to start evolving. To evolve a single generation and get the best chromosome, we need a code like this:

	population.evolve();
	IChromosome bestsol = population.getFittestChromosome();
	System.out.println("Best chromosome: " + bestsol.getFitnessValue());

Let’s put it all together in a class. Also, we need to create a way to evolve for a number of generations and report progress at intervals:

package ga;

import model.Problem;

import org.jgap.Chromosome;
import org.jgap.Configuration;
import org.jgap.FitnessFunction;
import org.jgap.Gene;
import org.jgap.Genotype;
import org.jgap.IChromosome;
import org.jgap.InvalidConfigurationException;
import org.jgap.impl.DefaultConfiguration;
import org.jgap.impl.IntegerGene;

public class SimpleGaAlg {
	Problem p;
	Genotype population;

	public void create(Problem p) throws InvalidConfigurationException {
		this.p = p;

		// Create a GA configuration and use our fitness function
		Configuration conf = new DefaultConfiguration();
		FitnessFunction fitnessFunction = new CarterFitnessFunction(p);
		conf.setFitnessFunction(fitnessFunction);

		// Create a sample chromosome so that JGAP can create new ones
		Gene[] sampleGenes = new Gene[p.E];
		for (int i = 0; i < p.E; i++)
			sampleGenes[i] = new IntegerGene(conf, 0, p.P - 1);
		Chromosome sampleChromosome = new Chromosome(conf, sampleGenes);

		// Use the sample chromosome in the configuration
		conf.setSampleChromosome(sampleChromosome);

		// Set the population size
		conf.setPopulationSize(100);

		// Initialize the population with random members
		population = Genotype.randomInitialGenotype( conf );
	}

	public void evolve(int numberGens) {
		int step = 10;
		for(int i=0; i<numberGens; i+=step) {
			for(int s=0; s<step; s++)
				population.evolve();
			IChromosome bestsol = population.getFittestChromosome();
			System.out.println("Iteration " + (i+step) + ", Best chromosome: " + bestsol.getFitnessValue());
		}
	}
}

In order to get the algorithm started, we need to add (i.e. in main) the following code:

	SimpleGaAlg ga = new SimpleGaAlg();
	ga.create(problem);
	ga.evolve(10000);

Conclusions

This is a first attempt to implement Genetic Algorithms using JGAP. The presented algorithm did not reach a legal solution after 1000 generations, but either way, I didn’t expect it to reach good, not even legal solutions. Scheduling problems are in general over-constrained and the random recombination implemented in the genetic algorithm can easily get lost into illegal space. Also, various construction heuristics  for the ETP are described in the literature and many are simple graph theory implementations. Such heuristics can provide a good and legal initial population for the GA to start.

As following actions from here, one would want to learn the JGAP options for various GA methods. Also, GA is easy to implement in parallel/distributed manner, so I would also see what does JGAP offer towards cluster, multicore and manycore computing.

Advertisements

Posted on August 22, 2011, in Optimization and tagged , , . Bookmark the permalink. 2 Comments.

  1. thanks a lot for this explanation
    can you sir please tell me where can i find more examples of the implementation of genetic algorithm with jgap
    cordiale salutations

  2. Hello, Have you checked the tutorials on the jgap web site? They have many examples there. http://jgap.sourceforge.net/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: