DESARROLLO DE ALGORITMOS METAHEURÍSTICOS PARA … · tamaño de lote o del trabajo en particular....

UNIVERSIDAD PONTIFICIA COMILLAS

ESCUELA TÉCNICA SUPERIOR DE INGENIERÍA (ICAI)

INGENIERÍA SUPERIOR INDUSTRIAL

PROYECTO FIN DE CARRERA

DESARROLLO DE ALGORITMOS

METAHEURÍSTICOS PARA LA

PLANIFICACIÓN DE UN

TALLER FLEXIBLE

SARA LUMBRERAS SANCHO

MADRID, junio de 2006

Autorizada la entrega del proyecto al alumno:

Sara Lumbreras Sancho

EL DIRECTOR DEL PROYECTO

Ángel Sarabia Viejo

Fdo: Fecha:

Vº Bº del Coordinador de Proyectos

Claudia Meseguer Velasco

Fdo: Fecha:

“The general jobshop problem is a fascinating challenge. It is easy to pose and visualise,

but extremely difficult to solve. The problem keeps attracting researchers that cannot

believe that a so simply structured problem can be so difficult until they try”

Richard W. Conway, et al. (1967)

Agradecimientos

A todos los que me han ayudado durante esta larga etapa de estudiante. A mis padres, a

mi hermana Esther y a los profesores y directores – especialmente algunos de ellos – que

he tenido desde que era una niña hasta ahora. A todos los que me han enseñado y me

siguen enseñando a aprender, a trabajar, a vivir.

También me gustaría agradecer algunos favores concretos que han apoyado este

proyecto en particular. De nuevo, a Esther, por sus maravillosas ilustraciones. A

Paloma, por sus ‘clases particulares’ de biología. A Josemaría, por escuchar atentamente

cada progreso.

Y, sobre todo, a Ángel. Me has hecho disfrutar como una enana.

Resumen v

Resumen

Los métodos clásicos de optimización no resultan adecuados para los problemas

NP-completos de la planificación, para los que consumen tiempos de cálculo en

muchas ocasiones inaceptables. En estos casos es conveniente aplicar técnicas

heurísticas, en las que se renuncia al óptimo global a cambio de obtener soluciones

muy satisfactorias en tiempos asequibles .La secuenciación de tareas en un taller es una

de las manifestaciones más interesantes de esta realidad, tanto por la diversidad de

entornos en los que se presenta como por su complejidad, que ha motivado la

aparición de una extensa literatura.

El caso en estudio es inusitadamente general, abarcando la mayoría de las posibles

complejidades de los modelos deterministas. El taller es de tipo flexible, lo que implica

que una operación puede realizarse en varias máquinas distintas, por supuesto con

diferentes tiempos de ejecución. Además, se han considerado tiempos de cambio

dependientes de la secuencia y tiempos de transporte que pueden depender del

tamaño de lote o del trabajo en particular.

El objetivo de este trabajo es la resolución del problema del taller en esta versión

general, mediante el desarrollo y estudio de nuevos algoritmos que sean capaces de

resolver el problema en tiempos que queden muy por debajo de los empleados para su

resolución mediante métodos clásicos.

Únicamente con el fin de comparar el comportamiento de los algoritmos

desarrollados con el resultado de las técnicas tradicionales, se ha construido un modelo

de programación lineal. Sin embargo, el núcleo de los objetivos fue el desarrollo de los

heurísticos y su implementación.

Para ello, se ha implementado una codificación basada en listas de trabajos. Si bien

la idea fundamental resultó no ser original (es sugerida aunque no empleada en

[MESOUGNI 04], aunque el autor no pareció no ser consciente de sus ventajas), sí lo

son la implementación y la generalización a doble lista para posibilitar su aplicación al

problema en su versión flexible. Este sistema presenta la considerable ventaja de poder

codificar únicamente soluciones factibles, lo cual redunda en algoritmos más sencillos

y eficientes.

Resumen vi

Se ha desarrollado un algoritmo de recocido simulado que difiere de las

implementaciones convencionales de este paradigma en su estructura más básica, que

como novedad se articula en dos pasos: cada iteración realiza inicialmente una

búsqueda de mejora en el entorno de la última solución aceptada. Sólo si esta primera

etapa no ha devuelto ningún resultado satisfactorio se habilita después la posibilidad

de saltos termodinámicos –la base del recocido-. Además, se proponen una definición

de vecindario especialmente ventajosa y dos nuevas reglas de enfriamiento – que

constituyen una parte fundamental de este tipo de algoritmos- , el Plan de Convección y

una Regla Dinámica, basadas en las aplicaciones industriales del recocido y en el control

dinámico de la probabilidad de salto respectivamente. Ambos planes destacan por su

eficiencia y robustez.

Se creó asimismo un genético, para el que se han desarrollado versiones de los

operadores genéticos convencionales (generación, selección, cruce, mutación o

inmigración) que aprovechan las ventajas de la codificación empleada. Además, se

introduce un nuevo mecanismo que aumentan la eficiencia mediante una original

imitación de los procesos de reproducción asexual en algunos insectos y que ha sido

bautizado como partenogénesis.

Se han diseñado tres híbridos que combinan a los anteriores con el objetivo de

mejorar su comportamiento:

Un híbrido secuencial en familia, que se caracteriza por la aplicación del recocido

simulado a un conjunto de soluciones que se seleccionan como las más

prometedoras de la etapa anterior.

Un híbrido al que se ha bautizado como no-darwiniano porque desarrolla la idea

de que las mutaciones no son en realidad aleatorias sino que avanzan en

direcciones de mejora.

Un algoritmo compuesto que combina el no-darwiniano y el original de manera

secuencial, consiguiendo mejorar en eficiencia al resto de algoritmos

desarrollados.

Además, se ha incluido una generalización adicional que permite trabajar con tiempos

aleatorios mediante la definición de una función de distribución coherente,

posibilitando así la resolución de un taller estocástico.

Resumen vii

La posibilidad de fallo de alguna de las máquinas está también considerada,

permitiendo la obtención de soluciones robustas.

Se aplica la Teoría de de las Limitaciones a la solución para obtener datos de apoyo

para la toma de decisiones de gestión. La herramienta diseñada identifica cuellos de

botella para la secuencia de tareas a realizar, y analiza la evolución temporal del trabajo

en curso.

Finalmente, la bondad de la solución obtenida es evaluada comparándola con una

cota obtenida mediante un heurístico propuesto y con un valor calculado a partir del

ajuste de una distribución no paramétrica (lo cual supone otra aportación original de

este proyecto) a los resultados de la evolución de la función objetivo.

Summary viii

Summary

Classical optimization methods are not suitable to solve NP-complex scheduling

problems, for which they take computation times often unacceptable. In these cases, it

is advisable to resort to heuristic techniques, which sacrifice global optimality in order

to provide very satisfactory solutions in affordable times.

Jobshop scheduling is one of the most interesting manifestations of this fact, not

only because of the diversity of situations where it arises but also because of its

complexity, which has motivated extensive research. The studied case is unusually

general, contemplating the majority of the possible complexities of deterministic

models. The shop is flexible, which implies that each operation can be performed on

several different machines, of course with dissimilar processing times. Set up times

dependent on sequence, as well as transportation times that can vary for different

batch sizes, have been considered.

The objective of this project is to solve the jobshop problem in this general version,

developing new algorithms that are capable of solving the problem in computation

times much better than the ones shown by classical methods.

With the single objective of comparing the performances of these algorithms and the

traditional approaches, a linear programming model was built. Nevertheless, the core

of the objectives was the development and implementation of the metaheuristics.

With this aim, a codification system based in job lists was implemented. The

fundamental idea turned out to be not original (it is suggested but not used in

[MESOUGHNI 04], where its author does not to seem to be aware of the advantages of

this codification). Nevertheless, both its implementation and its generalization to a

double list to enable its application to the flexible version of the problem are an

innovation. This system shows the considerable advantage of encoding only feasible

solutions, which results in more simple and efficient algorithms.

A Simulated Annealing algorithm was developed, which differs from the

conventional implementation of this paradigm in its fundamental structure,

innovatively composed of two steps. Each iteration performs a search of better

solutions in the neighbourhood of the latest accepted solution. Only if this search does

Summary ix

not return any satisfactory result, the possibility of thermodynamical jumps–Simulated

Annealing base-is enabled. In addition, a neighbourhood definition, specially

advantageous, is proposed. Two innovative cooling plans –which are essential to this

algorithms- are also introduced. Named Convection plan and Dynamical rule, they are

based respectively on the industrial applications of annealing and on controlling

dynamically jump probabilities. Both have shown remarkable efficiency and

robustness.

A Genetic Algorithm was also created, for which new versions of the conventional

genetic operators (generation, selection, crossover, mutation and immigration) were

developed, with the aim of taking advantage of the codification system. In addition, a

new efficiency-enhancing mechanism inspired by asexual reproduction processes in

some social insects that has been named parthenogenesis is introduced.

Three hybrids that combine the latest algorithms with the objective of improving

their properties were designed:

A sequential hybrid, which has the peculiarity of applying Simulated Annealing

to a family of solutions that are selected as the most promising ones of the

Genetic step of the algorithm.

A non-darwinian version, that assumes that mutations are not random changes

but are directed towards improvement directions.

A compound algorithm, that combines the non-darwinian and the original genetic

to achieve the best performance results among the developed algorithms.

Furthermore, an additional generalization enables the algorithm to work with

random time values from a consistent distribution function, solving therefore a

Stochastic Jobshop.

Machine failure is considered too, enabling the models to achieve robust solutions.

The Theory of Constraints is applied to the solutions to analyse the shop

configuration and support management decisions. The developed tool identifies

bottlenecks in the sequence and analyses the temporal evolution of work in process.

Finally, the goodness of the provided solutions is evaluated by comparing it to a

bound found through a proposed heuristic and to a value that is obtained from a non-

Summary x

parametrical distribution fit to the algorithm performance data (which constitutes

another original contribution in this project).

Index xi

Index

1 MOTIVATION ...................................................................................................................................... 2

1.1 Scheduling or how to avoid missing Saturday’s golf game......................... 2

1.2 The Job Shop Problem (JSP) .............................................................................. 4

1.2.1 Introducing the problem 4 1.2.2 JSP Taxonomy 5

1.3 A little of history. A complex problem............................................................ 6

1.4 A Futile Search. Review of the Classical Approaches to the JSP............... 10

1.4.1 Branch & Bound 10 1.4.2 Dynamic Programming 10 1.4.3 Trapped in a Dead End? Project Aims 11

2 QUOD NATURA NON SUNT TURPIA. HEURISTIC METHODS FROM A

HOLISTIC POINT OF VIEW............................................................................................................ 13

2.1 Nihil novi sub sole. Basic mechanisms underlying heuristic optimization

methods. ............................................................................................................ 14

2.1.1 Global exploration 15 2.1.2 Combination 15 2.1.3 Selection 15 2.1.4 Neighbour search 15

2.2 Mechanisms of Evolution................................................................................ 16

2.2.1 Genetic Algorithms (G.A.) 16 2.2.1.1 Individual 16 2.2.1.2 Chromosome, gene and allele 17 2.2.1.3 Schema 17 2.2.1.4 Selection 18 2.2.1.5 Crossover 18 2.2.1.6 Mutation 19 2.2.1.7 The result 20 2.2.1.8 To Sumarize 20 2.2.1.9 Range of possibilities 20 2.2.1.10 Genetic Algorithms and the Jobshop problem. Recent works 21

2.2.2 The bright of a diamond. Simulated Annealing (S.A.) 21 2.2.2.1 A thermodynamical basis 22 2.2.2.2 Simulated Annealing and Markov Chains 24 2.2.2.3 The cooling plan 25 2.2.2.4 Simulated Annealing and the Jobshop problem. Recent works 25

Index xii

2.2.3 Other strikes of inspiration 26 2.2.4 Elitist ants, geniuses, and growing up too fast 27

2.3 Breakdown of some optimization techniques.............................................. 27

2.3.1 Taboo search 28 2.3.2 Simulated Annealing 28 2.3.3 Genetic Algorithms 28 2.3.4 Ant colony optimization 28 2.3.5 Immune Systems Optimization 28 2.3.6 The four elements applied to other classical optimization methods 29

2.4 Conclusions ....................................................................................................... 29

3 JUST TO COMPARE. A LINEAR PROGRAMMING MODEL FOR THE FLEXIBLE

JOBSHOP PROBLEM......................................................................................................................... 31

3.1 A more specific definition of the problem .................................................... 31

3.1.1 Considerations 31

3.2 Formulation ....................................................................................................... 32

3.2.1 Parameters 32 3.2.2 Variables 32 3.2.3 Equations 33

3.3 Linearizing the model...................................................................................... 34

3.4 Numerical Results ............................................................................................ 35

3.5 Conclusions ....................................................................................................... 37

4 DESIGNING A CODIFICATION. IMPORTANCE OF THE BASIS. ..................................... 39 4.1.1 The hat and the rabbit: making things appear and disappear 39 4.1.2 Simplifying the calculation 40 4.1.3 Operators design 41 4.1.4 Method selection 41 4.1.5 Conclusions 41

4.2 The codification used: two lists for a schedule ............................................ 41

4.2.1 Introduction: leaving formalities behind 41

4.3 The first idea is not the best ............................................................................ 42

4.4 Calling our magician. Introducing the codification used........................... 43

4.5 Our magician’s communication problems ................................................... 46

4.6 Conclusions ....................................................................................................... 47

5 DATA GENERATION. SOME PRACTICAL ISSUES................................................................ 49

5.1 Basic parameters ............................................................................................... 49

Index xiii

5.2 Time related parameters.................................................................................. 49

5.3 Complexities of a generalised jobshop.......................................................... 50

5.4 Generating random jobshop data .................................................................. 50

5.4.1 Processing time 50 5.4.2 Change, transportation and Start-up times 51

5.5 Making this structure useful to work with real data .................................. 51

6 BRIEF CONSIDERATIONS ABOUT JOBSHOP PROBLEM COMPLEXITY....................... 53

6.1 Introduction....................................................................................................... 53

6.2 Linear programming model ........................................................................... 53

6.3 Heuristic algorithm complexity ..................................................................... 54

6.4 Compared problem complexity. Conclusions ............................................. 55

7 A SIMULATED ANNEALING ALGORITHM APPLIED TO THE JOBSHOP

PROBLEM ............................................................................................................................................. 58

7.1 The basis. Looking for a good neighbour first ............................................. 58

7.2 Generating a starting point ............................................................................. 61

7.3 The importance of being neighbours............................................................. 61

7.3.1 Machine list minimum movement 62 7.3.2 Job list movement 63 7.3.3 Combining both movements 64 7.3.4 The key factor 64 7.3.5 Advantages of the neighbour creating system compared to classical alternatives 65

7.4 Too many doors to knock................................................................................ 67

7.5 Thermodynamical jumps. Choosing a cooling plan ................................... 69

7.5.1 Cooling plans state of the art 69 7.5.2 Industrial considerations. Interesting ideas about the causes of the success and

failure of the described cooling plans 70 7.5.3 Fixing the plan defining parameters 72 7.5.4 Dynamic cooling plan. A different proposal 73

7.6 Computation time vs. Problem Size .............................................................. 75

7.7 Conclusions ....................................................................................................... 77

8 A GENETIC ALGORITHM FOR THE JOBSHOP PROBLEM ................................................. 80

8.1 Creating the first generation ........................................................................... 80

8.1.1 Originating individuals 80 8.1.2 Diversity is the key 81

Index xiv

8.2 Survival of the fittest. ....................................................................................... 83

8.2.1 Natural selection 83 8.2.2 Never taking a step back 84

8.3 Mating season ................................................................................................... 85

8.3.1 Finding a partner 85 8.3.2 Breeding process 85

8.3.2.1 Machine list breeding 85 8.3.2.2 Job list breeding 86

8.4 Exploration enhancing mechanisms.............................................................. 88

8.4.1 Mutation 88 8.4.1.1 Mutation mechanism 88 8.4.1.2 Mutation probability 91

8.4.2 Immigration 92 8.4.2.1 Immigration rate 92 8.4.2.2 Immigration mechanism 94

8.5 Exploitation enhancing mechanisms ............................................................. 94

8.5.1 Introducing a new concept. Asexual reproduction 94 8.5.1.1 Social insects and evolution 94 8.5.1.2 Introducing an asexual system 95

8.1 Computation time vs. Problem size............................................................... 97

8.2 Conclusions ....................................................................................................... 99

9 TURNING THE MIXER ON. HYBRIDS...................................................................................... 101

9.1 First one, then another ................................................................................... 101

9.1.1 When is it time to make the switch? 101 9.1.2 A family of solutions 102

9.2 Non-darwinian G.A. ...................................................................................... 105

9.3 Conclusions ..................................................................................................... 108

10 ANTICIPATING MACHINE FAILURE. AN APPROACH TO THE ROBUST

JOBSHOP ............................................................................................................................................ 110

10.1 Introduction.................................................................................................... 110

10.2 Not so many combinations .......................................................................... 110

10.3 Concentrating on what is relevant.............................................................. 112

10.4 Considering a real problem ......................................................................... 114

10.5 A simple example.......................................................................................... 114

11 INTRODUCING RANDOM TIME VALUES. AN APPROACH TO THE

STOCHASTIC JOBSHOP PROBLEM .......................................................................................... 118

Index xv

11.1 Randomness inside the Jobshop ................................................................. 118

11.2 Adapting the algorithm................................................................................ 121

12 FURTHER SUGGESTIONS TO THE JOBSHOP RESPONSIBLE. APPLYING T.O.C. .... 123

12.1 Introduction to the Theory of Constraints................................................. 123

12.2 Bottleneck concept applied to Jobshop production ................................. 125

12.2.1 The main difference 125 12.2.2 Traces of a bottleneck 125 12.2.3 T.O.C. Study 125

13 CAN YOU POSSIBLY KNOW HOW FAR YOU ARE FROM YOUR GOAL, IF YOU

DO NOT EVEN KNOW WHERE IT IS? ...................................................................................... 131

13.1 An obvious bound. True relaxation............................................................ 132

13.2 Can you estimate it?...................................................................................... 132

13.2.1 Introducing the problems of the data to fit 132 13.2.2 No parameters? A different kind of density distribution 135 13.2.3 Applying non-parametric density function estimation to the data provided by the

algorithms 137

13.3 Conclusions .................................................................................................... 138

14 NEXT STEPS....................................................................................................................................... 140

REFERENCES........................................................................................................................................... 143

LIST OF ABBREVIATIONS USED..................................................................................................... 148

A GENETIC EXPRESSION MODELS. AN INTRODUCTION.................................................. 151

B ARTICLE PRESENTED TO C.I.O. 2006 ....................................................................................... 154

C DATA FOR THE LP EXAMPLE ..................................................................................................... 170

Index of Figures xvi

Index of Figures

Figure 1 Example of disjunctive graph..................................................................................................... 5 Figure 2. Computational complexity of the Travelling Salesman Problem ....................................... 9 Figure 3. Example Gantt diagram ........................................................................................................... 45 Figure 4: A Gantt diagram is understandable by mere inspection.................................................... 47 Figure 5. LP model complexity ................................................................................................................ 54 Figure 6. Heuristic model complexity..................................................................................................... 55 Figure 7. Compared complexity .............................................................................................................. 56 Figure 8. Representation of a certain solution....................................................................................... 62 Figure 9. The same solution after a machine list movement .............................................................. 62 Figure 10. Initial solution .......................................................................................................................... 63 Figure 11. The same solution after a job list movement ...................................................................... 64 Figure 12. Neighbourhood size in linear scale ...................................................................................... 68 Figure 13. Neighbourhood size in logarithmic scale............................................................................ 68 Figure 14. Temperature evolution followed by natural convection processes................................ 71 Figure 15. Comparation of the first order transitory and the geometrical rule............................... 72 Figure 16. Algorithm performance with different cooling plans....................................................... 74 Figure 17. Temperature evolution with different cooling plans ........................................................ 74 Figure 18. Computation time/Iteration vs. Problem size ................................................................... 76 Figure 19. Curve fitting to Computation time per iteration vs. Problem size ................................. 77 Figure 20 Computation time vs. Population size.................................................................................. 82 Figure 21. Performance vs. Population size........................................................................................... 83 Figure 22. Initial solution .......................................................................................................................... 89 Figure 23. Solution after a machine list mutation................................................................................. 89 Figure 24. Initial solution ......................................................................................................................... 90 Figure 25. Solution after a job list mutation........................................................................................... 91 Figure 26. Performance for several mutation rates............................................................................... 92 Figure 27. Performance for several immigration rates........................................................................ 93 Figure 28. Computation time vs. immigration rate.............................................................................. 93 Figure 29. Performance for several parthenogenesis rates.................................................................. 96 Figure 30. Computation time vs. Parthenogenesis rate ....................................................................... 97 Figure 31. Computation time vs. Problem size ..................................................................................... 97 Figure 32. Curve fitting to Computation time vs. Problem Size ........................................................ 98 Figure 33. Hybrid performance with 4-member familiy ................................................................... 103 Figure 34. Hybrid performance with 6-member familiy ................................................................... 103 Figure 35. Hybrid performance with 12-member familiy ................................................................. 104

Index of Figures xvii

Figure 36. Performance comparison between the first hybrid and the original G.A. .................. 104 Figure 37. Non-darwinian mutation algorithm compared to the conventional G.A. .................. 106 Figure 38. Compared performance between the non-darwinian mutation algorithm and the

original G.A.................................................................................................................................... 106 Figure 39. Performance comparison between the conventional G.A., the non-darwinian

hybrid and the compound version ............................................................................................ 107 Figure 40. Performance comparison between the original,the non-darwinian and the

compound version........................................................................................................................ 107 Figure 41. Number of posible failures vs. problem size .................................................................... 111 Figure 42. Pareto distribution................................................................................................................. 113 Figure 43. Resulting solution.................................................................................................................. 115 Figure 44. Resulting schedule after the considered failure ............................................................... 116 Figure 45. Random component from a Weibull distribution ........................................................... 120 Figure 46. Resulting distribution in a specific example..................................................................... 120 Figure 47. Gantt diagram for the optimal schedule obtained........................................................... 126 Figure 48. W.I.P. evolution for that solution........................................................................................ 127 Figure 49. Gantt diagram for the solution............................................................................................ 127 Figure 50. Corresponding W.I.P. evolution ......................................................................................... 128 Figure 51. Example of density function................................................................................................ 133 Figure 52. Normal fit................................................................................................................................ 133 Figure 53. Different widely used distributions applied to the data ................................................ 134 Figure 54. Examples of bins with different sizes ................................................................................ 135 Figure 55. Example of non parametric fit............................................................................................. 137

Index of Tables xviii

Index of Tables

Table 1. Example of single-point crossover ........................................................................................... 19 Table 2. Example of two-point crossover ............................................................................................... 19 Table 3. Numerical values for processing time in the example.......................................................... 36 Table 4. Optimal schedule for the example ........................................................................................... 36 Table 5. Optimal schedule for the second example.............................................................................. 37 Table 6. Example of schedule oriented to machines ............................................................................ 42 Table 7. Circular reference in a schedule oriented to machines......................................................... 43 Table 8. Table specifying the machines that will perform each operation ....................................... 44 Table 9. Example of codification based on machine schedules.......................................................... 66 Table 10. Neighbour solution ................................................................................................................... 67 Table 11. Results of the curve fitting to computation time per iteration .......................................... 76 Table 12. Parent 1........................................................................................................................................ 85 Table 13. Parent 2........................................................................................................................................ 86 Table 14. Child 1 ......................................................................................................................................... 86 Table 15. Child 2 ......................................................................................................................................... 86 Table 16. Resuts of the different fits ........................................................................................................ 98 Table 17. Example of performance possibilities for each operation ................................................ 115

1The Jobshop Problem.

Motivation.

Equation Chapter (Next) Section 1

The Jobshop problem. Motivation 2

1 Motivation

1.1 Scheduling or how to avoid missing Saturday’s golf game

Johnson, pioneer in studying

scheduling techniques, was said

to have started his research on

the two-machine flowshop (the

problem of scheduling different

jobs that have to be processed by

two different machines in the

same order) after encountering

an everyday problem. He

wished to play golf one Saturday but, as a single man, had to do a week’s laundry

before leaving for his t. time. Due to wet weather, the different batches to be washed

had to go through the washing machine first and then through the dryer. This forms a

two-machine flowshop. His goal was to finish the laundry chores as soon as possible,

so he wondered, ‘How can I order these batches so that the time I spend washing and

drying will be minimized?’ According to the story, this was the first time in which the

minimum makespan criterion was formulated formally (Nevertheless, the concept is as

old as mankind). The anecdote finishes assuring that he became so intrigued by this

problem that he forgot about both the laundry and the golf match and spent the rest of

the weekend working on the now famous Johnson algorithm.

Scheduling is one of the vital problems in both the industrial environment and

everyday living. It is defined as the process of allocating one or more scarce resources

to different activities over a period of time. One or more criteria are established as the

goal(s) for the schedule. For instance, one may choose to minimize makespan (the time

in which all the work will be finished, as Johnson wanted to do), or to minimize the

maximum delay (if we consider it will be penalized in some way). There is a wide

range of possible optimisation criteria that will be explored in the next sections.

Now a quick tour around some different functional areas of a company will show

the presence of scheduling in them:


Production department needs to know the real sequence of products to be

manufactured. Thus, staff has to be aware of the schedule resulting from the

Master Production Plan.

Accounting department takes into account this production schedule and all the

due dates to predict the revenues and costs the company will incur, as well as

update financial reports and make cash flow estimates.

Marketing department will determine, for example, if punctuality is one of the

strong or weak points of the company’s organization. Knowing the processing

time of each product in each moment can allow them to adjust delivery dates

offered to clients.

Purchases department can make a more accurate prevision of the material

requirements and so reduce the cost of raw materials inventories.

As stated in [FRENCH 1990], a compilation of the different strategies to solve

scheduling problems, ‘Any manufacturing firm not engaged in mass production of a single

item will have scheduling problems similar to that of the job shop. Each product will have its

own route through the various work areas and machines of the factory. In the clothing industry

different styles have different requirements in cutting, sewing, pressing and packaging. In steel

mills each size of rod or girder passes through the set of rollers in its own particular order and

with its own particular temperature and pressure settings.[…] The objective in scheduling will

vary from firm to firm and often from day to day. Perhaps the aim would be to maintain an

equal level of activity in all departments so that expensive machines are seldom idle. Perhaps it

would be to achieve certain contractual target dates or simply to finish all the work as soon as

possible.’

Scheduling seems to be a problem that concerns every area of a company and has

been subject to extensive research for over fifty years. Although the aim of this project

will be limited to manufacturing scheduling (and will thus drop such interesting topics

as inventory management) the concepts used can easily be generalised to any situation

with several tasks to be accomplished abiding limitations on resources. Adopting

manufacturing terminology, a job consists of one or more activities, and a machine is a

resource that can perform at most one activity at a time.


1.2 The Job Shop Problem (JSP)

1.2.1 Introducing the problem

It was formulated for the first time by Coffman in 1976 [COFFMAN 1976], although

more constrained situations had already been object of research. Informally, the

problem could be described as follows. We are given a set of jobs and a set of

machines. Each job consists of a chain of operations, each one of which needs to be

processed during a certain amount of time on a given machine. This duration can be

deterministic, in which case we will face a Deterministic JSP or follow a given

probability distribution in the case of a Stochastic JSP. One of the most generally

accepted assumptions is that each machine can only process one operation at a time.

This is not necessarily the case. Nevertheless, a JSP with multitask machines remains

practically unexplored. This constraint will thus be assumed in this project.

Sometimes it is useful to represent the problem by Roy and Sussman’s ‘Disjunctive

graph model’ (1964). The problem is determined by three different sets: n , m and o .

These contain the n jobs, m machines and o operations respectively defined by the

problem. The machine in which each operation must be performed is represented

as oM . The precedence constraints for the operations of the same job are

represented u v→ . The graph ( , , )G O A E is described as follows:

0, 1N= +O o∪ , where 0 and 1+N are dumb operations with null processing

time meaning the start and the end of the schedule. The weight of each vertex is

given by the processing time omt .

( , ) / , , (0, ) / , :

( , 1) / , :

A u w u v o u w U w w o v o v w

v N v o w o v w

= ∈ → ∉ ∃ ∈ →

+ ∈ ∃ ∉ → (1-1)

This means that A contains arcs that connect all the operations of the same job, as

well as arcs from the dumb operation ‘start’ to the first operation of each job and from

the last one to the dumb operation ‘end’.

, / o vE o v M M= = (1-2)

Thus, edges in E connect operations to be processed by the same machine.


0

321

654 10

987

00

332211

665544 1010

998877

Figure 1 Example of disjunctive graph

Figure 1 shows the disjunctive graph for a jobshop with 3 machines and 3 jobs.

Operations 1, 5 and 9 correspond to machine 1 , operations 2, 4 and 8 by machine 2 and

operations 3, 6 and 7 by machine 3. 0 and 10 are the fictitious initial and final

operations. Dotted lines represent edges in E and arrows denote arcs in A .

If the aim of the jobshop is minimizing makespan, the objective value of each

feasible solution (given by the combination of the two different ways in which each

disjunction –edge in E - can be settled) is the length of the longest path in the graph.

1.2.2 JSP Taxonomy

As it was presented before, the nature of the processing times data could split JSP

into deterministic and stochastic JSP.

Production systems can be classified according to machine environment. When each

job requires only one operation this is known as single-stage and they can involve

either a single machine, or m machines operating in parallel. In the latter case, each

machine has the same function. We can consider three different classes of parallel

machines. In the case of identical parallel machines each processing time is

independent of the machine performing the job. Uniform parallel machines operate at

different speeds but are otherwise identical. When working with unrelated parallel

machines the processing times depend not only on the machine selected but also on its

schedule.


There are three main types of multi-stage systems. In a flowshop with s stages,

each job is processed through the stages 1 to s in that pre-determined order, which is

the same for all the jobs. In an openshop, the processing of each job also goes once

through each stage, but the sequence of stages through which a job must pass can

differ between jobs and forms part of the decision process. In a jobshop, the case that

will be studied in this project, each job has a prescribed routing through the stages, but

the routing may differ from job to job. A multiprocessor variant of multi-stage systems

can also be introduced, where each stage comprises several parallel machines.

Many other specific formulations appear in research. Some representative cases will

now be introduced.

Robust jobshops were developed in an attempt to take into consideration that the

assumption that the machines will all always be available to process the scheduled jobs

is just an idealization. As stated in [JENSEN 1999], ‘A quality solution which can be

modified easily according to changes in the environment may be more valuable than an optimal

solution that does not allow easy modifications’.

When both serial and parallel operations appear, we will face an assembly jobshop.

Dispatching rules for this kind of models were researched in [REEJA 2000].

Additional constraints could also be considered. For example, a storage limitation

will lead to buffer-constrained jobshops [HOLTHAUS 2002].

Finally, flexible jobshops are multi-stage, multiprocessor production systems, that

is, jobshops with several different machines available to perform each manufacturing

stage. This will be the particular situation surveyed in this project.

1.3 A little of history. A complex problem

Much of the early work on scheduling had the aim of finding out rules that lead to

optimal schedules for some simple jobshop models. One of the first examples was

Johnson’s paper on the two machine flowshop [JOHNSON 1954]. According to

Conway, Maxwell and Miller, ‘This paper is important not only for its own content but also

for its influence. […] Johnson accomplishment is not so much the proposition of an algorithm as

its is the offering of a proof that the obvious algorithm is in fact optimal’. Subsequent works

studied problems of scheduling a single machine to minimize the maximum lateness of

the jobs for which an optimal solution is obtained by sequencing the jobs according to


the earliest due date (EDD) rule of Jackson [JACKSON 1955]. And of scheduling a

single machine to minimize the sum of weighted completion times of the jobs for

which an optimal solution is obtained by sequencing the jobs according to the shortest

weighted processing time (SWPT) rule of Smith [SMITH 1956]. These simple problems

do not often appear in practice. However, the analysis of these simplistic models

provides valuable insights that enable more complex systems to be scheduled.

Furthermore, the performance of many production systems is often determined by the

quality of the schedules for a bottleneck machine. In these cases, simple models can be

useful for scheduling such a machine.

Since then, papers have addressed several topics. For an extensive review of

research, this paper refers to [DUDEK 1992]. It gives an exhaustive and well-organised

overview of the research conducted from the 1950’s to the early 1990’s.

When trying to get a general perspective of all that work, it appears as the quest for

optimality in minimizing makespan for the n -job m -machine flowshop prompted as a

natural continuation of Johnson’s paper. The problem raised interest already in the

1950’s with Akers’ graphical approach [AKERS 1956] and the search for algorithms for

such a problem began in the early 1960’s and resulted in the work of Dudek and

Teuton (1964) [DUDEK 1964] and Smith and Dudek (1967) [SMITH 1967], who used a

combinatorial optimization approach. Meanwhile, the branch-and-bound analysts

published applications of that approach for the flowshop problem. The first papers

were by Ignali (1965) [IGNALI 1965] and Lomniki (1965) [LOMNICKI 1965].

Additional papers on optimization were published; however, no clearly original,

innovative and effective results appeared.

This lack of significant results in optimization was not due to a lack of effort but to a

lack of success. It was not until the application of the computational complexity theory

that the real reason for this lack of success was unveiled.

According to this theory, there are two different kinds of problems depending on

the amount of resources required during computation to solve them. The most

common resources considered are time (how many steps does it take to figure them

out) and space (how much memory is necessary). The class P consists of all those

decision problems that can be solved on a deterministic sequential machine in an

amount of time that is polynomial in the size of the input. The class NP is the set of


decision problems that can be solved in polynomial time by a non-deterministic

machine and verified in polynomial time on a deterministic machine. These kinds of

machines, known as Turing machines, are automatons composed of some tape to be

used as memory -in which symbols from some finite alphabet can be stored-, a head -to

read and write on the tape-, a state register and an action table –that tells the machine

what symbol to write, how to move the head and the following state-. In deterministic

Turing machines the next state is determined by the current one, while in non-

deterministic ones take also into account the symbol written on the tape. In a very

informal way, the P NP≠ problem can be formulated as ‘If possible solutions can be

verified quickly, can they also be calculated quickly?’ An example of this could be

determining whether a number is a prime number. Just checking that it has a certain

non-trivial factor is much easier than figuring it out.

After years of search, no polynomial algorithm has been found for a NP-problem.

Nevertheless, there is no definite proof that P NP≠ and researchers do not have a

common view on the subject. In a 2002 poll to scientists that asked Is P NP= ? 61%

believed the answer was no, 9% believed the answer was yes, 22% were unsure, and

8% believed the question may be independent of the currently accepted axioms, and so

impossible to prove or disprove. A $1,000,000 prize has been offered for a correct

solution.

NP-complete (Non-deterministic Polynomial time Complete) problems are the most

intractable of all combinatorial optimization problems. The class NP-Complete is

defined as those problems which are NP-hard and in the class NP. This concept was

first introduced by Cook in 1971. If a polynomial time solution to a problem makes it

possible to solve all NP problems in polynomial time then this problem is said to be

NP-hard. That would be a kind of master key for all NP problems. No such algorithm

is known to exist.

Perhaps one of the most well-known problems of this kind is the Travelling

Salesman Problem. A figure with the computational complexity of this problem is

included to illustrate the ideas exposed.


100

101

102

100

1050

10100

10150

10200 Travelling Salesman Problem

Number of cities to visit

Pos

sibl

e ro

utes

Figure 2. Computational complexity of the Travelling Salesman Problem

The number of possible solutions to evaluate in the case of the JSP are exposed in

section 6, given that this number depends on the solution definition used. Two

different ones are used in this project –the one of a linear programming model and the

one proposed for the heuristic-, so their implications on the problem complexity will be

analysed once they have been introduced.

One of first works that applied computational complexity theory to the JSP,

concluding that it was NP-complete was [GONZALEZ, 1977]. In fact, the JSP seems to

be one of the hardest NP-Complete problems.

Dudek, Panwalkar and Smith made this heart-breaking statement in one of their

papers years after that search [DUDEK 1992] : ‘The consolation for those seeking optimizing

algorithms following the NP-completeness shock was the realization that the lack of success in

optimizing the general flowshop problem was the impossibility of the task rather than the

inadequacy of the effort. It is unfortunate that the theory on NP-completeness did not come

several years earlier, before so much effort was expended in the futile search for efficient

optimizing algorithms. As members of the group of researchers who searched for optimizing

algorithms, we can report that at least it was challenging and interesting even though it was

not as productive as had been hoped.’


1.4 A Futile Search. Review of the Classical Approaches to the JSP

This section will present an overview of the two main classical approaches to

solving scheduling problems, based in enumerative algorithms: Branch & Bound and

Dynamic Programming.

1.4.1 Branch & Bound

The solving process can be conveniently represented by a search tree. Each node of

the tree symbolizes a subset of feasible solutions. How the feasible solutions at a node

are partitioned into subsets, each corresponding to a descendant node of the search

tree, is specified by a branching rule.

For instance, in the case of a single machine scheduling each solution is defined as a

sequence of jobs. For such a problem, commonly used branching rules involve forward

sequencing and backward sequencing. In forward sequencing, each node corresponds

to sequences in which the jobs in the first positions are fixed. The first branching

creates nodes corresponding to every job that can appear in the first position, the

second branching creates nodes that fix the job in the second position, and so on. A

backward sequencing is built from the end. That would be the ‘Branch’ part of the

algorithm.

The problem has an objective function to be minimized. A lower bounding scheme

associates a lower bound with each node of the search tree. The aim is to eliminate any

node for which the lower bound is worse than or equal to the value of the best known

feasible solution, since it cannot lead to an better solution. Lower bounds are usually

obtained by solving a relaxed problem. The most widely used methods are Lagrangean

relaxation and solving the linear programming relaxation of an integer programming

formulation. The use of dominance rules can lead to eliminating nodes before

calculating any lower bounds. The order in which the nodes of the tree are selected for

branching has to be defined. One of the most widely used strategy is to branch from a

node with the smallest lower bound.

1.4.2 Dynamic Programming

In dynamic programming each partial solution is represented by a state with a

certain assigned value of the objective function. When two or more partial solutions


achieve the same state, the one with the best value is retained, while the others are

dismissed. The states have to be defined in such a way that this elimination is valid.

For instance, consider the single machine sequencing problem again. Partial solutions

would be initial partial sequences. A state can be represented by the jobs that are

sequenced and the time that the machine becomes available to process the jobs that

remain unprocessed. A decision transforms a partial solution from one state to another.

All possible decisions are considered. Applying this method recursively determines the

optimal solution value, and backtracking is then used to find the decisions that define

that solution.

1.4.3 Trapped in a Dead End? Project Aims

As stated below, it is believed that such algorithms cannot efficiently solve (very

commonly understood as ‘in polynomial time’) the JSP. Fortunately, that was not the end

of the research on scheduling. It just meant a change of view.

New interesting approaches to combinatorial problems were developed. These

methods will be reviewed in the next chapter.

This project is set in motion from all the research efforts that started with Johnson’s

work and that this chapter has tried to capture. It is born with the ambition of taking

advantage of all the already developed heuristic methods and combining them to

create a more powerful new hybrid one. Afterwards, the performance of this new

algorithm will be tested and compared to the currently used ones. If possible, its

effectiveness will be theoretically studied.

2Quod natura non sunt turpia.

Heuristic methods from a holistic

point of view


Quod natura non sunt turpia. Heuristic methods from a holistic point of view 13

2 Quod natura non sunt turpia1. Heuristic methods from a

holistic point of view

In the author’s opinion,

there have been only two main

sources of inspiration to

mankind. The first one is man’s

own emotions. Feelings have

generated every moving

masterpiece from the beginning

of time. Nature is the second

one. A beautifully shaped

concha shell can remind us of a

Durero’s spiral and the golden

number or just provoke the

idea of include it in a still

nature picture. The tireless

work of ants searching for food can inspire both a Samaniego’s fable and the ant colony

optimization techniques that will be presented later.

Having run out of ideas, turning to nature can give us some perspective on how to

solve incommensurably more complex optimization problems in a beautifully efficient

way.

There is a qualitative difference between the classical approach to optimization and

these nature-inspired methods. While the first one searches the global optimum point,

the latest look for good solutions, with no guarantee of optimality, but with affordable

computation times. This is the difference between a heuristic, conceived as a series of

steps that lead to optimum, and a metaheuristic method as the ones that will be used in

this project. The concepts of efficiency and effectiveness are the underlying difference

between them: a linear programming model of a general jobshop can be effectively

solved by dual simplex method – i.e. by following the algorithm steps, it is sure that

1 ‘What is natural cannot be bad’, Latin proverb.


the global optimum will be achieved sometime - but that does not mean that it could do

it efficiently – so the time spent in the calculation can be longer that what we can afford

-. Therefore, this approach renounces a guaranteed optimality in the solution in order

to be able to provide a good solution in a reasonable amount of time. There is an

additional difference: nature is not deterministic. A probabilistic component is

introduced. As will be presented later, this helps preventing the algorithm from getting

stuck in local optima. The next sections will describe some of these methods: the ones

that will be used in this project and some other interesting ones of amazing creativity.

As stated in the introduction chapter, nature is one of the main sources of

inspiration for mankind. Fascinating in its perfection, constantly evolving, and always

one step further than every scientist, it has provided the most inquisitive ones with the

most interesting tracks to follow. Having run out of ideas, turning to nature can give us

some perspective on how to solve incommensurably more complex optimization

problems in a beautifully efficient way.

This idea has set in motion a series of different heuristic models, which looked to

different natural optimizers and tried to capture their essence and translate into

optimization methods.

The intuition underlying this project is that all these diverse interpretations of

nature can be split into a discrete (and very limited) set of basic mechanisms. These

patterns will be introduced in the next section, and later on they will be identified in

the most well-known optimization techniques.

2.1 Nihil novi sub sole2. Basic mechanisms underlying heuristic optimization

methods.

Continuing with proverbs, this last one remarks the real difficulty of actual

innovation. As stated below, there is nothing new with heuristic methods but a

reinterpretation of natural patterns. Moreover, they all sound similar. This section

explores the similitude between them expressed as the basic parts they share. The

nomenclature is intended to be general, but the influence of some of the particular

methods is obvious in some cases.

2 ‘Nothing new under the sun’, Latin proverb.


2.1.1 Global exploration

This basic tool could be described as a completely random search. Different

solutions, with neither connection between them nor particular search direction, are

proposed with no subsequent treatment.

2.1.2 Combination

Combination is an operator that receives some solutions and returns just one that is

a combination of them. Note that the properties of the output will be a combination of

the inputs only in some cases, depending on the convex nature of the problem.

Moreover, the same operator applied to the same inputs is able to return different

output, given that some degree of randomness is introduced in the process.

As this operator is the one with more difficulties to be practically applied, a

different metaheuristic paradigm, called ‘Genetic expression models’ , originally created

in this project, is introduced in the annexes.

2.1.3 Selection

From a set of solutions, choosing some of them and discarding the rest. The

conditions upon which this selection is done are completely free and must, as in the

previous cases, include some randomness in its criterion.

2.1.4 Neighbour search

The surrounds of a given point are inspected looking for better solutions. Although

there can be some randomness when selecting the points to be analysed, this search

involves a certain level of direction.

These four elements are the basis of all optimization techniques. After presenting

some of the most widely used metaheuristics, they will be identified to illustrate this

idea.


2.2 Mechanisms of Evolution

2.2.1 Genetic Algorithms (G.A.)

Biologists seem to live in a constant

surprise. It appears as if there was a living

being adapted to every habitat on earth.

From abyssal fish that bear pressures of 4

kPa and an absolutely dark environment to the bacteria living in the sulphuric

extremely hot water in Yellowstone’s geysers, life has reached all possible scenarios.

All those creatures came from ancestors that had nothing to do with their present

appearance. Evolution has lead to a specialised living being for each possible ecological

niche that in some way is optimum for that specific environment. These methods can

be explored with the aim of developing new optimization techniques. This brilliant

idea was conceived by Holland [HOLLAND 1975] for solving parameter optimization

problems. Genetic algorithms have proven to be robust in the sense that they provide

good solutions on a wide range of problems. In addition, they can be easily modified

with respect to the objective function and constraints.

Generation after generation, the individuals compete with each other. Survival of

the fittest (Darwinian selection) is the main idea. Fitter members of the population are

more likely (but just more likely, in a probabilistic sense) to mate with each others and

have an offspring in subsequent generations. This mating leads to a recombination of

the genetic materials of the best and often drives to a sequence of generations that is

successively fitter.

In addition to the mating mechanism, nature introduces infrequent and completely

random modifications that increase diversity among the population. This is known as

mutation.

To implement the biological metaphor, these ideas are materialised:

2.2.1.1 Individual

Each possible solution to the problem is considered an ‘individual’. It can be ‘normal’

or a ‘monster’ depending on whether it is feasible or not. It will be codified in a way

that is believed to be efficient for the given problem. The algorithm will work with the


codification and not with the solution. This is one more difference between G.A. and

the traditional optimization methods. There will be an initial population whose size is

usually calculated using heuristic rules. This enables the algorithm to work with

several parallel optimization lines and not just one, as it happens in the classical

approaches.

2.2.1.2 Chromosome, gene and allele

Chromosome is the definition of each individual. It is represented by strings. Each

element in the string is called ‘gene’ and is occupied by one of the symbols of a certain

alphabet. Each symbol in this alphabet is called ‘allele’. The size of a chromosome is the

number of genes it contains.

It is necessary to define a useful codification to translate each solution to a string of

symbols. Binary codification is frequently used in general problems because of its

simplicity and the possibility of using already developed genetic operators. Although it

has been actually used in some jobshop scheduling work, as in [NAKANO,1991] or

[YAMADA,1992], due to the problems that arise when trying to implement crossover

genetic operators, this project discards this option. An appropiate codification for the

problem will be created.

2.2.1.3 Schema

Given the initial alphabet Ω , we will consider an extended alphabet 'Ω constituted

by the last one plus a new element, the metasymbol ‘*’. It indicates that the position it

occupies there can be any of the elements of Ω . All the strings formed by the extended

alphabet receive the name of ‘schemas’.

The order of a schema is given by the number of fixed positions in the string, and its

length is the difference between the position of the last fixed character and the first one.

The number of individuals that belong to a given schema in each generation evolves

according to its fitness relative to the remaining members of the population, its length

and order.

The so-called Schema Theorem is the only mathematically rigorous support for the

G.A. theory. It was formulated by Holland as:


( )( , ) ( )[ ( , 1)] · ( , )· 1 · ·(1 )( ) 1

o sc m

u s t d sE m s t m s t p pf t l

⎛ ⎞+ ≥ − −⎜ ⎟−⎝ ⎠ (2-1)

• m(s,t)= instances of schema s in population at time t .

• f(t)= average fitness of population at time t .

• u(s,t)= average fitness of instances of s at time t .

• pc = probability of single point crossover operator.

• pm = probability of mutation operator.

• l = length of single bit strings.

• o(s) number of defined (non *) bits in s .

• d(s)= distance between leftmost, rightmost defined bits in s .

And, informally, can be expressed as:

‘Short, low-order, above-average fitness schemata receive exponentially increasing trials in

subsequent generations’

2.2.1.4 Selection

The fittest members of the population will be selected to breed. There are two main

methods to replicate this mechanism. The first one, known as the roulette wheel, is

based on the simple fitness proportional rule. This stipulates that the expected number

of copies of each individual in the next generation is proportional to its fitness. More

specifically, if the population has a size of N members, the selection process will be

carried out by building a roulette wheel with N slots. Each slot corresponds to a

member and its angular width is proportional to the fitness of that individual. The

wheel is spun and fixed pointers select the members that will generate the next

generation. In second one, known as the tournament, a random permutation of the

existing solutions is obtained. They are then split up into a definite number of groups

and the fittest member of each group is selected.

2.2.1.5 Crossover

It could be defined as a random function that returns a ‘child’ chromosome out of

two ‘parent’ chromosomes by recombining in some way their genetic information.


When using a binary codification, it is easy to implement single or multiple point

operators. For instance, we could have:

Example of single point crossover in genes 3rd and 4th .

Parent 1: 110|10011 Child 1: 11001101

Parent 2: 011|01101 Child 2: 01110011

Table 1. Example of single-point crossover

Example of two-point crossover in genes 3rd-4th and 6th-7th.

Parent 1 : 110|110|11 Child 1: 110011111

Parent 2: 011|011|01 Child 2 : 01110001

Table 2. Example of two-point crossover

Other operators have been developed, such as Partially Mapped Crossover (PMC),

from Goldberg and Lingle, or the Reeves operator, which is widely used when

working with numerical strings. More sophisticated ideas have been researched. In

[AGGARWAL, 1996], for example, one of the children is constructed in such a way to

have the best objective function value from all the feasible set of children while the

other is selected with the aim of maintaining diversity.

2.2.1.6 Mutation

Mutation consists in introducing small random changes in some individuals with a

very low probability. That reduces the risk of getting trapped in local optima.

There are variations of this mechanism that include immigration (adding

individuals independent of the current generation) or inversion (modifying all the

genes that lie between two positions randomly selected).


2.2.1.7 The result

Fitter members and schemas spread while the worst solutions disappear, giving

fitter individuals (i.e, solutions with a better value of the objective function) generation

after generation.

2.2.1.8 To Sumarize

The process could be expressed in pseudo code as follows.

Initialize population;

Generation = 0;

While (Termination_Criterion = = false)

Generation++;

Selection(Population);

Crossover(Population);

Mutate(Population);

2.2.1.9 Range of possibilities

There is an enormous range of possible specific implementations of G.A. Every

living being’s reproductive pattern could be studied and translated to an algorithm.

Some successful implementations have actually come from this idea (like the ‘Deer’s

Genetic Algorithm’ used in Jose Luis Gahete’s thesis [GAHETE 2004]). However, it

should be noted that these patterns can be modified in such ways that nature could

never contemplate. For instance, the best individual(s) of each generation, the

‘genius(es)’, could be passed directly to the next generation and treated as immortal

until they are beat in fitness by other members of the population.


2.2.1.10 Genetic Algorithms and the Jobshop problem. Recent works

In [GONÇALVES 2002], a hybrid G.A. is presented, with the particularity that after

a schedule is obtained a local search heuristic is applied to improve the solution. To

prevent premature convergence, an immigration mechanism is included.

[MESGHOUNI 2004] introduced parallel representation (which could be viewed as

a multi-dimensional matrix where each row corresponds to a different machine) and

developed genetic operators adapted to this codification.

[FAYAD 2003] performed some research on the ‘real word’ J.S.P. – that is, using

fuzzy sets to model uncertain processing times and due dates using G.A. to solve it.

2.2.2 The bright of a diamond. Simulated Annealing (S.A.)

This time, the inspiration came from the

metallurgical processes that underlie the

thermal treatment that enhances mechanical

resistance in steel. It consists in rising the

metal temperature to the fusion point and

then allowing it to get cold again. The

properties of the solid obtained will highly

depend on the speed of this cooling. If this happens very fast (the tempering process),

the formed crystals will contain a relatively high number of imperfections. On the

contrary, if the process is slow enough, the mineral can shape into big perfect crystals,

the ones that achieve an energy minimum, just the necessary to keep molecular

cohesion, which can be as beautiful as a diamond.

This concept was introduced in statistical mechanics by Metropolis in 1953

[METROPOLIS 1953]. Thirty years later, Kilpatrick [KIRPATRICK 1983] and Cerny

[CERNY 1985] used it as a general solution approach for discrete optimization

problems. Quickly after these early works, this approach was applied to a variety of

problems as diverse as VLSI (very-large-scale integrated circuits) design, ([JEPSEN

1983], [KIRKPATRICK 1984], [VECCHI 1983], [ROWEN 1985], pattern

recognition([GEMAN, 1984, [HINTON, 1984]) and code generation [EL GAMAL 1987].


2.2.2.1 A thermodynamical basis

The annealing process is based on the laws of Thermodynamics. According to them,

at a specific temperature t , the energy increment E∆ in the studied thermodynamical

system is a random variable whose distribution function is given by:

·( ; ) k tP E t e∂

−∆ > ∂ = (2-2)

where k represents the Boltzmann constant.

That is equivalent to:

·( ; ) 1 k tP E t e∂

−∆ ≤ ∂ = − (2-3)

The probability of a big energy increment is very low, and can be described through

an exponential distribution function, which implies that this increment happening is a

rare event. It is well known that this distribution is linked to the Poisson one (the latest

one gives the probability of the occurrence in a certain moment and the exponential

models the time length between occurrences). This could be an introduction to the role

that Markov chains (based in Poisson distributions) will play in Simulated Annealing.

Metropolis’ algorithm generates a perturbation and calculates the energy increment

that would result from it. If energy lessens, then the new state is accepted. If it

increases, it can be accepted or not depending on the thermodynamical rule above.

Kirkpatrick adapted it to be an optimization heuristic which could be seen as a

variant of the well-known local search heuristics based on the concept of

neighbourhood. Each solution is assimilated to a point of a thermodynamical system.

In the classical approach, up-hill movements (the result of a neighbour with a higher

level of energy being selected in the random search) are rejected. Consequently, there is

a huge risk of getting trapped in local optima and the final result can depend highly on

the initial point chosen to start the process.

On the contrary, simulated annealing does not disregard this kind of solutions but

accepts them according to the thermodynamical probability of the energy increment

they involve.


To develop the metallurgical metaphor, a few more concepts are introduced. The

different ‘states of the system’ represent the space of feasible solutions of the problem.

Energy is equivalent to cost. A change of state is understood as accepting a neighbour

solution. Temperature will be the control parameter of the mechanism, therefore

creating the need for a ‘cooling plan’ that defines its evolution with time. Finally, a

frozen state represents the heuristic final solution.

A possible pseudo code expression could be as follows:

Select an initial solution 0s from the space of feasible solutions S ;

Fix the initial temperature to a positive value 0 0t > and select a cooling plan ( )CP t ;

While (Termination_Criterion == false)

Select another state from the neighbourhood of the current one 0( )s N s∈ ;

Evaluate the energy increment 0( ) ( )E E s E s∆ = − ;

If 0E∆ < then 0s s=

Else

Generate a random number [0,1]r∈ ;

If Etr e∆

−< then make the up-hill movement 0s s=

Change the value for temperature if necessary ( )t CP t=

Note that, in the algorithm, the Boltzmann constant has disappeared. Its function

would be just scaling the control parameter, which will continue to be referred to as

‘temperature’.


2.2.2.2 Simulated Annealing and Markov Chains

A Markov chain, named after Andrei Markov, is a discrete-time stochastic process

(although some continuous generalizations have been developed) in which the

probability for changing to a specific state depends only on the current state. The chain

is a sequence of random variables. The range of these variables, i.e., the set of their

possible values, is called the state space. At any time n , a finite Markov chain can be

characterized by a matrix of probabilities whose ,x y element is given

by 1( / )n nP X x X y+ = = , where n represents the time index. If the chain is homogenous,

these probabilities will be independent from n . These kinds of discrete finite Markov

chains can also be described by a directed graph where the arcs are labelled by the

probabilities of going from one state to the other state that are on either end of the arc.

There is a wide range of practical problems in areas such as inventory management,

queuing systems, maintenance and manpower planning that have been modelled as

Markov chains.

A final class is a subset of states that can be reached from states outside the class,

but there is no external state that can be reached from inside the subset. In an

irreducible Markov chain, it is possible to reach all feasible states from any initial point.

That implies that no class other than the complete chain with period ∂ can be defined.

A periodical chain can be split into ∂ different sets of states in such a way that the

evolution of the chain will pass from s∈ 1S to s∈ 2S … s∈ S∂ . An irreducible

aperiodical chain –that is to say, a chain in which 1 is the maximum common divisor of

all net circuits- is called ‘ergodic’.

Formally, the simulated annealing algorithm could be modelled as a Markov Chain

with as many states as feasible solutions to the problem there are. Given a

temperature t , the probability to move from state i to state j can be represented by ijp .

This configures a stationary Markov chain. If it is ergodic it will converge to a limiting

distribution and, as temperature cools down, get closer to a uniform distribution on the

set of optimal solutions.

Anily and Federgruen [ANILY, 1987] got to prove the convergence of this method.

They stated that a simulated annealing mechanism, when using an infinitely long

cooling plan, can be assimilated to a strongly ergodic Markov chain, thus being


asymptotically independent of the starting state and converging to a distribution that

can be identified as the global minimum.

2.2.2.3 The cooling plan

A good design of the cooling plan is essential to achieve both efficiency and

effectiveness of the algorithm. It has to include the definition of the upper and lower

bounds for the temperature parameter and the rate according to which it is lowered.

If the aim is to get a solution that is independent from the initial state, then the

starting temperature must be high enough to allow all the up-hill movements

proposed. In some cases, the problem provides sufficient information to suitably

estimate this initial value. This happens when the maximum difference between the

values of the cost function of any two points in a neighbourhood is known. This

difference can be considered as enough to avoid getting trapped in local optima.

Several functions to describe the cooling plan have been developed. Nevertheless,

more sophisticated alternatives do not lead to better results.

It is necessary to be really careful when designing a cooling plan, and be prepared

to make a study of the performance resulting from several alternatives.

2.2.2.4 Simulated Annealing and the Jobshop problem. Recent works

In [LAARHOVEN 1992] a S.A. approach is introduced to solve the J.S.P. to

minimize makespan. They use the Aarts and Van Laarhoven cooling rule described in

7.5.1, which takes into consideration the variance of the cost values generated. They

chose the length of each Markov chain to be equal to the size of the largest

neighbourhood. In this paper they introduce an algorithm that gives quite accurate

solutions and is approximately polynomial in the size of the problem.

[AIDYN 2003] Developed a ‘parallel SA’ model in which several solutions are

worked out, as the concept of ‘population’ in G.A.s.


2.2.3 Other strikes of inspiration

Genetic Algorithms and Simulated Annealing are not the only approaches that have

looked back into nature with the aim of solving complex optimization problems. As an

illustration, the Ant Colony Optimization techniques will be briefly reviewed.

These algorithms were introduced by Dorigo in 1996 [DORIGO, 1996] and are

inspired by the behaviour of colonies of real ants, in particular, how they forage for

food. One of the main underlying ideas is that these insects can communicate through

indirect means by modifying the concentration of pheromones, some highly volatile

chemicals, in their immediate environment.

The ants visit a solution

depending on the distance to

the point they are in (the

increment or decrement in the

objective function) and the

pheromone concentration in it.

The relative importance of these

two factors is fixed through two

parameters. Once the ant has

completed its tour, the amount

of evaporation of the

pheromone is calculated, and

the chemical track of this latest

ant (that will be more intense for

better cost values) is added.

Ant by ant, the solutions are improved and finally converge to the global optimum.

This basic mechanism can be enhanced, for instance, by introducing a set of ‘elitist ants’ that raise the pheromone concentration in the best solutions found until that moment. This idea was proposed in [WHITE, 2003].

Another example of biological metaphor used to build an optimization method are

immune systems. These mechanisms are motivated by the theory of immunology. The

biological immune system functions to protect the body against pathogens or antigens

that could potentially cause harm. It works by producing antibodies that identify, bind


to, and finally eliminate the pathogens. Even though the number of antigens is far

larger than the number of antibodies, the biological immune system has evolved to

allow it to deal with the antigens. The immune system will discover the criteria of the

antigens so that in future it can react both to those antigens it has encountered before

as well as to entirely new ones. This idea was introduced by de Castro and Timmis in

[DE CASTRO 2002], and has been already applied to the JSP in recent works such as

[AICKELIN 2004].

Human Guided Search, interactive algorithms that take advantage of human beings’

abilities in fields where they outperform computers, has also been applied to this

problem in [LESH 2003].

More processes in nature could presumably be adapted for optimization purposes,

so there is a wide range of unexplored nature-inspired techniques.

2.2.4 Elitist ants, geniuses, and growing up too fast

Elements such as elitist ants and geniuses are introduced in the general mechanisms

with the aim of speeding convergence up. However, an unbalanced addition of this

kind can have harmful effects in the performance of the algorithm.

As expressed in [WHITE, 2003], ‘However, by using elitist ants to reinforce the best tour

the problems now takes advantage of global data with the additional problem of deciding on how

many elitist ants to use. If too many elitist ants are used, then the algorithm can easily become

trapped in local optima’.

This text illustrates the ‘exploration versus exploitation’ dilemma that will be recurrent

along this project and any other dealing with similar metaheuristical techniques. In

some way, it is similar to the problem that arises when a child is willing to grow too

fast and burns out life stages too soon, or when we hurry to finish any delicate piece of

work. It is vital to find a balance between leaving the algorithm to blindly explore

without allowing it to learn and guiding it too much, thereby making it grow too fast

so it cannot explore the paths that would lead it to the best solution.

2.3 Breakdown of some optimization techniques

This section will make a description of the already introduced heuristic methods in

terms of the four elements presented in section 2.1.


2.3.1 Taboo search

The first solution, completely random, is obtained from a Global exploration

mechanism.

The algorithm then performs neighbour search, which involves direction, until it

finds the solution.

2.3.2 Simulated Annealing

The first solution is provided by a global search method.

The movements according to the Boltzmann equation are an expression of

neighbour search.

2.3.3 Genetic Algorithms

The first generation and the immigrants are obtained by global search.

Natural selection is a very important part of the algorithm.

Children are created from their parents through a combination process.

Mutation can be understood as a kind of neighbour search.

Containing all the basic elements provides GA with a special robustness, and can

make them preferable when the characteristics of the space of solutions are not well

known.

2.3.4 Ant colony optimization

The first solution is obtained by global search.

The algorithm then continues the work by neighbour search.

2.3.5 Immune Systems Optimization

New patterns are constantly being generated by global search.

The designs that are not valid are eliminated in a selection process.

The most successful lymphocytes are copied allowing some random changes in

them: that could be understood as neighbour search. Moreover, the essence of this

is clearly the same as in the mutation operation.


It seems obvious that the ideas underlying Immune Systems Optimization are

exactly the same as in GA but discarding the combination operation.

2.3.6 The four elements applied to other classical optimization methods

In them, the degree of randomness has disappeared.

Dynamic programming or Branch & Bound could be understood as an aggregate

of global search that has the peculiarity of being exhaustive and selection, which

eliminates the branches that cannot drive to better results.

Methods like simplex are an expression of a pure neighbour search in which the

level of direction is the maximum possible, completely guiding the algorithm

towards the best solutions.

2.4 Conclusions

Optimization methods share some elemental mechanisms that provide them with

their effectiveness and efficiency: global search, neighbour search, selection and

combination. They all include randomness in some way.

Different methods put different weights on them, and express their essence

differently. That is the most important degree of freedom when designing an

optimization method.

When choosing a method to solve a particular problem, its characteristics can give

an indication beforehand of which ones will have a better performance. Those can be

easily understood as to which extent the four elements are consistent with the problem

and the codification used (e.g. Does a combination of solutions have something to do with the

inputs? If the fitness value of a combination were to be contained in the interval

defined by the parents’ finesses, then a combination mechanism would lack any sense as

no possible improvement could appear).

Once the elemental mechanisms are evaluated, one of the methods which use them

must be selected. The real work then begins: implementing each one of the ideas in the

way that suits the problem peculiarities best.

3Just to compare.

A linear programming model for

the Jobshop problem


Just to compare. A linear programming model for the Flexible Jobshop problem 31

3 Just to compare. A linear programming model for the Flexible

Jobshop problem

Just with the aim of comparing the performance results of the developed

metaheuristic algorithms with the outcome of the classical approaches to the problem,

a L.P. model has been created.

3.1 A more specific definition of the problem

The jobshop problem consists in finding the optimal schedule (according to a

chosen criterion) for a set of n jobs, each one composed of io operations to be

performed on the set of available m machines.

General assumptions are:

One machine can at most perform one operation at a time

The operations cannot be split, once they are started they have to be completed

before the machine is performing it is available for a second operation.

The description above could belong to a plain Jobshop problem. The following

characteristics determine the peculiarities of the specific problem to be solved.

Each operation can be performed not just on one machine but on a certain set of

them. This is the reason for the ‘Flexible’ adjective of the problem title.

The processing times for an operation on a given machine are independent from

the sequence scheduled. However, the concept of set up time, which varies for

each operation depending on the latest operation performed by the machine, is

introduced.

3.1.1 Considerations

The model that will be developed in this project will take into consideration:

The setup time for a machine considering that it was idle before, which will differ

from machine to machine and could vary depending on the specific job

scheduled in the first place for that machine.

Processing times that are different for different machines.


Setup times for each machine that will depend on the job performed and the

latest job performed on it.

3.2 Formulation

The proposed formulation will be introduced following the GAMS structure.

3.2.1 Parameters

n : Number of jobs to be scheduled.

io : Operations that compose the job i .

m : Total number of available machines.

iij kPT : Processing time of the job i in its j -th operation when it is performed on

machine k .

0iij kST : Setup time when starting to perform the job i in its j -th operation on

machine k and it was idle before.

i xij xy kST : Set up time when changing, on machine k , to j -th operation of job i

from y -th operation of job x .

ijkP : Binary parameter that takes the value ‘1’ if and only if the j -th operation of

job i can be performed on machine k . The equation below links the value of these

variables to the processing times. The processing time is recorded as ‘0’ when the

machine is not able to perform the operation.

iijr : Order number of process j of job i . In the GAMS model, this is simply

introduced as ( )iord j .

3.2.2 Variables

iij klM : Binary variable that takes the value ‘1’ if and only if the j -th operation of

job i is performed on machine k on l -th place.

klT : Time moment in which machine k finishes its l -th process. They are defined

in a recursive way.


iijX : Moment when the j -th operation of job i is finished. It is defined in a

recursive way too.

( )Start k : Binary variable that takes the value ‘1’ if and only if machine k is used.

( , )Stop k l : Binary variable that takes the value ‘1’ if and only if l -th position is

the last one for machine k .

3.2.3 Equations

Objective function to be minimized : total makespan. It is greater or equal than

the time in which the last operation of each of the jobs is finished.

iioTM X≥ (3-1)

Each operation must be performed:

,

1iij kl

k l

M =∑ (3-2)

Recursive definition of klT :

0 0kT = (3-3)

Dumb operation ‘start’. In the created GAMS model this is not necessary (when the

compiler looks for the value of a variable whose index is out of bounds, ‘0 ‘is used)

, 1 1,

, 1, , , ,

max , · 0 , ·

· · ·

i i

i

i i i x i x

i i x

kl k l ij k ijk ia ij klk i j

ij kl ij k ij kl xy k l ij xy ki j i j x y

T T M ST X M

M PT M M ST

−

−

⎧ ⎫⎪ ⎪= +⎨ ⎬⎪ ⎪⎩ ⎭

+

∑ ∑

∑ ∑ (3-4)

where ( ) 1ia r ij= −

The first addend selects the latest of the time when the previous process of that job

or the previous process of that machine were finished. The second one accounts for

the processing time and the last one for the set up time.

Recursive definition of iijX :

0 0iX = (3-5)

, 1 1 , 1,

, 1,

max , · 0 , ·

· · ·

i i i

i i i x i x

x

ij i j ij k ijk k l ij klk k l

ij kl ij k ij kl xy k l ij xy kk l klxy

X X M ST T M

M PT M M ST

− −

−

⎧ ⎫= +⎨ ⎬

⎩ ⎭+

∑ ∑

∑ ∑ (3-6)


The first addend selects the latest of the time when the previous operation of the job

or the previous operation performed by the machine finishes. The second one accounts

for the processing time and the last one for the set up time.

Filling consecutive positions: it is necessary to compel the model to fill

consecutive job positions for each machine.

If a machine is started, it has to be stopped sometime:

kl kl

Stop Start=∑ (3-7)

If a machine was started and has not been stopped yet, it is used:

1

,1

i

g l

ij kl k k lg

M Start Stop= −

=

≥ − ∑ (3-8)

If a machine is started, its first position is full:

1,

iij k ki j

M Start=∑ (3-9)

The positions are filled in ascending order:

( 1),

i iij kl ij k li j

M M −≤ ∑ (3-10)

3.3 Linearizing the model

Although the model here explained contains some non-linearities in its formulation,

the problem can be linearized by means of introducing a large amount of binary

variables, therefore boosting computation time.

Four non-linearities can be found in the model:

, 1 1,

, , 1, , , ,

max , · 0 , ·

· · ·

i i

i

i i i x i x

i i x

kl k l ij k ijk ia ij klk i j

ij kl ij k ij kl xy k l ij xy ki j i j x y

T T M ST X M

M PT M M ST

−

−

⎧ ⎫⎪ ⎪= +⎨ ⎬⎪ ⎪⎩ ⎭

+

∑ ∑

∑ ∑ (3-11)


, 1 1 , 1,

, 1,

max , · 0 , ·

· · ·

i i i

i i i x i x

x

ij i j ij k ijk k l ij klk k l

ij kl ij k ij kl xy k l ij xy kk l klxy

X X M ST T M

M PT M M ST

− −

−

⎧ ⎫= +⎨ ⎬

⎩ ⎭

+

∑ ∑

∑ ∑ (3-12)

Two of them are products of binary variables. The constraints introduced to

represent such a restriction will be:

·

1

z x yz xz yz x y

=

≤⎧ ⎫⎪ ⎪≤⎨ ⎬⎪ ⎪≥ + −⎩ ⎭

(3-13)

The remaining ones are products of a binary and a real variable. To describe it in a

simpler way, the problem can be expressed as a variable v that takes the value of

another one u if and only if the binary variable b takes the value 1. The way of

introducing this into the model can be described as:

·

(1 )v M bv u M b≤⎧ ⎫

⎨ ⎬≥ − −⎩ ⎭ (3-14)

Thus, the restrictions have been linearized to make the use of a LP solver possible.

In addition, the max operator is suppressed.

max ,z x yz xz y

=

≥≥

(3-15)

3.4 Numerical Results

The model will be run with different data in order to examine the numerical

solutions provided.

Two cases will be considered to illustrate the performing of the gams program. The

first one will be small so that the optimal solution can be seen intuitively. Afterwards, a

complex problem will be analized to show the performance of the model in that field.

Let us start considering two jobs (A, B) to be done in two machines (K1, K2). Job A

has two operations (j1,j2) and B has two (j1,j2). In order to make it easy to solve, we will

not take into account any kind of setup time.


The numerical values of each operation time:

Task K1 [Hours] K2 [Hours]

A.j1 3 6

A.j2 4

B.j1 5 4

B.j2 5

Table 3. Numerical values for processing time in the example

It can be easily seen that the optimal order of operations will be:

L2L1K1

43

L1

21 87

L2

65 109

K2

Time [Hours]

L2L1K1

43

L1

21 87

L2

65 109

K2

Time [Hours]

Table 4. Optimal schedule for the example

The optimal value was 9 hours.

To get an idea of the complexity of the problem:

Number of equations: 488

Number of single variables: 197

Computation time: 0.62 secs.

Finally, the model can be tested with the data amount which will be more usual in a

real case. For the sake of simplicity, the time data are not included here but in the final

annexes.

This problem is computationally intensive. The optimal solution was found by Neos

Server (a multi-institutional effort that enables easy access to more than fifty


optimization packages and can be found at http://www-neos.mcs.anl.gov/) after

analyzing it for more than 24 hours:

A.j1

C.j3

5

C.j1,C.j2

D.j1

B.j1,B.j2

1

K1

K2

K3

Time

B.j4,B.j5,B.j6,B.j7

C.j4,C.j5

15

B.j3

A.j2,A.j3,A.j4

10

D.j2

C.j7D.j3

D.j4,D.j5C.j6

35302520

A.j1

C.j3

5

C.j1,C.j2

D.j1

B.j1,B.j2

1

K1

K2

K3

Time

B.j4,B.j5,B.j6,B.j7

C.j4,C.j5

15

B.j3

A.j2,A.j3,A.j4

10

D.j2

C.j7D.j3

D.j4,D.j5C.j6

35302520

Table 5. Optimal schedule for the second example

The optimal value obtained was 35.8 hours.

And problem complexity can be expressed as:

Number of equations: 104,260

Number of single variables: 36,400

Computation time: 23 hours.

3.5 Conclusions

The last one of the numerical examples introduced kept GAMS working for more

than 20 hours. The resolution of the problem could not be finished in time to provide

numerical results here.

In the authors’ opinion, the two more important characteristics of the model

regarding this fact are:

• The great amount of binary variables.

• The impossibility to describe the model in a more intuitive way.

This has the consequence of boosting computation time as shown in the results

provided.

Nevertheless, the heuristic algorithms developed in this project overcome these

difficulties to achieve considerably better performances.

4Designing a Codification.

Importance of the basis


Designing a codification. Importance of the basis. 39

4 Designing a codification. Importance of the basis.

When starting to implement

any of the optimization

methods described, the first

problem to tackle is how to

represent the possible

solutions. Although some tools

based on binary codification

are available and ready to use

with the time savings this

represents, they are not always

advisable. It is of a vital

importance to choose the best

possible system and not to

reduce the range of options to

just binary. The complexity and

the characteristics of the problem that make more suitable the different elementary

optimization mechanisms –global search, neighbour search, selection and combination-

strongly depend on the codification.

As stated below, choosing the right codification system has a huge impact on the

final outcome. Therefore, a considerable time investment should be made in order to

finally make a sound decision. This section will explore the main reasons of this impact

and will provide with some pieces of advice on how to actually choose or design a

codification.

4.1.1 The hat and the rabbit: making things appear and disappear

A classical definition of optimization is choosing the best solution among the possible

ones. According to this, there are only two things that define an optimization problem:

what is possible –delimitating the space of feasible solutions, usually by a series of

constraints- and what is considered the best –defining an objective function that will

return objective values when applied to the feasible solutions-.


These two concepts are the sources of complexity for the problem. Thus, if one of

them could be reduced or simplified, the problem would be easier to be solved. In

some cases, difficult problems are relaxed, which means that some of their constraints

are disregarded in order to get an approximate solution or valuable information about

the exact one. This is not a kind of relaxation. All the constraints are taken into account,

not explicit but implicitly.

The main idea is that if a codification makes it impossible to construct an unfeasible

solution, then it will not be necessary to check any solution, because all of them will be

feasible beforehand. That simplifies the model to be applied and lowers computational

time.

Of course finding such a codification would be an ideal situation, but in some cases

it is possible to find an intermediate possibility in which a given set of constraints –but

not they all- can be overseen. The rabbit is inside the hat before the magician pulls it

out from its ears, but we cannot see it. We do not have to worry about implicit

constraints. They are in there in case we wanted to pull them out.

The objective is to find the way of providing all the necessary information to define

a solution, but giving only the necessary and sufficient data. Giving more information

arises the threaten of unfeasibility.

In this project, a codification of this kind has been found. Although it was

considered an original innovation of this project, the author discovered it was already

briefly described in [MESOUGHNI 04] but not used. Astonishingly, Mesoughni did not

recognise its advantages and continued using the traditional codification depicted

below. It will be introduced in further sections.

4.1.2 Simplifying the calculation

The second source of complexity in a problem is the peculiarities of the objective

function. Choosing a good codification can make the objective calculations easier. As in

the case of making constraints implicit, that results in simpler models and lower

computation times.


4.1.3 Operators design

When starting the real work, that is, operators design, codification plays a very

important role. Depending on the way the information is coded the four elementary

ideas could be implemented in a more intuitive way. That is, the codification should be

developed having into account the possibilities that each of the options presents

regarding to different operators.

4.1.4 Method selection

Depending on the codification and the way they are defined, some operators will

clearly be more suitable for the problem. For example, if little modifications in the code

can represent solutions that are in essence very different, with objective values that

have nothing to do one with the other, then the neighbour search mechanism will have

no sense. As this element is the basis for most methods, maybe it would be a good idea

to spend some time trying to improve the codification system.

4.1.5 Conclusions

Although some ready-to-use tools based on binary codification can be found in the

market, spending some time in choosing a codification is worth doing. The benefits of

using a right system could be summarised as making constraints implicit, simplifying

the objective calculations, making operators design easier and clarifying the choice of

an optimization method.

In scheduling problems, these advantages become even clearer, given the complex

nature of the problem. That was the reason behind discarding the already-made

operators for this project and developing a codification. In the next section this system

will be introduced.

4.2 The codification used: two lists for a schedule

4.2.1 Introduction: leaving formalities behind

When starting to face the problem of how to codify the solutions, it seems a good

idea to revise the already developed linear programming model. The formulation of

objective function and constraints can clarify the concepts prior to take any decision. In

this case, that will not be done.


Looking back to the model developed, intuition seems to arise a feeling that the

model is more complicated than the problem. There are a huge number of auxiliary

variables and constraints that contain them and are not obvious. Nevertheless, all the

information contained in them could be expressed in a much more easy and informal

way, ideally the same one in which we could show the schedule to follow to a

colleague.

There is some rigidity in mathematical models that can be bypassed someway when

developing a heuristic algorithm. It is not necessary to explicitly formalize neither

constraints nor the objective function in a purely mathematical way. Just

understanding the processes that have to be followed and implementing them

correctly is well enough.

This is a source of informality but also of flexibility, that opens the range of

possibilities and enables the use of more efficient definitions. The codification used

makes advantage of this peculiar characteristic, leaving formalities behind.

4.3 The first idea is not the best

The more intuitive way of expressing a particular jobshop schedule seems to be

specifying the series of operations that each machine must perform. This is idea that

most of the mathematicians and engineers trying to solve the flexible jobshop have

used.

For example, the information could be stored in a matrix in which each one of the

rows corresponds to a machine. The columns then express the operations that the

machine will perform in their particular order. The notation O(3,1) expresses the 3rd job

in its 1st operation. The matrix provided is an example of three machines performing

three jobs, containing respectively two, two and three operations. All the operations

have to be present in the matrix to make it complete.

M1: O(1,2) O(2,1)

M2: O(2,2) O(1,1)

M3: O(3,1) O(3,2)

Table 6. Example of schedule oriented to machines


It is obvious that this information is sufficient to define a schedule. However, is it

necessary?

The main disadvantage of this system becomes clear when trying to carry out the

example below.

M1 starts by the 2nd operation of job 1, so it will have to wait until the 1st one is

finished. M2 starts by the 2nd operation of job 2, so it will also have to wait for the 1st

one to be performed. M3 performs the 1st operation of job 3 and continues with the 2nd

one. Then there is no choice for neither of the machines to keep on working.

We have a problem. There is a circular reference between jobs 1 and 2.

M1: O(1,2) O(2,1)

M2: O(2,2) O(1,1)

M3: O(3,1) O(3,2)

Table 7. Circular reference in a schedule oriented to machines

None of them can be started and the schedule is therefore unfeasible.

If this codification is used, then looking for infeasibilities of this kind would be

compulsory. Note that when the size of the matrix grows, this checking process

increases very fast. Provided that, in each row, the operations are in order (and that

seems easy to achieve), each operation has to be compared with every one that appears

after it to check that the one that precedes them (in the particular job they take part in)

is not placed after the one we were checking originally.

This process is complicated and time-consuming. It would be very useful to find a

way of making this constraint implicit.

4.4 Calling our magician. Introducing the codification used

As stated in4.3, giving more information than necessary is a source of infeasibilities.

In this case, we can reduce the amount of data to a minimum and make this constraint

disappear.


On the one hand, we will have a matrix that stores which machine will perform

each operation. An example that keeps the sizes of the first depicted case is included in

Table 8.

O1 O2

J1 M2 M1

J2 M1 M2

J3 M3 M3

Table 8. Table specifying the machines that will perform each operation

Then, it is necessary to specify the order in which these operations will be

performed. This point is the basis of the codification developed.

The main idea is to work only with jobs and not caring about operations. That way,

when the schedule refers to a starting operation of a certain job, the operation number

is redundant, as it is known that it must be the first one that has not been performed yet.

This avoids giving unnecessary information and eliminates the possibility of circular

references.

A solution could therefore be expressed as a job list, for example 1 3 1 2 2 3.That

means that the first operation to be performed will be the first one of job 1. The

machine in charge of it is M2. It is idle, so it can assume the work. The next one is job 3.

M3 is idle too, so it can start from the first moment. Then job 1 comes again. M1 is

waiting but will have to remain still until the first operation of the job is finished. The 2

afterwards means that the 1st operation of job 2 is waiting for M2 to finish O(1,1). When

it is complete, the 2nd operation of job 2 will be performed on M2, which should be free.

O(3,2) is performed on M3 just when O(3,1) is finished on the same machine. A Gantt

diagram is presented to depict the schedule in Figure 3.

As shown in this example, this codification completely eliminates the possibility of

circular references. Moreover, the constraints of the problem have been astonishingly

reduced. Now, every list of integers where each job number appears as many times as

operations compound it is a valid job list. Machine matrixes have to abide only the

characteristics of the machines, which can perform some operations but not the


remaining ones. Both controls are now very easy – as easy as some counts in the case of

the job list and simple checking in the machine matrix.

0 5 10 15 20 25 30 35 40 450.5

1

1.5

2

2.5

3

3.5

(1,1)

(1,2)

(2,1)

(2,2)

(3,1)

(3,2)

time(min)

mac

hine

Gantt Diagram

Figure 3. Example Gantt diagram

A small change is introduced. Instead of working with the machine matrix, it is

reordered and transformed into a second list of the machines corresponding to the

operations shown in the job list. In the case example, the machine list would be 2 3 1 1 2

3. Thus, every solution is encoded in two lists.

The process that converts each ensemble of a machine list and job list into a

comprehensible schedule is simple. The first step is to identify the operation number

that corresponds to each component of the job list. Then, a machine and a job schedule

are constructed having into account the latest operation scheduled for the job and for

the corresponding machine. Processing, change and transportation times are selected

from the data matrixes. The information is structured from two different perspectives:

in the form of a machine schedule (the operations it will perform with its starting and

finishing times) or a job schedule (with the machines on which each operation will be

performed, together with starting and finishing times).


An additional advantage of this codification is that it has a true meaning when

representing the schedule. That implies that similar job lists or machine lists will be

similar too, and have similar objective values as well. This is vital as it gives sense to

the neighbour search fundamental process. Moreover, combinations of two solutions do

have something to do with the original ones, but do not necessarily have to show

intermediate objective values –one of them could be better at the beginning of the

schedule and the other one at the end, so a combination of could outperform both-.

This makes a combination mechanism valuable. Taking these ideas into consideration, it

is evident that all four elemental optimization tools –provided that global search and

selection are useful in all situations- are valid to solve the problem if this codification is

used.

4.5 Our magician’s communication problems

Interpreting such a schedule is not easy. All the relevant data are presented and

ordered, but it is difficult –maybe not for some privileged minds- to get a quick idea of

how jobs are developed through time. That is to say: encoded solutions store only the

minimum necessary data, which have to follow a purely mechanical transformation

that involves the use of time data in order to become complete schedules. Nevertheless,

these schedules are not easily interpretable. For that reason, a graphical output is really

valuable.

A Gantt representation expresses the duration of activities as the length of the bars

in the diagram. A function which constructs this kind of graphs from the schedules has

been developed. The solution adopted to represent a flexible jobshop was to draw

several horizontal bars that will be identified with the available machines. Colours are

used to represent jobs and operations are labelled across the diagram. An example

with eight jobs is provided here to illustrate the function in Figure 4.

This graph can be easily understood. The performance of each job can be tracked by

simply following the corresponding colour. Idle times are easily localised and a rough

idea of work charge can be obtained by mere inspection.


0 20 40 60 80 1000.5

1

1.5

2

2.5

3

3.5

(1,1)

(1,2)

(1,3)

(2,1)

(2,2)

(2,3)

(2,4)

(3,1)

(3,2)

(4,1)

(4,2)

(4,3)

(5,1)

(5,2)

(5,3)

(5,4)

(6,1)

(6,2)

(6,3)

(7,1)

(8,1)

(8,2)

time(min)

mac

hine

Gantt Diagram

Figure 4: A Gantt diagram is understandable by mere inspection

Such a tool is an indispensable help during the process of developing the algorithm

and checking for the reasonability of the solutions provided.

4.6 Conclusions

The codification used is based on the use of two kinds of lists: job lists and machine

lists. It has the peculiar advance of completely eliminating the possibility of encoding

unfeasible solutions. Thus, it enables simpler and quicker algorithms to be

implemented.

It is easily translated into a schedule, thus enabling a very easy calculation of

objective function.

It appears to reflect the true nature of solutions and gives sense to the four

elemental optimization mechanisms –global search, selection, neighbour search and

combination-, so all them can be applied in the final algorithm.

As schedules are not easy to interpret, a function that returns a graphical output in

the form of a Gantt diagram has been developed. This enables to notice interesting

characteristics of the schedule by mere inspection.

5Data generation.

Some practical issues


Data generation. Some practical issues. 49

5 Data generation. Some practical issues

With the aim of making further algorithm explanations clearer, the data used to

define the particularities of each jobshop problem is introduced. This data structure has

been created with the intention of easing the calculations and taking advantage of the

special features of the codification proposed.

5.1 Basic parameters

The scalars that define the size of the problem are:

n :number of jobs to be scheduled.

o :a vector that stores the number of operations that conform each job in its

n components

m : number of available machines.

5.2 Time related parameters

iij kPT : Processing time of job i in its j -th operation when performed on

machine k .

0iij kST : Setup time when starting to perform job i in its j -th operation on

machine k if it is the first task assigned to the machine.

i xij xy kST : Set up time when changing, on machine k , to j -th operation of job i

from y -th operation of job x .

ijkP : Binary variable that takes the value ‘1’ if and only if the j -th operation of

job i can be performed on machine k . The processing time is recorded as ‘0’ when

the machine is not able to perform the operation.

iijPList : A vector containing the identification of all the machines in which j -th

operation of job i can be performed. It represents just a different way of

expressing ijkP , but results especially useful to work with.

iij klTT : Transportation time for job i in its j -th operation when this one will be

performed on machine k and its previous operation was done on machine l .


5.3 Complexities of a generalised jobshop

The problem that this project aims to solve is a very general version of the jobshop

(works do not usually deal with setup, change or transportation times at once, and the

flexible jobshop corresponds to the more complex machine configuration).

This generality leads to the need of managing complex data.

For instance, the size of iij klTT matrix in the case of a problem with 8 jobs of 5

operations each and 4 machines is 8·5·8·5·4 = 6400 scalars.

Obviously, it is not possible to introduce manually that amount of data to get

valuable information about the algorithm performance. Consequently, a simple tool

that generates coherent data has been developed.

5.4 Generating random jobshop data

The data generated must be coherent with the definitions of n ,o and m . To adjust

the level of redundancy among the machines –that is, how many machines will be on

average capable to perform each activity- a value of SparcityParameter is defined,

where 0 would correspond to a non-flexible jobshop and 1 to having machines that are

all able to perform all the operations.

Just to give a rough idea of the order of magnitude of the time length of the

operations, a TimeParameter is also introduced. The generated time values come from

a normal distribution with mean TimeParameter and standard deviation

/ 3TimeParameter . As these values have to be positive, the simulation is repeated if a

negative value is generated.

5.4.1 Processing time

An empty max( )n o m× × matrix is generated. Iteratively, for each job, operation and

machine, random numbers are generated from a uniform [0,1] distribution. If that

random value does not exceed SparcityParameter , a random normal time is included

in the matrix. If after that process there is some operation that has not been assigned to

any machine, a random one is selected to fill that gap and make the schedule possible.


iijPList is build from iij kPT by searching the possible machines in which an operation

can be performed and listing them in an ordered way.

5.4.2 Change, transportation and Start-up times

They are generated from iij kPT , taking into account the different possible

alternatives and scaling values that fix their importance compared to processing times.

5.5 Making this structure useful to work with real data

As stated below, the generality of the studied problem has the disadvantage of

needing important amounts of data. Measuring all this different possibilities is not

economically viable in a real shop.

Thus, in a real case study, this time simulation tool is also useful. Transportation,

change or start-up times can be neglected if the peculiar characteristics of the case

allow that simplification (just setting their TimeParameter to 0). Or some of the times

can be simulated taking as a starting point a smaller set of times actually measured,

assuming they will be similar in the same way bootstrapping techniques work. In order

to do that, TimeParameter should be adjusted for each case.

6Brief considerations about

problem complexity


Brief considerations about Jobshop problem complexity 53

6 Brief considerations about Jobshop problem complexity

6.1 Introduction

As stated in the introduction section, the main difficulty that the J.S.P. resolution

poses is the combinatorial explosion of possible solutions. If the problem was to be

solved by exhaustive enumeration and evaluation, then the computation time will be

linked to the total number of solutions and the time spent evaluating each one of them.

This calculation was not included in the introduction. The reason for not doing it is

that this value depends on what is considered a solution, which is linked to the

codification used. Two different systems have been used in this project –the one in the

linear programming model and the one proposed for the algorithms- so their

implications on the problem complexity will be analysed separately.

6.2 Linear programming model

In the L.P. model, each solution can be defined by its decision variables, expressed

as:

iij klM : Binary variable that takes the value ‘1’ if and only if the j -th operation of job

i is performed on machine k on l -th place.

The indexes will have to take the values:

1,...,1,...,/ 1

1,...,

i

ijk

k

i nj ok k Pl p

=== =

=

Where:

n : number of jobs to be scheduled.

io : operations that compose the job i .

m : total number of available machines.


kp : maximum number of operations that can be performed on machine k .It is

calculated as k ijkij

p P=∑ and ijkP is a binary parameter that takes the value ‘1’ if

and only if the j -th operation of job i can be performed on machine k .

Then, the total amount of possible combinations is obtained as:

1 1 / 1 ,2

j oi n n

abki j k P a bijk

P

N

==

= = =∑ ∑ ∑ ∑

= (6-1)

Computation time is the product of the total number of solutions and the time

evaluating each one, that is a linear function of ii

o∑ (see ‘From Code to Schedule’), so its

plots will be just shifted in a logarithmic scale. This effect will not be included.

The results for a jobshop with total flexibility in 3 machines and composed by jobs

with 4 operations each are shown in Figure 5.

100

101

100

1050

10100

10150

10200

10250 LP model complexity

Number of jobs

Num

ber

of p

ossi

ble

solu

tions

Figure 5. LP model complexity

6.3 Heuristic algorithm complexity

As explained in ‘Designing a Codification. Importance of the basis’, the codification

system proposed has the advantage of reducing the space of solutions to the feasible

set. Although combinatorial explosion is also present, computation time will be lower

for all problem sizes.


A solution is encoded in the form of a job and a machine list. The first one is a

permutation of numbers from 1 to n where each number i appears exactly io times.

Thus, considering just this half of the solution the total number of possibilities is equal

to the permutations with repetition expressed as:

!

!

ii

ii

oN

o

⎛ ⎞⎜ ⎟⎝ ⎠=∑∏

(6-2)

As the corresponding number on the machine list can take ijkk

P∑ values, the total

amount of possible solutions is obtained as:

!·

!

ii

ijkij ki

i

oN P

o

⎛ ⎞⎜ ⎟⎝ ⎠=∑

∏∑∏ (6-3)

The results for the total amount of solutions are shown in Figure 6.

100

101

100

1020

1040

1060

1080

10100 Heuristic model complexity

Number of jobs

Num

ber

of p

ossi

ble

solu

tions

Figure 6. Heuristic model complexity

6.4 Compared problem complexity. Conclusions

When compared, it becomes clear that the codification used mitigates problem

complexity in some way.

Moreover, as stated in ‘Designing a Codification. Importance of the basis’, no checking

process is necessary in this case, so the constant k that links problem complexity in


number of solutions and in computation time is lower. This effect is not considered in

the plots.

100

101

100

1050

10100

10150

10200

10250 Compared complexity

Number of jobs

Num

ber

of p

ossi

ble

solu

tions

LP modelHeuristic model

Figure 7. Compared complexity

The Flexible Jobshop problem is one of the most complex NP-problems in terms of

total amount of solutions. This is the main obstacle that prevents classical optimization

methods from solving it successfully. The heuristic algorithms proposed try to

overcome this difficulty but, in addition, reduce considerably the number of possible

solutions and the computation effort needed to evaluate them.

7A simulated annealing algorithm

for solving the Jobshop problem


A Simulated Annealing algorithm applied to the Jobshop problem 58

7 A Simulated Annealing algorithm applied to the Jobshop

problem

One of the metaheuristical

algorithms that have been

implemented in this project is

Simulated Annealing (S.A.). As

stated in the introduction to

this method, it is inspired in the

metallurgical processes that

underlie this thermal

treatment. It consists in rising

the metal temperature to the

fusion point and then allowing

it to get cold again. The properties of the solid obtained will highly depend on the

speed of this cooling. If this happens very fast (the tempering process), the formed

crystals will contain a relatively high number of imperfections. On the contrary, if the

process is slow enough, the mineral can shape into big perfect crystals, the ones that

achieve an energy minimum.

The fundamentals of this optimization method have already been reviewed, so they

will not be presented again. The characteristics of the created algorithm will be

introduced in next sections, highlighting its most interesting features and the results

that it returns when applied to the selected problem.

7.1 The basis. Looking for a good neighbour first

The proposed S.A. algorithm implements a combination of the mechanism created

by Kirkpatrick and a pure random neighbourhood search.

As described in the introduction section, S.A. relies on a thermodynamical equation,

·( ; ) 1 k tP E t e∂

−∆ ≤ ∂ = − (7-1)

where parameter t is usually interpreted as the system temperature.


The probability of a big energy increment is very low, and can be described through

an exponential distribution function. Therefore, a possible pseudo-code expression for

this classical interpretation of this metaheuristic could be:







Else


If Etr e∆



Nevertheless, the algorithm created in this project performs a random search

around the point, evaluating a certain number of neighbours and moving to them if

their fitness exceeds the local value. Only when the selected neighbours have been

searched, the algorithm turns to the classical S.A. thermodynamical equation to escape

from that local optimum.

The resulting pseudo-code could be:





For (Number of neighbours to be searched)

Select another state from the neighbourhood of the current one 0( )s N s∈

Evaluate the energy increment 0( ) ( )E E s E s∆ = −

If 0E∆ < then

0s s=

break




Else


If Etr e∆



The combination of both mechanisms has proved to be more efficient, as the

number of up-hill movements is minimized.

The main concepts of the algorithm will be introduced in the next sections.


7.2 Generating a starting point

The codification used easies the generation of random solutions to the problem.

They are encoded in two lists:

A job list that stores numbers from 1 to n repeated jo times each, their first

appearance corresponding to their first operation and so on.

A machine list that shows the machines that will perform the operations on the job

list.

Thus, the structure of a job list is just a permutation with repetition of the numbers

from 1 to n repeated jo times each. The algorithm builds a vector composed of 1

repeated 1o times, 2 repeated 2o times and so on. Then, a random permutation of them

is selected.

Based on the generated job list, machines are randomly selected from iijPList to

perform the operations.

This gives origin to a valid solution to the problem. The codification structure

makes any verification unnecessary. Thus, in such a simple way, a starting point for

the algorithm has been achieved. It is important to notice that LP methods spend a

quite important amount of time just looking for feasible solutions, so this represents a

considerable effort saving.

After generating the solution, its fitness is evaluated and stored.

7.3 The importance of being neighbours

One of the most difficult decisions when implementing a S.A. algorithm is deciding

the definition of neighbour that will be used, as it will constitute the basis for the good-

neighbour search and the thermodynamical jumps. That definition should be coherent

with the codification chosen and represent similar solutions not only in code but in

nature (see section 4).

The election made is related to the halves of the codification: job and machine lists.


7.3.1 Machine list minimum movement

The minimum movement in the machine list is produced changing just one of the

machines to another one that is also capable of performing that operation. That seems

intuitive and simple. An example that illustrates the similar nature of two solutions that

differ in a minimum movement in their machine lists is shown in Figure 8 and Figure 9.

0 10 20 30 40 50 60 70 80 900.5

1

1.5

2

2.5

3

3.5

(1,1)

(1,2)

(1,3)

(2,1)

(2,2)

(2,3)

(2,4)

(3,1)

(3,2)

(4,1)

(4,2)

(4,3)

(5,1)

(5,2)

(5,3)

(5,4)

(6,1)

(6,2)

(6,3)

(7,1)

(8,1)

(8,2)

time(min)

mac

hine

Gantt Diagram

Figure 8. Representation of a certain solution

0 20 40 60 80 1000.5

1

1.5

2

2.5

3

3.5

(1,1)

(1,2)

(1,3)

(2,1)

(2,2)

(2,3)

(2,4)

(3,1)

(3,2)

(4,1)

(4,2)

(4,3)

(5,1)

(5,2)

(5,3)

(5,4)

(6,1)

(6,2)

(6,3)

(7,1)

(8,1)

(8,2)

time(min)

mac

hine

Gantt Diagram

Figure 9. The same solution after a machine list movement


Both schedules are very similar, and it seems that a change like that could be

considered minimum.

7.3.2 Job list movement

In this case, every movement can be thought as a permutation. It is not obvious to

decide the kind of permutation that can be allowed (i.e. a block permutation or a single

operation permutation). This project uses the second definition, and gets a definition of

neighbour alternative to the machine one.

For example, the same schedule below could be modified changing the priority

between the first and third operations.

0 10 20 30 40 50 60 70 80 900.5

1

1.5

2

2.5

3

3.5

(1,1)

(1,2)

(1,3)

(2,1)

(2,2)

(2,3)

(2,4)

(3,1)

(3,2)

(4,1)

(4,2)

(4,3)

(5,1)

(5,2)

(5,3)

(5,4)

(6,1)

(6,2)

(6,3)

(7,1)

(8,1)

(8,2)

time(min)

mac

hine

Gantt Diagram

Figure 10. Initial solution

The modification introduced exchanges in priority the first operations of jobs 6 and

7, returning the result represented in Figure 11.


0 10 20 30 40 50 60 70 80 900.5

1

1.5

2

2.5

3

3.5

(1,1)

(1,2)

(1,3)

(2,1)

(2,2)

(2,3)

(2,4)

(3,1)

(3,2)

(4,1)

(4,2)

(4,3)

(5,1)

(5,2)

(5,3)

(5,4)

(6,1)

(6,2)

(6,3)

(7,1)

(8,1)

(8,2)

time(min)

mac

hine

Gantt Diagram

Figure 11. The same solution after a job list movement

7.3.3 Combining both movements

Considering that both movements are equally important, when searching for a

random neighbour there is the same probability of moving away in each of the two

directions.

Thus, the process for obtaining a neighbour is selecting a direction randomly –

machine or job – and then performing a simple movement.

7.3.4 The key factor

As stated in ‘Designing a Codification. Importance of the basis’, choosing the kind of

algorithm to solve a given problem should be a sound decision based on its own

characteristics and the codification used related to the basic optimization mechanisms –

global search, neighbour search, selection and combination-.

In the particular case of S.A., the first solution is obtained by global search and the

remaining of the algorithm performance is based on neighbour search. Consequently, it

is extremely important to choose a codification system and a method for obtaining

neighbours that responds to the real characteristics of the problem. Neighbours should


be different, but very similar in nature, so the distances between solutions are

minimum.

Actually, the definition of distance underlies the one of neighbour, as the

neighbourhood of a solution should correspond to the set of points immediately close

to the one considered. Moreover, that means that it should be possible to obtain any

solution from any other just performing a finite set of movements from neighbour to

neighbour. The algorithm developed works with a definition of distance as the

minimum job and machine list changes that have to be performed in order to transform

one schedule to another. The large amount of possible combinations makes this

calculation computationally expensive, but it is not necessary for the performance of

the algorithm. The only thing that is needed is to create neighbours and the process is

fairly fast and simple, by the mechanism described.

The so-obtained neighbours respond to these characteristics and play the basic role

of the algorithm.

7.3.5 Advantages of the neighbour creating system compared to classical alternatives

It can seem that the distance and neighbour definitions developed are simple and

intuitive. Surprisingly, simplicity and intuitiveness are not always considered as a

main goal by algorithm designers.

For instance, most heuristics work with a binary codification. There are ready-to-use

tools available to implement the algorithm saving time and effort. If the codification

system –based in two lists- was to be maintained but translated into binary, the

commonly used definition of neighbour would be changing one 0 to 1 or vice versa in

any of the lists. The implicit distance definition would be the Hamming distance3.

That seems equally straightforward at first sight, but presents the particular

disadvantage of spoiling some of the most positive characteristics of the codification

system developed. For example, a jobshop constituted by three jobs with two

operations each and only one machine (then only a job list is necessary) could have as a

possible solution the following list:

3 Hamming distance computes the summation of the elements of a binary chain that are different. E.g.,

the distance between 1 0 0 1 and 1 1 0 0 would be 0+1+0+1 = 2.


1 2 2 1 3 3 →01 10 10 01 01 11 11

A simple movement could change the first 0 into a 1:

11 10 10 01 01 11 11 →3 2 2 1 3 3?

his is not a feasible solution, as the second operation of job 1 is not performed and

job 3 appears once more than needed. Obviously, changing a 1 for a 0 and a 0 for a 1

does not improve the situation:

11 10 10 00 01 11 11 →3 2 2 0 3 3?

Thus, a complex checking process should be done after a modification in order to

ensure that the solution obtained abides the constraints of the problem. The definition

of neighbour is therefore not useful.

In another case, if the codification used was based on machine schedules –the one

that has been applied classically to this problem-, a possible solution is depicted in

Table 9:

M1: O(2,1) O(1,2)

M2: O(2,2) O(1,1)

M3: O(3,1) O(3,2)

Table 9. Example of codification based on machine schedules

and the intuitive definition of neighbour would be moving one operation to a

machine that actually is able to perform it, or exchanging two of them. For instance, as

expressed in Table 10.


M1: O(1,2) O(2,1)

M2: O(2,2) O(1,1)

M3: O(3,1) O(3,2)

Table 10. Neighbour solution

¿What is the problem with this definition? Firstly, the exchange below has created a

circular reference between jobs 1 and 2 (see section 4). A complex checking process is

necessary here too in order to avoid this kind of infeasibilities. It has also the

disadvantage of returning distant relatives. The so-obtained neighbours can be very

different –and this is intuitive, as the process can imply movements in both directions

and not simple ones (as operations can be exchanged between two different machines

and that would mean a movement of two units in the job list and other two in the

machine list direction according to the selected definition).

This implies that the mechanism of neighbour search looses some of its sense

(neighbours can be not very similar in nature), so it becomes closer to a global search

element. This worsens algorithm efficiency and is a threat that should always be

considered when designing this kind of heuristics.

7.4 Too many doors to knock

As highlighted below, looking for better solutions in the neighbourhood before

allowing up-hill movements enhances algorithm efficiency.

Computational explosion that poses the main difficulty to the Jobshop problem

resolution appears also in this neighbour calculation.

In an example with n jobs composed of io operations each, in which each operation

j of job i can be performed on ijm machines, the number of neighbours of a certain

point is given by the expression:

,

2· ij

i j

mc⎛ ⎞⎜ ⎟⎝ ⎠

∏ (7-2)


where ii

c o= ∑ .

The results for a jobshop with total flexibility in 3 machines and composed by jobs

with 4 operations each are shown in Figure 12 and Figure 13.

0 2 4 6 8 100

1

2

3

4

5

6

7

8

9

10x 10

21 Neighbourhood size

Number of jobs

Figure 12. Neighbourhood size in linear scale

100

101

100

105

1010

1015

1020

1025 Neighbourhood size

Number of jobs

Figure 13. Neighbourhood size in logarithmic scale

There are too many doors to knock, so only a random selection of neighbours will

be checked. The election of this parameter fixes a degree of directionality in the search.

If too many neighbours are checked the algorithm will loose efficiency. However, if

only a few of them are analysed there could be also a loss of efficiency because the

opportunity of using interesting information –and its directionality possibilities- is

discarded.


7.5 Thermodynamical jumps. Choosing a cooling plan

If a better solution was found among the set of searched neighbours that kind of

search will be restarted from the new solution. Only if no improvement was

successfully achieved, the mechanism of thermodynamical jumps, the one which

characterises S.A., is set in motion.

In it, the possibility of up-hill movements is open, so the algorithm does not get

trapped in local optima.

The expression that returns the probability of accepting such a worsening of the

solution is given by the Boltzmann equation:

·( ; ) 1 k tP E t e∂

−∆ ≤ ∂ = − (7-3)

Temperature is the key variable in this expression.

7.5.1 Cooling plans state of the art

Several functions to describe the cooling plan have been developed. The most

widely used ones, already mentioned in the introduction section, are described below:

Geometrical rule:

, 1, 11t ct c cii = ≈ <+ (7-4)

Hajek plan:

( )log 1ctk k

=+

(7-5)

This plan is based on the concept of local optimum deepness, as the increment in the

cost function necessary to escape from it to other optimum state. c is at least as

large as the maximum deepness and k is the number of iterations to be done.

Despite its theoretical perfection, it does not have a good practical performance.

Aarts&Korst plan:

( )( )( )1 ln 1 /3

ttt t

αδ σ

=+ +

(7-6)

It takes into consideration the standard deviation of the solutions obtained at each

temperature.


Lundy &Mees plan:

( ) 1tt

tα

β=

+ (7-7)

with relatively small values for β .They suggest the termination criterion to be

established as

( )log 1 /

tS pε

⎡ ⎤⎣ ⎦≤

− (7-8)

were S is defined as the size of the population. This rule returns a solution that

differs from the global optimum in less than ε with a probability p .

In [JOHNSON, 1989], adaptive cooling schedules are tested. These plans take into

account the fact that time spent by the algorithm at very high or low temperatures

seems to be unproductive. Based on a physical analogy, Kirkpatrick, Vecchi and Gelatt

argued that more time is needed between these temperatures, when the cost is

dropping rapidly, in order to achieve equilibrium. Johnson and his colleagues found

that the effect on solutions and the timing of this schedule was indistinguishable from

a geometric schedule of some determined parameters.

Other ideas that appeared in research, such as using better-than-random starting

solutions and lower-than-normal starting temperatures, have not been successful.

7.5.2 Industrial considerations. Interesting ideas about the causes of the success and failure of

the described cooling plans

If the starting point for all metaheuristics was turning to nature and the basis for

S.A. is the real metallurgical process of annealing, it seems a good idea to study the

temperature patterns that industrial annealing applications follow.

Annealing is applied to metals and alloys like steel and copper to enhance ductility

and suppressing residual tensions. Consulting one of the most prestigious metallurgy

publications (The American Manufacturing Society Handbook), one can discover that,

although temperature evolution has been deeply studied for the tempering process,

there is virtually no research for the annealing case. It is assumed that just leaving the

product to cool in the air or inside the oven is enough.


The ‘cooling plan’ for such a simple process, assuming that the cooling takes place

sufficiently slowly and the temperature is uniform across the object would correspond

to the differential equation:

· · ·( ( ) ( ))oo e

Q TC h S T t T tt t

∂ ∂= = −

∂ ∂ (7-9)

Where Q represents evacuated heat, C the object specific heat, h is the convection

coefficient, S is the object surface, and oT , eT correspond to object and environment

temperatures respectively.

This pattern can be described as a first order transitory process, depicted in Figure

14.

0 2 4 6 8 100

50

100

150

200

250

300

350

400

450

500Natural Convection Cooling

time

Tem

pera

ture

Figure 14. Temperature evolution followed by natural convection processes

The simplest and more intuitive cooling schedule should correspond to this pattern,

so imitating it would be worth trying.

A cooling plan that tried to capture the essence of this process should make discrete

the expression:

· · ·( ( ) ( ))oo e

Q TC h S T t T tt t

∂ ∂= = −

∂ ∂ (7-10)

The result comes in the form of:

1 · ·(1 ) ·t t t t tT T k T T k c T+ = − = − = (7-11)


where 1c < .

The values for temperature turn out to be, in the first case:

( ) ( )·t

o e o eT t T T T e τ−

= + − (7-12)

and, in the discrete case:

1( )t e

t tT TT Tτ+−

= − (7-13)

It can be noticed that both plans are identical just by mere inspection of Figure 15.

0 200 400 600 800 10000

50

100

150

200

250

300

350

400

450

500Natural Convection Cooling

iteration

Tem

pera

ture

First order transitoryGeometrical rule

Figure 15. Comparation of the first order transitory and the geometrical rule

Revising the existing research on cooling plans, it seems that plans more

sophisticated than the geometrical rule do not give origin to better performances. This

fact could be related to the metallurgical industry belief that states that controlling

temperature and forcing its evolution to differ from the natural convection pattern

would not improve the results of the annealing process.

7.5.3 Fixing the plan defining parameters

The initial value for temperature should be high enough to allow all up-hill

movements to take place. A good idea to fix this starting point is considering that an

up-hill jump of 100% the initial fitness value should be allowed with a 99% probability,

for example.


( ; ) 1 TP E t e∂

−∆ ≤ ∂ = − (7-14)

0.99 1 o

InitialFitnessTe

−

= − (7-15)

(0.01)o

InitialFitnessTLN

−= (7-16)

The value of c should be calculated assuming that the final value of the

temperature is 0, so 0eT = .

1( ) 1· 1t e

t t tT TT T Tτ τ+− ⎛ ⎞= − = −⎜ ⎟

⎝ ⎠ (7-17)

By analogy:

11cτ

= − (7-18)

and, if the proprieties of first order transitory processes are taken into account,

τ should be fixed noting that the time in which temperature will have descended some

99.75% is 6·τ . Thus, this parameter will control the number of iterations that the plan

will take to be completed.

7.5.4 Dynamic cooling plan. A different proposal

Geometrical rule is simple and gives good results. It fixes temperature and not jump

acceptance probability. The proposed alternative plan does exactly the opposite,

adapting dynamically temperature to match the desired up-hill movement probability.

From the definition of the probability of acceptance of an up-hill movement of a

given size relative to the fitness of the latest accepted solution, it computes

temperature. That probability is updated in a geometrical fashion, decreasing it in the

same way that the geometrical plan developed controls its duration with the aid ofτ .

This plan is not as intuitive as the simple geometrical rule, and in many cases it is

less effective than it. For example, solving a jobshop with 10 jobs composed by 3,4 ,2, 3,

4, 3, 10, 5, 6 and 2 operations respectively on 3 machines, the results obtained were as

represented in Figure 16 and Figure 17.


0 200 400 600 800 100090

100

110

120

130

140

150

160

170Comparative performance

Aceptable geometrical rulePoor geometrical ruleDynamical plan

Figure 16. Algorithm performance with different cooling plans

0 200 400 600 800 10000

5

10

15

20

25

30

35Resulting cooling plans

Aceptable geometrical rulePoor geometrical ruleDynamical plan

Figure 17. Temperature evolution with different cooling plans

It is important to note that the temperature result of these plans will differ from

their theoretical evolution, given that the plan is frozen while the neighbour search is

performed, flattening the temperature plot. The more neighbour search is done, the

more distortions are introduced to the plan. They have also an impact on computing

time, increasing the number of inspected and visited solutions (that can be seen in the


plots below, as they show algorithm performance and temperature evolution versus

solutions accepted).

Although under some circumstances, with a good parameter tuning, geometrical

rule can give the best results in efficiency and effectiveness, if the parameters are not

well chosen, the result underperforms the dynamical plan. The values suggested below

work adequately in the majority of cases, but some others exist in which the parameter

tuning is difficult.

Nevertheless, the dynamical plan has proved to be quite robust, as it controls

directly the probability of a certain jump. Some values that have been adequate

recurrently are:

( 0.1; 0) 0.25P E t∆ ≤ = = (7-19)

The initial probability of accepting an up-hill movement of 10% is 25%.

( 0.1; 1) · ( 0.1; )P E t c P E t∆ ≤ + = ∆ ≤ (7-20)

1 11 1

8

c NIterationsτ= − = − (7-21)

The probability evolves following a geometrical rule.

As a conclusion, it can be said that the simplest geometrical rule can have the best

performance, but there are other plans –like the alternative dynamical plan proposed

in this project- that can be more robust when facing changing problem size or

maximum allowed iterations.

7.6 Computation time vs. Problem Size

The fact that the starting point to design a cooling schedule is the maximum allowed

number of iterations implies that a simple analysis of computation time vs. problem

size cannot be performed easily, as actual time will depend on the initial assumption of

its value. Instead of that, a sound statistical analysis of variance that takes into account

not only problem size but also this initial assumption should be carried out. The latest

lies beyond the scope of this project. Nevertheless, the relationship between problem

size and time consumed per iteration is shown in Figure 18. Some models have been fit

to this data and represented in Figure 19. The results of this fits are shown in Table 11.


Fit Sum of the squared error Adjusted R-Square

Linear polynomial 0.057 0.866

Square polynomial 0.0192 0.956

9-Degree polynomial 1.04 0.999

Exponential 0.0159 0.963

Table 11. Results of the curve fitting to computation time per iteration

0 5 10 15 20 25 30 35 400.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Computation time/Iteration vs. Problem size

size (number of jobs)

time

(s)

Figure 18. Computation time/Iteration vs. Problem size


5 10 15 20 25 30 35 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Curve fitting to Computation time vs. Problem size

y vs. xLinear polynomialQuadratic polynomial3-Degree polynomialExponential

Figure 19. Curve fitting to Computation time per iteration vs. Problem size

It can be seen that computation time per iteration can be expressed as a polynomial

function of problem size.

7.7 Conclusions

Simulated Annealing can be applied to the Jobshop problem with good results. In it,

two different elemental mechanisms appear:

The first solution is provided by a global search method.

The movements according to the Boltzmann equation are an expression of

neighbour search.

The developed algorithm performs that neighbour search in two different ways:

Searching a random set of neighbours looking for better solutions. The size of

this set represents a very important parameter to the algorithm performance.

Up-hill movements are allowed by Boltzmann processes, which characterize S.A.

algorithms.

The definition of neighbourhood has to be consistent with the problem nature and

the codification used. It is essential to find a process to generate neighbours that are at


a minimum distance from the initial point –with all this implies- at a low

computational cost. This process is described.

The evolution of temperature is the other key aspect of the algorithm. The published

research seems to conclude that very sophisticated rules do not give good results.

Moreover, there are less intuitive and more difficult to control, and usually a

geometrical rule of certain parameters gives the same results.

Turning back to the industrial applications of annealing, it can be seem that

manufacturing engineers do not care about temperature evolution in this case, leaving

the metal to cool down in the open air or inside the oven. If the cooling takes

sufficiently long, the pattern followed by temperature can be assimilated to a first

order transitory process that when made discrete corresponds to a geometrical rule.

First order transitory processes characteristics can be applied to obtain the rules for

setting the parameters that were deduced below. They have proved to be adequate in

most cases, but the performance of the algorithm highly depends on the parameters

and their tuning is difficult under certain circumstances.

A dynamical plan, which fixes the probability of acceptance of an up-hill movement

of a given height and makes it evolve in a geometrical fashion, is more robust when

facing changes. This proposed plan can be outperformed by a geometrical rule of

certain parameters, but given that the parameter tuning is difficult sometimes that

robustness is a very interesting characteristic.

8A genetic algorithm for the

Jobshop problem


A Genetic Algorithm for the Jobshop Problem 80

8 A Genetic Algorithm for the Jobshop Problem

As described in section 2.2, Genetic Algorithms (G.A.) imitate the mechanisms of

evolution to solve complex optimization problems.

This project applies the theory of these methods together with new ideas that will be

highlighted to the Jobshop successfully, achieving acceptable solutions in both

efficiency and effectiveness terms.

The main elements of the developed algorithm will be presented in the next

sections, explaining their most important features.

8.1 Creating the first generation

8.1.1 Originating individuals

The first generation of individuals represents the initial point for the algorithm. The

solutions are generated through a global search method (see ‘Heuristic methods from a

holistic point of view’), that is essentially the same as in the S.A. algorithm.

The codification system developed turns out to be very useful for the generation of

random solutions to the problem. As explained in ‘Designing a codification. Importance of

the basis’, each individual is encoded in two lists:


A joblist that stores numbers from 1 to n repeated jo times each, their first

appearance corresponding to their first operation and so on.

A machinelist that shows the machines that will perform the operations on the

joblist.

A valid joblist is just a permutation with repetition of the numbers from 1 to n

repeated jo times each. The algorithm builds a vector composed of 1 repeated 1o times,

2 repeated 2o times and so on. Then, a random permutation of them is performed.

Based on the generated joblist, machines are randomly selected from iijPList to

perform the operations.

This process produces a valid solution to the problem. The codification structure

makes any verification unnecessary. Therefore, a starting point for the algorithm has

been obtained in a simple way. It is important to notice that LP methods spend a quite

important amount of time just looking for feasible solutions, so this represents a

considerable effort saving.

8.1.2 Diversity is the key

When a population of a given species is small, it is more probable that changes in

the environment have a negative impact on it, leading it even to extinction. The

explanation for this phenomenon is that there probably are not enough different

individuals to have a considerable set of them that is resistant to these changes.

As in biological systems, it is important to obtain a population that is sufficiently

diverse in order to guarantee the correct performance of genetic mechanisms. That

diversity is linked not only to the randomness of the solution generating process –that

is guaranteed by the design of the mechanism as explained above- but to population

size.

The time spend on each iteration of the algorithm is a linear function of population

size. Thus, increasing it can lead to a loss of efficiency.

Nevertheless, too small populations imply poor diversity, which generates

convergence to sub-optimal points or –if diversity enhancement methods are used-

slow convergence –with the subsequent efficiency loss-.


The time spent on this G.A. can be estimated as:

· ·t k S N c= +

where S represents population size and N the number of iterations performed, as

all the operations that compose the algorithm are applied to each member of the

population. This fact was proved by measuring computation time for different

population sizes.

0 5 10 15 20 25 30 350

5

10

15

20

25Computation time vs PopulationSize

Population Size [individuals]

Com

puta

tion

time

for

100

gene

ratio

ns [s

]

Figure 20 Computation time vs. Population size

A representative plot of the performance of the algorithm is included in Figure 21.

Through this entire chapter, the same example case will be solved with different

parameter tunings. This case is composed by 3 machines and 10 jobs containing 3, 4, 5,

6, 1, 2, 4, 4, 3 and 1 operations respectively.

The performance tests applied show that growing population sizes decrease the

probability of getting trapped in local optima, as they act as a guarantee to preserve

diversity. Computation time for each generation is inevitably higher, but the results

can be better.

Of course, this is not a direct relationship, as randomness plays a vital role in both

the algorithm performance and the initial population creation. In the plot below, it is


possible to observe that, in some cases, this effect does not appear. This is due to the

consequences of that random construction of the initial population and subsequent

generations, so it can be concluded that a higher population size allows avoiding local

optima and speeds up convergence at the same time, decreasing the dependence of the

final result on the initial conditions.

0 20 40 60 80 100140

150

160

170

180

190

200

210Performance vs Population Size

Iterations

Tota

l mak

espa

n

351115212531

Figure 21. Performance vs. Population size

Thus, when having decided the maximum computation time for the algorithm, the

population size should be chosen to be the maximum value that maintains a prudent

number of iterations. The tests performed suggest that acceptable values for iterations

number are 80-100, independently of the size of the problem.

8.2 Survival of the fittest.

8.2.1 Natural selection

The main idea underlying Evolution Theory is that natural selection makes the

worse individuals disappear. The best will constitute the basis for the following

generation.

This principle of survival of the fittest has to be introduced in the genetic algorithm.

There are two main methods to replicate this mechanism:


The first one, known as the roulette wheel, is based on the simple fitness

proportional rule. This stipulates that the expected number of copies of each

individual in the next generation is proportional to its fitness. More specifically,

if the population has a size of N members, the selection process will be carried

out by building a roulette wheel with N slots. Each slot corresponds to a member

and its angular width is proportional to the fitness of that individual. The wheel

is spun and fixed pointers select the members that will generate the next

generation.

In the second one, known as the tournament, a random permutation of the

existing solutions is obtained. They are then split up into a definite number of

groups and the fittest member of each group is selected.

The expected result is the same for both methods. Nevertheless, it seems that the

first one is more efficient in selecting the best individuals, as randomness is introduced

only once in the process, while in the tournament method it appears twice. For this

reason, the roulette wheel method was chosen.

A goal is built, making its N slots proportional to the relative fitness of each

individual. Then, some darts are thrown randomly, selecting the individuals that will

be used as parents for the following generation.

However, tournament method is not completely discarded, as it is included in the

immigration process that will be described in 8.4.

8.2.2 Never taking a step back

When implementing a S.A. algorithm, it is easy to see that its main advantage when

compared to non-heuristic methods –allowing up-hill movements- is also a

shortcoming, as the best solution visited can be lost in the process (of course, it can be

stored, but the ways it could lead to do not keep open).

The G.A. developed in this project eliminates this disadvantage –and therefore

improves both efficiency and effectiveness- by guaranteeing that the fittest individual

in each generation – the queen – does not disappear. It is identified and passed through

without any change to the following generation, somehow extending its life. This

individual, the best one in fitness terms, is also used in the asexual reproduction

process, imitating the role played by the queens of social insects.


8.3 Mating season

8.3.1 Finding a partner

Once the individuals that will constitute the parents for the following generation –

the ones that have survived- have been selected, it is mating time. They look for a

partner randomly, by forming a random queue and forming a couple with the

individual that is closer to them.

Each pair of solutions will combine to form two children that will have similar

characteristics to them, but whose fitness cannot always be expressed as a convex

linear combination of their parents (see ‘Heuristic methods from a holistic point of view’).

8.3.2 Breeding process

The breeding operators described in section 2.2.1.5 are not valid, as the codification

used is completely different, and it is essential to find a mechanism that does not

generate infeasible solutions –monsters-.

This process is based in the codification structure, that is to say, in the job and

machine lists.

8.3.2.1 Machine list breeding

From both lists, a matrix that stores which machine performs each one of the

operations is built for each parent. After that, four random points are generated. The

submatrix they define is exchanged between the progenitors. The resulting machine

matrixes will be passed to the offspring. An example depicts the idea in Table 12, Table

13, Table 14 and Table 15.

O1 O2 O3 O4

J1 M1 M3 M2 M1

J2 M2 M3 M1

J3 M1 M3

Table 12. Parent 1


O1 O2 O3 O4

J1 M2 M1 M1 M2

J2 M3 M2 M3

J3 M3 M2

Table 13. Parent 2

O1 O2 O3 O4

J1 M1 M1 M1 M1

J2 M2 M2 M3

J3 M1 M2

Table 14. Child 1

O1 O2 O3 O4

J1 M2 M3 M2 M2

J2 M3 M3 M1

J3 M3 M3

Table 15. Child 2

The feasibility of the offspring is guaranteed because both parents are valid

solutions.

8.3.2.2 Job list breeding

If a bare two-point crossover is tried here (see section 2.2.1.5), the generated children

will probably be monsters. The example below illustrates a possible mechanism for a

jobshop composed of 3 jobs with 3, 3 and 4 operations respectively.


Parent 1:

1 2 2 3 1 3 3 1 2 3

Parent 2:

2 3 3 1 2 2 1 3 1 3

Child 1:

1 2 2 3 1 3 1 3 1 3

Child 2:

2 3 3 1 2 2 3 1 2 3

In child 1 job 1 appears 4 times (once more than necessary) and job 2 only twice

(once less than necessary). Child 2 has the opposite problems.

Therefore, subsequent changes have to be introduced in order to guarantee

offspring feasibility.

Firstly, the problems with the children generated by two-point crossover are

identified as missing or extra operations. The last ones are suppressed, eliminating

their first appearance in the bred block. Missing ones are inserted at the beginning of

the bred block, in the same order that they followed in the original sequence.

The result in the example would be:

Child 1:

1 2 2 3 1 3 2 3 1 3

Child 2:

2 3 3 1 2 2 1 3 1 3

In this way, all the produced individuals are feasible solutions and they actually

resemble their parents.


Selection and combination are the basis for evolution, but some other elements are

introduced into the algorithm to improve its performance. Looking back to the

exploration vs. exploitation dilemma exposed in section 2, they can be classified

according to the criterion they enhance. They will be introduced in the next sections.

8.4 Exploration enhancing mechanisms

8.4.1 Mutation

Mutation operators are applied to the individuals, introducing random changes

with a low probability, then preventing early convergence and preserving population

diversity.

The mutation process used is basically the same that the one developed for

neighbour generating in the S.A. algorithm, except that the changes generated do not

represent a movement of the minimum distance (see section 7.3, ‘The importance of being

neighbours’) but can give origin to higher distances if mutation probability is

sufficiently elevated. This essentially imitates the DNA duplication processes in nature,

where the probability of introducing a change is fixed for every unitary copy made.

The total mutations introduced will correspond in average to the product of that fixed

probability times chain size.

The mechanism is related to the halves of the codification: job and machine lists.

8.4.1.1 Mutation mechanism

8.4.1.1.1 Machine list mutation

Machine list mutation is produced changing just one of the machines to another one

that is also capable of performing that operation. That seems intuitive and simple. An

example is included in Figure 22 and Figure 23.


0 10 20 30 40 50 60 70 80 900.5

1

1.5

2

2.5

3

3.5

(1,1)

(1,2)

(1,3)

(2,1)

(2,2)

(2,3)

(2,4)

(3,1)

(3,2)

(4,1)

(4,2)

(4,3)

(5,1)

(5,2)

(5,3)

(5,4)

(6,1)

(6,2)

(6,3)

(7,1)

(8,1)

(8,2)

time(min)

mac

hine

Gantt Diagram


And the result changing the machine in which operation 1 of job 8 is performed

from 2 to 1 is included in Figure 23.

0 20 40 60 80 1000.5

1

1.5

2

2.5

3

3.5

(1,1)

(1,2)

(1,3)

(2,1)

(2,2)

(2,3)

(2,4)

(3,1)

(3,2)

(4,1)

(4,2)

(4,3)

(5,1)

(5,2)

(5,3)

(5,4)

(6,1)

(6,2)

(6,3)

(7,1)

(8,1)

(8,2)

time(min)

mac

hine

Gantt Diagram

Figure 23. Solution after a machine list mutation


8.4.1.1.2 Job list mutation

In this case, mutation will be structured as a random two-job permutation.

For example, the same schedule in Figure 24 could be modified changing the

priority between the first and third operations and result in Figure 25.

As highlighted below, both mutation processes can happen more than once

depending on the mutation probability introduced.

0 10 20 30 40 50 60 70 80 900.5

1

1.5

2

2.5

3

3.5

(1,1)

(1,2)

(1,3)

(2,1)

(2,2)

(2,3)

(2,4)

(3,1)

(3,2)

(4,1)

(4,2)

(4,3)

(5,1)

(5,2)

(5,3)

(5,4)

(6,1)

(6,2)

(6,3)

(7,1)

(8,1)

(8,2)

time(min)

mac

hine

Gantt Diagram


The modification introduced exchanges in priority the first operations of jobs 6 and

7, returning the result in Figure 25.


0 10 20 30 40 50 60 70 80 900.5

1

1.5

2

2.5

3

3.5

(1,1)

(1,2)

(1,3)

(2,1)

(2,2)

(2,3)

(2,4)

(3,1)

(3,2)

(4,1)

(4,2)

(4,3)

(5,1)

(5,2)

(5,3)

(5,4)

(6,1)

(6,2)

(6,3)

(7,1)

(8,1)

(8,2)

time(min)

mac

hine

Gantt Diagram

Figure 25. Solution after a job list mutation

8.4.1.2 Mutation probability

Mutation rate has to be tuned with care in order to take advantage of this

mechanism. On the one hand, if the value selected is too low, its effects will not be

noticed. On the other hand, if it is too high then the algorithm will be more similar to

just a global search method (see ‘Heuristic methods from a holistic point of view’) and it will

loose efficiency. It will search new paths without trying to go further along promising

directions. Moreover, the time spent in the mutation process can be expressed as

· ·t S s m= (8-1)

where S represents population size, s is problem size in total number of operations

and m symbolizes mutation rate

Thus, increasing mutation probabilities leads to enlarging computation times.

Performance tests return results as the ones included in Figure 26.


0 20 40 60 80 100140

150

160

170

180

190

200Performance vs Mutation rate

Iterations

Tota

l mak

espa

n

0%0.1%1%5%10%25%50%

Figure 26. Performance for several mutation rates

The results suggest that values around 3-6% are preferable.

Nevertheless, mutation rate does not need to be constant. As it is a method for

increasing diversity, it should strengthen when the population uniforms. This project

fixes a minimum and a maximum value for it and takes an intermediate value between

them according to a linear function of the standard deviation of the parent generation

the fitness values. Some boundaries that have proved to give good results are [3%, 6%].

8.4.2 Immigration

In natural systems, immigration is a vital source of diversity. It is well known that

isolated populations tend to uniform and loose the diversity that is the key for being

able to face environmental changes.

8.4.2.1 Immigration rate

The value that will establish the amount of immigrants that will be inoculated to the

population is an important performance parameter for the algorithm. The possible

effects of an incorrect choice are exactly the same as for mutation rate. If it is too low,

then the mechanism will not be profitable. On the contrary, if it turns out to be too

high, global search grows in importance with the subsequent loss of efficiency.


Some examples for different immigration rates for the same problem are displayed

in Figure 27.

0 20 40 60 80 100140

150

160

170

180

190

200

210Performance vs Immigration rate

Iterations

Tota

l mak

espa

n

0%5%15%20%40%60%

Figure 27. Performance for several immigration rates

To have a right perspective on the results of the plot, it is important to notice that

computation time has a linear dependence on immigration rate as shown in Figure 28.

5 10 15 20 25 30 35 4022

24

26

28

30

32

34

36

38

40

42Computation time vs Immigration rate

Tim

e [s

]

Immigration rate [%]

Figure 28. Computation time vs. immigration rate


The tests performed indicate that immigration rate should take a value between 5

and 15% to obtain the best results. Again, it is possible to set variable immigration

rates. The decision taken in this work was to generate rates that vary with standard

deviation of the current generation between those recommended levels.

8.4.2.2 Immigration mechanism

Immigrants are considered individuals from a different population. They have

nothing to do with the current local members, so they are generated randomly in the

same way the first generation was.

Once they have been born and abandon their land, they are humoristically imagined

to draw close to the existing individuals. They form a team while they observe the local

population to identify the weakest members. Among them, each immigrant selects a

victim and attacks it. The combat, which is an implementation of the tournament

selection method (see ‘Survival of the fittest’) will finish with the immigrant eliminated

or occupying its prey’s place.

8.5 Exploitation enhancing mechanisms

8.5.1 Introducing a new concept. Asexual reproduction

8.5.1.1 Social insects and evolution

The classical definition of fitness in biology has been ‘size of the offspring generated’

from Wilson. Nevertheless, there are some cases in nature in which the individuals

seem to have renounced to reproduce themselves.

Hymenoptera, a group of social insects that includes bees and ants, have their

colonies composed by three different categories of members:

Working females (the most numerous), diploid individuals –with a double set of

genetic material-which come from a fecundated ovule (that is, from sexual

reproduction). They have their reproductive system inhibited, so they will never

generate an offspring, but they spend a high amount of their lives looking after

the queen’s children.

Males, haploid individuals –with a single set of genetic material- which came

from a non-fecundated ovule (from an asexual reproduction process called


parthenogenesis). If they are lucky, they will have an opportunity to fecundate

the queen in its ‘nuptial flight’, which happens usually once a year.

A queen, which is entirely devoted to the reproductive process. It stores the

semen of the chosen males in a deposit inside its body and controls the rates of

the sperm releases it in order to define the timing of its fecundation and correctly

alternate sexual and asexual reproductive periods.

For these species, the high majority of males will never have the opportunity to

reproduce themselves, and all females are sterile. That comes to say that the

individuals have a very low –or completely null- fitness value.

Does that sound logical? In fact, social insects have spread to almost every habitat

on earth, from rainforests to deserts. They behave like a super-organism that results

surprising for the complexity of its coordination in matters like forage –having even

inspired the Ant colony optimization methods- or nest building. Definitively, they cannot

be considered badly fitted beings. What is the answer, then?

H. Hamilton proposed that Wilson’s fitness definition was simply wrong. He

introduced the concept of Kin selection.

According to him, females share some 50% of their genetic material with the queen,

about 25% with their haploid brothers and a surprising 75% with their sisters

(assuming a 50-50% mix in sexual reproduction and that males are also sons to the

queen). Therefore, when females look after their sisters there are also preserving their

own genetic material. And 75% is a rate that can be considered higher than the 50%

that a mother would share with its sexually born child, and much more elevated than

the 25% that spinster aunts share with their nieces.

Thus, Hamilton fitness definition was ‘the amount of genetically related individuals left’.

This idea of sexual and asexual reproductive patterns could also be applied to G.A.

8.5.1.2 Introducing an asexual system

Mutation process explores every random direction. What if more resources are

allocated to explore only the most promising direction?


In the parthenogenesis mechanism created, the queen reproduces itself in an asexual

way, generating individuals that could be considered its neighbours in a S.A. context

(actually, even the same code is used). Some results of including this system are

included in Figure 29.

This time there is no time problem with increasing the rate, as the total computation

time for one iteration could be expressed as:

· ·((1 )·( · ) · )c m pt S s p K K m p K= − + + (8-2)

Where S represents population size, s is problem size in total number of

operations, p corresponds to parthenogenesis rate and m symbolizes mutation rate.

, ,c m pK K K are the time constants for crossover, mutation and parthenogenesis

respectively. Considering c pK K> , as the crossover process is slightly more time

consuming, increasing parthenogenesis rates decrease computation time linearly to

some extent.

0 20 40 60 80 100140

150

160

170

180

190

200

210Performance vs Pathenogenesis rate

Iterations

Tota

l mak

espa

n

0%5%10%15%20%40%60%80%90%

Figure 29. Performance for several parthenogenesis rates


0 10 20 30 40 50 60 70 8014

15

16

17

18

19

20

21Computation time vs Parthenogenesis rate

Tim

e [s

]Parthenogenesis rate [%]

Figure 30. Computation time vs. Parthenogenesis rate

The results suggest that parthenogenesis rates around 15% are a good choice.

8.1 Computation time vs. Problem size

It is interesting to analyse the evolution of computation time for increasingly high

problem sizes. The performance tests carried out gave origin to the results presented in

Figure 31.

0 5 10 15 200

100

200

300

400

500

600Computation time vs. Problem Size

Problem Size (number of jobs)

Com

puta

tion

time

(s)

Figure 31. Computation time vs. Problem size


If these results are analysed statistically, some different curves that can be fit to the

data are shown in Figure 32.

4 6 8 10 12 14 16 18 20-100

0

100

200

300

400

500

600

Curve fitting to Computation time vs. Problem Size

y vs. xQuadratic polynomialLinear Polynomial9-Degree polynomialExponential

Figure 32. Curve fitting to Computation time vs. Problem Size

The results of the different fits are presented in Table 16.

Fit Sum of the squared error Adjusted R-Square

Linear polynomial 30095,904 0,920

Square polynomial 7998,051 0,979

9-Degree polynomial 1,360 0,999

Exponential 15448,775 0,959

Table 16. Resuts of the different fits

From these results, it could be inferred that the computation time evolution follows

a polynomial pattern.


8.2 Conclusions

Genetic Algorithms include at least one element of the four basic optimization

mechanisms global search, selection, combination and local search (see ‘Heuristic methods

from a holistic point of view’).

The tool developed generates a random population (performing a global search

action) to constitute a starting point. The size of this population is one of the most

important factors that have an influence on the performance of the algorithm, being

preferable the biggest possible ones.

The best solution –the queen- is always passed to the next generation, so the solution

never worsens between consecutive iterations. The fittest ones are selected in an

implementation of the roulette wheel method to be the parents for the next generation.

That is considered a selection mechanism. They form couples and breed, combining their

information.

Mutation and immigration are introduced as means to maintain diversity and are

respectively further local and global search systems.

An original metaphor of the asexual reproduction in social insects is introduced,

with important positive effects on algorithm performance as it explores further the

possibilities of the most promising direction in every moment. This could also be

considered local search.

Advisable values for the rates related to these mechanisms, as well as their impact

on computation time, have been analysed and provided.

Having a population of solutions instead of working with a single one and

implementing all the four elemental mechanisms provide G.A. with a considerable

robustness that, to the extent known by the author, is not matched by other heuristic

methods.

9Turning the mixer on.

Hybrids


Turning the mixer on. Hybrids 101

9 Turning the mixer on. Hybrids

After having created a

simulated annealing and a

genetic algorithm, it is

interesting to combine them

with the aim of creating hybrids

that could show a better

performance. There is no

guarantee that the resultant

amalgamation will surpass its

composing elements, but

designing a hybrid could be

worth trying.

9.1 First one, then another

The most intuitive way of creating a hybrid is performing both algorithms

sequentially. As G.A. become usually exhausted far before S.A. starts converging, this

will be the natural order in which they will be combined.

Both stages of the algorithm are directly taken from the mechanisms developed in

this project, using the values advised for each parameter and the dynamic cooling plan

proposed –because of its robustness- with a lower-than-usual starting temperature.

9.1.1 When is it time to make the switch?

The time moment in which the shift should be made should not be determined

beforehand. On the contrary, the genetic stage should be left to achieve a point of

exhaustion, in which no improvement has been made for a long time.

That point is determined by setting two variables. The first one will define a

minimum number of iterations that the G.A. will perform by force before being

allowed to give up. The second is the number of generations with no improvement that

will constitute the exhaustion alarm threshold.


The first one is set according to the time allowed to solve the problem (notice that

the time spent in that stage will be linearly dependent on the number of iterations.

Independently of problem or population size, values bigger than 250 do not seem

efficient. For both conclusions, see the section ‘The mechanisms of natural selection applied

to complex problem solving. A Genetic Algorithm for the Jobshop Problem’.

Exhaustion threshold is a more delicate parameter. Experience suggests that values

around 75-100 generations are a good choice.

9.1.2 A family of solutions

The first hybrid proposed tries to get advantage of the robustness that working with

a population of solutions and not with a single one confers to G.A.. For that reason, the

S.A. leg of the hybrid will be formed by a small elitist population of solutions taken

directly from the most promising ones of the preceding stage.

The size of that population is the third important parameter of this hybrid. If it is

too high, the algorithm can loose efficiency (computation time of this stage is

approximately linearly proportional to family size), but if it is too low, the advantages

of working with a family of solutions can vanish. It is interesting to notice that the best

solution achieved after the full performance of the algorithm usually do not come from

the one that was the most promising in the previous phase, being this one just a local

optimum. This fact highlights the importance of choosing a set of solutions instead of

only one. Some examples of performance are provided in the Figure 33 , Figure 34 and

Figure 35.


0 50 100 150 200 250 300145

150

155

160

165

170

175

180

185

190

195Hybrid performance with 4-member family

Iterations

Tota

l Mak

espa

n

Figure 33. Hybrid performance with 4-member familiy

0 50 100 150 200 250 300 350 400140

150

160

170

180

190


Iterations

Tota

l Mak

espa

n



0 100 200 300 400 500 600140

150

160

170

180

190


Iterations

Tota

l Mak

espa

n


Experience shows that family sizes between 6 and 12 are a good choice for problem

sizes of around 10 jobs, but should be bigger for more complex problems.

Although this hybrid guarantees better solutions as it continues exploring when the

G.A. is exhausted, it is far more time consuming. Considering the average time spent:

Figure 36. Performance comparison between the first hybrid and the original G.A.


9.2 Non-darwinian G.A.

Even nowadays, some biologists find that there are some wonderful facts in

evolution that natural selection mechanisms can hardly explain. It is interesting to turn

back to the roots of the evolution theory in order to revise the fundamental schools of

thought that tried to explain the evidences of evolution.

Jean-Baptiste Lamarck supported the idea of acquired characteristics inheritance.

On the contrary, Alfred Russel Wallace rejected it completely, although even Darwin

would not rule it out. August Weismann, the most prominent "neo-Darwinian" of the

time after Darwin argued that hereditary material, which he called the germ plasm,

was kept utterly separate from the development of the organism. This was seen by

most biologists as an extreme position, however, and variations of neo-Lamarckism,

orthogenesis ("progressive" evolution), and saltationism (evolution by "jumps" or

mutations) were discussed as alternatives.

Mutation is already implemented in most G.A. models, but usual mutation

operators follow a pattern of random jumps. Implementing a Lamarckian process, in

which these jumps are orientated towards the direction of improvement is out of the

standard. Again, the idea is worth trying.

By means of that mechanism, individuals could develop positive characteristics and

pass them through to their offspring. This will therefore take the form of a local search

mechanism.

This idea is implemented inside the mutation operator, which will now consist in a

small S.A. performance, so the chances are that it will actually improve the solution –

and will diminish the probability of a resulting disadvantageous change that will

maybe spoil some of the computation effort needed to produce the following

generations. Of course, that different mutation operator will take more time than the

usual one, but the time savings mentioned can surpass those efficiency losses.

The performance of this second hybrid is compared to the former G.A. developed in

Figure 37.


0 20 40 60 80 100150

155

160

165

170

175

180

185

190

Iteration

Tota

l mak

espa

n

Non-darwinian mutationConventional G.A.Non-Darwinian G.A.

Figure 37. Non-darwinian mutation algorithm compared to the conventional G.A.

Computation time is bigger for the hybrid, although it is faster at the beginning

(when the non-darwinian mutation mechanism has more upside for improvement).

1 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7Compared performance: Original version(1) and Non-Darwinian(2)

% Improvement / s

Figure 38. Compared performance between the non-darwinian mutation algorithm and the original G.A.

Experience suggests it is better to set two different stages. In the first one that

mechanism could work, while in the other the conventional mutation operator is used

(as upside has been considerably reduced).


An example of compared results for the first iterations of a performance is shown in

Figure 39.

0 20 40 60 80 100140

150

160

170

180

190

200Hybrid performance

Iteration

Tota

l Mak

espa

n

Conventional G.A.Non-darwinian hybridCompound G.A.

Figure 39. Performance comparison between the conventional G.A., the non-darwinian hybrid and the compound

version

As the figure indicates, the compound algorithm takes advantage of the efficiency of

the non-darwinian hybrid at its first stages but then goes back to the ordinary mutation

operator in order to avoid getting trapped in local optima and allow some members of

the population to mutate in disadvantageous directions.

Figure 40. Performance comparison between the original,the non-darwinian and the compound version


The results in computation time are also good, outperforming both algorithms.

9.3 Conclusions

In summary, the considered amalgamations have relatively good performances

when compared to their compounding parts.

The sequential one shows an excellent ability to escape from local optima. Working

with a family dramatically increases the possibilities of conventional S.A. methods.

Moreover, despite some unsuccessful trials of using better-than-normal starting points

and lower-than-normal starting temperatures have been documented in the research,

this time that strategy has proved to be very convenient. As an inconvenient,

computation time is higher than in other cases.

The non-darwinian hybrids proposed, specially the compound one, outperform the

most classical version. It is important to note that the first G.A. developed does already

embed a local search mechanism that focuses in the best directions (the parthenogenesis

system). It could be surprising to see that including a more sophisticated version of this

same basic element can actually lead to further improvements.

10Anticipating machine failure.

An approach to the Robust

Jobshop


Anticipating machine failure. An approach to the robust Jobshop 110

10 Anticipating machine failure. An approach to the robust

Jobshop

10.1 Introduction

One of the most

interesting research lines

to enlarge this project

would be considering a

robust jobshop. These

models were developed in

an attempt to take into

consideration that the

assumption that the machines will all always be available to process the scheduled jobs

is just an idealization. As stated in [JENSEN 1999], ‘A quality solution which can be

modified easily according to changes in the environment may be more valuable than an optimal

solution that does not allow easy modifications’.

The complexities of such a model are quite clear: to return a valuable solution, all

the possible failures should be considered, taking also into account the probabilities of

each one.

10.2 Not so many combinations

Each failure is determined not only by the broken machine but also by the operation

it was performing when that happened and the type of failure occurred. The latest can

fall into one of the two categories considered:

The operation has to be reprocessed in another machine

The job has been entirely spoiled

The occurrence of a failure is supposed to be infrequent, and thus having more than

one failure in the same instant is a discarded possibility. Although some different


failures can happen sequentially and this situation could be actually studied with the

help of the developed tool, in principle only simple failures will be considered.

Thus, the total space of possible breakdowns can be expressed as:

1

2·i

i

m

ij klk ij l

F M=

= ∑∑ (10-1)

where:

1,...,1,...,1,...,

i

k

i nj ol p

===

n : Number of jobs to be scheduled.

io : Operations that compose the job i .

m : Total number of available machines.

kp : maximum number of operations that can be performed on machine k .It is

calculated as k ijkij

p P=∑ and ijkP is a binary parameter that takes the value ‘1’ if

and only if the j -th operation of job i can be performed on machine k .

To illustrate problem size, an example where all jobs are composed of four

operations, performed on different machines, is included in Figure 41.

0 10 20 30 40 500

200

400

600

800

1000

1200

1400

1600Total amount of possible failures

Number of jobs to schedule

Pos

sibl

e fa

ilure

s

Figure 41. Number of posible failures vs. problem size


As the figure shows, total amount of possible failures increases linearly with the

number of jobs if the number of operations per job and the machines are kept constant.

The intricacies of solving a robust jobshop do not come from the amount of

potential breakdowns but from the implicit complexity of each individual problem (see

‘Brief considerations about jobshop problem complexity’).

10.3 Concentrating on what is relevant

The most classical criterion would be calculating exhaustively the expected

makespan from the probabilities of each failure and the total makespan resulting from

the recomposed schedule and then choosing the best option. The problem of

combinatorial explosion makes evaluating all the solutions impossible, so only a

definite set of solutions (the best found by the algorithm) will be analysed.

The proposed algorithm, however, suggests reducing the space of failures

considered to only the most probable ones according to statistical studies or the ones

that are foreseen to have a bigger impact –from the sparcity of the generated matrixes,

which gives an idea of redundancy between machines, or from the unitary machine

loads-, for example.

In fact, the patterns of machine failure will probably follow the Pareto rule. This

principle (also known as the 80-20 rule, the law of the vital few and the principle of

factor sparsity) states that for many phenomena, 80% of the consequences stem from

20% of the causes. Mathematically, if a set of participants share something, there will

always be a number k between 50 and 100 such that k% is taken by (100-k)% of the

participants. k may vary from 50 in the case of an equal distribution to almost 100 in

the case of a very small number of participants taking almost all of the available

resources. The number 80 is not special, but many systems actually have k lying

around this region of intermediate imbalance in distribution. So, if machine failure

follows this rule, studying around 20% of the most important breakdowns according to

our criteria will be enough to obtain a solution that is robust in the 80% of the cases.


0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Pareto Distribution

A B C

20% of the total possible

machine failuresaccount for 80% of consequences

Figure 42. Pareto distribution

An alternative and beautiful illustration of this phenomenon is linked to Shannon

information theory. According to him, a good measure of the information contained in

a given data 1 2, ,..., nS s s s= can be expressed as what is known as Shannon entropy –

or neg-entropy, as an ordered system provides with more information -:

( )1

1( )·log2 ( )

n

ii i

I S p sp s=

⎛ ⎞= − ⎜ ⎟

⎝ ⎠∑ (10-2)

The basic unit of information is called bit and refers to the information of the

occurrence of an event with a 50% probability.

This formula can be applied to the case of a manager who wants to be informed

only if the relation between real and standard costs -which are known to follow a

N(1,0.2) and are considered to take only the values [0.96 0.97 0.98 0.99 1.00 1.01 1.02

1.03 1.04] – move from 1 in more than 0.02 units.

The amount of information that the cost rate contains is, when applying formula

(10-2) in the case of an equiprobable distribution, 2log 9 3.16= bits. If the normal

probabilities are taken into account, this value turns out to be 1.66 bits, which means

that knowing the distribution followed by the data stores 3.16-1.66=1.5 bits.


Coming back to the manager, he would be informed just some 28% of the time,

receiving an amount of information of 3.3 bits. Apart from that, he knows the

distribution, having a total knowledge of 0.28 3.3 + 1.5 = 2.42 bits

Following this criterion, he gets some 2.42/3.16 = 76.5% of the maximum amount of

information in just 28% of the time, so its performance increases to 76.5/28 = 273%

In the same way, the proposed model just takes into account the most probable

failures, dramatically decreasing the amount of resources needed to make the

calculations (for example, to a 20% ) and indeed having a result that will be the best

one in robustness most of the time (for example, in the 80% of the cases).

10.4 Considering a real problem

This project develops also a very useful tool that will take advantage of the positive

features of the algorithms developed to analyse the robustness perspectives of a

proposed solution.

The structure of the failure analysis system is as simple as repeating the jobshop

solving with the new constraints imposed by each breakdown –assuming that the out

of order machine will not be available in the time horizon considered and noticing that,

after its loss, one or more jobs could be impossible to finish-, and taking note of the

worsening of the solution that the failure would lead to.

The model returns an alternative schedule to a given one in each case and computes

its expected fitness. In the case that there is no possible solution, it warns that one or

more jobs cannot be accomplished and returns a list of the conflictive operations and

an alternative schedule for the remaining jobs.

10.5 A simple example

An 8-job 3-machine problem is analysed below.

PList, the matrix that stores the possible machines that can perform each operation,

is included in Table 17:


Op1 Op2 Op3 Op4

Job1 1 3 2,3

Job2 1 2 1 3

Job3 3 2

Job4 3 2 1,2

Job5 1 1 3 3

Job6 1 3 1

Job7 2

Job8 1,2 1,3

Table 17. Example of performance possibilities for each operation

The pseudo-optimal schedule obtained for these constraints could be represented in

a Gantt diagram as in Figure 43.

0 10 20 30 40 50 60 70 80 900.5

1

1.5

2

2.5

3

3.5

(1,1)

(1,2)

(1,3)

(2,1)

(2,2)

(2,3)

(2,4)

(3,1)

(3,2)

(4,1)

(4,2)

(4,3)

(5,1)

(5,2)

(5,3)

(5,4)

(6,1)

(6,2)

(6,3)

(7,1)

(8,1)

(8,2)

time(min)

mac

hine

Gantt Diagram

Figure 43. Resulting solution


If machine 1 breaks down immediately after finishing the first operation of job 5 and

spoiling it completely, the best schedule is represented in Figure 44.

0 10 20 30 40 50 60 70 800.5

1

1.5

2

2.5

3

3.5

(1,1)

(1,2)

(1,3)

(2,1)

(3,1)

(3,2)

(4,1)

(4,2)

(4,3)

(5,1)

(6,1)

(7,1)

(8,1)

(8,2)

time(min)

mac

hine

Gantt Diagram

Figure 44. Resulting schedule after the considered failure

Note that only the jobs that can be performed now have been programmed. The

algorithm returns also a list with the operations that could not be possibly performed

because now there is no machine with that capability - (2,3), (5,2) and (6,3)- . That is the

reason for the number of jobs to be reduced as it appears on Figure 44.

11Introducing random time values.

An approach to the stochastic

Jobshop problem.


Introducing random time values. An approach to the Stochastic Jobshop problem 118

11 Introducing random time values. An approach to the Stochastic

Jobshop problem

Through all the work presented, all the parameters considered have been

deterministic. This section explores the possibilities of taking into account the fact that

processing, change, set up and transportation times are actually random.

11.1 Randomness inside the Jobshop

If we would like to introduce randomness in our problem, then considering

stochastic time values seems the logical starting point. This leads, in principle, to

abandon optimisation techniques and resource to simulation tools.

However, in the case of metaheuristic algorithms this is not true. Randomness can

be introduced in every part of the mechanism.

In the stochastic model developed this is accomplished by modifying the function

that evaluates fitness for each solution. Now it will perform a sufficient number of

iterations – a few hundreds can be enough – generating each time value from a

probability distribution taking into account the deterministic value. Thus, the

simulation will be structured as one resolution of the whole jobshop in which a

simulation loop will be performed each time that a solution has to be evaluated.

The chosen distribution was the Weibull one, which is often used to model the time

until a given technical device fails.

1

( ; , ) · ·kk xk xF x k e λλ

λ λ

− ⎛ ⎞−⎜ ⎟⎝ ⎠⎛ ⎞= ⎜ ⎟

⎝ ⎠ (11-1)

If the failure rate of the device decreases over time, the parameters should be chosen

so k < 1 (resulting in a decreasing density f). If the failure rate of the device is constant

over time k = 1, again resulting in a decreasing function f. On the contrary, if it

increases over time, k > 1 and the density f increases towards a maximum and then

decreases forever. Manufacturers will often supply the shape and scale parameters for

the lifetime distribution of a particular device or, if they do not, they can be estimated

from statistical failure data.


In this case, the setup, transportation and processing times are simulated from the

initial value of the deterministic parameter using the expression

( )* 1 0.05·( ( ))T P r E r= + − (11-2)

where:

T : Simulated time value

P : Deterministic parameter for that value

r : random component of the simulated time, which is obtained from a Weibull

distribution W(1,2) –chosen because its shape seems reasonable-, and is represented in

Figure 45.

As having a random component with mean 0 allows keeping the deterministic

parameter as the average of the result and the chosen distribution has a positive mean,

the distribution is displaced by subtracting this value.

As long as 0.05· ( ) 1E r < , the resulting T will be positive, and this condition is true

for the parameters considered.

In addition, the importance of this random component is considered dependent of

the size of the deterministic parameter, choosing 0.05 as an acceptable value (in the

case of a very small parameter the resolution would be essentially the same as in the

deterministic performance and a too high random component would mean that the

problem is not worth solving, as the dispersion of the results would be very elevated).

For example, the resulting distribution for a parameter of 15 results to be as plotted in

Figure 46.


0 0.5 1 1.5 2 2.5 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Data

Den

sity

Random component from Weibull distribution

Mean = 0.887

Figure 45. Random component from a Weibull distribution

14.5 15 15.5 16 16.5 170

0.2

0.4

0.6

0.8

1

1.2

Data

Den

sity

Resulting distribution for P=15

Mean = 15

Figure 46. Resulting distribution in a specific example


11.2 Adapting the algorithm

This leads, in principle, to abandon optimisation techniques in favour of simulation

tools. Nevertheless, in this case it is much simpler: when evaluating the fitness of a

solution, inside the function F_CodeToSchedule, the time values are simulated from the

deterministic parameter instead of taking it directly. The evaluation is repeated a

sufficient number of times –a quite hundreds are enough, as stated before-. The

statistical distribution of the results of these iterations is obtained and the algorithm

continues to operate as usual with the mean values.

The computation time taken, ceteris paribus, is a linear function of the number of

iterations performed to evaluate the fitness of a solution, but this should not be

reduced excessively in order to maintain the accuracy of the result.

Thus, the existing models have been successfully adapted to solve the stochastic

version of the Jobshop problem.

12Further suggestions to the

Jobshop responsible.

Applying T.O.C.


Further suggestions to the Jobshop responsible. Applying T.O.C. 123

12 Further suggestions to the Jobshop responsible. Applying

T.O.C.

The models developed in this

project are more ambitious than

merely suggesting pseudo-optimal

scheduling solutions. One of the most

interesting and original add-ins is the

use of T.O.C. (Theory of Constraints)

applied to the final schedules to

obtain interesting data that can be

very useful to highlight the best ways

in which the shop machinery could be

improved.

12.1 Introduction to the Theory of Constraints

In the 70’s, Eliyahu Goldratt, an Israelite physician, introduced in his well-known

book ‘The goal’ a new production management system that he called Optimised

Production Technology (OPT). It was based in production flux balancing and

bottleneck management. Subsequently, he spread that philosophy to all the company

subsystems and created the Theory of Constraints as a managing system centred on the

system constraints.

As a starting point, Goldratt defines the goal for a production company as simply

earning money –that should be measured through indexes as net profit, profitability or

liquidity-. As these parameters are too general and lack an exploitation meaning, other

three indexes are presented as more useful: net revenue (money entering the company

system), inventories (money that is retained inside the system) and operation expenses

(money invested in transforming inventories to revenues). In order to progress

towards its goal, the company should work to increase net revenue and lower

operation expenses and inventories.

Consistently with T.O.C. , Systems Theory and Games Theory, it can be stated that

that the bare summation of local objectives in each of the production subsystems does


not lead to the global optimum. Moreover, the strength of the production chain is

determined by the weakest of its links. That weakness cannot be compensated by

strengthening another subsystem (although there are statistical fluctuations only the

negative ones –the ones that will cause delays- can be passed through because of the

dependence relation in each of the steps –if some of them is performed faster, it will

have to wait to be processed in the next step as the machinery cannot increase its

speed-).

Once those ‘weak links’ have been identified, all the management decisions have to

be taken in such a way that they are subordinated to them. This gives origin an

iterative process, as the constraints can change as a consequence of the management

decisions.

The main principles of that kind of management can be expressed as:

It is not production capacity what should be balanced but production flux.

Bottlenecks (‘weak links’) determine production capacity. The other resources

should follow their rhythm in the same way that a boy-scouts group will finish

their walk without stopping to wait the slowest boys or without moving away

from the first one if they leave the slowest child to lead the group.

Non-bottleneck resources usage should be determined by other system

constraints. If that is not done, it will produce stock that will queue in order to be

processed (and, as stated below, growing inventories prevent the company to

advance towards its goal).

A bottleneck resource constraints each of the other resources. ‘One lost hour in a

bottleneck resource is one lost hour for the whole system’ [GOLDRATT 84].

One hour earned in a non-bottleneck resource is an illusion. The inventory

produced will have to wait to be processed in the bottleneck one.

Varying batch size can supply the production chain with flexibility and, if it is

decreased, it can lower processing time.

The implementation of these ideas will lead to an increase in net revenue and a

decrease in inventories and production costs, and consequently the company will be

nearer to its goal expressed as making money.


12.2 Bottleneck concept applied to Jobshop production

12.2.1 The main difference

Although the Theory of Constraints (T.O.C.) body of knowledge was created to be

applied to production chains, its generalised concepts can be also applied to jobshop.

The main difference between these two kinds of environments is that, for a jobshop

system, neither work sequence nor processing times are determined. That provides it

with flexibility to successfully perform very different kinds of job, but at the price of

loosing efficiency.

Variable sequences and processing times result in variable bottlenecks. That is to

say, bottlenecks vary with schedule. And, as the jobshop structure was chosen because

flexibility was needed, that means that bottlenecks will actually vary.

This implies that the T.O.C. analysis is more complex, but can also provide

manufacturing managers with very interesting information.

12.2.2 Traces of a bottleneck

What kind of analysis should be done in order to correctly identify bottlenecks?

Having the pseudo-optimal schedules that the algorithm returns, it should be an easy

job.

As Goldratt stated, ‘One lost hour in a bottleneck is one lost hour for the whole system’.

[GOLDRATT 84]. That means that a bottleneck resource will correspond to critical

activities in a CPM sense. If the schedule is in fact optimal, bottleneck resources will be

charged to their limit, while non-bottleneck ones will have lower charges. Thus, the

first indicator to recognise a bottleneck is having it operating at its maximum charge.

It is also important to notice that inventories tend to accumulate just before

bottlenecks, waiting to be processed. Consequently, it is important to analyse work in

process inventories through all the system to detect these accumulations.

12.2.3 T.O.C. Study

All these variables are easily computed with one of the add-ins of the algorithm,

named ‘T.O.C. Study’. When performed, the function returns total charges for each one


of the machines, identifies bottlenecks and plots work in process inventories to give a

quite useful idea of what is really happening on the shop floor.

To depict these ideas, an example of output will be presented.

Firstly, the optimal schedule for a given problem was expressed in the Gantt

Diagram in Figure 47.

0 10 20 30 40 50 60 70 80 900.5

1

1.5

2

2.5

3

3.5

(1,1)

(1,2)

(1,3)

(2,1)

(2,2)

(2,3)

(2,4)

(3,1)

(3,2)

(4,1)

(4,2)

(4,3)

(5,1)

(5,2)

(5,3)

(5,4)

(6,1)

(6,2)

(6,3)

(7,1)

(8,1)

(8,2)

time(min)

mac

hine

Gantt Diagram

Figure 47. Gantt diagram for the optimal schedule obtained

The T.O.C. Study outputs were:

• MachineCharge: [1.0000 0.8475 0.9904], a vector which expresses unitary

charge for each machine.

• BottleNeck: 1, as machine 1 is loaded to its maximum.

A graphical output of W.I.P. (work in process) is also returned in Figure 48.


0 20 40 60 80 1000

2

4WIP Charge on machine 1

0 20 40 60 80 1000

2


0 20 40 60 80 1000

1


Figure 48. W.I.P. evolution for that solution

From the graph on Figure 48, it can be seen that although machine 1 is loaded to its

maximum, there are quite high accumulations of WIP before machines 2 and 3.

Actually, they are quite balanced, so there are no important conclusions. See Figure 49

for a different example output.

0 20 40 60 80 100 120 1400.5

1

1.5

2

2.5

3

3.5

(1,1)

(1,2)

(1,3)

(2,1)

(2,2)

(2,3)

(2,4)

(3,1)

(3,2)

(4,1)

(4,2)

(4,3)

(5,1)

(5,2)

(5,3)

(5,4)

(6,1)

(6,2)

(6,3)

(7,1)

(8,1)

(8,2)

time(min)

mac

hine

Gantt Diagram

Figure 49. Gantt diagram for the solution


Its results were:

• MachineCharge: [1 0.3768 0.6680]

• BottleNeck: 1

0 20 40 60 80 100 120 1400

2


0 20 40 60 80 100 120 1400

1


0 20 40 60 80 100 120 1400

0.5


Figure 50. Corresponding W.I.P. evolution

This time, the identified bottleneck continues to be machine 1, but the WIP waiting

to be processed in it is much bigger than the charge in the others. That means that, for

this particular job charge, machine 1 constraints the whole system. Thus, if that job

charge was representative of the jobshop and the machinery responsible were able to

buy new machines, the best choice would be machine 1.

Using this kind of graphs effectively helps taking decisions about increasing

capacity. However, those studies should be performed on several different schedules

that could be considered representative of the future needs of the shop in order to get

valuable information.

In addition, knowing the schedule bottleneck can be useful to pay more attention to

that resource –making a preventive maintenance, for example-, given that every delay

in that machine will cause an overall delay.


As a conclusion, it is useful to study the bottleneck resources of the jobshop in order

to pay more attention to them and assess managers to take investment decisions. This

project supplies with a tool that provides the information necessary to correctly

identify these bottlenecks.

13Can you possibly know how far

you are from your goal, if you do

not even know where it is?

Can you possibly know how far you are from your goal, if you do not even know where it is? 131

13 Can you possibly know how far you are from your goal, if you

do not even know where it is?

As stated before,

for getting the

efficiency that

heuristic methods

can provide it is

necessary to make a

renounce. We will

never know what

the global optimal

point is. It seems

that accepting a solution with our eyes blinded, without knowing how good it is, is

quite risky. Should we have employed more time or resources to try to improve the

current solution? Could that be economically reasonable, or the amount of effort

needed is not worth making?

This problem may sound familiar. In branch and bound methods, a bound is

assigned to each branch by solving a relaxed problem. As this boundary would be the

best possible solution for the branch but there is no guarantee beforehand that it can be

obtained, the whole branch will be eliminated if that value means no improvement

when compared to the best solution found until the moment. That implies that there is

always a value for the error level of the approximated solution if not all the branches

are checked.

However, the Jobshop is different. It seems intuitive that the linear programming

model developed in this project creates more complexity than the non negligible

amount already present in the model. In the author’s opinion, this is clear because the

problem is much easier to understand that the LP model and this should not happen.

One of its consequences is that the constraints in the model do not have a

straightforward meaning. As a result, there is no sense in trying to solve a relaxed

problem in which binary variables can take values intermediate between 1 and 0, for


example. The problem has no meaning and the solution is therefore not a valid

approximation.

Nevertheless, it is not time to give up. Although this kind of error delimitation has

barely been explored in the research work, this project makes an effort to provide

improvement margins for the solutions obtained by the use of a truly relaxed problem

and by an innovative statistical analysis.

13.1 An obvious bound. True relaxation

The first thing that can be done is setting two completely obvious limits. They

correspond to the solutions of the truly relaxed problem. That is to say, to a problem

that ignores some of the constraints of the problem – not the LP model-. In particular,

the most suitable constraints to be relaxed are:

All the operations of a job must be performed

A machine can perform only one operation at a time

If we only considered the problem from the job perspective, without taking the

machines into account, the ideal situation would be described as having each operation

performed on the possible machine in which the task takes shorter. Then, the

completion time of the job would be the summation of operation processing times. The

ideal schedule finishes with the longest job completion time.

Regarding only the machines, another bound can be calculated. The minimum work

load for each machine is the summation of the processing times of all the operations

that can only be performed on it. That also represents a lower limit for each machine,

and the maximum of them can be used as a second approximation.

Both bounds are very inexact, but definitively more useful than relaxed MIP.

A second method based in statistics will be introduced in the next section.

13.2 Can you estimate it?

13.2.1 Introducing the problems of the data to fit

When studying how to fit density functions to data in statistics lessons, it seems that

we always know beforehand the kind of function that will suit the data best, because of


previous assumptions about the data source or just because the case is obvious. For

example, we can be given a set of points with an empirical density function similar to

the one on Figure 51.

1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Data

Den

sity

Empirical Density Function

Figure 51. Example of density function

At first sight it seems quite symmetrical, unimodal… maybe fitting a normal could be a

good idea. If we want to explore other possibilities, we could try with a Normal

distribution, for example, as shown in Figure 52.

1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Data

Den

sity

Empirical Density FunctionFitting a Normal Distribution

Figure 52. Normal fit


The results are quite good and it can be inferred that the data actually came from a

N(2.87,0.89) with a 98% confidence level4.

Now it is simple to apply statistical methods to obtain more interesting information.

However…what happens if we are not in that privileged position, that is to say, we

have no idea of which function could fit the data beforehand? That means that

parameter estimation is not enough, as we do not know which parameters should be

estimated and how.

That is the case of the objective values obtained thorough the performance of the

algorithm. They are not as easy to assimilate to a specific function as the example in

Figure 53. To illustrate this affirmation, some data from one of the GA models

developed in this project is plotted in Figure 53.

84 86 88 90 92 94 960

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Data

Den

sity

ObjectiveRecord dataNormal fitWeibull fitLognormal fitGamma fitInverse Gaussian fit

Figure 53. Different widely used distributions applied to the data

As it can be seen from the inspection of the figure, there is no classical function that

could be assimilated to this data.

4 The level of confidence was returned by a Chi-Square test function developed by the author. Both the

graphs and the estimation where obtained with MatLab distribution fitting toolbox. The data for the

example were also generated using MatLab.


It seems that its nature (multi modal, asymmetrical, irregularly spaced) will probably

not fit into any density function. This has been assumed by most heuristic designers.

However, this is not true.

13.2.2 No parameters? A different kind of density distribution

The function that has been successfully used to fit the objective function data is

different from the other ones represented in the plot below. The key aspect is that it has

no parameter.

Some disadvantageous aspects of classical fitting will be highlighted first, and the

solution provided by non-parametric distribution fitting will then be analysed.

The first thing is that the whole aspect of the histogram may vary depending on

how we select the boundaries of the bins. We could imagine that smaller bin sizes

allow more precise representation, but when exceeding a certain level this can result in

empty bins. It is difficult to determine the optimal smoothing level. To depict this idea,

an example is provided in Figure 54.

Figure 54. Examples of bins with different sizes

84 86 88 90 92 94 96 980

0.05

0.1

0.15

0.2

0.25

Data

Den

sity

ObjectiveRecord data (2 )

84 86 88 90 92 94 960

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Data

Den

sity

ObjectiveRecord data

86 88 90 92 94 960

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Data

Den

sity


86 88 90 92 94 960

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Data

Den

sity



However, many alternatives are available when trying to avoid histograms. One of

the simplest and most well-known is the naïve estimator. It approximates the density

function without setting bin limits in any way, just by placing a box of height 1(2 )nh −

and height 2h on each observation and summing all the boxes (which gives a total area

of 1). Smoothness is controlled by the choice of h . Naïve estimator has the

disadvantage of being discontinuous –with jumps on every step-. A formal expression

of this estimator could be:

1 1

1 1 1( )2 2

n ni i

i i

x x x xx wnh h n h h

φ= =

− −⎛ ⎞ ⎛ ⎞= Π =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

∑ ∑

Where Π represents the rectangle or boxcar function

Kernel density estimate overcomes the difficulty of discontinuity by using an

estimation that takes the form:

1

1( )n

i

i

x xx Knh h

φ=

−⎛ ⎞= ⎜ ⎟⎝ ⎠

∑

Where ( )K x is usually chosen as a symmetric probability density function –with the

constraints that this imposes-, such as1

0

( ) 1K x dx =∫ .

The so-obtained estimation is continuous and differentiable. Again, h is used as a

smoothing parameter or bandwidth. As stated in [AGNEW 04], these estimates suffer

from some drawbacks, especially when applied to long-tailed distributions. The

estimate tends to be noise in the tails of the distribution –as usually data are more

sparse in the tails-. This effect can be mitigated by lowering h , but the price to pay is

resolution loss in the centre of the distribution –and not noticing interesting details

such as multiple modes-.

Further improvements can be made to this distribution. One of them is trying to

adapt the amount of smoothing to the local density of the data. Nearest Neighbour

Method controls the level of smoothing based on the size of a box required to contain a

determined number of observations. It calculates the distance from each observation to

the nearest point ( ) min id x x x= − . A generalization of this concept of distance would

be using the k -th nearest neighbour. The estimation expression results:


1

1( )( ) ( )

ni

ik k

x xx Knd x d x

φ=

⎛ ⎞−= ⎜ ⎟

⎝ ⎠∑

13.2.3 Applying non-parametric density function estimation to the data provided by the

algorithms

When comparing the results of the fit provided by this kind of estimation to the

ones shown in last section, it becomes evident that this is a much more suitable

approach.

86 88 90 92 94 960

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Data

Cum

ulat

ive

prob

abili

ty

ObjectiveRecordNon-parametric fit

Figure 55. Example of non parametric fit

Moreover, Kolmogorov-Smirnoff test function returns the following information:

P = 0.9000

MaximumDifference between the empirical and the estimated CDF : 0.1369

KS statistic values for 90,95 and 99% confidence level: [0.3107 0.2591 0.2336]

As it can be seen, the approximation is quite valid, so it is possible to make an

estimation of its minimum value. It is obtained as the point x for which 0

( )t x

t

t dt lφ=

=

=∫ ,

where l is a small parameter that in this case has been fixed to 0.005.


For the example case, the estimate is 83.8600, which means that the best schedule

obtained (with a total makespan of 84.64) can be, in the worst case, at a 0.91% distance.

13.3 Conclusions

The error that the heuristic methods incur can be estimated, not only by solving

relaxed problems –noticing what this really means in the context of the problem- but

also by density function estimation.

For this second method, a non-parametric function is fit. A function that develops

such a fit and performs a Kolmogorov-Smirnoff test has been developed. In addition to

data that evaluate the fitness of the estimation, a minimum value is estimated.

This way, it is possible to now how far we are from our goal.

14Next steps

Next steps 140

14 Next steps

Having accomplished the

project objectives, it becomes

clearer to identify the

possible lines of research that

would be logical to follow.

Most of them have already

been introduced, but it is

interesting to briefly express

them explicitly in order to

define future work.

A fuzzy jobshop: it is maybe one of the most interesting lines of research, as it

reflects reality better than other models. The idea is to introduce imprecise time

data, that will be preferredly modeled as trapezoidal numbers as they are more

easily understood and defined. As a consequence, the algorithm should be able

to work with poligonal numbers as they appear when operating trapezoidal

ones. Specific arithmetic operators have already been designed for these

numbers, although an efficient algorithm for calculating fuzzy maximums and

minimums has not been developed yet, and it would be an essential part of the

algorithm. The core of the models could remain the same, but the way in which

fitness is evaluated should give a fuzzy result that would be defuzzied according

to a consistent funtion in order to calculate survivance probabilities. In addition,

the algorithm should be able to dynamically refresh the optimal schedule as time

goes by and information is more detailed –decreasing the imprecision in the

data-.

A more general robust jobshop: in the model developed in this project, it is

assumed that the broken-down machine will take longer to be repaired than the

total makespan for the schedule. A possible generalization could contemplate the

machine being available in an intermediate time value before the schedule is

completed.

Next steps 141

The assembly jobshop remains practically unexplored in research. The

complexities of the precedence constraints make the definition of an ideal

codification impossible (in the sense explained in section 4.1.1), so an efficient

checking process, as well as a more complex random solution generator should

be developed.

A buffer constrained jobshop could be considered by futher using the already

developed T.O.C. tool, as it calculates the amount of work in process that is

waiting to be processed on each machine. The solutions that exceed a determined

level are eliminated.

Implementation of Genetic Expression Models: this new heuristic paradigm –

note that it is different from Genetic Algorithms- is introduced in the annexes but

has not been implemented to test its performance. It should be tried to study its

possibilities.

RReferences

References 143

References

[AGGARWAL 1996] Aggarwal, Charu, James B. Orlin and Ray P.Tai . 1996.

‘Optimized Crossover for the Independent Set Problem ’ Operations Research, Vol. 45.

[AICKELIN 2004] Aickelin, U., E. Burke and A. Mohamed. 2002. ‘Investigating

Artificial Immune Systems for Job Shop Rescheduling in Changing Environments’ Poster

Proceedings of ACDM 2004 Engineer’s House, Bristol, UK.

[AIDYN 2003] M. E. Aydin and T. C. Fogarty. 2003. ‘Modular Simulated Annealing

Algorithm for Job Shop Scheduling running on Distributed Resource Machine (DRM)’. South

Bank University, SCISM, 103 Borough Road, London, SE1 0AA, UK.

[AKERS 1956] Akers, S.B. 1956. ‘A graphical approach to production scheduling

problems’. Operations Research, 4.

[ANILY 1987] Anily, S and A. Federgruen. 1987. ‘Ergodicity in Parametric

Nonstationary Markov Chains: An Application to Simmulated Annealing Methods’.

Operations Research, Vol.35.

[CERNY 1985] Cerny,V. 1985. ‘A Thermodynamical Approach to the TSP: an Efficient

Simulation Algorithm’. J. Optim. Theory Applied. 15, 41-51.

[COFFMAN 1976] Coffman, E.G. 1976. ‘Computer and Job-shop Scheduling Theory’.

John Wiley, New York.

[DE CASTRO 2002] de Castro. L. N., and J. Timmis. 2002. ‘Artificial Immune Systems:

A New Computational Intelligence Approach’, Springer-Verlag.

[DORIGO 1996] Dorigo, M., V. Maniezzo and A. Colorni. 1996. ‘The ant system:

Optimization by a Colony of Cooperating Agents’ IEEE Transactions on Systems, Man and

Cybernetics-Part B, 26(1): 29-4.

[DUDEK 1964] Dudek, R. A. and O. F. Teuton. 1964. ‘Development of M-Stage Decision

Rule for Scheduling n Jobs through m Machines ’. Operations Research, Vol.12.

[DUDEK 1992] Dudek, R.A. , S.S. Panlawar and M.L. Smith. 1992. ‘The Lessons of

Flowshop Scheduling Research’. Operations Research, Vol. 40.

References 144

[EL GAMAL 1987] El Gamal, A.A., L.A. Hemachandra, I. Sheperling and V.K. Wei.

1987. ‘Using Simulated Annealing to Design Good Codes’. IEEE Trans. Inform. Theory 33,

116-223.

[FAYAD 2003] Fayad, C. and S. Petrovic. 2003. ‘A Genetic Algorithm for the Real-World

Fuzzy Job Shop Scheduling.’Engineering and Physics Science Research Council (EPSRC),

UK, Grant No. GR/R95319/01. Sherwood Press Ltd, Nottingham.

[FRENCH 1990] S. French. 1990. ‘Sequencing and Scheduling. An Introduction to the

Mathematics of the Job-Shop’. Ellis Horwood series on Mathematics and Its Applications.

Chelsea College, University of London.

[GEMAN 1984] Geman, S. and D. Geman,. ‘Stochastic Relaxation, Gibbs Distribution

and the Bayesian Restoration of Images’. IEEE Proc. Pattern Analysis and Machine

Intelligence PAMI-6, 721-744.

[GAHETE 05] Gahete Díaz, J. L., ‘Observatorio de Internet. Modelo de Supervisión,

Optimización y Mejora Global del Encauzamiento de Datos entre Sistemas Autónomos’, thesis.

[GONÇALVES 2002] Gonçalves, J.F, J.J. de Magalhaes and M. Resende. 2002. ‘A

Hybrid Genetic Algorithm for the Job Shop Scheduling Problem’. AT&T Labs Research

Technical Report TD-5EAL6J.

[GONZALEZ 1977] T. Gonzalez and S. Sahni. 1977. ‘ Flowshop and Jobshop schedules:

Complexity and Approximation’. Operations Research, Vol. 26.

[GOLDRATT 1982] Goldratt, E. 1982. ‘The goal. An ongoing process’. North River P.

[HINTON 1984] Hinton, G.E., T.J. Sejnowski and D.H. Ackley. 1984. ‘Boltzmann

machines,: Constrain Satisfaction Networks that Learn’. Report CMU-CS-84-119,

Department of Computer Science, Carnegie-Mellon University, Pittsburgh.

[HOLLAND 1975] Holland, J.H. 1975. ‘Adaptation in Natural and Artificial Systems’.

University of Michigan Press. Ann Arbor, MI.

[HOLTHAUS 2002] Holthaus O. snd C. Rajendran. ‘A Study of the Performance of

Scheduling Rules in Buffer-constrained Dynamic Flowshops’. 2002. International Journal of

Production Research, Volume 40, Number 13, pp. 3041-3052.

References 145

[IGNALI 1965] Ignali, E. and L. E. Schrage. 1965 ‘Application of the Branch-and-Bound

Technique to Some Flowshop Scheduling Problems’. Operations Research, Vol. 13.

[JACKSON 1955] Jackson, J. R. 1955. ‘Scheduling a Production Line to Minimize

Maximum Tardiness’. Research Report 43,Management Science Research Project,

University of California, Los Angeles, USA.

[JENSEN 1999] Jensen, T. and T. K. Hansen. 1999. ‘Robust Solutions to Job Shop

Problems’ Proceedings of the 1999 Congress on Evolutionary Computation, pp. 1138-

1144.

[JEPSEN, 1983] Jepsen, D, and D. Gelatt, 1983. ‘Macro Placement by Monte Carlo

Annealing’. In Proc. International Conference on Computer Design., 495-498. Port

Chester , N.Y.

[JOHNSON 1954] Johnson, S. M. 1954, ‘Optimal two and three-stage Production

Schedules with Setup times included’. Naval Research Logistic Quarterly,1

[JOHNSON 1989] Johnson, D, et Al. ‘Graph partitioning by Simulated Annealing’. 1989.

Operations Research, Vol. 37.

[KIRKPATRICK, 1984] Kirkpatrick, S. 1984. ‘Simulated Annealing : Quantitative

Studies’.J. Statis. Phys. 34, 975-986.

[KIRPATRIC 1983] Kirpatric,S, C. Gelatt and M. Vecchi.1983. ‘Optimization by

Simmulated Annealing’. Science 220. 671-680.

[LESH 2003] Lesh,N. L. B. Lopes, J. Marks, M. Mitzenmacher and G. T. Schafer.

2003. ‘Human-Guided Search for Jobshop’ In proceedings of the 3rd International NASA

Workshop on Planning and Scheduling for Space. Houston, Texas.

[LOMNICKI 1965] Lomnicki,Z. 1965 ‘A Branch-and-Bound Algorithm for the Exact

Solution of the Three Machine Scheduling Problem’. Operations Research, Vol. 16.

[MESGHOUNI 2004] Mesghouni, K, S Hammadi and P. Borne. 2004. ‘Evolutionary

Algorithms for Job-Shop Scheduling’ Int. J. Appl. Math. Comput. Sci., 2004, Vol. 14, No. 1,

91–103.

References 146

[METROPOLIS 1953] Metropolis, W., and A. Rosenbluth. 1953 ‘Equation of State

Calculations by Fast Computing Machines’. J. Chem. Phys. 21, 1087-1092.

[NAKANO 1991] Nakano, R. And Yamada, T,. 1991‘Conventional Genetic Algorithm

for Jobshop Problems’, Proceedings of the fourth International Conference on Genetic

Algorithms, Morgan Kauffman; San Mateo, CA, USA; 1991, pp. 477-479.

[REEJA 2000] Reeja, MK and C. Rajendran, ‘Dispatching Rules for Scheduling in

Assembly Jobshops’. 2000. International Journal of Production Research, 38 (10): 2349-

2360.

[ROWEN 1985] Rowen, C. and J.L. Henessy. 1985. ‘A Flexible Logic Implementation

System’. In proceedings 22nd Design Automation Conference, 748-752, Las Vegas , N.M.

[SMITH 1956] Smith, R. D. 1956. ‘Various Optimizers for Single-stage Production’.

Naval Research Logistics Quarterly 59-66.

[SMITH 1967] Smith, R. D. and R.A. Dudek. 1967. ‘A General Algorithm for Solution of

the n-Job m-Machine Sequencing problem of the Flowshop’. Operations Research, Vol. 1.

[VECCHI 1983] Vecchi, M.P and S. Kirkpatrick. 1983. ‘Global wiring by Simulated

Annealing’. IEEE Trans. Computer-Aided Design CAD-2, 215-222.

[WHITE 2003] White, T., S. Kaegi and T. Oda. ‘Revisiting Elitism in Ant Colony

Optimization’ Proceedings of the GECCO 2003, Genetic and Evolutionary Computation

Conference, Chicago, IL, USA, July 12-16, 2003.

[YAMADA 1992] Nakano, R. And Yamada, T. 1992. ‘A Genetic Algorithm Applied to

Large-scale Jobshop Problems’, in R.Männer et al. Parallel Problem Solving from Nature,

Amsterdam, 1992, pp.281-290.

LList of Abbreviations Used

List of Abbreviations Used

J.S.P.: Jobshop Problem

T.S.P.: Travelling Salesman Problem

P.: Polinomial time problems

N.P.: Non- Polinomial time problems

G.A.: Genetic Algorithm

P.M.C.: Partially Mapped Crossover

S.A.: Simulated Annealing

A.I.S.: Artificial Immune Systems

L.P.: Linear Programming

T.O.C.: Theory of Constraints

W.I.P.: Work in Process

Annexes

AGenetic Expression Models

A Genetic Expression Models. An Introduction. 151

A Genetic Expression Models. An Introduction.

As part of a constant search of ways to adapt to the environment, organisms

regulate the way in which their genes are expressed. The information contained in the

genes of a living being (known as genotype) does not completely define its actual

characteristics (referred to as phenotype).

Adaptation occurs through the evolution mechanisms, which produce permanent

changes on the genetic library, but that is not the only form in which it appears.

Evolved species as human beings transform the environment in order to make it more

suitable for them. In a smaller scale, cells regulate the expression of the genetic

information they own. That is to say, the proteins that are encoded in the chromosomes

are produced only when they are needed and in the right amount.

This mechanism is much faster than evolution and represents a dynamic real-time

equilibrium in which the concentration of each one of the proteins is kept up to the

necessary levels according to external and internal variables. To achieve this, a complex

mechanism involving inductors and repressors is developed. Genetic expression is the

basis that makes cellular differentiation possible, playing a crucial role in the

organism’s adaptability and versatility.

This system could be adapted to represent a new paradigm in metaheuristic

optimization methods. Some of the ideas that could depict the metaphor will be

presented.

The main actor in the system could be the polymerase enzyme, which in the right

conditions will link to a promoter (DNA region that represents the initial point for a

sequence that codifies a protein) and transcribe it to messenger RNA. Afterwards, that

messenger RNA leaves the nucleus to be translated into a protein in the ribosomes.

Both the transcription and translation processes are regulated. Nevertheless, only

transcription regulation will be considered for the sake of simplicity. Moreover, just a

single cellular function will be taken into account. How suitable a protein is to develop

such a role will be determined by the fitness value of the solution it codifies.

An infinite DNA chain codifying different proteins (the solutions for the problem) is

the starting point. In diploid beings, there are two different sequences codifying the

A Genetic Expression Models. An Introduction. 152

same character: they are known as alleles. In some cases, only one allele is expressed

(the dominant one), while in others the result is a combination of both.

According to this, a double chain of genes is generated. In some cases a

combination of them will be expressed, while in some others only the dominant

character will arise. For optimization purposes, the relative dominance of alleles could

rely on the fitness of each one of the solutions that the sequences would generate. If

that is very different, then only the best one will be considered, while if they are similar

a hybrid of them will arise.

For each one of the sequences, the messenger RNA concentration and

inhibition/induction information is recorded.

An inhibition signal could be triggered when other protein is found to be much

more effective. That could decelerate and eventually stop the transcription process of

that chain and even reduce its cellular concentration.

An induction signal is generated in the opposite situation, that is, when the protein

is effective. That could set in motion a mechanism that looks for similar proteins and

starts to transcribe them.

All the time, new sequences are tested.

That could lead to finding the best way to perform that cellular function, which will

correspond to the optimal solution to the problem.

BArticle presented to C.I.O. 2006

(Congreso nacional de

Ingeniería de Organización)

B Article presented to C.I.O. 2006 154

B Article presented to C.I.O. 2006

X Congreso de Ingeniería de Organización

Valencia, 7 y 8 de septiembre de 2006

Desarrollo de algoritmos genéticos, de recocido simulado e híbridos

para la planificación de un taller flexible

Sara Lumbreras Sancho1, Ángel Sarabia Viejo1

1 Dpto. de Organización Industrial. Universidad Pontificia Comillas. C/ Alberto

Aguilera, 25, 28015, Madrid. [email protected], [email protected] .

Resumen

El problema del Jobshop ha sido objeto de investigación durante décadas debido a la diversidad de situaciones en las que aparece y a la complejidad computacional que presenta. Debido a esta última no resulta conveniente aplicar para su resolución los métodos clásicos de optimización, que resultan tan eficientes en otros casos. Los algoritmos metaheurísticos son preferidos para este tipo de problemas. Este artículo describe varios algoritmos creados para su resolución, basados en modelos como los algoritmos genéticos o el recocido simulado, pero que presentan peculiaridades originales que les confieren una eficiencia particular.

Palabras clave: algoritmo genético, recocido simulado, secuenciación, jobshop

1. Introducción

Los métodos clásicos de optimización no resultan adecuados para resolver los problemas NP-completos de la planificación, para los que consumen tiempos de cálculo en muchas ocasiones inaceptables. En estos casos es conveniente aplicar técnicas heurísticas, en las que se renuncia al óptimo global a cambio de obtener soluciones muy satisfactorias en tiempos asequibles. Una selección de estas técnicas puede encontrarse en Pham (1998).

La secuenciación de tareas en un taller es uno de las manifestaciones más

interesantes de esta realidad, tanto por la diversidad de entornos en los que se presenta

como por su complejidad, que ha motivado la aparición de una extensa literatura.


El objetivo de este trabajo es el desarrollo y estudio de nuevos algoritmos capaces de

llevar a cabo la optimización de la planificación de un taller en tiempos que quedan muy

por debajo de los empleados para la resolución mediante métodos clásicos. Estas

diferencias de tiempos de cómputo son estudiadas y la posible desviación con respecto

al óptimo global es acotada y estimada. Por último, se realiza un análisis de la solución

aceptada para apoyar la toma de decisiones con respecto a posibles ampliaciones de

capacidad en las instalaciones.

2. Un taller generalizado

El problema del taller, conocido como jobshop, consiste en la secuenciación de

tareas para un conjunto de trabajos compuestos por varias operaciones distintas,

empleando el conjunto de máquinas disponibles.

El caso en estudio es inusitadamente general, abarcando la mayoría de las posibles

complejidades de los modelos deterministas. El taller es de tipo flexible, lo que implica

que una operación puede realizarse en varias máquinas distintas, por supuesto con

diferentes tiempos de ejecución. Además de tiempos de cambio dependientes de la

secuencia, se han considerado tiempos de transporte entre centros que pueden depender

del tamaño de lote o del trabajo en particular.

La función objetivo a optimizar permite trabajar con magnitudes sencillas (como el

tiempo total de proceso o el máximo retraso) o definir una función multicriterio en la

que es posible, por ejemplo, establecer prioridades entre clientes para evitar retrasar los

pedidos más importantes.

Además, se ha incluido una generalización adicional que permite trabajar con

tiempos aleatorios mediante la definición de una función de distribución coherente,

posibilitando así la resolución de un taller estocástico.

La posibilidad de fallo de alguna de las máquinas está también considerada,

permitiendo la obtención de soluciones robustas.

3. Comparación con los métodos clásicos. Desarrollo de un modelo de

programación lineal

Con el único objetivo de comparar los resultados obtenidos con los que arrojaría la

aplicación de los métodos clásicos, se ha desarrollado un modelo de programación


lineal implementado en GAMS que ya para tamaños del problema pequeños –cuatro

trabajos con seis operaciones cada uno a realizar sobre tres máquinas- requirió tiempos

de alrededor de más de 24 horas para su ejecución.

4. Codificación empleada

La codificación matricial que suele emplearse en este tipo de problemas –como en

Mesoughni (2004)- presenta el inconveniente de permitir secuencias infactibles. De

manera más específica, pueden aparecer referencias circulares entre operaciones, que

resultan computacionalmente muy costosas de identificar y eliminar. La figura 1

muestra una secuencia aparentemente normal pero que no podría llevarse a cabo debido

a la existencia de una referencia circular entre los trabajos 1 y 2.

M1: O(1,2) O(2,1) O(3,3)

M2: O(2,2) O(1,1)

M3: O(3,1) O(3,2)

Figura 1. Ejemplo de referencia circular en una secuencia

Las soluciones, en todos los heurísticos desarrollados, son codificadas en forma de

una doble lista:

• Una lista de trabajos, que representa únicamente a éstos y no a sus operaciones. Cuando aparece el símbolo de un trabajo se pretende notar que su primera operación no realizada comenzará a trabajarse en la máquina correspondiente tan pronto como ésta se encuentre libre.

• Una lista de máquina, en paralelo con la anterior, que complementa la información previa con el dato de la máquina sobre la que ha de realizarse la operación que simboliza cada elemento de la lista de trabajos.

A partir de una solución codificada es relativamente sencillo obtener todos los datos

de interés de la secuencia que representa. Por ejemplo, una solución válida para un taller

con tres trabajos de dos operaciones y tres máquinas podría ser:

Trabajos: 1 3 1 2 2 3

Máquinas:2 3 1 1 2 3


y la secuencia resultante podría representarse mediante un diagrama de Gantt como

el de la figura 2, en la que a cada trabajo le ha sido asociado un color y las etiquetas

representan cada operación realizada en el formato (trabajo,operación) (por ejemplo,

“(3,2)” significaría la segunda operación del tercer trabajo).

Figura 2. Representación de la secuencia ejemplo en un diagrama de Gantt

Para que una solución codificada de esta manera sea factible, únicamente debe

cumplir que la lista de trabajos sea una permutación con repetición de los símbolos

empleados para los trabajos en el que cada uno aparezca tantas veces como operaciones

lo componen. La lista de máquinas sólo debe respetar la flexibilidad de las máquinas

del taller.

Estas comprobaciones son muy sencillas y pueden tenerse en cuenta en el algoritmo

como una guía para la generación de la solución o soluciones aleatorias que se tomarán

como punto de partida. Así, la estructura en doble lista imposibilita que aparezcan

infactibilidades, reduciendo el espacio de soluciones a únicamente las factibles.

Además, facilita considerablemente el desarrollo de los operadores básicos de los

algoritmos y es interpretable de manera sencilla. Debido a esto, los tiempos de cálculo

se reducen.


5. Un algoritmo genético original

Este modelo aplica la interpretación de los mecanismos de la evolución de los

algoritmos genéticos usuales adaptándolos al problema junto con nuevas ideas que

mejoran sensiblemente sus resultados. Las secciones siguientes explicitan estas

peculiaridades.

5.1. Creación de la generación inicial

La población inicial de soluciones representa el punto de partida del algoritmo. La

codificación utilizada facilita notablemente este proceso:

• La lista de trabajos se obtiene como una permutación aleatoria de una cadena en

la que el símbolo correspondiente a cada trabajo aparece repetido tantas veces

como operaciones lo componen.

• A partir de ella, la lista de máquinas se construye seleccionando aleatoriamente

para cada posición un centro de trabajo del conjunto de los posibles.

Este proceso garantiza que todos los individuos serán factibles de antemano,

evitando así todo proceso de comprobación posterior.

Es importante notar que el tamaño de la población influenciará notablemente el

desarrollo del proceso de optimización. Por una parte, el tiempo de cómputo total del

algoritmo es proporcional al número de iteraciones y al tamaño de la población. Al

mismo tiempo, y al igual que en poblaciones biológicas, la diversidad es un factor clave.

La probabilidad de que el algoritmo quede atrapado en óptimos locales crece cuando la

población es pequeña. Las pruebas realizadas sugieren que la mejor estrategia es fijar el

tiempo de ejecución y considerar un número prudente de generaciones (de 100 a 200,

independientemente del tamaño del problema, ya que este tipo de algoritmos suele

agotarse o detenerse en su proceso de mejora tras un número de iteraciones de ese

orden). Así se trabajaría con el mayor número posible de individuos en la población, lo

que garantiza aprovechar el tiempo disponible de manera óptima en eficacia y

eficiencia.

5.2. Selección natural

La supervivencia de los individuos más fuertes se modela mediante una selección en

ruleta. A las mejores soluciones se les asignan sectores mayores, de tal manera que unos


dardos lanzados al azar tengan más probabilidades de seleccionarlos para que se

reproduzcan.

Con el objetivo de garantizar que la mejor solución obtenida hasta el momento –la

denominada reina en este modelo-no se pierda, independientemente de que ésta sea

escogida para formar parte del conjunto de padres, se le realiza una copia que pasa

directamente a la generación siguiente sin cambios.

5.3. Cruce

Los individuos seleccionados para ser padres se organizan por parejas al azar.

El cruce se implementa en dos pasos, uno para cada uno de los componentes de la

doble lista. Si en las listas de tareas se aplicase un cruce bipuntual simple, los

individuos hijos podrían resultar monstruos o soluciones infactibles. Por ello, un

proceso sencillo identifica las operaciones sobrantes o faltantes en cada bloque

intercambiado. Las primeras son eliminadas en su primera aparición en el bloque a

intercambiar, mientras que las segundas se añaden al final en el orden en el que

aparecían en la secuencia original. Esta comprobación y posterior modificación de los

cromosomas es sencilla y rápida, evitando los problemas de infactibilidades originadas

en otros operadores de cruce.

Las listas de máquinas se traducen primero en matrices que almacenan el centro de

trabajo asignado a cada trabajo y operación. Después, se determina aleatoriamente una

submatriz que es intercambiada entre ambos padres.

Mediante este proceso se garantiza la obtención de dos cromosomas hijos factibles

que resultan realmente una combinación de sus padres. Sus características serán

similares en los fragmentos que comparten y será posible la mejora en tanto las mejores

partes de cada padre pueden ser seleccionadas para un hijo.

5.4. Mutación

El proceso de generación de mutaciones forma parte del conjunto de mecanismos

cuyo objetivo consiste en la intensificación de la exploración de nuevas soluciones. Se

introducen cambios al azar en los cromosomas, preservando la diversidad y previniendo

la convergencia temprana del algoritmo.


La mutación se implementa también en dos pasos. Por un lado, la lista de máquinas

puede ser modificada cambiando el centro de trabajo en el que se ejecuta una operación

a otro que disponga también de esa posibilidad. Los cambios en la lista de trabajos se

estructuran como una permutación entre dos de sus elementos. Esta definición es

intuitiva y responde a un concepto de mínima distancia, garantizando así que la solución

mutada sea muy similar a la original.

La probabilidad de mutación es un parámetro de ajuste delicado, ya que si resulta

demasiado bajo sus efectos no serán relevantes pero si es muy elevado el algoritmo será

más parecido a una búsqueda aleatoria, con la pérdida de eficiencia que de ello se

seguiría. Las pruebas realizadas sugieren que las tasas de mutación entre el 3 y el 6%

arrojan los mejores resultados. El modelo desarrollado escoge un valor en este intervalo

dependiendo de la desviación típica de los ajustes de la generación anterior, dando

valores mayores para poblaciones más uniformes.

5.5. Inmigración

Este mecanismo tiene también por objeto la conservación de la diversidad. Se

generan nuevos individuos aleatorios que pueden o no ser incluidos en la población. El

mecanismo diseñado considera que los inmigrantes seleccionan aleatoriamente una

presa de entre el conjunto de los individuos más débiles. Cada pareja se bate entonces

en un duelo en el que las probabilidades de sobrevivir son proporcionales a la bondad

del ajuste, resultando así una implementación particular del conocido como método de

selección por torneo. Si el inmigrante resulta vencedor ocupa el lugar del individuo

original.

De la misma forma que la tasa de mutación, el número de aspirantes a inmigrantes

debe ser escogido de manera que se aprovechen sus efectos positivos sobre la diversidad

pero sin elevarlo hasta el extremo de comenzar a perder eficiencia. Las pruebas

realizadas señalan los valores entre el 5 y el 15% como los más adecuados. De nuevo,

este valor se modifica durante la evolución del algoritmo de manera que se incremente

cuando la diversidad disminuya.

5.6. Componente asexual en la reproducción

A imitación de los sistemas reproductivos de algunos insectos sociales, que alternan

ciclos de reproducción sexual y asexual (partenogénesis), un nuevo mecanismo es


introducido en este modelo. Una proporción de la nueva generación es obtenida

mediante un proceso que imita la partenogénesis. Este mecanismo toma la reina como

punto de partida para generar un conjunto de soluciones modificadas mediante el

operador mutación. De esta manera se intensifica la búsqueda en las regiones más

prometedoras, incrementando la eficiencia.

De manera análoga a los casos anteriores, el porcentaje de hijos obtenidos mediante

este proceso es un parámetro que debe seleccionarse cuidadosamente. Esta vez el

problema es el opuesto: si es demasiado alto el algoritmo perderá efectividad,

resultando más fácilmente atrapado en óptimos locales. Las pruebas señalan el 10-15%

como valores entre los que debería oscilar esta proporción. La figura 3 ilustra el efecto

que esta tasa tiene sobre el algoritmo.

Figura 3. Comportamientos del algoritmo en sus primeras iteraciones

para distintas tasas de reproducción asexual

6. Algoritmo de recocido simulado

Los algoritmos de recocido simulado (S.A.) imitan los procesos metalúrgicos del

recocido en el acero o el cobre. El modelo desarrollado aporta algunas ideas

innovadoras a las implementaciones clásicas de estos algoritmos.


6.1. Obtención de soluciones vecinas

En estos heurísticos resulta de vital importancia conseguir generar vecinos que estén

situados a una distancia mínima del punto de origen para que estén directamente

relacionados con él pero puedan presentar valores de ajuste diferentes. Esto se consigue

de manera análoga al procedimiento descrito en la sección dedicada a la mutación,

conservando como ventaja importante que no es necesario hacer comprobaciones ni

correcciones posteriores.

6.2. Mecanismo de búsqueda local

Para mejorar la eficiencia, el algoritmo de recocido contiene un mecanismo de

búsqueda local que, al seleccionarse una solución, evalúa un conjunto de puntos de su

entorno, desplazándose a ellos en el caso de que se encuentre una dirección de mejora.

Si la mejora se produce, el proceso de búsqueda local se repite sin variación de la

temperatura. Sin embargo, si no ha resultado satisfactorio se procede al mecanismo

clásico de recocido en el que la probabilidad de un movimiento de empeoramiento se

determina por su magnitud y por el valor de la temperatura según la ley de Boltzmann.

El tamaño del conjunto de vecinos que serán inspeccionados debe ser

suficientemente grande como para tener impacto en los resultados pero suficientemente

pequeño como para no disminuir la eficiencia. Las pruebas realizadas aconsejan valores

en torno a los 30 puntos.

6.3. Planes de enfriamiento

De entre la extensa literatura publicada sobre la evolución que debería seguir la

temperatura a lo largo del algoritmo puede concluirse que los planes demasiado

sofisticados no resultan eficientes, siendo superados en la gran mayoría de los casos por

una simple regla geométrica de determinados parámetros. Sin embargo, estos valores

son difíciles de ajustar y el resultado es muy sensible a ellos, de manera que en muchas

ocasiones se descarta aplicar la regla geométrica por esta dificultad.

El primero de los planes de enfriamiento propuestos intenta imitar el que resulta

efectivamente escogido en la industria metalúrgica –resulta curioso comprobar que,

mientras que el proceso del temple ha sido sometido a estudios exhaustivos para


mejorar las propiedades de la pieza final, para el recocido se considera suficiente dejar

enfriar el metal al aire o en el horno- que resulta ser el correspondiente a la convección

natural. La discretización de este proceso lleva efectivamente a la regla geométrica, pero

al tener en cuenta las propiedades de los transitorios de primer orden se obtienen pautas

sencillas para determinar los parámetros que resultan en algoritmos eficientes y

robustos:

11cτ

= − (1)

6· nτ = (2)

donde c representa el parámetro de la geométrica, n simboliza el número máximo de

iteraciones permitido y τ. la constante de tiempo del transitorio, que es empleada como

un parámetro intermedio.

Una segunda alternativa es introducida en forma de plan dinámico, que fija el valor

de la temperatura teniendo en cuenta el valor del ajuste de la solución en cada momento,

de manera que la probabilidad de un salto en contra de la pendiente de determinada

magnitud desciende mediante una regla geométrica que se define mediante las sencillas

reglas descritas en (1) y (2). El plan de enfriamiento obtenido resulta en un algoritmo

alternativo al anterior que resulta más eficiente en algunos de los casos simulados.

7. Algoritmos híbridos

Las características positivas de los heurísticos descritos se han combinado para

obtener híbridos que resultan interesantes desde las perspectivas de eficacia y

efectividad.

7.1. Híbrido en dos fases

La forma más sencilla de combinar los algoritmos genéticos y de recocido descritos

es ejecutarlos secuencialmente. Al agotarse antes, el genético será el que comenzará.

Una vez la mejora comience a ser más lenta, pasará el relevo al recocido. Determinar el

momento del cambio no es sencillo. Las pruebas sugieren que es conveniente fijar un

número mínimo de iteraciones que por fuerza serán realizadas (unas 150-200

independientemente del tamaño del problema). Después, la alarma de agotamiento

puede dispararse, y el cambio se realiza una vez se han sucedido 50-75 generaciones sin

que se consiguiera mejora alguna.


Después, el recocido se realiza sobre una familia de soluciones que está formada por

los miembros más prometedores de la etapa anterior. Trabajar con una población y no

con una solución aislada mejora notablemente la efectividad del recocido. El tamaño de

la familia ha de escogerse cuidadosamente, siendo preferidos los valores en el intervalo

6-12 para tamaños del problema de alrededor de 10 trabajos, pero deberían

incrementarse para problemas mayores. La figura 3 muestra un ejemplo de ejecución,

mientras que la figura 4 destaca el principal problema de este algoritmo: aunque

garantiza una mayor exploración que los anteriores, su eficiencia es mucho menor.

Figura 3. Ejemplo de ejecución del híbrido en dos fases con una familia de 12

miembros


Figura 4. Eficiencia comparada del genético original y el híbrido secuencial

7.2. Mutación no darwiniana

El segundo híbrido propuesto se aparta de la teoría clásica de la evolución –el

estándar seguido por los algoritmos genéticos- para considerar que los cambios

aleatorios que se introducen en forma de mutaciones no son tales, sino que la

probabilidad de que resulten en una mejora es mayor que la de que el resultado

empeore. Así, se introduce una pequeña ejecución de recocido en el operador mutación,

que garantiza que la solución obtenida habrá explorado sus vecinos y pasado por el

mecanismo de búsqueda local descrito y el método de recocido clásico. Un ejemplo de

ejecución se incluye en la figura 5.

Este elemento intensifica la búsqueda en las regiones más prometedoras, resultando

en un algoritmo más eficiente que ninguno de los anteriores en las primeras

generaciones, siendo después superado por el genético darwiniano (cuando se ha

alcanzado un óptimo local, y para seguir avanzando, sería necesario inspeccionar puntos

nuevos, incrementando la diversidad). Así, se diseño un tercer híbrido que conmuta del

genético no darwiniano al más clásico después de unas cuantas generaciones (150-200),

alcanzando los mejores resultados en cuanto a eficiencia de todos los modelos

presentados. La figura 6 muestra una comparación en eficiencia de los tres híbridos


desarrollados.

Figura 5. Ejemplo de ejecución de las primeras iteraciones del genético original y

el no darwiniano

Figura 6. Eficiencia comparada entre los tres híbridos desarrollados


8. Otras salidas de los modelos

Además de devolver los datos correspondientes a la solución obtenida, el avance del

algoritmo en el tiempo e información estadística sobre la evolución de la población, los

modelos desarrollados proporcionan otras salidas interesantes sobre el problema.

8.1. Acotamiento y estimación del error cometido

Además de una cota superior del error se facilita una estimación del mismo, que se

basa en el ajuste de una distribución no paramétrica a los resultados obtenidos a lo largo

de la ejecución del algoritmo. Este tipo de distribuciones no habían sido nunca aplicadas

al problema de estimación de errores de heurísticos, arrojando esta herramienta

resultados notablemente buenos.

8.2. Aplicación de la Teoría de las Limitaciones

La Teoría de las Limitaciones (Goldratt (1982)) se introduce en una herramienta que

analiza la evolución del trabajo en curso, identifica cuellos de botella en el taller para la

carga de trabajos considerada y aconseja dónde realizar nuevas inversiones de aumento

de capacidad.

9. Otras generalizaciones de los modelos: taller estocástico y taller robusto

Los modelos desarrollados están preparados para trabajar con un taller estocástico,

en el que los tiempos aleatorios se simulan a partir de una distribución que surge como

combinación del dato del tiempo determinista y una componente aleatoria proporcional

a él que se obtiene de una Weibull.

Además, los algoritmos tienen la posibilidad de considerar el fallo de alguna de las

máquinas (encontrando la secuenciación de tareas alternativa óptima, así como

indicando los trabajos cuya realización no será ya posible). De esta manera, mediante la

introducción de una lista con los fallos más frecuentes y sus probabilidades respectivas

es posible encontrar la solución robusta para el taller considerado siguiendo un proceso

similar al presentado en Jensen (1999).


Referencias

Mesghouni, K, S Hammadi and P. Borne (2004): Evolutionary Algorithms for Job-

Shop Scheduling. Int. J. Appl. Math. Comput. Sci. Vol. 14, No. 1, 91–103.

Pham, D.T. and D. Karaboga (1998): Intelligent Optimization Techniques. Springer-

Verlag

Jensen, T. and T. K. Hansen.‘Robust Solutions to Job Shop Problems’. Proceedings

of the 1999 Congress on Evolutionary Computation, 1999.

E. Goldratt.‘The goal. An ongoing process’(1982). North River.


CData for the LP example

C Data for the LP example 170

C Data for the LP example

The example is composed of four jobs, with 4 ,7, 7 and 5 operations respectively.

The data that were given the most important weight was processing times, so for the

sake of simplicity only it will be shown.

The structure of the data is PT (job, operation, machine), and the values result as

follows:

PT(:,:,1) =

3 3 14 5 0 0 0

3 0 14 5 5 5 5

3 3 0 5 3 3 14

0 3 14 0 0 0 0

PT(:,:,2) =

4 6 4 5 0 0 0

4 0 4 0 5 0 0

0 6 4 5 4 0 4

4 0 4 5 4 0 0

C Data for the LP example 171

PT(:,:,3) =

5 2 12 2 0 0 0

5 2 12 2 2 2 2

5 0 12 0 0 2 0

5 2 0 2 5 0 0

DESARROLLO DE ALGORITMOS METAHEURÍSTICOS PARA … · tamaño de lote o del trabajo en particular....

Documents

Transcript of DESARROLLO DE ALGORITMOS METAHEURÍSTICOS PARA … · tamaño de lote o del trabajo en particular....