Attaining global optimized solution by applying Q-learning

Ali G.M.; Mansoor A.; Liu S.; Olearczyk J.; Bouferguene A.; Al-Hussein M.

doi:https://doi.org/10.46354/i3m.2020.emss.015

Attaining global optimized solution by applying Q-learning

^aGhulam Muhammad Ali ,
^bAsif Mansoor ,
^cShuai Liu ,
^dJacek Olearczyk ,
^eAhmed Bouferguene ,
^fMohamed Al-Hussein

^a,b,c,eDepartment of Civil and Environmental Engineering, University of Alberta, 9105 116 Street NW, Edmonton, T6H 2W2, Canada
^d2Campus Saint-Jean, University of Alberta, Edmonton, T6C 4G9, Canada

Cite as

Ali G.M., Mansoor A., Liu S., Olearczyk J., Bouferguene A., Al-Hussein M. (2020). Attaining global optimized solution by applying Q-learning. Proceedings of the 32nd European Modeling & Simulation Symposium (EMSS 2020), pp. 112-119. DOI: https://doi.org/10.46354/i3m.2020.emss.015

Download PDF

Abstract

The assembly of modular construction projects depends primarily on the use of heavy capacity cranes. The increase in usage of heavy cranes is the impetus for the optimization of resources utilized for the crane work. One of the major cost drivers is the design optimization process of such resources. Even for an initial mechanical/structural design prior to a prototype, the design phase with finite element analysis (FEA) is time-consuming and extensive work. The greedy algorithm could be an answer to these problems to accomplish the optimization in short period of time. In some of the complex cases, the greedy algorithm can confine to the local optimum and thus, overlooking the global optimum. To avoid this, a model-free reinforcement learning (RL) algorithm, Q-learning, is employed in this study to evaluate its suitability for use in the design optimization process with the use of FEA. A hypothetical structural support design problem is used to formulate a framework for comparing the greedy algorithm and Q-learning. The findings show that not only can Q-learning overcome the local optimum confinement of the greedy algorithm, but it in fact can surpass the greedy algorithm along the progressive iteration, by refining policy and reward.

Greedy Algorithm | Reinforcement Learning | Q-Learning | Finite Element Analysis | Structural Design

References

Bang-Jensen, J., Gutin, G., & Yeo, A. (2004). When the greedy algorithm fails. Discrete Optimization, 1(2), 121–127. https://doi.org/10.1016/j.disopt.2004.03.007
Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C (2009). Introduction to Algorithms (3rd ed.). Cambridge Mass: MIT Press
Doran, J., & Michie, D. (1966). Experiments with the Graph Traverser Program. Proceedings of The Royal Society A: Mathematical, Physical and Engineering Sciences, 294, 235–259. 10.1098/rspa.1966.0205
Gullapalli, V., & Barto, A. G. (1992). Shaping as a method for accelerating reinforcement learning. Proceedings of the 1992 IEEE International Symposium on Intelligent Control, 554–559. 10.1109/ISIC.1992.225046
Gutin, G., Yeo, A., & Zverovich, A. (2002). Traveling Salesman Should not be Greedy: Domination Analysis of Greedy-Type Heuristics for the TSP. Discrete Applied Mathematics, 117, 81–86. 10.1016/S0166-218X(01)00195-0
Hilgard, E. R., Marquis, D. G., & Kimble, G. A. (1961). Hilgard and Marquis’ Conditioning and Learning. East Norwalk, CT, US: Appleton-Century-Crofts, Inc.
Kan, C., Zhang, P., Fang, Y., Anumba, C., & Messner, J. (2018). A Taxonomic Analysis of Mobile Crane Fatalities for CPS-based Simulation. 17th International Conference on Computing in Civil and Building Engineering. Retrieved from http://programme.exordo.com/icccbe2018/delegat
es/presentation/253/
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015).
Continuous control with deep reinforcement learning. ArXiv Preprint ArXiv:1509.02971. Retrieved from https://arxiv.org/abs/1509.02971
Mehr, E. (2019, April 9). Using Reinforcement Learning to Design a Better Rocket Engine. Retrieved from https://blog.insightdatascience.com/usingreinforcement-learning-to-design-a-betterrocket-engine-4dfd1770497a
Richard Bellman. (1957). A Markovian Decision Process. Journal of Mathematics and Mechanics, 6(4), 679–684. Retrieved from www.jstor.org/stable/24900506
Sutton, R. S. (1988). Learning to Predict by the Method of Temporal Differences. Machine Learning, 3, 9–44. 10.1007/BF00115009
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). Cambridge, Mass: MIT press. Retrieved from http://incompleteideas.net/book/RLbook2020.pdf
Watkins, C. (1989). Learning from Delayed Rewards [Doctoral dissertation, King’s College, Cambridge, UK]. University of Cambridge. http://www.cs.rhul.ac.uk/~chrisw/new_thesis.pdf
Watkins, C., & Dayan, P. (1992). Technical Note: QLearning. Machine Learning, 8, 279–292.
10.1007/BF00992698
Zhao, Q. (2011). Cause Analysis of U.S. Crane-related Accidents [Master Dissertation, University of Florida, USA]. UF Theses & Dissertations. http://ufdc.ufl.edu/UFE0042972/00001

Volume Details

Volume Title

Proceedings of the 32nd European Modeling & Simulation Symposium (EMSS 2020)

Conference Location and Date

Online

September 16-18, 2020

Conference ISSN

2724-0029

Volume ISBN

978-88-85741-44-7

Volume Editors

Michael Affenzeller

Upper Austria University of Applied Sciences, Austria

Agostino G. Bruzzone

MITIM-DIME, University of Genoa, Italy

Francesco Longo

University of Calabria, Italy

Antonella Petrillo

Parthenope University of Naples, Italy

EMSS 2020 Board

Francesco Longo

General Chair

University of Calabria, Italy

Michael Affenzeller

Program Co-Chair

Upper Austria University of Applied Sciences, Austria

Antonella Petrillo

Program Co-Chair

Parthenope University of Naples, Italy

Copyright

© 2020 The Authors. The articles are open access and distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) license.