Attaining global optimized solution by applying Q-learning

  • Ghulam Muhammad Ali 
  • Asif Mansoor , 
  • Shuai Liu , 
  • Jacek Olearczyk , 
  • Ahmed Bouferguene , 
  • Mohamed Al-Hussein
  • a,b,c,e  Department of Civil and Environmental Engineering, University of Alberta, 9105 116 Street NW, Edmonton, T6H 2W2, Canada
  • 2Campus Saint-Jean, University of Alberta, Edmonton, T6C 4G9, Canada
Cite as
Ali G.M., Mansoor A., Liu S., Olearczyk J., Bouferguene A., Al-Hussein M. (2020). Attaining global optimized solution by applying Q-learning. Proceedings of the 32nd European Modeling & Simulation Symposium (EMSS 2020), pp. 112-119. DOI: https://doi.org/10.46354/i3m.2020.emss.015

Abstract

The assembly of modular construction projects depends primarily on the use of heavy capacity cranes. The increase in usage of heavy cranes is the impetus for the optimization of resources utilized for the crane work. One of the major cost drivers is the design optimization process of such resources. Even for an initial mechanical/structural design prior to a prototype, the design phase with finite element analysis (FEA) is time-consuming and extensive work. The greedy algorithm could be an answer to these problems to accomplish the optimization in short period of time. In some of the complex cases, the greedy algorithm can confine to the local optimum and thus, overlooking the global optimum. To avoid this, a model-free reinforcement learning (RL) algorithm, Q-learning, is employed in this study to evaluate its suitability for use in the design optimization process with the use of FEA. A hypothetical structural support design problem is used to formulate a framework for comparing the greedy algorithm and Q-learning. The findings show that not only can Q-learning overcome the local optimum confinement of the greedy algorithm, but it in fact can surpass the greedy algorithm along the progressive iteration, by refining policy and reward.

References

  1. Bang-Jensen, J., Gutin, G., & Yeo, A. (2004). When the greedy algorithm fails. Discrete Optimization, 1(2), 121–127. https://doi.org/10.1016/j.disopt.2004.03.007
  2. Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C (2009). Introduction to Algorithms (3rd ed.). Cambridge Mass: MIT Press
  3. Doran, J., & Michie, D. (1966). Experiments with the Graph Traverser Program. Proceedings of The Royal Society A: Mathematical, Physical and Engineering Sciences, 294, 235–259. 10.1098/rspa.1966.0205 
  4. Gullapalli, V., & Barto, A. G. (1992). Shaping as a method for accelerating reinforcement learning. Proceedings of the 1992 IEEE International Symposium on Intelligent Control, 554–559. 10.1109/ISIC.1992.225046
  5. Gutin, G., Yeo, A., & Zverovich, A. (2002). Traveling Salesman Should not be Greedy: Domination Analysis of Greedy-Type Heuristics for the TSP. Discrete Applied Mathematics, 117, 81–86. 10.1016/S0166-218X(01)00195-0
  6. Hilgard, E. R., Marquis, D. G., & Kimble, G. A. (1961). Hilgard and Marquis’ Conditioning and Learning. East Norwalk, CT, US: Appleton-Century-Crofts, Inc.
  7. Kan, C., Zhang, P., Fang, Y., Anumba, C., & Messner, J. (2018). A Taxonomic Analysis of Mobile Crane Fatalities for CPS-based Simulation. 17th International Conference on Computing in Civil and Building Engineering. Retrieved from http://programme.exordo.com/icccbe2018/delegat
    es/presentation/253/
  8. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015).
    Continuous control with deep reinforcement learning. ArXiv Preprint ArXiv:1509.02971. Retrieved from https://arxiv.org/abs/1509.02971
  9. Mehr, E. (2019, April 9). Using Reinforcement Learning to Design a Better Rocket Engine. Retrieved from https://blog.insightdatascience.com/usingreinforcement-learning-to-design-a-betterrocket-engine-4dfd1770497a
  10. Richard Bellman. (1957). A Markovian Decision Process. Journal of Mathematics and Mechanics, 6(4), 679–684. Retrieved from www.jstor.org/stable/24900506
  11. Sutton, R. S. (1988). Learning to Predict by the Method of Temporal Differences. Machine Learning, 3, 9–44. 10.1007/BF00115009
  12. Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). Cambridge, Mass: MIT press. Retrieved from http://incompleteideas.net/book/RLbook2020.pdf
  13. Watkins, C. (1989). Learning from Delayed Rewards [Doctoral dissertation, King’s College, Cambridge, UK]. University of Cambridge. http://www.cs.rhul.ac.uk/~chrisw/new_thesis.pdf
  14. Watkins, C., & Dayan, P. (1992). Technical Note: QLearning. Machine Learning, 8, 279–292.
    10.1007/BF00992698
  15. Zhao, Q. (2011). Cause Analysis of U.S. Crane-related Accidents [Master Dissertation, University of Florida, USA]. UF Theses & Dissertations. http://ufdc.ufl.edu/UFE0042972/00001