This is paper is available on arxiv under CC 4.0 DEED license.
Authors:
(1) Leah Chrestien, Czech Technical University in Prague;
(2) TomĂĄËs PevnĂœ, Czech Technical University in Prague, and Gen Digital, Inc.;
(3) Stefan Edelkamp, Czech Technical University in Prague;
(4) AntonĂn Komenda, Czech Technical University in Prague.
Table of Links
- Abstract & Introduction
- Preliminaries
- Conditions on strictly optimally efficient heuristic
- Loss functions
- Related Work
- Experiments
- Conclusion and References
- Appendix
7 Conclusion
The motivation of this paper stemmed from the observation that even the cost-to-goal, considered to be an optimal heuristic, fails to guarantee a strictly optimally efficient search. Since a large body of existing research optimizes this quantity, we are effectively lost with respect to what should be optimized. To fill this gap, we have stated the necessary and sufficient conditions guaranteeing the forward search to be strictly optimally efficient. These conditions show that the absolute value of the heuristic is not important, but that the ranking of states in the Open list is what controls the efficiency. Ranking can be optimized by minimizing the ranking loss functions, but its concrete implementation needs to correspond to a variant of the forward search. In case of mismatch, the resulting heuristic can perform poorly, which has been demonstrated when the heuristic optimized for BGFS search was used with A* search. The other benefit of ranking losses is that from the point of view of statistical learning theory, they solve a simpler problem than ubiquitous regression in estimating the cost-to-goal.
We do not question the existence of a strictly optimally efficient heuristic. Given our experiments, we believe that if the space of heuristic functions over which the loss is optimized is sufficiently rich, the result will be sufficiently close to the optimal for the needs of the search.
7.1 Acknowledgements
This work has been supported by project numbers 22-32620S and 22-30043S from Czech Science Foundation and OP VVV project CZ.02.1.01/0.0/0.0/16_019/0000765 "Research Center for Informatics". This work has also received funding from the European Unionâs Horizon Europe Research and Innovation program under the grant agreement TUPLES No 101070149.
References
[1] Forest Agostinelli, Stephen McAleer, Alexander Shmakov, and Pierre Baldi. Solving the rubikâs cube with deep reinforcement learning and search. Nature Machine Intelligence, 1(8):356â363, 2019.
[2] Thomas Anthony, Zheng Tian, and David Barber. Thinking fast and slow with deep learning and tree search. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5360â5370, 2017.
[3] Shahab Jabbari Arfaee, Sandra Zilles, and Robert C Holte. Learning heuristic functions for large state spaces. Artificial Intelligence, 175(16-17):2075â2098, 2011.
[4] Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B Shah. Julia: A fresh approach to numerical computing. SIAM review, 59(1):65â98, 2017.
[5] Mohak Bhardwaj, Sanjiban Choudhury, and Sebastian Scherer. Learning heuristic search via
imitation. In Conference on Robot Learning, pages 271â280. PMLR, 2017.
[6] Blai Bonet and HĂ©ctor Geffner. Planning as heuristic search. Artificial Intelligence. 2001 Jun; 129 (1-2): 5-33., 2001.
[7] Vladimir Cherkassky, Xuhui Shao, Filip M Mulier, and Vladimir N Vapnik. Model complexity control for regression using vc generalization bounds. IEEE transactions on Neural Networks, 10(5):1075â1089, 1999.
[8] Leah Chrestien, TomĂĄĆĄ PevnĂœ, AntonĂn Komenda, and Stefan Edelkamp. Heuristic search planning with deep neural network using imitation, attention and curriculum learning. In PRLIJCAI workshop, 2021.
[9] LukĂĄs Chrpa. Combining learning techniques for classical planning: Macro-operators and entanglements. In ICTAI, pages 79â86. IEEE Computer Society, 2010.
[10] Rina Dechter and Judea Pearl. The optimality of A* revisited. In Michael R. Genesereth, editor, AAAI, pages 95â99. AAAI Press, 1983.
[11] Edsger W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1:269â271, 1959.
[12] Stefan Edelkamp. Automated creation of pattern database search heuristics. In Model Checking and Artificial Intelligence, pages 35â50, 2006.
[13] Stefan Edelkamp and Stefan Schrödl. Heuristic Search - Theory and Applications. Academic Press, 2012.
[14] Ariel Felner, Uzi Zahavi, Robert Holte, Jonathan Schaeffer, Nathan Sturtevant, and Zhifu Zhang. Inconsistent heuristics in theory and practice. Artificial Intelligence, 175(9):1570â1603, 2011.
[15] Dieqiao Feng, Carla P. Gomes, and Bart Selman. Solving hard AI planning instances using curriculum-driven deep reinforcement learning. In Christian Bessiere, editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 2198â2205. ijcai.org, 2020.
[16] Patrick Ferber, Malte Helmert, and Jörg Hoffmann. Neural network heuristics for classical planning: A study of hyperparameter space. In ECAI, pages 2346â2353. IOS Press, 2020.
[17] Santiago Franco, Ălvaro Torralba, Levi H. S. Lelis, and Mike Barley. On creating complementary pattern databases. In IJCAI, pages 4302â4309, 2017.
[18] Caelan Reed Garrett, Leslie Pack Kaelbling, and TomĂĄs Lozano-PĂ©rez. Learning to rank for synthesizing planning heuristics. page 3089â3095, 2016.
[19] Edward Groshev, Aviv Tamar, Maxwell Goldstein, Siddharth Srivastava, and Pieter Abbeel. Learning generalized reactive policies using deep neural networks. 2018.
[20] Arthur Guez, ThĂ©ophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, RĂ©mi Munos, and David Silver. Learning to search with mctsnets. In International conference on machine learning, pages 1822â1831. PMLR, 2018.
[21] Peter E Hart, Nils J Nilsson, and Bertram Raphael. A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics, 4(2):100â107, 1968.
[22] Patrik Haslum, Adi Botea, Malte Helmert, Blai Bonet, and Sven Koenig. Domain-independent construction of pattern database heuristics for cost-optimal planning. In AAAI, pages 1007â1012, 2007.
[23] Malte Helmert and Gabriele Röger. How good is almost perfect? In AAAI, pages 944â949. AAAI Press, 2008.
[24] Robert C. Holte and Sandra Zilles. On the optimal efficiency of cost-algebraic A. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, pages 2288â2295. AAAI Press, 2019.
[25] Michael Katz, Shirin Sohrabi, Horst Samulowitz, and Silvan Sievers. Delfi: Online planner selection for cost-optimal planning. IPC-9 planner abstracts, pages 57â64, 2018.
[26] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
[27] Donald E Knuth and Ronald W Moore. An analysis of alpha-beta pruning. Artificial intelligence, 6(4):293â326, 1975.
[28] Richard E. Korf. Iterative-deepening-A*: An optimal admissible tree search. In IJCAI, pages 1034â1036. Morgan Kaufmann, 1985.
[29] Richard E Korf. Macro-operators: A weak method for learning. Artificial intelligence, 26(1):35â77, 1985.
[30] Ć imon MandlĂk, Matej Ra Ë cinsk Ë y, Viliam Lis ` y, and TomĂĄĆĄ Pevn ` y. Jsongrinder. jl: automated ` differentiable neural architecture for embedding arbitrary json data. Journal of Machine Learning Research, 23(298):1â5, 2022.
[31] Drew V. McDermott. A heuristic estimator for means-ends analysis in planning. In AIPS, pages 142â149. AAAI, 1996.
[32] David MĂ©nager, Dongkyu Choi, Mark Roberts, and David W. Aha. Learning planning operators from episodic traces. In 2018 AAAI Spring Symposia. AAAI Press, 2018.
[33] Ionut Moraru, Stefan Edelkamp, Santiago Franco, and MoisĂ©s MartĂnez. Simplifying automated pattern selection for planning with symbolic pattern databases. In KI, pages 249â263, 2019.
[34] Christian Muise. Collection of pddl files of classical problems. https://github.com/AI-Planning/classical-domains, 2023.
[35] XuanLong Nguyen, Martin J Wainwright, and Michael I Jordan. On surrogate loss functions and f-divergences. The Annals of Statistics, pages 876â904, 2009.
[36] Laurent Orseau, Levi Lelis, Tor Lattimore, and Théophane Weber. Single-agent policy tree search with guarantees. Advances in Neural Information Processing Systems, 31, 2018.
[37] Laurent Orseau and Levi HS Lelis. Implementation of the algorithms described in "policyguided heuristic search with guarantees" by l. orseau and l. lelis, published at aaaiâ21. https://github.com/levilelis/h-levin, 2021.
[38] Laurent Orseau and Levi HS Lelis. Policy-guided heuristic search with guarantees. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12382â12390, 2021.
[39] Arthur L Samuel. Some studies in machine learning using the game of checkers. iiârecent progress. IBM Journal of research and development, 11(6):601â617, 1967.
[40] Max-Philipp B. Schrader. gym-sokoban. https://github.com/mpSchrader/gym-sokoban, 2018.
[41] Jendrik Seipp, Florian Pommerening, and Malte Helmert. New optimization functions for potential heuristics. In ICAPS, 2015.
[42] Jendrik Seipp and Silvan Sievers. Fast downward stone soup 2023.
[43] Jendrik Seipp, Ălvaro Torralba, and Jörg Hoffmann. PDDL generators. https://doi.org/10.5281/zenodo.6382173, 2022.
[44] William Shen. Strips-hgn. https://github.com/williamshen-nz/STRIPS-HGN, 2021.
[45] William Shen, Felipe Trevizan, and Sylvie ThiĂ©baux. Learning domain-independent planning heuristics with hypergraph networks. In ICAPS, volume 30, pages 574â584, 2020.
[46] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. nature, 550(7676):354â359, 2017.
[47] Kristian Smith. Maze creator / solver. https://github.com/ravenkls/Maze-Generator-and Solver, 2022.
[48] Gustav Sourek, Vojtech Aschenbrenner, Filip Zelezny, Steven Schockaert, and Ondrej Kuzelka. Lifted relational neural networks: Efficient learning of latent relational structures. Journal of Artificial Intelligence Research, 62:69â100, 2018.
[49] Simon StĂ„hlberg, Blai Bonet, and Hector Geffner. Learning general optimal policies with graph neural networks: Expressive power, transparency, and limits. In Akshat Kumar, Sylvie ThiĂ©baux, Pradeep Varakantham, and William Yeoh, editors, Proceedings of the Thirty-Second International Conference on Automated Planning and Scheduling, ICAPS 2022, Singapore (virtual), June 13-24, 2022, pages 629â637. AAAI Press, 2022.
[50] Ingo Steinwart. Consistency of support vector machines and other regularized kernel classifiers. IEEE transactions on information theory, 51(1):128â142, 2005.
[51] Takeshi Takahashi, He Sun, Dong Tian, and Yebin Wang. Learning heuristic functions for mobile robot path planning using deep neural networks. In ICAPS, volume 29, pages 764â772, 2019.
[52] Alvaro Torralba, Vidal AlcĂĄzar, Daniel Borrajo, Peter Kissmann, and Stefan Edelkamp. SymBA*: A symbolic bidirectional A* planner. In International Planning Competition, pages 105â108, 2014.
[53] Sam Toyer, Sylvie ThiĂ©baux, Felipe Trevizan, and Lexing Xie. Asnets: Deep learning for generalised planning. Journal of Artificial Intelligence Research, 68:1â68, 2020.
[54] Vladimir Vapnik. Principles of risk minimization for learning theory. Advances in neural information processing systems, 4, 1991.
[55] Marin Vlastelica, Michal Rolinek, and Georg Martius. Neuro-algorithmic policies enable fast combinatorial generalization. In International Conference on Machine Learning, pages 10575â 10585. PMLR, 2021.
[56] Xuemei Wang. Learning planning operators by observation and practice. In Kristian J. Hammond, editor, AIPS, pages 335â340. AAAI, 1994.
[57] Christopher Wilt and Wheeler Ruml. When does weighted A* fail? In International Symposium on Combinatorial Search, volume 3, 2012.
[58] Christopher Makoto Wilt and Wheeler Ruml. Effective heuristics for suboptimal best-first search. J. Artif. Intell. Res., 57:273â306, 2016.
[59] Yuehua Xu, Alan Fern, and Sungwook Yoon. Learning linear ranking functions for beam search with application to planning. Journal of Machine Learning Research, 10(7), 2009.
[60] Ryo Yonetani, Tatsunori Taniai, Mohammadamin Barekatain, Mai Nishimura, and Asako Kanezaki. Path planning using neural A* search. In International Conference on Machine Learning, pages 12029â12039. PMLR, 2021.
[61] Ziqi Zhang and Florian GeiĂer. Extending graph neural networks for generalized stochastic planning. In Bridging the Gap Between AI Planning and Reinforcement Learning. 2021.
[62] Tan Zhi-Xuan. PDDL. jl: An Extensible Interpreter and Compiler Interface for Fast and Flexible AI Planning. PhD thesis, Massachusetts Institute of Technology, 2022.
[63] Juntang Zhuang, Tommy Tang, Yifan Ding, Sekhar C Tatikonda, Nicha Dvornek, Xenophon Papademetris, and James Duncan. Adabelief optimizer: Adapting stepsizes by the belief in observed gradients. Advances in neural information processing systems, 33:18795â18806, 2020.