Optimize Planning Heuristics to Rank, not to Estimate Cost-to-Goal: Conclusion and References

This is paper is available on arxiv under CC 4.0 DEED license.

Authors:

(1) Leah Chrestien, Czech Technical University in Prague;

(2) Tomá˘s Pevný, Czech Technical University in Prague, and Gen Digital, Inc.;

(3) Stefan Edelkamp, Czech Technical University in Prague;

(4) Antonín Komenda, Czech Technical University in Prague.

Table of Links

7 Conclusion

The motivation of this paper stemmed from the observation that even the cost-to-goal, considered to be an optimal heuristic, fails to guarantee a strictly optimally efficient search. Since a large body of existing research optimizes this quantity, we are effectively lost with respect to what should be optimized. To fill this gap, we have stated the necessary and sufficient conditions guaranteeing the forward search to be strictly optimally efficient. These conditions show that the absolute value of the heuristic is not important, but that the ranking of states in the Open list is what controls the efficiency. Ranking can be optimized by minimizing the ranking loss functions, but its concrete implementation needs to correspond to a variant of the forward search. In case of mismatch, the resulting heuristic can perform poorly, which has been demonstrated when the heuristic optimized for BGFS search was used with A* search. The other benefit of ranking losses is that from the point of view of statistical learning theory, they solve a simpler problem than ubiquitous regression in estimating the cost-to-goal.

We do not question the existence of a strictly optimally efficient heuristic. Given our experiments, we believe that if the space of heuristic functions over which the loss is optimized is sufficiently rich, the result will be sufficiently close to the optimal for the needs of the search.

7.1 Acknowledgements

This work has been supported by project numbers 22-32620S and 22-30043S from Czech Science Foundation and OP VVV project CZ.02.1.01/0.0/0.0/16_019/0000765 "Research Center for Informatics". This work has also received funding from the European Union’s Horizon Europe Research and Innovation program under the grant agreement TUPLES No 101070149.

References

[1] Forest Agostinelli, Stephen McAleer, Alexander Shmakov, and Pierre Baldi. Solving the rubik’s cube with deep reinforcement learning and search. Nature Machine Intelligence, 1(8):356–363, 2019.

[2] Thomas Anthony, Zheng Tian, and David Barber. Thinking fast and slow with deep learning and tree search. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5360–5370, 2017.

[3] Shahab Jabbari Arfaee, Sandra Zilles, and Robert C Holte. Learning heuristic functions for large state spaces. Artificial Intelligence, 175(16-17):2075–2098, 2011.

[4] Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B Shah. Julia: A fresh approach to numerical computing. SIAM review, 59(1):65–98, 2017.

[5] Mohak Bhardwaj, Sanjiban Choudhury, and Sebastian Scherer. Learning heuristic search via

imitation. In Conference on Robot Learning, pages 271–280. PMLR, 2017.

[6] Blai Bonet and Héctor Geffner. Planning as heuristic search. Artificial Intelligence. 2001 Jun; 129 (1-2): 5-33., 2001.

[7] Vladimir Cherkassky, Xuhui Shao, Filip M Mulier, and Vladimir N Vapnik. Model complexity control for regression using vc generalization bounds. IEEE transactions on Neural Networks, 10(5):1075–1089, 1999.

[8] Leah Chrestien, Tomáš Pevný, Antonín Komenda, and Stefan Edelkamp. Heuristic search planning with deep neural network using imitation, attention and curriculum learning. In PRLIJCAI workshop, 2021.

[9] Lukás Chrpa. Combining learning techniques for classical planning: Macro-operators and entanglements. In ICTAI, pages 79–86. IEEE Computer Society, 2010.

[10] Rina Dechter and Judea Pearl. The optimality of A* revisited. In Michael R. Genesereth, editor, AAAI, pages 95–99. AAAI Press, 1983.

[11] Edsger W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1:269–271, 1959.

[12] Stefan Edelkamp. Automated creation of pattern database search heuristics. In Model Checking and Artificial Intelligence, pages 35–50, 2006.

[13] Stefan Edelkamp and Stefan Schrödl. Heuristic Search - Theory and Applications. Academic Press, 2012.

[14] Ariel Felner, Uzi Zahavi, Robert Holte, Jonathan Schaeffer, Nathan Sturtevant, and Zhifu Zhang. Inconsistent heuristics in theory and practice. Artificial Intelligence, 175(9):1570–1603, 2011.

[15] Dieqiao Feng, Carla P. Gomes, and Bart Selman. Solving hard AI planning instances using curriculum-driven deep reinforcement learning. In Christian Bessiere, editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 2198–2205. ijcai.org, 2020.

[16] Patrick Ferber, Malte Helmert, and Jörg Hoffmann. Neural network heuristics for classical planning: A study of hyperparameter space. In ECAI, pages 2346–2353. IOS Press, 2020.

[17] Santiago Franco, Álvaro Torralba, Levi H. S. Lelis, and Mike Barley. On creating complementary pattern databases. In IJCAI, pages 4302–4309, 2017.

[18] Caelan Reed Garrett, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. Learning to rank for synthesizing planning heuristics. page 3089–3095, 2016.

[19] Edward Groshev, Aviv Tamar, Maxwell Goldstein, Siddharth Srivastava, and Pieter Abbeel. Learning generalized reactive policies using deep neural networks. 2018.

[20] Arthur Guez, Théophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Rémi Munos, and David Silver. Learning to search with mctsnets. In International conference on machine learning, pages 1822–1831. PMLR, 2018.

[21] Peter E Hart, Nils J Nilsson, and Bertram Raphael. A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics, 4(2):100–107, 1968.

[22] Patrik Haslum, Adi Botea, Malte Helmert, Blai Bonet, and Sven Koenig. Domain-independent construction of pattern database heuristics for cost-optimal planning. In AAAI, pages 1007–1012, 2007.

[23] Malte Helmert and Gabriele Röger. How good is almost perfect? In AAAI, pages 944–949. AAAI Press, 2008.

[24] Robert C. Holte and Sandra Zilles. On the optimal efficiency of cost-algebraic A. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, pages 2288–2295. AAAI Press, 2019.

[25] Michael Katz, Shirin Sohrabi, Horst Samulowitz, and Silvan Sievers. Delfi: Online planner selection for cost-optimal planning. IPC-9 planner abstracts, pages 57–64, 2018.

[26] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.

[27] Donald E Knuth and Ronald W Moore. An analysis of alpha-beta pruning. Artificial intelligence, 6(4):293–326, 1975.

[28] Richard E. Korf. Iterative-deepening-A*: An optimal admissible tree search. In IJCAI, pages 1034–1036. Morgan Kaufmann, 1985.

[29] Richard E Korf. Macro-operators: A weak method for learning. Artificial intelligence, 26(1):35–77, 1985.

[30] Šimon Mandlík, Matej Ra ˇ cinsk ˇ y, Viliam Lis ` y, and Tomáš Pevn ` y. Jsongrinder. jl: automated ` differentiable neural architecture for embedding arbitrary json data. Journal of Machine Learning Research, 23(298):1–5, 2022.

[31] Drew V. McDermott. A heuristic estimator for means-ends analysis in planning. In AIPS, pages 142–149. AAAI, 1996.

[32] David Ménager, Dongkyu Choi, Mark Roberts, and David W. Aha. Learning planning operators from episodic traces. In 2018 AAAI Spring Symposia. AAAI Press, 2018.

[33] Ionut Moraru, Stefan Edelkamp, Santiago Franco, and Moisés Martínez. Simplifying automated pattern selection for planning with symbolic pattern databases. In KI, pages 249–263, 2019.

[34] Christian Muise. Collection of pddl files of classical problems. https://github.com/AI-Planning/classical-domains, 2023.

[35] XuanLong Nguyen, Martin J Wainwright, and Michael I Jordan. On surrogate loss functions and f-divergences. The Annals of Statistics, pages 876–904, 2009.

[36] Laurent Orseau, Levi Lelis, Tor Lattimore, and Théophane Weber. Single-agent policy tree search with guarantees. Advances in Neural Information Processing Systems, 31, 2018.

[37] Laurent Orseau and Levi HS Lelis. Implementation of the algorithms described in "policyguided heuristic search with guarantees" by l. orseau and l. lelis, published at aaai’21. https://github.com/levilelis/h-levin, 2021.

[38] Laurent Orseau and Levi HS Lelis. Policy-guided heuristic search with guarantees. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12382–12390, 2021.

[39] Arthur L Samuel. Some studies in machine learning using the game of checkers. ii—recent progress. IBM Journal of research and development, 11(6):601–617, 1967.

[40] Max-Philipp B. Schrader. gym-sokoban. https://github.com/mpSchrader/gym-sokoban, 2018.

[41] Jendrik Seipp, Florian Pommerening, and Malte Helmert. New optimization functions for potential heuristics. In ICAPS, 2015.

[42] Jendrik Seipp and Silvan Sievers. Fast downward stone soup 2023.

[43] Jendrik Seipp, Álvaro Torralba, and Jörg Hoffmann. PDDL generators. https://doi.org/10.5281/zenodo.6382173, 2022.

[44] William Shen. Strips-hgn. https://github.com/williamshen-nz/STRIPS-HGN, 2021.

[45] William Shen, Felipe Trevizan, and Sylvie Thiébaux. Learning domain-independent planning heuristics with hypergraph networks. In ICAPS, volume 30, pages 574–584, 2020.

[46] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.

[47] Kristian Smith. Maze creator / solver. https://github.com/ravenkls/Maze-Generator-and Solver, 2022.

[48] Gustav Sourek, Vojtech Aschenbrenner, Filip Zelezny, Steven Schockaert, and Ondrej Kuzelka. Lifted relational neural networks: Efficient learning of latent relational structures. Journal of Artificial Intelligence Research, 62:69–100, 2018.

[49] Simon Ståhlberg, Blai Bonet, and Hector Geffner. Learning general optimal policies with graph neural networks: Expressive power, transparency, and limits. In Akshat Kumar, Sylvie Thiébaux, Pradeep Varakantham, and William Yeoh, editors, Proceedings of the Thirty-Second International Conference on Automated Planning and Scheduling, ICAPS 2022, Singapore (virtual), June 13-24, 2022, pages 629–637. AAAI Press, 2022.

[50] Ingo Steinwart. Consistency of support vector machines and other regularized kernel classifiers. IEEE transactions on information theory, 51(1):128–142, 2005.

[51] Takeshi Takahashi, He Sun, Dong Tian, and Yebin Wang. Learning heuristic functions for mobile robot path planning using deep neural networks. In ICAPS, volume 29, pages 764–772, 2019.

[52] Alvaro Torralba, Vidal Alcázar, Daniel Borrajo, Peter Kissmann, and Stefan Edelkamp. SymBA*: A symbolic bidirectional A* planner. In International Planning Competition, pages 105–108, 2014.

[53] Sam Toyer, Sylvie Thiébaux, Felipe Trevizan, and Lexing Xie. Asnets: Deep learning for generalised planning. Journal of Artificial Intelligence Research, 68:1–68, 2020.

[54] Vladimir Vapnik. Principles of risk minimization for learning theory. Advances in neural information processing systems, 4, 1991.

[55] Marin Vlastelica, Michal Rolinek, and Georg Martius. Neuro-algorithmic policies enable fast combinatorial generalization. In International Conference on Machine Learning, pages 10575– 10585. PMLR, 2021.

[56] Xuemei Wang. Learning planning operators by observation and practice. In Kristian J. Hammond, editor, AIPS, pages 335–340. AAAI, 1994.

[57] Christopher Wilt and Wheeler Ruml. When does weighted A* fail? In International Symposium on Combinatorial Search, volume 3, 2012.

[58] Christopher Makoto Wilt and Wheeler Ruml. Effective heuristics for suboptimal best-first search. J. Artif. Intell. Res., 57:273–306, 2016.

[59] Yuehua Xu, Alan Fern, and Sungwook Yoon. Learning linear ranking functions for beam search with application to planning. Journal of Machine Learning Research, 10(7), 2009.

[60] Ryo Yonetani, Tatsunori Taniai, Mohammadamin Barekatain, Mai Nishimura, and Asako Kanezaki. Path planning using neural A* search. In International Conference on Machine Learning, pages 12029–12039. PMLR, 2021.

[61] Ziqi Zhang and Florian Geißer. Extending graph neural networks for generalized stochastic planning. In Bridging the Gap Between AI Planning and Reinforcement Learning. 2021.

[62] Tan Zhi-Xuan. PDDL. jl: An Extensible Interpreter and Compiler Interface for Fast and Flexible AI Planning. PhD thesis, Massachusetts Institute of Technology, 2022.

[63] Juntang Zhuang, Tommy Tang, Yifan Ding, Sekhar C Tatikonda, Nicha Dvornek, Xenophon Papademetris, and James Duncan. Adabelief optimizer: Adapting stepsizes by the belief in observed gradients. Advances in neural information processing systems, 33:18795–18806, 2020.