Authors:
(1) RaphaÍl Millière, Department of Philosophy, Macquarie University ([email protected]);
(2) Cameron Buckner, Department of Philosophy, University of Houston ([email protected]).
Table of Links
2. A primer on LLMs
3. Interface with classic philosophical issues
3.2. Nativism and language acquisition
3.3. Language understanding and grounding
3.5. Transmission of cultural knowledge and linguistic scaffolding
4. Conclusion, Glossary, and References
4. Conclusion
We began this review article by considering the skeptical concern that LLMs are merely sophisticated mimics that memorize and regurgitate linguistic patterns from their training dataâakin to the Blockhead thought experiment. Taking this position as a null hypothesis, we critically examined the evidence that could be adduced to reject it. Our analysis revealed that the advanced capabilities of state-of-the-art LLMs challenge many of the traditional critiques aimed at artificial neural networks as potential models of human language and cognition. In many cases, LLMs vastly exceeds predictions about the performance upper bounds of non-classical systems. At the same time, however, we found that moving beyond the Blockhead analogy continues to depend upon careful scrutiny of the learning process and internal mechanisms of LLMs, which we are only beginning to understand. In particular, we need to understand what LLMs represent about the sentences they produceâand the world those sentences are about. Such an understanding cannot be reached through armchair speculation alone; it calls for careful empirical investigation. We need a new generation of experimental methods to probe the behavior and internal organization of LLMs. We will explore these methods, their conceptual foundations, and new issues raised by the latest evolution of LLMs in Part II.
Glossary
Blockhead A philosophical thought experiment introduced by Block (1981), illustrating a hypothetical system that mimics human-like responses without genuine understanding or intelligence. Blockheadâs responses are preprogrammed, allowing it to answer any conceivable question based on retrieval from an extensive database, akin to a hash table lookup. This system challenges traditional notions of intelligence by demonstrating behaviorally indistinguishable from a humanâs, yet lacking the internal cognitive processes typically associated with intelligence. Blockhead serves as a critical example in discussions about the nature of artificial intelligence, emphasizing the distinction between mere behavioral mimicry and the presence of complex, internal information processing mechanisms as a hallmark of true intelligence. 2, 3, 10, 18, 20
generalization The ability of a neural network model to perform accurately on new, unseen data that is similar but not identical to the data it was trained on. This concept is central to evaluating the effectiveness of a model, as it indicates the extent to which the learned patterns and knowledge can be applied beyond the specific examples in the training dataset. A model that generalizes well maintains high performance when faced with new and varied inputs, demonstrating its adaptability and robustness across a broad range of scenarios. 3, 11â14, 20, 22
logit In the context of Transformer-based LLMs, a logit is the raw output of the modelâs final layer before it undergoes a softmax transformation to become a probability distribution. Each logit corresponds to a potential output token (e.g., a word or subword unit), and its value indicates the modelâs preliminary assessment of how likely that token is to be the next element in the sequence, given the input. The softmax function then converts these logits into a probability distribution, from which the model selects the most likely next token during text generation. 7
out-of-distribution (OOD) data In machine learning, OOD data refers to input data that significantly differs from the data the model was trained on. This type of data falls outside the distribution of the training dataset, presenting patterns, features, or characteristics that the model has not encountered during its training phase. OOD data is a critical concept because it challenges the modelâs ability to generalize and maintain accuracy. Handling OOD data effectively is important for robustness and reliability, especially in real-world applications where the model is likely to encounter a wide variety of inputs. 20
self-attention A mechanism within Transformer-based neural networks that enables them to weigh and integrate information from different positions within the input sequence. In the context of LLMs, self-attention allows each token in a sentence to be processed in relation to every other token, facilitating the understanding of context and relationships within the text. This process involves calculating attention scores that reflect the relevance of each part of the input to every other part, thereby enhancing the modelâs ability to capture dependencies, regardless of their distance in the sequence. This feature is key to LLMsâ ability to handle long-range dependencies and complex linguistic structures effectively. 5â7, 22
tokenization The process of breaking down text into smaller units, called tokens. These tokens can be words, subwords, characters, or other meaningful elements, depending on the granularity of the tokenization algorithm. The purpose of tokenization is to transform the raw text into a format that can be easily processed and understood by a language model. This step is crucial for preparing input data, as it directly affects the modelâs ability to analyze and generate language. Tokenization plays a fundamental role in determining the level of detail and complexity a model can capture from the text, but can also have a downstream impact on the modelâs performance with certain tasks such as arithmetic. 6, 22
train-test split In machine learning, the train-test split is a method used to evaluate the performance of a model. It involves dividing the available data into two distinct sets: a training set and a test set. The training set is used to train the model, allowing it to learn and adapt to patterns within the data. The test set, which consists of data not seen by the model during its training, is used to assess the modelâs performance and generalization capabilities. This split is crucial for providing an unbiased evaluation of the model, as it demonstrates how the model is likely to perform on new, unseen data. 11
Transformer A type of neural network architecture introduced by Vaswani et al. (2017), predominantly used for processing sequential data such as text. It is characterized by its reliance on self-attention mechanisms, which enable it to weigh the importance of different parts of the input data. Unlike earlier architectures, Transformers do not require sequential data to be processed in order, allowing for more parallel processing and efficiency in handling long-range dependencies in data. This architecture forms the basis of most LLMs, known for its effectiveness in capturing complex linguistic patterns and relationships. 1, 5â7, 10â12, 19, 21
vector Mathematically, a vector is an ordered array of numbers, which can represent points in a multidimensional space. In the context of LLMs, vectors are used to represent tokens, where each token can map onto a word or part of a word depending on the tokenization scheme. These vectors, known as embeddings, encode the linguistic features and relationships of the tokens in a high-dimensional space. By converting tokens into vectors, LLMs are able to process and generate language based on the semantic and syntactic properties encapsulated in these numerical representations. 3â5, 7, 14â16, 22
References
Aiyappa, R., An, J., Kwak, H. & Ahn, Y.-Y. (2023), âCan we trust the evaluation on ChatGPT?â.
AkyĂźrek, E., AkyĂźrek, A. F. & Andreas, J. (2020), Learning to Recombine and Resample Data For Compositional Generalization, in âInternational Conference on Learning Representationsâ.
Alayrac, J.-B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Mensch, A., Millican, K., Reynolds, M., Ring, R., Rutherford, E., Cabi, S., Han, T., Gong, Z., Samangooei, S., Monteiro, M.,Menick, J. L., Borgeaud, S., Brock, A., Nematzadeh, A., Sharifzadeh, S., BiĹkowski, M., Barreira, R., Vinyals, O., Zisserman, A. & Simonyan, K. (2022), âFlamingo: A Visual Language Model for Few-Shot Learningâ, Advances in Neural Information Processing Systems 35, 23716â23736.
Andreas, J. (2020), Good-Enough Compositional Data Augmentation, in âProceedings of the 58th Annual Meeting of the Association for Computational Linguisticsâ, Association for Computational Linguistics, Online, pp. 7556â7566.
Andreas, J. (2022), Language Models as Agent Models, in âFindings of the Association for Computational Linguistics: EMNLP 2022â, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, pp. 5769â5779.
Anil, R., Dai, A. M., Firat, O., Johnson, M., Lepikhin, D., Passos, A., Shakeri, S., Taropa, E., Bailey, P., Chen, Z., Chu, E., Clark, J. H., Shafey, L. E., Huang, Y., Meier-Hellstern, K., Mishra, G., Moreira, E., Omernick, M., Robinson, K., Ruder, S., Tay, Y., Xiao, K., Xu, Y., Zhang, Y., Abrego, G. H., Ahn, J., Austin, J., Barham, P., Botha, J., Bradbury, J., Brahma, S., Brooks, K., Catasta, M., Cheng, Y., Cherry, C., Choquette-Choo, C. A., Chowdhery, A., Crepy, C., Dave, S., Dehghani, M., Dev, S., Devlin, J., DĂaz, M., Du, N., Dyer, E., Feinberg, V., Feng, F., Fienber, V., Freitag, M., Garcia, X., Gehrmann, S., Gonzalez, L., Gur-Ari, G., Hand, S., Hashemi, H., Hou, L., Howland, J., Hu, A., Hui, J., Hurwitz, J., Isard, M., Ittycheriah, A., Jagielski, M., Jia, W., Kenealy, K., Krikun, M., Kudugunta, S., Lan, C., Lee, K., Lee, B., Li, E., Li, M., Li, W., Li, Y., Li, J., Lim, H., Lin, H., Liu, Z., Liu, F., Maggioni, M., Mahendru, A., Maynez, J., Misra, V., Moussalem, M., Nado, Z., Nham, J., Ni, E., Nystrom, A., Parrish, A., Pellat, M., Polacek, M., Polozov, A., Pope, R., Qiao, S., Reif, E., Richter, B., Riley, P., Ros, A. C., Roy, A., Saeta, B., Samuel, R., Shelby, R., Slone, A., Smilkov, D., So, D. R., Sohn, D., Tokumine, S., Valter, D., Vasudevan, V., Vodrahalli, K., Wang, X., Wang, P., Wang, Z., Wang, T., Wieting, J., Wu, Y., Xu, K., Xu, Y., Xue, L., Yin, P., Yu, J., Zhang, Q., Zheng, S., Zheng, C., Zhou, W., Zhou, D., Petrov, S. & Wu, Y. (2023), âPaLM 2 Technical Reportâ.
Askell, A., Bai, Y., Chen, A., Drain, D., Ganguli, D., Henighan, T., Jones, A., Joseph, N., Mann, B., DasSarma, N., Elhage, N., Hatfield-Dodds, Z., Hernandez, D., Kernion, J., Ndousse, K., Olsson, C., Amodei, D., Brown, T., Clark, J., McCandlish, S., Olah, C. & Kaplan, J. (2021), âA General Language Assistant as a Laboratory for Alignmentâ.
Auersperg, A. M. I. & von Bayern, A. M. P. (2019), âWhoâs a clever bird â now? A brief history of parrot cognitionâ, Behaviour 156(5-8), 391â407.
Baier, A. C. (2002), Hume: The Reflective Womenâs Epistemologist?, in âA Mind Of Oneâs Ownâ, 2 edn, Routledge.
Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. (2021), On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? đŚ, in âProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparencyâ, FAccT â21, Association for Computing Machinery, New York, NY, USA, pp. 610â623.
Bender, E. M. & Koller, A. (2020), Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data, in âProceedings of the 58th Annual Meeting of the Association for Computational Linguisticsâ, Association for Computational Linguistics, Online, pp. 5185â5198.
Bengio, Y., Ducharme, R. & Vincent, P. (2000), A Neural Probabilistic Language Model, in âAdvances in Neural Information Processing Systemsâ, Vol. 13, MIT Press.
Betker, J., Goh, G., Jing, L., Brooks, T., Wang, J., Li, L., Ouyang, L., Zhuang, J., Lee, J., Guo, Y. et al. (2023), âImproving image generation with better captionsâ, Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf .
Block, N. (1981), âPsychologism and Behaviorismâ, The Philosophical Review 90(1), 5â43.
Block, N. (1986), âAdvertisement for a Semantics for Psychologyâ, Midwest Studies in Philosophy 10, 615â678.
Boleda, G. (2020), âDistributional Semantics and Linguistic Theoryâ, Annual Review of Linguistics 6(1), 213â234.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I. & Amodei, D. (2020), âLanguage Models are Few-Shot Learnersâ, arXiv:2005.14165 [cs] .
Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M. T. & Zhang, Y. (2023), âSparks of Artificial General Intelligence: Early experiments with GPT-4â.
Buckner, C. (2017), Understanding Associative and Cognitive Explanations in Comparative Psychology, in âThe Routledge Handbook of Philosophy of Animal Mindsâ, Routledge.
Buckner, C. (2021), âBlack Boxes or Unflattering Mirrors? Comparative Bias in the Science of Machine Behaviourâ, The British Journal for the Philosophy of Science pp. 000â000.
Buckner, C. J. (2023), From Deep Learning to Rational Machines: What the History of Philosophy Can Teach Us about the Future of Artificial Intelligence, Oxford University Press, Oxford, New York.
Butlin, P. (2021), âSharing Our Concepts with Machinesâ, Erkenntnis .
Carnie, A. (2021), Syntax: A Generative Introduction, John Wiley & Sons.
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H. & Bengio, Y. (2014), âLearning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translationâ.
Chollet, F. (2019), âOn the Measure of Intelligenceâ.
Chomsky, N. (1957), Syntactic Structures, Mouton.
Chomsky, N. (2000), Knowledqe of Lanquaqe: Its Nature, Oriqin and Use, in R. J. Stainton, ed., âPerspectives in the Philosophy of Language: A Concise Anthologyâ, Broadview Press, p. 3.
Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S. & Amodei, D. (2017), Deep Reinforcement Learning from Human Preferences, in âAdvances in Neural Information Processing Systemsâ, Vol. 30, Curran Associates, Inc.
Conklin, H., Wang, B., Smith, K. & Titov, I. (2021), Meta-Learning to Compositionally Generalize, in âProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)â, Association for Computational Linguistics, Online, pp. 3322â3335.
CsordĂĄs, R., Irie, K. & Schmidhuber, J. (2022), CTL++: Evaluating Generalization on Never-Seen Compositional Patterns of Known Functions, and Compatibility of Neural Representations, in âProceedings of the 2022 Conference on Empirical Methods in Natural Language Processingâ, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, pp. 9758â9767.
DÄ browska, E. (2015), âWhat exactly is Universal Grammar, and has anyone seen it?â, Frontiers in Psychology 6.
Firth, J. R. (1957), âA synopsis of linguistic theory, 1930-1955â, Studies in linguistic analysis.
Fodor, J. A. (1975), The Language of Thought, Harvard University Press.
Fodor, J. A. & Pylyshyn, Z. W. (1988), âConnectionism and cognitive architecture: A critical analysisâ, Cognition 28(1), 3â71.
Grand, G., Blank, I. A., Pereira, F. & Fedorenko, E. (2022), âSemantic projection recovers rich human knowledge of multiple object features from word embeddingsâ, Nature Human Behaviour 6(7), 975â 987.
Grynbaum, M. M. & Mac, R. (2023), âThe Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Workâ, The New York Times.
Ha, D. & Schmidhuber, J. (2018), âWorld Modelsâ.
Harnad, S. (1990), âThe symbol grounding problemâ, Physica D: Nonlinear Phenomena 42(1), 335â346.
Harris, Z. S. (1954), âDistributional structureâ, Word 10, 146â162.
He, Z., Xie, Z., Jha, R., Steck, H., Liang, D., Feng, Y., Majumder, B. P., Kallus, N. & Mcauley, J. (2023), Large Language Models as Zero-Shot Conversational Recommenders, in âProceedings of the 32nd ACM International Conference on Information and Knowledge Managementâ, CIKM â23, Association for Computing Machinery, New York, NY, USA, pp. 720â730.
Herbold, S., Hautli-Janisz, A., Heuer, U., Kikteva, Z. & Trautsch, A. (2023), âA large-scale comparison of human-written versus ChatGPT-generated essaysâ, Scientific Reports 13(1), 18617.
Hochreiter, S. & Schmidhuber, J. (1997), âLong Short-Term Memoryâ, Neural Computation 9(8), 1735â 1780.
Huebner, P. A., Sulem, E., Cynthia, F. & Roth, D. (2021), BabyBERTa: Learning More Grammar With Small-Scale Child-Directed Language, in A. Bisazza & O. Abend, eds, âProceedings of the 25th Conference on Computational Natural Language Learningâ, Association for Computational Linguistics, Online, pp. 624â646.
Hume, D. (1978), A Treatise of Human Nature, 2nd edition edn, Oxford University Press, Oxford.
Hupkes, D., Giulianelli, M., Dankers, V., Artetxe, M., Elazar, Y., Pimentel, T., Christodoulopoulos, C., Lasri, K., Saphra, N., Sinclair, A., Ulmer, D., Schottmann, F., Batsuren, K., Sun, K., Sinha, K., Khalatbari, L., Ryskina, M., Frieske, R., Cotterell, R. & Jin, Z. (2023), âA taxonomy and review of generalization research in NLPâ, Nature Machine Intelligence 5(10), 1161â1174.
Jelinek, F. (1998), Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA, USA.
Jones, C. & Bergen, B. (2023), âDoes GPT-4 Pass the Turing Test?â.
Karhade, M. (2023), âGPT-4: 8 Models in One ; The Secret is Outâ.
Kasneci, E., Sessler, K., KĂźchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., GĂźnnemann, S., HĂźllermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, C., Pfeffer, J., Poquet, O., Sailer, M., Schmidt, A., Seidel, T., Stadler, M., Weller, J., Kuhn, J. & Kasneci, G. (2023), âChatGPT for good? On opportunities and challenges of large language models for educationâ, Learning and Individual Differences 103, 102274.
Keysers, D., Schärli, N., Scales, N., Buisman, H., Furrer, D., Kashubin, S., Momchev, N., Sinopalnikov, D., Stafiniak, L., Tihon, T., Tsarkov, D., Wang, X., van Zee, M. & Bousquet, O. (2019), Measuring Compositional Generalization: A Comprehensive Method on Realistic Data, in âInternational Conference on Learning Representationsâ
Kheiri, K. & Karimi, H. (2023), âSentimentGPT: Exploiting GPT for Advanced Sentiment Analysis and its Departure from Current Machine Learningâ.
Kim, N. & Linzen, T. (2020), COGS: A Compositional Generalization Challenge Based on Semantic Interpretation, in âProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)â, Association for Computational Linguistics, Online, pp. 9087â9105.
Kripke, S. (1980), Naming and Necessity, Harvard University Press, Cambridge, MA.
Lake, B. & Baroni, M. (2018), Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks, in âProceedings of the 35th International Conference on Machine Learningâ, PMLR, pp. 2873â2882.
Lake, B. M. & Baroni, M. (2023), âHuman-like systematic generalization through a meta-learning neural networkâ, Nature pp. 1â7.
Lake, B. M., Ullman, T. D., Tenenbaum, J. B. & Gershman, S. J. (2017), âBuilding machines that learn and think like peopleâ, Behavioral and Brain Sciences 40.
Lasnik, H. & Lohndal, T. (2010), âGovernmentâbinding/principles and parameters theoryâ, WIREs Cognitive Science 1(1), 40â50.
Lavechin, M., Sy, Y., Titeux, H., BlandĂłn, M. A. C., Räsänen, O., Bredin, H., Dupoux, E. & Cristia, A. (2023), âBabySLM: Language-acquisition-friendly benchmark of self-supervised spoken language modelsâ.
LeCun, Y. (n.d.), âA Path Towards Autonomous Machine Intelligenceâ.
Lee, N., Sreenivasan, K., Lee, J., Lee, K. & Papailiopoulos, D. (2023), Teaching Arithmetic to Small Transformers, in âThe 3rd Workshop on Mathematical Reasoning and AI at NeurIPSâ23â.
Lewkowycz, A., Andreassen, A., Dohan, D., Dyer, E., Michalewski, H., Ramasesh, V., Slone, A., Anil, C., Schlag, I., Gutman-Solo, T., Wu, Y., Neyshabur, B., Gur-Ari, G. & Misra, V. (2022), âSolving Quantitative Reasoning Problems with Language Modelsâ.
Liang, W., Zhang, Y., Cao, H., Wang, B., Ding, D., Yang, X., Vodrahalli, K., He, S., Smith, D., Yin, Y., McFarland, D. & Zou, J. (2023), âCan large language models provide useful feedback on research papers? A large-scale empirical analysisâ.
Long, B., Goodin, S., Kachergis, G., Marchman, V. A., Radwan, S. F., Sparks, R. Z., Xiang, V., Zhuang, C., Hsu, O., Newman, B., Yamins, D. L. K. & Frank, M. C. (2023), âThe BabyView camera: Designing a new head-mounted camera to capture childrenâs early social and visual environmentsâ, Behavior Research Methods.
donald, C. (1995), Classicism Vs. Connectionism, in C. Macdonald & G. F. Macdonald, eds, âConnectionism: Debates on Psychological Explanationâ, Blackwell.
Mandelkern, M. & Linzen, T. (2023), âDo Language Models Refer?â.
Marconi, D. (1997), Lexical Competence, MIT Press.
McCoy, R. T., Yao, S., Friedman, D., Hardy, M. & Griffiths, T. L. (2023), âEmbers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solveâ.
McGrath, S., Russin, J., Pavlick, E. & Feiman, R. (2023), âProperties of LoTs: The footprints or the bear itself?â.
Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013), âEfficient Estimation of Word Representations in Vector Spaceâ, arXiv:1301.3781 [cs].
Millière, R. (forthcoming), Language Models as Models of Language, in R. Nefdt, G. Dupre & K. H. Jain, eds, âThe Oxford Handbook of the Philosophy of Linguisticsâ, Oxford University Press, Oxford.
Mirchandani, S., Xia, F., Florence, P., Ichter, B., Driess, D., Arenas, M. G., Rao, K., Sadigh, D. & Zeng, A. (2023), âLarge Language Models as General Pattern Machinesâ.
Mirowski, P., Mathewson, K. W., Pittman, J. & Evans, R. (2023), Co-Writing Screenplays and Theatre Scripts with Language Models: Evaluation by Industry Professionals, in âProceedings of the 2023 CHI Conference on Human Factors in Computing Systemsâ, CHI â23, Association for Computing Machinery, New York, NY, USA, pp. 1â34.
Mollo, D. C. & Millière, R. (2023), âThe Vector Grounding Problemâ.
Murty, S., Sharma, P., Andreas, J. & Manning, C. D. (2023), âGrokking of Hierarchical Structure in Vanilla Transformersâ.
Ontanon, S., Ainslie, J., Fisher, Z. & Cvicek, V. (2022), Making Transformers Solve Compositional Tasks, in âProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)â, Association for Computational Linguistics, Dublin, Ireland, pp. 3591â3607.
OpenAI (2022), âIntroducing ChatGPTâ.
OpenAI (2023a), âGPT-4 Technical Reportâ.
OpenAI (2023b), âGPT-4V(ision) System Cardâ.
Osgood, C. E. (1952), âThe nature and measurement of meaningâ, Psychological bulletin 49(3), 197â 237.
Pavlick, E. (2023), âSymbols and grounding in large language modelsâ, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 381(2251), 20220041.
Pearl, L. (2022), âPoverty of the Stimulus Without Tearsâ, Language Learning and Development 18(4), 415â454.
Piantadosi, S. (2023), âModern language models refute Chomskyâs approach to languageâ.
Piantadosi, S. & Hill, F. (2022), âMeaning without reference in large language modelsâ.
Pinker, S. & Prince, A. (1988), âOn language and connectionism: Analysis of a parallel distributed processing model of language acquisitionâ, Cognition 28(1), 73â193.
Portelance, E. & Jasbi, M. (2023), âThe roles of neural networks in language acquisitioâ.
Putnam, H. (1975), âThe Meaning of âMeaningâ, Minnesota Studies in the Philosophy of Science 7, 131â193.
Qiu, L., Shaw, P., Pasupat, P., Nowak, P., Linzen, T., Sha, F. & Toutanova, K. (2022), Improving Compositional Generalization with Latent Structure and Data Augmentation, in âProceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologiesâ, Association for Computational Linguistics, Seattle, United States, pp. 4341â4362.
Quilty-Dunn, J., Porot, N. & Mandelbaum, E. (2022), âThe Best Game in Town: The Re-Emergence of the Language of Thought Hypothesis Across the Cognitive Sciencesâ, Behavioral and Brain Sciences pp. 1â55.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W. & Liu, P. J. (2020), âExploring the limits of transfer learning with a unified text-to-text transformerâ, The Journal of Machine Learning Research 21(1), 140:5485â140:5551.
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. (2022), âHierarchical Text-Conditional Image Generation with CLIP Latentsâ.
Salton, G., Wong, A. & Yang, C. S. (1975), âA vector space model for automatic indexingâ, Communications of the ACM 18(11), 613â620.
Savelka, J., Agarwal, A., An, M., Bogart, C. & Sakr, M. (2023), Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses, in âProceedings of the 2023 ACM Conference on International Computing Education Research V.1â, pp. 78â92.
Savelka, J., Ashley, K. D., Gray, M. A., Westermann, H. & Xu, H. (2023), Can GPT-4 Support Analysis of Textual Data in Tasks Requiring Highly Specialized Domain Expertise?, in âProceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1â, pp. 117â123.
Schmidhuber, J. (1990), Towards Compositional Learning with Dynamic Neural Networks, Inst. fĂźr Informatik.
Schut, L., Tomasev, N., McGrath, T., Hassabis, D., Paquet, U. & Kim, B. (2023), âBridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZeroâ.
Searle, J. R. (1980), âMinds, Brains, and Programsâ, Behavioral and Brain Sciences 3(3), 417â57.
Shinn, N., Cassano, F., Berman, E., Gopinath, A., Narasimhan, K. & Yao, S. (2023), âReflexion: Language Agents with Verbal Reinforcement Learningâ.
Smolensky, P. (1988), âOn the proper treatment of connectionismâ, Behavioral and Brain Sciences 11(1), 1â23.
Smolensky, P. (1989), Connectionism and Constituent Structure, in R. Pfeifer, Z. Schreter, F. FogelmanSouliĂŠ & L. Steels, eds, âConnectionism in Perspectiveâ, Elsevier.
Smolensky, P., McCoy, R., Fernandez, R., Goldrick, M. & Gao, J. (2022a), âNeurocompositional Computing: From the Central Paradox of Cognition to a New Generation of AI Systemsâ, AI Magazine 43(3), 308â322.
Smolensky, P., McCoy, R. T., Fernandez, R., Goldrick, M. & Gao, J. (2022b), âNeurocompositional computing in human and machine intelligence: A tutorialâ.
Sober, E. (1998), Morganâs canon, in âThe Evolution of Mindâ, Oxford University Press, New York, NY, US, pp. 224â242.
Sullivan, J., Mei, M., Perfors, A., Wojcik, E. & Frank, M. C. (2021), âSAYCam: A Large, Longitudinal Audiovisual Dataset Recorded From the Infantâs Perspectiveâ, Open Mind 5, 20â29.
Tomasello, M. (2009), Constructing a Language, Harvard University Press.
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao, C., Goswami, V., Goyal, N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H., Kardas, M., Kerkez, V., Khabsa, M., Kloumann, I., Korenev, A., Koura, P. S., Lachaux, M.-A., Lavril, T., Lee, J., Liskovich, D., Lu, Y., Mao, Y., Martinet, X., Mihaylov, T., Mishra, P., Molybog, I., Nie, Y., Poulton, A., Reizenstein, J., Rungta, R., Saladi, K., Schelten, A., Silva, R., Smith, E. M., Subramanian, R., Tan, X. E., Tang, B., Taylor, R., Williams, A., Kuan, J. X., Xu, P., Yan, Z., Zarov, I., Zhang, Y., Fan, A., Kambadur, M., Narang, S., Rodriguez, A., Stojnic, R., Edunov, S. & Scialom, T. (2023), âLlama 2: Open Foundation and Fine-Tuned Chat Modelsâ.
Tshitoyan, V., Dagdelen, J., Weston, L., Dunn, A., Rong, Z., Kononova, O., Persson, K. A., Ceder, G. & Jain, A. (2019), âUnsupervised word embeddings capture latent knowledge from materials science literatureâ, Nature 571(7763), 95â98.
Turing, A. M. (1950), âComputing Machinery and Intelligenceâ, Mind 59(236), 433â460.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ĺ. & Polosukhin, I. (2017), Attention is All you Need, in I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan & R. Garnett, eds, âAdvances in Neural Information Processing Systems 30â, Curran Associates, Inc., pp. 5998â6008.
Wallace, E., Wang, Y., Li, S., Singh, S. & Gardner, M. (2019), Do NLP Models Know Numbers? Probing Numeracy in Embeddings, in K. Inui, J. Jiang, V. Ng & X. Wan, eds, âProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)â, Association for Computational Linguistics, Hong Kong, China, pp. 5307â5315.
Wang, L., Lyu, C., Ji, T., Zhang, Z., Yu, D., Shi, S. & Tu, Z. (2023), âDocument-Level Machine Translation with Large Language Modelsâ.
Wang, R., Todd, G., Yuan, E., Xiao, Z., CĂ´tĂŠ, M.-A. & Jansen, P. (2023), âByteSized32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Gamesâ.
Warstadt, A. & Bowman, S. R. (2022), What Artificial Neural Networks Can Tell Us about Human Language Acquisition, in âAlgebraic Structures in Natural Languageâ, CRC Press.
Warstadt, A., Mueller, A., Choshen, L., Wilcox, E., Zhuang, C., Ciro, J., Mosquera, R., Paranjabe, B., Williams, A., Linzen, T. & Cotterell, R. (2023), Findings of the BabyLM Challenge: SampleEfficient Pretraining on Developmentally Plausible Corpora, in A. Warstadt, A. Mueller, L. Choshen, E. Wilcox, C. Zhuang, J. Ciro, R. Mosquera, B. Paranjabe, A. Williams, T. Linzen & R. Cotterell, eds, âProceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learningâ, Association for Computational Linguistics, Singapore, pp. 1â6.
Weaver, W. (1955), Translation, in W. N. Locke & D. A. Booth, eds, âMachine Translation of Languagesâ, MIT Press, Boston, MA.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q. V. & Zhou, D. (2022), âChain-of-Thought Prompting Elicits Reasoning in Large Language Modelsâ, Advances in Neural Information Processing Systems 35, 24824â24837.
Winograd, T. (1971), âProcedures as a Representation for Data in a Computer Program for Understanding Natural Languageâ.
Wittgenstein, L. (1953), Philosophical Investigations, Wiley-Blackwell, New York, NY, USA.
Zeng, A., Attarian, M., Ichter, B., Choromanski, K., Wong, A., Welker, S., Tombari, F., Purohit, A., Ryoo, M., Sindhwani, V., Lee, J., Vanhoucke, V. & Florence, P. (2022), âSocratic Models: Composing Zero-Shot Multimodal Reasoning with Languageâ.
Zhang, C., Bengio, S., Hardt, M., Recht, B. & Vinyals, O. (2021), âUnderstanding deep learning (still) requires rethinking generalizationâ, Communications of the ACM 64(3), 107â115.
Zhang, T., Ladhak, F., Durmus, E., Liang, P., McKeown, K. & Hashimoto, T. B. (2023), âBenchmarking Large Language Models for News Summarizationâ.
Zhou, A., Wang, K., Lu, Z., Shi, W., Luo, S., Qin, Z., Lu, S., Jia, A., Song, L., Zhan, M. & Li, H. (2023), âSolving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verificationâ.
This paper is available on arxiv under CC BY 4.0 DEED license.