Nancy Spero | Detail of Codex Artaud XXV
(before you say anything, yes I know Chomsky is still alive.)
This is about the ‘debate’ between Foucault and Chomsky on YouTube. And its continued influence today.
This is bat country.
[Debate Noam Chomsky & Michel Foucault - On human nature (YouTube)]
The 1971 debate between Michel Foucault and Noam Chomsky, immortalized on YouTube, stands as a pivotal moment in intellectual history. This verbal sparring match between two titans of 20th-century thought continues to reverberate through the halls of academia and, increasingly, through the silicon corridors of AI research.
This exchange wasn’t so much a debate as a rich dialogue on language and the nature of being. Foucault, armed with the analytical toolkit of French post-structuralism, brought to bear his keen insights on the history of scientific language. Chomsky, the architect of modern linguistics, presented his unified field theory that would go on to lay the groundwork for cognitive science and what Steven Pinker would later dub “the language instinct.”
The Linguistic Tango
The comical linguistic dance of the debate—with its potential for six different permutations of French and English exchange—serves as a fitting metaphor for the complex interplay of ideas at work. This multilingual aspect mirrors the current state of Natural Language Processing (NLP), where models must grapple with the intricacies of multiple languages and the nuances lost (or gained) in translation.
Chomskyan Empiricism vs. Foucauldian Deconstruction
Chomsky’s insistence on empiricism in linguistics—the need for measurable, testable hypotheses—stands in stark contrast to Foucault’s more fluid, deconstructive approach. This tension between the quantifiable and the qualitative continues to play out in the field of AI, particularly in the development and interpretation of Large Language Models (LLMs).
LLMs and the Etymology of ‘Hallucination’
The emergence of Large Language Models (LLMs) has introduced new terminologies, including “hallucination,” which describes AI-generated content that is coherent but factually incorrect. This usage bears a striking resemblance to Michel Foucault’s exploration of madness in his seminal work “History of Madness” (1961).
Foucault’s “History of Madness”: A Brief Overview
In “History of Madness,” Foucault traces the social and cultural construction of madness from the Middle Ages through the classical age to modernity. He argues that what society considers “madness” is not a fixed, objective reality but a concept shaped by social, cultural, and institutional forces over time.
Key points from Foucault’s analysis:
- The Great Confinement: The 17th-century confinement of the “mad” along with other “unreasonable” members of society.
- Medicalization of Madness: The emergence of psychiatry in the 19th century transformed madness into a medical condition.
- Power and Knowledge: The interplay between power structures and the production of knowledge about madness.
- The “Other”: Madness defined in opposition to reason, serving as the “Other” that helps define what is considered normal or rational.
AI “Hallucinations”: A Modern Parallel
The term “hallucination” in AI contexts shares several interesting parallels with Foucault’s analysis of madness:
- Social Construction: Our understanding of AI “hallucinations” is being socially constructed as we grapple with this new technology.
- Othering: AI hallucinations are defined in opposition to “correct” or “factual” outputs.
- Institutional Power: The definition and treatment of AI hallucinations are shaped by tech companies, researchers, and regulatory bodies.
- Evolving Terminology: The choice of the word “hallucination” anthropomorphizes AI behavior and draws parallels to human mental states.
The Spectrum of Thought: From Word Salad to Coherence
The concept of “word salad” has long been a focal point in psycholinguistics, particularly in the study of thought disorders like schizophrenia. As we venture into the age of Large Language Models, this concept takes on new relevance, offering a unique lens through which to examine both human and artificial language production.
Word Salad in Human Language
In clinical settings, “word salad” refers to a confused or unintelligible mixture of seemingly random words and phrases, most commonly associated with schizophrenia. From a linguistic perspective, word salad represents a breakdown in syntactic structure, semantic coherence, and pragmatic relevance.
The Coherence Spectrum
Rather than a binary distinction between “word salad” and “coherent speech,” it’s more accurate to consider a spectrum of linguistic coherence:
- Pure Word Salad
- Loosely Associated Speech
- Tangential Speech
- Circumstantial Speech
- Normal Speech
- Highly Structured Discourse
Parallels in AI Language Models
The development of LLMs has, somewhat unexpectedly, recapitulated this spectrum of coherence, progressing from early models producing AI “word salad” to advanced models demonstrating nuanced communication.
The Neurolinguistic Turn
Contemporary neurolinguistics, with its ability to map cerebral activity related to language processing, offers a fascinating counterpoint to both Chomsky’s formalism and Foucault’s historicism. By grounding language in observable brain function, it provides a bridge between the abstract structures of linguistics and the lived experience of language use.
LLMs: A New Frontier in Language and Thought
The development of LLMs represents a significant milestone in the journey that Chomsky and Foucault’s debate set in motion. These models, trained on vast corpora of text, seem to capture something of both the underlying structures that Chomsky posited and the discursive formations that Foucault analyzed.
Key concepts in LLM development that resonate with this intellectual lineage include:
- Tokens and tokenization
- Word sense disambiguation
- Window sizes in language models
The Spectral Nature of Discourse
The ongoing dialogue between formal linguistics and post-structuralist philosophy—now playing out in the realm of AI—continues to generate new insights and challenges. The “infinite spectrality” of this discourse ensures that the conversation begun by Chomsky and Foucault continues to evolve and inform our understanding of language, thought, and artificial intelligence.
In my previous article, I spoke about Derrida. I love using Derrida to point out holes—here is an attempt at summarising the above through Derrida’s wordplay and what we can and can’t do, and why;
- Linguistic Complexity
- Can state-of-the-art NLP understand? Partially
- Detailed Explanation:
- Modern NLP models can handle complex sentence structures and extensive vocabulary.
- They struggle with Derrida’s intentional ambiguity and unconventional syntax.
- Why?
- NLP models are trained on vast corpora of text, enabling them to process a wide range of linguistic structures.
- Derrida intentionally pushes language to its limits, creating structures that challenge conventional understanding. NLP models, designed to find patterns and clarity, struggle with text that intentionally subverts these goals.
- Contextual Understanding
- Can state-of-the-art NLP understand? Limited
- Detailed Explanation:
- Recent NLP advancements have improved contextual understanding within a given text.
- However, they lack the vast, interconnected knowledge base required for Derrida’s work.
- Why?
- NLP can grasp immediate textual context, but Derrida’s writing requires understanding of extensive philosophical, historical, and cultural contexts. Current models lack the deep, interconnected knowledge and the ability to draw complex inferences across diverse domains that Derrida’s texts demand.
- Intertextuality
- Can state-of-the-art NLP understand? Very Limited
- Detailed Explanation:
- NLP can identify explicit references if they’re in its training data.
- It struggles with Derrida’s complex, implicit dialogues with other texts.
- Why?
- Derrida’s intertextuality goes beyond simple references. It involves reinterpretation and repurposing of ideas in nuanced ways. NLP lacks the ability to “read between the lines” and understand the complex network of textual relationships that Derrida constructs. It can’t grasp how he’s positioning his ideas within a broader intellectual tradition.
- Philosophical Concepts
- Can state-of-the-art NLP understand? Limited
- Detailed Explanation:
- NLP can recognize and categorize philosophical terms.
- It struggles with Derrida’s complex engagements with these concepts.
- Why?
- While NLP can identify philosophical concepts, it lacks the capacity for the critical, reflexive thinking that Derrida’s philosophy demands. It can’t truly grasp how Derrida challenges, redefines, or deconstructs these concepts. The models don’t have the ability to engage in the kind of meta-philosophical analysis that Derrida’s work often requires.
- Intentional Ambiguity
- Can state-of-the-art NLP understand? No
- Detailed Explanation:
- NLP systems are designed to reduce ambiguity.
- Derrida often uses ambiguity as a fundamental feature of meaning.
- Why?
- The goal of most NLP systems - to clarify and disambiguate - is fundamentally at odds with Derrida’s use of ambiguity. NLP models can’t recognize when ambiguity is intentional and meaningful rather than a problem to be solved. They lack the capacity to hold multiple, potentially contradictory meanings simultaneously, which is crucial to understanding Derrida.
- Metaphorical Language
- Can state-of-the-art NLP understand? Partially
- Detailed Explanation:
- Advanced NLP can identify and sometimes interpret common metaphors.
- It struggles with Derrida’s complex, original metaphors.
- Why?
- While NLP has improved in handling metaphorical language, Derrida’s use of metaphor is often highly original and deeply intertwined with his philosophical arguments. NLP can’t trace the full implications of these metaphors across texts and within broader philosophical debates. It lacks the creative interpretation skills needed to understand how Derrida extends and subverts traditional metaphors.
- Neologisms and Wordplay
- Can state-of-the-art NLP understand? Limited
- Detailed Explanation:
- NLP can identify new words but struggles to understand their intended meanings.
- It can’t fully grasp Derrida’s complex wordplay.
- Why?
- While NLP can detect unfamiliar words, it can’t understand the complex, multifaceted meanings of Derrida’s neologisms. His wordplay often involves puns, homonyms, and cross-linguistic jokes that require a deep understanding of multiple languages and cultural contexts. NLP can’t appreciate how these linguistic innovations often embody philosophical ideas rather than just describing them.
- Cross-lingual Analysis
- Can state-of-the-art NLP understand? Limited
- Detailed Explanation:
- Multilingual NLP models exist but struggle with Derrida’s cross-lingual wordplay.
- They can’t fully grasp his exploitation of similarities and differences between languages.
- Why?
- Although NLP can work across languages, it lacks the deep understanding of linguistic structures needed to appreciate Derrida’s cross-lingual philosophical arguments. It can’t grasp the nuances of how he uses similarities and differences between languages to make philosophical points. The models don’t have the cultural and linguistic depth to understand the full implications of his cross-lingual analysis.
- Rhetorical Structure
- Can state-of-the-art NLP understand? Partially
- Detailed Explanation:
- NLP can identify common rhetorical devices and broad structural elements.
- It struggles with Derrida’s subversion of traditional structures.
- Why?
- While NLP can recognize standard rhetorical patterns, it can’t appreciate how Derrida often intentionally breaks these conventions for philosophical effect. It lacks the ability to understand how his rhetorical choices embody his philosophical positions on language and meaning. NLP can’t grasp the meta-level at which Derrida’s form often reinforces or challenges his content.
- Interpretative Depth
- Can state-of-the-art NLP understand? Very Limited
- Detailed Explanation:
- NLP can provide surface-level analysis and basic interpretations.
- It lacks the ability for deep, creative interpretation that Derrida’s work demands.
- Why?
- Understanding Derrida requires going beyond the explicit content of the text. It demands active, critical reading that involves challenging assumptions, exploring multiple conflicting interpretations, and situating ideas within broader philosophical debates. NLP fundamentally lacks the capacity for this kind of original, transformative thinking. It can’t engage in the radical questioning and reinterpretation that Derrida both practices and calls for in his readers.
Conclusion: The Ongoing Dialogue
As we continue to develop and refine LLMs, the echoes of the Foucault-Chomsky debate resound with renewed relevance. The tension between formal structures and historical contingencies, between universal grammars and discursive formations, finds new expression in the challenges and possibilities of AI language processing.
The quest to create machines that can truly understand and generate human language forces us to confront fundamental questions about the nature of language, thought, and consciousness—questions that Chomsky and Foucault grappled with half a century ago, and which remain as pressing and puzzling as ever.
In this ongoing dialogue between human and artificial intelligence, between the logical and the discursive, we may yet find new insights into the nature of language and being that neither Chomsky nor Foucault could have anticipated. The development of AI language models has profound implications for our understanding of cognition, creativity, and the nature of intelligence itself.
As we navigate this new frontier, we must remain aware of the power dynamics at play in defining and controlling AI capabilities. The way we frame and understand AI “hallucinations” will have significant implications for AI development, regulation, and integration into society.
The journey from word salad to coherence, observed in both human psychology and AI development, offers a rich terrain for interdisciplinary exploration. As we push the boundaries of AI language capabilities, we may gain new insights not just into artificial intelligence, but into the very nature of human thought and communication.
We find moments where the ideas of Chomsky and Foucault seem to briefly align—what we might playfully term a “vancura,” borrowing from the chess term for a momentary advantage. These moments of superposition, where empiricism and deconstruction coexist, may offer the most fruitful ground for understanding the complex relationship between language, thought, and technology.
The debate continues, now with silicon interlocutors adding their voices to the conversation, promising new revelations about the nature of language, cognition, and what it means to be human in an increasingly AI-driven world. Once again, I’ve offered no answers, only more questions.
References
- Bedi, G. et al. (2015). “Automated analysis of free speech predicts psychosis onset in high-risk youths”
- Bender, E. M., & Koller, A. (2020). “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data”
- Brown, T. et al. (2020). “Language Models are Few-Shot Learners”
- Chomsky, Noam. (1957). “Syntactic Structures”
- Covington, M. et al. (2005). “Schizophrenia and the structure of language: The linguist’s view”
- Crawford, K. (2021). “Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence”
- Derrida, Jacques. (1993). “Specters of Marx”
- Derrida, Jacques. (1967). “Of Grammatology”
- Fedus, W., Zoph, B., & Shazeer, N. (2021). “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity”
- Foucault, Michel. (1961). “History of Madness”
- Mitchell, M. (2019). “Artificial Intelligence: A Guide for Thinking Humans”
- Pinker, Steven. (1994). “The Language Instinct”