Decoding Language: How English Encodes Oppression Into AI
Image by Paolo Chiabrando (Unsplash)
Can an LLM ever be unbiased when it’s trained on biased language?
Misogyny and colonial hierarchy are not incidental to the English language but embedded etymologically in its roots.
What does that mean for AI trained on it?
Studies have shown that AI reflects societal biases even when specifically programmed not to.
If a language itself treats women as the “second sex”, how can an LLM operate outside of those parameters?
Let’s explore these questions together.
Language Sets The Parameters Of Thought
Language is not neutral. Linguists call it the Sapir-Whorf hypothesis, or linguistic relativity: the idea that the language one speaks influences the way one thinks about reality. Different languages reflect distinct interpretations of experienced reality and affect cognitive patterns at both individual and cultural levels.
Our consciousness, which is our framework for understanding the world, imagining future scenarios, and solving problems, is trained in much the same way as artificial intelligence. And just as AI can only work within the limitations of its training data, human society is constrained by the language framework we operate within.
We cannot build truly equitable intelligence, human or artificial, without first examining the framework we are operating inside of.
Etymology should be seen as a building block of collective human consciousness, just as code is the building block of software. Like the genome of a DNA sequence, language is the material code that creates the framework within which our species can evolve, and limits us within its alphabet.
Language is a living technology that holds memory as data.
Etymology of “Neutral” Language
The idea that the worst thing you can be is a woman is coded into our language. Despite being the vessel to carry life itself, to assume the female’s position in the reproductive cycle is within the framework of English, the gravest insult a human can be dealt.
If AI is shaped by that coding, unexamined, it is extremely limited in its capabilities to rise above human biases.
As Audre Lorde wrote, “the master’s tools will never dismantle the master’s house.”
Language shows how the female sex has been medically and culturally misunderstood. For example, the word vagina comes from the Latin for “sheath or scabbard for a sword,” with gladius (sword) being a common Latin term for the penis. Defining womanhood by her relation to maleness is a familiar pattern.
The fallopian tubes, small passageways between the ovaries and the uterus, were named after the anatomist Gabriele Falloppio, who documented their existence. The parallel to colonialism is not a stretch. It is the same logic: to name a thing is to claim it.
Documentation and Colonialism
The myth of the Americas as “virgin soil” was used as justification for colonization. “Virgin land.” The feminization of the natural world is not a coincidence, but a structural device of the English language that defines the male sex as hierarchically above the female sex, and nature itself. Within this framework, male is the default, and both nature and female are the subordinates, with documentation as an instrument of ownership.
Some indigenous philosophers have described how labeling a natural phenomenon with a word is a limiting practice that limits how we perceive Earth. In some cultures, natural phenomena are appreciated without being named, preserving a sense of awe that opens new pathways of thought. This article from Atmos beautifully explores how the way we talk about nature impacts how we treat the Earth, from indigenous perspectives.
This brings us to a problem at the heart of machine learning: AI systems that claim to draw on the collective knowledge of humanity are being trained on an extremely biased dataset.
The voices and knowledge systems it cannot name, it cannot see. And what it cannot see, it cannot learn from. For machine learning technology to effectively be used in AI infrastructure without causing harm, we must first address how our language itself perpetuates harmful systems and silences marginalized voices.
AI Mirrors Its Programmers
In English, to be male is to be human. “Man” is the default. The female is the deviation, the second sex, the “other.” So it tracks that when we project humanity onto machines, we project the qualities of men as the qualities of the species. In fact, men account for up to 85% of leadership positions within the AI industry.
We can see in real time what that means: AI-generated sexual abuse material, autonomous weapons systems, surveillance tools disproportionately trained on and turned against women and marginalized communities. These are not random outcomes. They are the predictable result of a homogeneous group controlling a powerful technology without adequate checks and without having to bear the cost of its consequences.
Women currently represent around 22% of the AI workforce, yet have driven contributions to the field far exceeding that proportion. This pattern is not new. Female inventions such as weaving led to female inventions such as coding, and while both began as “women’s work,” they were appropriated by men once found to be productive of value, and women were largely excluded from the benefits of the advances they made possible.
What if AI reflected not the worst qualities concentrated by unchecked power, such as violence, dominance, control…but the best qualities our species is also capable of? Empathy. Nurturing. Collective survival.
Engineering Language, Engineering Consciousness
Enormous importance is placed on technologies that rewrite DNA. Should we not place the same importance on the words we use, which are the code of thought through which we interpret the world? To effectively address bias in AI systems, we must also engineer our own language and collective consciousness.
The English language doesn’t only set parameters for what we can describe; it sets the framework for what we can even imagine. Responsible use of AI begins with responsible language. Not causing harm is the bare minimum. We can do more. Technology and society must evolve together.
That work starts here, with the words we choose, and the worlds those words make possible.