Linguabase

At IDEA, we’ve spent over a decade building something unprecedented: a semantic network that maps how 1.5 million English words connect to each other through 100 million weighted relationships. This massive linguistic database powers our word games—OtherWordly and In Other Words—transforming vocabulary building and critical thinking into engaging gameplay.

The Small World of Language

Our most fascinating discovery: 76% of random English word pairs connect in seven hops or fewer, with an average path length of just 6.43 steps. This “small world” property of language explains why players can intuitively navigate from “Batman” to “inspect” through meaningful connections like vigilante → watchful → circumspect. Nearly any two words connect through chains of meaning—a mathematical property that makes our games possible.

 

Beyond Traditional Resources

Traditional thesauri have served writers well for over a century, but they were built with specific constraints. Writers typically need synonyms for abstract concepts, emotions, and actions—words like “applause” that naturally connect to many alternatives (acclaim, ovation, praise). Concrete objects like “apple” have few true synonyms, so dictionary editors, facing limited page counts, prioritized what writers needed most. We broke from this tradition entirely.

The Linguabase includes everything people actually say and write: from “ice cream” to “thermodispersion,” from “ghosting” to “crurifragium.” Each term connects to an average of 40 associations (we display 17 in our games), encompassing not just synonyms but categorical relationships (apple → fruit), functional connections (apple → orchard), and cultural associations (apple → pie).

Multiple Meanings as Network Bridges

English words often carry multiple meanings, creating natural bridges in our network. We organize these into three distinct types. Homographs are words with identical spelling but entirely different meanings, like “bass” (the low sound) versus “bass” (the fish)—English contains about 1,000-3,000 of these double meanings. Polysemes have multiple related definitions that evolved from the same root: “head” serves as body part, organizational leader, and ship’s bow. Finally, contextual flavors describe how the same core meaning activates different associations: “hiking” can emphasize nature (scenery, wildlife, trails) or exercise (fitness, exertion, calories). These multi-sense words don’t create shortcuts in our network—instead, they offer creative routing options for navigating semantic space.

Scale and Innovation

This scale and depth came from integrating over 70 dictionaries and thesauri, 648,460 Library of Congress book classifications, years of manual curation by our lexicographer and linguistics graduate students, and sophisticated AI validation through 80 million API calls. The result captures connections that neither traditional resources nor raw AI can produce alone—like how “coffee” relates to “fair trade certification” through economics texts or “algorithm” connects to “John Cage” through electronic music compositions.

Transforming How People Learn

The Linguabase transforms vocabulary learning from rote memorization to discovery through meaningful connections. Players navigate semantic pathways, encountering words in context rather than isolation. A player solving a puzzle from “fire” to “cancel” might discover: fire → ember → memory → recall → cancel. Daily players are exposed to thousands of words annually in ways that stick because the connections make intuitive sense.

This approach particularly helps address the vocabulary gap—the 12,000-word difference between the lowest and highest performing English speakers—by making word learning feel like exploration rather than study. The network builds critical thinking through pattern recognition across disciplines and the ability to see connections between seemingly unrelated concepts.

A Decade of Innovation

Built with support from the National Science Foundation (SBIR #2329817), the Linguabase represents a new way of thinking about language. We’ve created what we call “idea-linking games”—where meaning becomes play, where discovering that “salt” connects to “hardware” through crystals → facet → component feels like solving a puzzle.

To appreciate the Linguabase’s scope, consider that Webster’s Third New International Dictionary (1961) required 757 editor-years and $3.5 million—over $50 million in today’s dollars—to compile 476,000 entries. The Oxford English Dictionary took 70 years and thousands of contributors. What previous generations of lexicographers built over centuries, we’ve achieved through a novel combination of human expertise and AI capabilities. But more than scale, the Linguabase represents a new way of thinking about language.

For those interested in the technical details of how we mapped the semantic landscape of English, our comprehensive report explores the computational linguistics, network theory, and innovative methods behind this achievement. Organizations interested in licensing this semantic intelligence for their own applications can learn more about partnership opportunities.