Partner with IDEA for Semantic Intelligence

Build smarter language features with semantic intelligence—the backbone for next-generation word games, adaptive learning, and AI training. Go beyond definitions and trite associations, with curated word lists, 100 million semantic relationships, and 1.46 million usage examples that probe various creative ways words actually connect.

Break Free from Banal

LLMs democratized language work, but they generate what’s probable, not what’s possible or interesting. They cluster on dominant associations from training data. Ask for words related to “bank” and you’ll get 40 financial terms and 5 river terms—the financial sense dominates because that’s what appeared most often in text. Folk wisdom, experiential connections, and non-dominant word senses are systematically underrepresented.

Your product may need semantic intelligence to reach the next level.

Navigate the Nuances

Language presents thousands of crude and vulgar terms, plus contextual ones. “Exotic” works for fruit, not for women. “Buckwheat” suits cereal, not hair. IDEA’s semantic intelligence includes two-tier content filtering: hard block for offensive terms unsuitable for any public context, and soft block for suggestive terms like “stroking” or “knockers” that you may want to exclude from automated puzzles while still accepting as valid user input.

Level Up Your Language

For Game Developers — Build idea-linking games with sense-balanced associations that cover all meanings, not just the dominant one. Use A–E quality grades to tune difficulty: A-connections are obvious, D-connections require lateral thinking. Add natural-sounding narrative definitions designed for display.

For EdTech Companies — Generate concept maps automatically showing how ideas connect. Use word families for morphology lessons. Sense-separated clouds let students explore vocabulary depth across different meanings of the same word.

For AI/ML Teams — Provide associations that span all senses, counteracting training data clustering. Include experiential/gestalt connections LLMs systematically underweight (crisis → siren, wedding → white). Quality grades distinguish strong from oblique relationships for grounding chain-of-thought reasoning.

Built Different: Human Expertise + AI Validation

This isn’t “GPT said these words are related.” IDEA’s semantic intelligence draws from 70+ human-curated sources cross-analyzed over 15 years: professional thesauri (NASA, Getty, Library of Congress), 5,000+ custom word lists from professional lexicographers, and pre-LLM computational linguistics (2.3M supercomputer hours of topic modeling, Word2Vec, 6M+ crossword clues). The Library of Congress alone contributes 648K classifications from 17M books as expansion seeds.

Modern LLMs then validate and rank—130M API calls for auditing, not generation. The key insight: LLMs are better at recognizing valid relationships than generating them. We provide candidates from diverse sources, then use LLMs to evaluate. This also enabled removing 291K false cognates (dig→digress, pan→pandemic) that string similarity can’t catch.

Foundation for Innovation

IDEA’s semantic intelligence powers “In Other Words”—a live iOS game where players navigate semantic space, finding paths like “sugar → peace” through meaning. The Linguabase (1.5M words full, 400K production-deployed) has been tested by real users, not just validated in theory. This infrastructure is available to partners building next-generation language applications.

What You Get

Six data layers, each licensable separately:

Vocabulary — 400K words with difficulty scores, from everyday terms to crossword-worthy rarities
Definitions — Readable 2-3 sentence paragraphs designed for in-game display, not fragmented dictionary entries
Content Filters — Two-tier filtering: hard block (offensive) and soft block (suggestive but context-dependent)
Word Associations — ~50 sense-balanced associations per word, quality-graded A–E, with meanings split by sense
Word Families — 478K morphological groupings (elephant → elephants, elephantine, elephantiasis)
Usage Examples — 1.46 million quotations from literature and open-access sources

Easy to Partner

Data licensing — available as:

Full database license (TSV, SQLite, or JSON) for offline integration
API access with pathfinding, difficulty scoring, and safety checking
Custom exports filtered by vocabulary size, quality grade, or domain

Puzzle licensing — let us generate custom puzzle content for your game mechanics. You define the rules, we create thousands of unique levels tailored to your format.

The dataset is actively maintained—new vocabulary added as words enter mainstream usage, quality grades refined with each pipeline run, ongoing auditing for edge cases. This isn’t a frozen academic dataset.

Licensing revenue directly supports the mission to make language learning engaging for everyone.

Explore semantic intelligence at linguabase.org →

Other topics

Archived blog posts

OtherWordly Makes Splash at Takoma Park Play Day

Play OtherWordly at Takoma Park Play Day This Sunday!

Gender role literacy: Girls in science?