Community Embraces New Word Game at Mid-Year Play Day This past Sunday, families at Takoma Park’s Seventh Annual Mid-Year Play Day had the opportunity to experience OtherWordly for the first time. Our educational language game drew curious children and parents to our table throughout the afternoon. Words in Space Several children gathered around our iPads […]
Read moreBuild smarter language features with semantic intelligence—the backbone for next-generation word games, adaptive learning, and AI training. Go beyond definitions and trite associations, with curated word lists, 100 million semantic relationships, and 1.46 million usage examples that probe various creative ways words actually connect.
Break Free from Banal
LLMs democratized language work, but they generate what’s probable, not what’s possible or interesting. They cluster on dominant associations from training data. Ask for words related to “bank” and you’ll get 40 financial terms and 5 river terms—the financial sense dominates because that’s what appeared most often in text. Folk wisdom, experiential connections, and non-dominant word senses are systematically underrepresented.
Your product may need semantic intelligence to reach the next level.
Navigate the Nuances
Language presents thousands of crude and vulgar terms, plus contextual ones. “Exotic” works for fruit, not for women. “Buckwheat” suits cereal, not hair. IDEA’s semantic intelligence includes two-tier content filtering: hard block for offensive terms unsuitable for any public context, and soft block for suggestive terms like “stroking” or “knockers” that you may want to exclude from automated puzzles while still accepting as valid user input.
Level Up Your Language
For Game Developers — Build idea-linking games with sense-balanced associations that cover all meanings, not just the dominant one. Use A–E quality grades to tune difficulty: A-connections are obvious, D-connections require lateral thinking. Add natural-sounding narrative definitions designed for display.
For EdTech Companies — Generate concept maps automatically showing how ideas connect. Use word families for morphology lessons. Sense-separated clouds let students explore vocabulary depth across different meanings of the same word.
For AI/ML Teams — Provide associations that span all senses, counteracting training data clustering. Include experiential/gestalt connections LLMs systematically underweight (crisis → siren, wedding → white). Quality grades distinguish strong from oblique relationships for grounding chain-of-thought reasoning.
Built Different: Human Expertise + AI Validation
This isn’t “GPT said these words are related.” IDEA’s semantic intelligence draws from 70+ human-curated sources cross-analyzed over 15 years: professional thesauri (NASA, Getty, Library of Congress), 5,000+ custom word lists from professional lexicographers, and pre-LLM computational linguistics (2.3M supercomputer hours of topic modeling, Word2Vec, 6M+ crossword clues). The Library of Congress alone contributes 648K classifications from 17M books as expansion seeds.
Modern LLMs then validate and rank—130M API calls for auditing, not generation. The key insight: LLMs are better at recognizing valid relationships than generating them. We provide candidates from diverse sources, then use LLMs to evaluate. This also enabled removing 291K false cognates (dig→digress, pan→pandemic) that string similarity can’t catch.
Foundation for Innovation
IDEA’s semantic intelligence powers “In Other Words”—a live iOS game where players navigate semantic space, finding paths like “sugar → peace” through meaning. The Linguabase (1.5M words full, 400K production-deployed) has been tested by real users, not just validated in theory. This infrastructure is available to partners building next-generation language applications.
What You Get
Six data layers, each licensable separately:
- Vocabulary — 400K words with difficulty scores, from everyday terms to crossword-worthy rarities
- Definitions — Readable 2-3 sentence paragraphs designed for in-game display, not fragmented dictionary entries
- Content Filters — Two-tier filtering: hard block (offensive) and soft block (suggestive but context-dependent)
- Word Associations — ~50 sense-balanced associations per word, quality-graded A–E, with meanings split by sense
- Word Families — 478K morphological groupings (elephant → elephants, elephantine, elephantiasis)
- Usage Examples — 1.46 million quotations from literature and open-access sources
Easy to Partner
Data licensing — available as:
- Full database license (TSV, SQLite, or JSON) for offline integration
- API access with pathfinding, difficulty scoring, and safety checking
- Custom exports filtered by vocabulary size, quality grade, or domain
Puzzle licensing — let us generate custom puzzle content for your game mechanics. You define the rules, we create thousands of unique levels tailored to your format.
The dataset is actively maintained—new vocabulary added as words enter mainstream usage, quality grades refined with each pipeline run, ongoing auditing for edge cases. This isn’t a frozen academic dataset.
Licensing revenue directly supports the mission to make language learning engaging for everyone.

