PDA

View Full Version : Free or licenseable English dictionary?


MattDiamond
2004.08.18, 11:40 PM
Long story short- my nifty uDG game idea has hit a road bump. It needs a decent English dictionary in it.

I had two ideas in mind for this. First was to use the Webster's word list that is in /usr/share/dict/web2. I wrote a little dictionary class to read it all in to memory and perform lookups in it. Reads in fast, works great. But then I notice that plurals are not represented. So it has "catacomb" but not "catacombs". This will make my game annoying to play. (Imagine playing Boggle and not being allowed to put "s" on the ends of your words! No, my game isn't Boggle; that's just an example.)

Next idea, a bit more obvious, was to use the NSSpellChecker class. But it's designed to spell check a chunk of text, not do quick lookups for a list of candidates. No matter, it would still probably be fast enough for my needs. But it also includes common abbreviations, proper names, and any words the user felt like adding to it. This could cause problems in my game, though perhaps I could live with it.

I'd be interested if anyone has any knowledge or experience with third party dictionaries that are either free or can be commercially licensed. For uDG, free might be best, otherwise my submitted source code has a big hole in it. But I may table this game idea and do something else for uDG and ressurrect it as a commercial product down the road, in which case a commercial license isn't out of the question. So any info would be appreciated.

I know Freeverse has some kind of dictionary for at least one of their games; if I could get them to publish the game my worries would be over. :-)

Thanks.

OneSadCookie
2004.08.19, 12:23 AM
what about ispell? You could even just run the executable via a pipe...

MattDiamond
2004.08.19, 07:22 AM
Thanks, I'll look into ispell. The licensing is key, obviously I'd like to distribute the binary with my game. I'll read up on it today.

If the ispell licensing precludes using it with a commercial game without released source then I'm still interested in hearing about alternatives, just in case I'm not sick of the game by the time uDG is over and want to develop it further.

I wish I'd gone the pipe route with my original dictionary, would have made it easy to drop in ispell. But I figured reading the dict into memory would be faster to code and make errors easier to detect.

Thanks!

MattDiamond
2004.08.19, 01:02 PM
Bingo! I just found the motherlode: word lists with the words classified in all sorts of ways:
http://wordlist.sourceforge.net
Some of the collections are partitioned to make it easy to decide just how comprehensive you want your word list to be. I could for example make my game know important words while disallowing really obscure ones that only a die-hard Scrabble player would know. These lists seem to be geared for spell-checking though.

Two links away from the above site, hosted by the National Puzzler's League, you can find more word lists, including ENABLE, a list of words intended as an alternative to "official" Scrabble dictionaries. It follows Scrabble rules but has words longer than 8 letters as well. It has been placed into the public domain! In particular note the following text in the license:
Game designers may feel free to incorporate the WORD.LST into their games. Please mention the source and credit us as originators of the list.

I'll probably just use ENABLE for now, unless I find that it takes too long to read in or that the file size makes too big a dent on my game's 10MB limit. It would not harm my game particularly to limit the length of words to 8 characters, so that's an option too.

There sure are some amazing things to be found on the internet.

FCCovett
2004.08.19, 02:57 PM
Do they say where they got their dictionary from?

It seems they just say they used 12 dictionaries as source.

One would have thought that there is a copyright free dictionary maintained by the government.

MattDiamond
2004.08.19, 10:11 PM
If you dig into the README's most of the dictionaries are attributed. There is a lot of cross-polination going on between the dictionaries, each with their own methodology for adding words and removing others. Some of them start life as an old commercial dictionary that has fallen out of copyright. Others take the web2 dictionary that comes with some Unixes including OS X (it's basically a Webster's from 1930's) and augment it with specialist words. Or maybe their goal is to generate a list of more common words for spell-checking, deliberately leaving out the obscure ones that are only used in Scrabble tournaments...

> One would have thought that there is a copyright free dictionary maintained by the government.

If there's a government agency trying to prevent the mangling of the English language, Bush has surely cut its funding by now, judging from his speeches. :-)

slorchy
2004.08.24, 02:59 PM
try the guttenberg project, i'm sure they would have one or more dictionaries.

davecom
2004.08.24, 11:39 PM
Do they say where they got their dictionary from?

It seems they just say they used 12 dictionaries as source.

One would have thought that there is a copyright free dictionary maintained by the government.

Having the government control the language... Yeah I'd feel good about that. Like as if they don't already control enough of our lives. :???: