|
This is a utility that forms a Lucene index of synonyms. It uses the Prolog Database Package from Wordnet. The prolog source file is parsed and the output of this utility is a 3MB index with 43,372 documents. Each "document" has a 'word' field with the target word (such as 'big') and then a series of 'syn' fields for all synonyms ('grown', 'large', 'adult'... - 23 or so for 'big').
Phrases (with spaces and hyphens) and non-alphabetic words are stripped out.
The intent is to do this once, then in a separate, search tool, to have an option to expand a users query based on the synonyms for each word uses. This is for experimental use - such things may return "too many" documents.
c:/proj/wordnet/prolog/wn_s.pl
.
You can pass in the file as an arg on the cmd line.
Output is hardcoded to be a lucene index named
syn_index
in the current directory.
C:\proj\tropo_java>java com.tropo.wordnet.Syns2Index
Opening c:/proj/wordnet/prolog/wn_s.pl
2 s(100001742,1,'entity',n,1,11). 0 0 ndecent=0
4 s(100002219,1,'thing',n,12,0). 1 1 ndecent=1
8 s(100002579,2,'nonentity',n,3,0). 5 5 ndecent=1
16 s(100004024,1,'life',n,11,31). 10 7 ndecent=4
32 s(100011413,4,'brute',n,2,0). 23 13 ndecent=7
64 s(100021905,1,'event',n,1,62). 50 28 ndecent=12
128 s(100032210,1,'advent',n,1,2). 99 67 ndecent=24
256 s(100042358,1,'completion',n,1,40). 198 126 ndecent=47
512 s(100065408,2,'return',n,4,3). 377 234 ndecent=107
1024 s(100109417,1,'fine-tooth_comb',n,2,0). 734 463 ndecent=226
2048 s(100206038,2,'quick_fix',n,1,0). 1406 930 ndecent=459
4096 s(100397839,4,'interpretative_dancing',n,1,0). 2677 1890 ndecent=940
8192 s(100805532,2,'gavage',n,1,0). 4911 3724 ndecent=2194
16384 s(101612956,1,'Haliotidae',n,1,0). 8525 6939 ndecent=6254
32768 s(103096786,2,'hydroxyzine',n,1,0). 15286 13240 ndecent=14337
65536 s(106301017,1,'roast',n,1,0). 28778 27352 ndecent=25437
131072 s(112574257,1,'liquefied_petroleum_gas',n,1,0). 50535 52425 ndecent=56968
row=1 doc= Document<Keyword<word:scum> Unindexed<syn:trash>>
row=2 doc= Document<Keyword<word:nard> Unindexed<syn:spikenard>>
row=4 doc= Document<Keyword<word:intromit> Unindexed<syn:admit>>
row=8 doc= Document<Keyword<word:shitter> Unindexed<syn:voider> Unindexed<syn:defecator>>
row=16 doc= Document<Keyword<word:winning> Unindexed<syn:victorious> Unindexed<syn:taking> Unindexed<syn:fetching>>
row=32 doc= Document<Keyword<word:grampus> Unindexed<syn:orca> Unindexed<syn:killer>>
row=64 doc= Document<Keyword<word:chopper> Unindexed<syn:whirlybird> Unindexed<syn:pearly> Unindexed<syn:helicopter> Uni
ndexed<syn:eggbeater> Unindexed<syn:cleaver> Unindexed<syn:chop>>
row=128 doc= Document<Keyword<word:fuchsia> Unindexed<syn:magenta>>
row=256 doc= Document<Keyword<word:adrianople> Unindexed<syn:edirne> Unindexed<syn:adrianopolis>>
row=512 doc= Document<Keyword<word:lack> Unindexed<syn:want> Unindexed<syn:miss> Unindexed<syn:deficiency>>
row=1024 doc= Document<Keyword<word:battler> Unindexed<syn:scrapper> Unindexed<syn:fighter> Unindexed<syn:combatant> Uni
ndexed<syn:belligerent>>
row=2048 doc= Document<Keyword<word:disfavour> Unindexed<syn:dislike> Unindexed<syn:disfavor> Unindexed<syn:disapproval>
Unindexed<syn:disadvantage>>
row=4096 doc= Document<Keyword<word:deflect> Unindexed<syn:parry> Unindexed<syn:obviate> Unindexed<syn:distract> Unindex
ed<syn:deviate> Unindexed<syn:debar> Unindexed<syn:block> Unindexed<syn:bend> Unindexed<syn:avoid> Unindexed<syn:avert>>
row=8192 doc= Document<Keyword<word:collapse> Unindexed<syn:tumble> Unindexed<syn:give> Unindexed<syn:founder> Unindexed
<syn:flop> Unindexed<syn:crumple> Unindexed<syn:crumble> Unindexed<syn:crash> Unindexed<syn:crack> Unindexed<syn:burst>
Unindexed<syn:break>>
row=16384 doc= Document<Keyword<word:bahrein> Unindexed<syn:bahrain>>
row=32768 doc= Document<Keyword<word:overbearingness> Unindexed<syn:imperiousness> Unindexed<syn:domineeringness>>
Documents : 43,372
Index Size: 3MB
Searching for: word:big
1 total matching documents after 380(ms)
name=word sv="big"
name=syn sv="vauntingly"
name=syn sv="vainglorious"
name=syn sv="swelled"
name=syn sv="prominent"
name=syn sv="openhanded"
name=syn sv="momentous"
name=syn sv="magnanimous"
name=syn sv="liberal"
name=syn sv="large"
name=syn sv="handsome"
name=syn sv="grownup"
name=syn sv="grown"
name=syn sv="giving"
name=syn sv="freehanded"
name=syn sv="crowing"
name=syn sv="braggy"
name=syn sv="bountiful"
name=syn sv="bounteous"
name=syn sv="boastfully"
name=syn sv="boastful"
name=syn sv="bighearted"
name=syn sv="bad"
name=syn sv="adult"