Esta es mi nota de uso WS4J calcular la similitud de palabras en Java.
Paso 1: Descarga los tarros
Descargue los siguientes dos frascos y agréguelos a la ruta de la biblioteca del proyecto.
jawjaw-1.0.2.jar: https://code.google.com/p/jawjaw/downloads/list
ws4j-1.0.1.jar: https://code.google.com/p/ws4j/downloads/list
Paso 2: Juega con el programa de demostración
Código:
package NLP; import edu.cmu.lti.lexical_db.ILexicalDatabase; import edu.cmu.lti.lexical_db.NictWordNet; import edu.cmu.lti.ws4j.RelatednessCalculator; import edu.cmu.lti.ws4j.impl.HirstStOnge; import edu.cmu.lti.ws4j.impl.JiangConrath; import edu.cmu.lti.ws4j.impl.LeacockChodorow; import edu.cmu.lti.ws4j.impl.Lesk; import edu.cmu.lti.ws4j.impl.Lin; import edu.cmu.lti.ws4j.impl.Path; import edu.cmu.lti.ws4j.impl.Resnik; import edu.cmu.lti.ws4j.impl.WuPalmer; import edu.cmu.lti.ws4j.util.WS4JConfiguration; public class SimilarityCalculationDemo { private static ILexicalDatabase db = new NictWordNet(); /* //available options of metrics private static RelatednessCalculator[] rcs = { new HirstStOnge(db), new LeacockChodorow(db), new Lesk(db), new WuPalmer(db), new Resnik(db), new JiangConrath(db), new Lin(db), new Path(db) }; */ private static double compute(String word1, String word2) { WS4JConfiguration.getInstance().setMFS(true); double s = new WuPalmer(db).calcRelatednessOfWords(word1, word2); return s; } public static void main(String[] args) { String[] words = {"add", "get", "filter", "remove", "check", "find", "collect", "create"}; for(int i=0; i<words.length-1; i++){ for(int j=i+1; j<words.length; j++){ double distance = compute(words[i], words[j]); System.out.println(words[i] +" - " + words[j] + " = " + distance); } } } } |
Producción:
add - get = 0.3333333333333333 add - filter = 0.4 add - remove = 0.3157894736842105 add - check = 0.2857142857142857 add - find = 0.47619047619047616 add - collect = 0.4 add - create = 0.2857142857142857 get - filter = 0.2857142857142857 get - remove = 0.5 get - check = 0.4 get - find = 0.5 get - collect = 0.5 get - create = 0.5 filter - remove = 0.2857142857142857 filter - check = 0.25 filter - find = 0.2857142857142857 filter - collect = 0.21052631578947367 filter - create = 0.2857142857142857 remove - check = 0.4 remove - find = 0.5 remove - collect = 0.3157894736842105 remove - create = 0.5 check - find = 0.4 check - collect = 0.2857142857142857 check - create = 0.4 find - collect = 0.38095238095238093 find - create = 0.5 collect - create = 0.2857142857142857