In some cases the solver fails to find a solution for my model, which I think is there.
So I would like to populate a solution, and then check which constraint is violated.
How to do that with choco-solver?
Using choco-solver 4.10.6.
Forcing a solution
I ended up adding constraints to force variables to values of my presumed solution:
e.g.
// constraints to force given solution
vehicle2FirstStop[0].eq(model.intVar(4)).post();
vehicle2FirstStop[1].eq(model.intVar(3)).post();
nextStop[1].eq(model.intVar(0)).post();
nextStop[2].eq(model.intVar(1)).post();
...
and then
model.getSolver().showContradiction();
if (model.getSolver().solve()) { ....
Shows the first contradiction of the presumed solution, e.g.
/!\ CONTRADICTION (PropXplusYeqZ(sum_exp_49, mul_exp_51, ...
So the next step is to find out where terms such as sum_exp_49 come from.
Matching the contradiction terms with the code
Here is a simple fix for constraints which will hopefully provide enough information. We can override the post() and associates() methods of model, so that it dumps the java source filename and line number when a constraint is posted/variable is created.
Model model = new Model("Vrp1RpV") {
/**
* retrieve the filename and line number of first caller outside of choco-solver from stacktrace
*/
String getSource() {
String source = null;
StackTraceElement[] stackTraceElements = Thread.currentThread().getStackTrace();
// starts from 3: thread.getStackTrace() + this.getSource() + caller (post() or associates())
for (int i = 3; i < stackTraceElements.length; i++) {
// keep rewinding until we get out of choco-solver packages
if (!stackTraceElements[i].getClassName().toString().startsWith("org.chocosolver")) {
source = stackTraceElements[i].getFileName() + ":" + stackTraceElements[i].getLineNumber();
break;
}
}
return source;
}
#Override
public void post(Constraint... cs) throws SolverException {
String source=getSource();
// dump each constraint along source location
for (Constraint c : cs) {
System.err.println(source + " post: " + c);
}
super.post(cs);
}
#Override
public void associates(Variable variable) {
System.err.println(getSource() + " associates: " + variable.getName());
super.associates(variable);
}
};
This will dump things like:
Vrp1RpV2.java:182 post: ARITHM ([prop(EQ_exp_47.EQ.mul_exp_48)])
Vrp1RpV2.java:182 associates: sum_exp_49
Vrp1RpV2.java:182 post: ARITHM ([prop(mul_exp_48.EQ.sum_exp_49)])
Vrp1RpV2.java:182 associates: EQ_exp_50
Vrp1RpV2.java:182 post: BASIC_REIF ([(stop2vehicle[2] = 1) <=> EQ_exp_50])
...
From there it is possible to see where sum_exp_49 comes from.
EDIT: added associates() thanks to #cprudhom suggestion on https://gitter.im/chocoteam/choco-solver
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.Map;
This is the public class
public class Process {
private String keywordAsString = "";
private String keyword = "";
// ArrayList to hold the letters of the keyword with duplicates removed.
private ArrayList<Integer> keywordAsIntsNoDup = new ArrayList<Integer>(0);
// Map for removing all duplicate letters in the keyword.
private Map<Integer, Integer> keywordLetters = new LinkedHashMap<Integer, Integer>(0);
// ArrayList to hold all 256 ASCII characters (as integers).
private ArrayList<Integer> asciiArray = new ArrayList<Integer>(0);
// ArrayList for storing the message from the file.
ArrayList<Integer> fileMessageAsInteger = new ArrayList<Integer>(0);
// Constructor
public void process() {
}
public void processKeyword(String keyword) {
// Copy incoming keyword String
this.keywordAsString = keyword;
// Pass incoming keyword String to the removeDuplicate method.
// removeDuplicate will first convert the letters to Integers,
// then remove any duplicate letters.
// Store the result in the keywordAsIntsNoDup ArrayList
this.keywordAsIntsNoDup = removeDuplicates(this.keywordAsString);
// Create ArrayList and fill it with all 256 ASCII characters (as integers).
createAsciiArr();
// Remove the keyword letters from the asciiArray.
for (int i=0; i<this.keywordAsIntsNoDup.size(); i++) {
Integer letterToSearchFor = this.keywordAsIntsNoDup.get(i);
if (this.asciiArray.contains(letterToSearchFor))
{
this.asciiArray.remove(letterToSearchFor);
}
}
}// END processKeyword()
public ArrayList<Integer> removeDuplicates(String keyword) {
// Copy incoming keyword String
this.keyword = keyword;
I really would appreciate if someone would help me Java is really no piece of cake.
// Loop through the keywordAsIntArray ArrayList, putting each 'letter' of the keyword into the map.
// Duplicate letters will be overridden, so the map will contain the keyword without any duplicates.
for (int i=0; i
// Put the maps' key set (which holds the 'letters') into an ArrayList.
// This will make it easier to put the 'letters' into the Table later.
ArrayList<Integer> keyslist = new ArrayList<Integer>(this.keywordLetters.keySet());
System.out.println("\n" + "map.keySet() from keyslist ArrayList = " + keyslist.toString());
return keyslist;
}
public void createAsciiArr() {
// Use an enhanced for loop to fill the asciiArray ArrayList
// with all 256 ASCII characters as integers.
for (int i=0; i<256; i++) {
this.asciiArray.add(i);
}
}// END createAsciiArr()
}// END class
Please I want to input String as keyword, then get back hex values as the encrypted code and not integers. Also Please I have more of the codes I dont really understand,am really new to Java. Please can anyone help me.
I would like to parallelize my program to be fast, so my program is like that:
Sim1 sim1 = new Sim1();
for(Entry<Integer, HashSet<String>> entry : map_topics_words.entrySet()) {
Integer k = entry.getKey();
Double sim = sim1.prob(word_m, entry.getValue());
sim_avg.put(k, sim);
score += sim;
}
and prob in the method in class Sim1 like that
public double prob(String w_i, HashSet<String> set_i){
Similarity sim = new Similarity();
double score = 1;
Iterator<String> it = set_i.iterator();
while (it.hasNext()) {
score += sim.computeSim(w_i, it.next());
}
score = score/set_i.size();
return score;
}
and computeSim in the method in classe Similarity like that :
public double computeSim(String w_1, String w_2){
return cmp(w_1,w_2);
}
So I would like to use thread for the first method and thread for the second method, I tried different ways but I failed
Any help, please
Thank you
You can change the first method code as below. We can try this by using Executor framework by submitting the work of prob() method as Callable task so that It can be executed in different thread, and then by using Future we can get the result for that particular call, for this we need to maintain one more map of key and corresponding Future object, Please see below code to understand it better, hope it may help you.
Sim1 sim1 = new Sim1();
Map<Integer, Future<Double>> workerMap = new HashMap<>();
ExecutorService exe = Executors.newCachedThreadPool();
for(Map.Entry<Integer, HashSet<String>> entry : map_topics_words.entrySet()) {
Integer k = entry.getKey();
workerMap.put(k, exe.submit(()->{ //Java 8 lamda
return sim1.prob(word_m, entry.getValue());
}));
}
//This loop is to get the result of prob() method for all the keys and process them further
for(Map.Entry<Integer, HashSet<String>> entry : map_topics_words.entrySet()) {
Integer k = entry.getKey();
try {
Double sim = workerMap.get(k).get();
sim_avg.put(k, sim);
score += sim;
} catch (Exception e) {
e.printStackTrace();
}
}
I will suggest you to google the working of Callable and Future in java, You can see this link also.
I want to take pinyin (english) as an input and return Chinese characters that user can choose from. I saw that this has been implemented in many place (support by OS keyboards and various websites), but can't find a library to do it.
Or possibly even doing it myself if it's not that complex or require large amount of data.
The simplest way to do this is use javachinesepinyin, a lightweight Chinese Pinyin Input Method.
You can find related code here.
private String[] pinyinToWord(String[] o) {
Result ret = null;
try {
ret = ptw.labelStateOfNodes(Arrays.asList(o));
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
Map<Double, String> results = new HashMap<Double, String>();
if (null != ret && ret.states() != null) {
for (int pos = 0; pos < ret.states()[o.length - 1].length; pos++) {
StringBuilder sb = new StringBuilder();
int[] statePath = Viterbi.getStatePath(ret.states(), ret.psai(), o.length - 1, o.length, pos);
for (int state : statePath) {
Character name = ptw.getStateBy(state);
sb.append(name).append(" ");
}
results.put(ret.delta()[o.length - 1][pos], sb.toString());
}
List<Double> list = new ArrayList<Double>(results.keySet());
Collections.sort(list);
Collections.reverse(list);
return results.get(list.get(0)).trim().split(" ");
}
return null;
}
Intro Slides in English: http://docs.google.com/present/edit?id=0AbbbdNFzwcADZGR3Z3N0NG1fMTk4M2hraGZjNmRw&hl=en
Live Demo: http://951438.appspot.com/pinyin.jsp?txt=zhongwenpinyinshurufa
If advanced features are needed, maybe you should consider use Rime Input Method Engine or sunpinyin.
FYI, Python Binding for sunpinyin.
I'm using the Stanford Named Entity Recognizer http://nlp.stanford.edu/software/CRF-NER.shtml and it's working fine. This is
List<List<CoreLabel>> out = classifier.classify(text);
for (List<CoreLabel> sentence : out) {
for (CoreLabel word : sentence) {
if (!StringUtils.equals(word.get(AnswerAnnotation.class), "O")) {
namedEntities.add(word.word().trim());
}
}
}
However the problem I'm finding is identifying names and surnames. If the recognizer encounters "Joe Smith", it is returning "Joe" and "Smith" separately. I'd really like it to return "Joe Smith" as one term.
Could this be achieved through the recognizer maybe through a configuration? I didn't find anything in the javadoc till now.
Thanks!
This is because your inner for loop is iterating over individual tokens (words) and adding them separately. You need to change things to add whole names at once.
One way is to replace the inner for loop with a regular for loop with a while loop inside it which takes adjacent non-O things of the same class and adds them as a single entity.*
Another way would be to use the CRFClassifier method call:
List<Triple<String,Integer,Integer>> classifyToCharacterOffsets(String sentences)
which will give you whole entities, which you can extract the String form of by using substring on the original input.
*The models that we distribute use a simple raw IO label scheme, where things are labeled PERSON or LOCATION, and the appropriate thing to do is simply to coalesce adjacent tokens with the same label. Many NER systems use more complex labels such as IOB labels, where codes like B-PERS indicates where a person entity starts. The CRFClassifier class and feature factories support such labels, but they're not used in the models we currently distribute (as of 2012).
The counterpart of the classifyToCharacterOffsets method is that (AFAIK) you can't access the label of the entities.
As proposed by Christopher, here is an example of a loop which assembles "adjacent non-O things". This example also counts the number of occurrences.
public HashMap<String, HashMap<String, Integer>> extractEntities(String text){
HashMap<String, HashMap<String, Integer>> entities =
new HashMap<String, HashMap<String, Integer>>();
for (List<CoreLabel> lcl : classifier.classify(text)) {
Iterator<CoreLabel> iterator = lcl.iterator();
if (!iterator.hasNext())
continue;
CoreLabel cl = iterator.next();
while (iterator.hasNext()) {
String answer =
cl.getString(CoreAnnotations.AnswerAnnotation.class);
if (answer.equals("O")) {
cl = iterator.next();
continue;
}
if (!entities.containsKey(answer))
entities.put(answer, new HashMap<String, Integer>());
String value = cl.getString(CoreAnnotations.ValueAnnotation.class);
while (iterator.hasNext()) {
cl = iterator.next();
if (answer.equals(
cl.getString(CoreAnnotations.AnswerAnnotation.class)))
value = value + " " +
cl.getString(CoreAnnotations.ValueAnnotation.class);
else {
if (!entities.get(answer).containsKey(value))
entities.get(answer).put(value, 0);
entities.get(answer).put(value,
entities.get(answer).get(value) + 1);
break;
}
}
if (!iterator.hasNext())
break;
}
}
return entities;
}
I had the same problem, so I looked it up, too. The method proposed by Christopher Manning is efficient, but the delicate point is to know how to decide which kind of separator is appropriate. One could say only a space should be allowed, e.g. "John Zorn" >> one entity. However, I may find the form "J.Zorn", so I should also allow certain punctuation marks. But what about "Jack, James and Joe" ? I might get 2 entities instead of 3 ("Jack James" and "Joe").
By digging a bit in the Stanford NER classes, I actually found a proper implementation of this idea. They use it to export entities under the form of single String objects. For instance, in the method PlainTextDocumentReaderAndWriter.printAnswersTokenizedInlineXML, we have:
private void printAnswersInlineXML(List<IN> doc, PrintWriter out) {
final String background = flags.backgroundSymbol;
String prevTag = background;
for (Iterator<IN> wordIter = doc.iterator(); wordIter.hasNext();) {
IN wi = wordIter.next();
String tag = StringUtils.getNotNullString(wi.get(AnswerAnnotation.class));
String before = StringUtils.getNotNullString(wi.get(BeforeAnnotation.class));
String current = StringUtils.getNotNullString(wi.get(CoreAnnotations.OriginalTextAnnotation.class));
if (!tag.equals(prevTag)) {
if (!prevTag.equals(background) && !tag.equals(background)) {
out.print("</");
out.print(prevTag);
out.print('>');
out.print(before);
out.print('<');
out.print(tag);
out.print('>');
} else if (!prevTag.equals(background)) {
out.print("</");
out.print(prevTag);
out.print('>');
out.print(before);
} else if (!tag.equals(background)) {
out.print(before);
out.print('<');
out.print(tag);
out.print('>');
}
} else {
out.print(before);
}
out.print(current);
String afterWS = StringUtils.getNotNullString(wi.get(AfterAnnotation.class));
if (!tag.equals(background) && !wordIter.hasNext()) {
out.print("</");
out.print(tag);
out.print('>');
prevTag = background;
} else {
prevTag = tag;
}
out.print(afterWS);
}
}
They iterate over each word, checking if it has the same class (answer) than the previous, as explained before. For this, they take advantage of the fact expressions considered as not being entities are flagged using the so-called backgroundSymbol (class "O"). They also use the property BeforeAnnotation, which represents the string separating the current word from the previous one. This last point allows solving the problem I initially raised, regarding the choice of an appropriate separator.
Code for the above:
<List> result = classifier.classifyToCharacterOffsets(text);
for (Triple<String, Integer, Integer> triple : result)
{
System.out.println(triple.first + " : " + text.substring(triple.second, triple.third));
}
List<List<CoreLabel>> out = classifier.classify(text);
for (List<CoreLabel> sentence : out) {
String s = "";
String prevLabel = null;
for (CoreLabel word : sentence) {
if(prevLabel == null || prevLabel.equals(word.get(CoreAnnotations.AnswerAnnotation.class)) ) {
s = s + " " + word;
prevLabel = word.get(CoreAnnotations.AnswerAnnotation.class);
}
else {
if(!prevLabel.equals("O"))
System.out.println(s.trim() + '/' + prevLabel + ' ');
s = " " + word;
prevLabel = word.get(CoreAnnotations.AnswerAnnotation.class);
}
}
if(!prevLabel.equals("O"))
System.out.println(s + '/' + prevLabel + ' ');
}
I just wrote a small logic and it's working fine. what I did is group words with same label if they are adjacent.
Make use of the classifiers already provided to you. I believe this is what you are looking for:
private static String combineNERSequence(String text) {
String serializedClassifier = "edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz";
AbstractSequenceClassifier<CoreLabel> classifier = null;
try {
classifier = CRFClassifier
.getClassifier(serializedClassifier);
} catch (ClassCastException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println(classifier.classifyWithInlineXML(text));
// FOR TSV FORMAT //
//System.out.print(classifier.classifyToString(text, "tsv", false));
return classifier.classifyWithInlineXML(text);
}
Here is my full code, I use Stanford core NLP and write algorithm to concatenate Multi Term names.
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;
import org.apache.log4j.Logger;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;
/**
* Created by Chanuka on 8/28/14 AD.
*/
public class FindNameEntityTypeExecutor {
private static Logger logger = Logger.getLogger(FindNameEntityTypeExecutor.class);
private StanfordCoreNLP pipeline;
public FindNameEntityTypeExecutor() {
logger.info("Initializing Annotator pipeline ...");
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner");
pipeline = new StanfordCoreNLP(props);
logger.info("Annotator pipeline initialized");
}
List<String> findNameEntityType(String text, String entity) {
logger.info("Finding entity type matches in the " + text + " for entity type, " + entity);
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
List<String> matches = new ArrayList<String>();
for (CoreMap sentence : sentences) {
int previousCount = 0;
int count = 0;
// traversing the words in the current sentence
// a CoreLabel is a CoreMap with additional token-specific methods
for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
String word = token.get(CoreAnnotations.TextAnnotation.class);
int previousWordIndex;
if (entity.equals(token.get(CoreAnnotations.NamedEntityTagAnnotation.class))) {
count++;
if (previousCount != 0 && (previousCount + 1) == count) {
previousWordIndex = matches.size() - 1;
String previousWord = matches.get(previousWordIndex);
matches.remove(previousWordIndex);
previousWord = previousWord.concat(" " + word);
matches.add(previousWordIndex, previousWord);
} else {
matches.add(word);
}
previousCount = count;
}
else
{
count=0;
previousCount=0;
}
}
}
return matches;
}
}
Another approach to deal with multi words entities.
This code combines multiple tokens together if they have the same annotation and go in a row.
Restriction:
If the same token has two different annotations, the last one will be saved.
private Document getEntities(String fullText) {
Document entitiesList = new Document();
NERClassifierCombiner nerCombClassifier = loadNERClassifiers();
if (nerCombClassifier != null) {
List<List<CoreLabel>> results = nerCombClassifier.classify(fullText);
for (List<CoreLabel> coreLabels : results) {
String prevLabel = null;
String prevToken = null;
for (CoreLabel coreLabel : coreLabels) {
String word = coreLabel.word();
String annotation = coreLabel.get(CoreAnnotations.AnswerAnnotation.class);
if (!"O".equals(annotation)) {
if (prevLabel == null) {
prevLabel = annotation;
prevToken = word;
} else {
if (prevLabel.equals(annotation)) {
prevToken += " " + word;
} else {
prevLabel = annotation;
prevToken = word;
}
}
} else {
if (prevLabel != null) {
entitiesList.put(prevToken, prevLabel);
prevLabel = null;
}
}
}
}
}
return entitiesList;
}
Imports:
Document: org.bson.Document;
NERClassifierCombiner: edu.stanford.nlp.ie.NERClassifierCombiner;