So whenever I try to append a new line using a StringBuilder, I can't get a new line whatsoever, I tried:
errorMessage.append(System.getProperty("line.separator"));
errorMessage.append(System.getProperty("\n"));
errorMessage.append(System.getProperty("\r\n"));
errorMessage.append(System.getProperty("line.separator"));
basically everything within the first 3 pages of google results, it's so frustrating. I am implementing it in a for loop like this : idk if it helps, but any suggestions are appreciated.
public String getIDs(HashMap<String,List<Integer>> errorMap ){
StringBuilder errorMessage = new StringBuilder();
for (String state:errorMap.keySet()){
List<Integer> listofId = errorMap.get(state);
if (listofId){
StringBuilder listOfIds = new StringBuilder();
for (Integer id :listofId) {
listOfIds.append(id.toString()+' , ')
}
errorMessage.append(state +" Trades: " +listOfIds.toString())
errorMessage.append("\n")
}
}
return errorMessage.toString();
}
Use
errorMessage.append("\n");
Instead of
errorMessage.append(System.getProperty("\n"));
You should directly be using builder.append("\n"). \n is not a property.
Also append method returns builder object itself (Builder pattern). So you can easily do builder.append("\n").append("text1").append("\n").append("text2").....
I want to get the highest available string value in java how can i achieve this.
Example: hello jameswangfron
I want to get the highest string "jameswangfron"
String Text = request.getParameter("hello jameswangfron");
Please code example.
public class HelloWorld{
public static void main(String []args){
String text = "hello jameswangfron";
String[] textArray = text.split(" ");
String biggestString = "";
for(int i=0; i<textArray.length; i++){
if(i==0) {
textArray[i].length();
biggestString = textArray[i];
} else {
if(textArray[i].length()>textArray[i-1].length()){
biggestString = textArray[i];
}
}
}
System.out.println("Biggest String : "+biggestString);
}
}
And it shows the output as
Biggest String : jameswangfron
Maybe this will be easyer to understand
public class HelloWorld {
public static void main(String[] args) {
System.out.println(StringManipulator.getMaxLengthString("hello jameswangfron", " "));
}
}
class StringManipulator{
public static String getMaxLengthString(String data, String separator){
String[] stringArray = data.split(separator);
String toReturn = "";
int maxLengthSoFar = 0;
for (String string : stringArray) {
if(string.length()>maxLengthSoFar){
maxLengthSoFar = string.length();
toReturn = string;
}
}
return toReturn;
}
}
But there is a catch. If you pay attention to split method from class String, you will find out that the spliter is actually a regex. For your code, i see that you want to separate the words (which means blank space). if you want an entire text to search, you have to pass a regex.
Here's a tip. If you want your words to be separated by " ", ".", "," (you get the ideea) then you should replace the " " from getMaxLengthString method with the following
"[^a-zA-Z0-9]"
If you want digits to split up words, simply put
"[^a-zA-Z]"
This tells us that we use the separators as anything that is NOT a lower case letter or upper case letter. (the ^ character means you don't want the characters you listed in your brackets [])
Here is another way of doing this
"[^\\w]"
\w it actually means word characters. so if you negate this (with ^) you should be fine
At the moment I am using org.apache.commons.lang.text.StrSubstitutor for doing:
Map m = ...
substitutor = new StrSubstitutor(m);
result = substitutor.replace(input);
Given the fact I want to remove commons-lang dependency from my project what would be a working and minimalistic implementation of StrSubstitutor using standard JRE libraries?
Note:
StrSubstitutor works like this:
Map map = new HashMap();
map.put("animal", "quick brown fox");
map.put("target", "lazy dog");
StrSubstitutor sub = new StrSubstitutor(map);
String resolvedString = sub.replace("The ${animal} jumped over the ${target}.");
yielding resolvedString = "The quick brown fox jumped over the lazy dog."
If performance is not a priority, you can use the appendReplacement method of the Matcher class:
public class StrSubstitutor {
private Map<String, String> map;
private static final Pattern p = Pattern.compile("\\$\\{(.+?)\\}");
public StrSubstitutor(Map<String, String> map) {
this.map = map;
}
public String replace(String str) {
Matcher m = p.matcher(str);
StringBuilder sb = new StringBuilder();
while (m.find()) {
String var = m.group(1);
String replacement = map.get(var);
m.appendReplacement(sb, replacement);
}
m.appendTail(sb);
return sb.toString();
}
}
A more performant but uglier version, just for fun :)
public String replace(String str) {
StringBuilder sb = new StringBuilder();
char[] strArray = str.toCharArray();
int i = 0;
while (i < strArray.length - 1) {
if (strArray[i] == '$' && strArray[i + 1] == '{') {
i = i + 2;
int begin = i;
while (strArray[i] != '}') ++i;
sb.append(map.get(str.substring(begin, i++)));
} else {
sb.append(strArray[i]);
++i;
}
}
if (i < strArray.length) sb.append(strArray[i]);
return sb.toString();
}
It's about 2x as fast as the regex version and 3x faster than the apache commons version as per my tests. So the normal regex stuff is actually more optimized than the apache version. Usually not worth it of course. Just for fun though, let me know if you can make it more optimized.
Edit: As #kmek points out, there is a caveat. Apache version will resolve transitively. e.g, If ${animal} maps to ${dog} and dog maps to Golden Retriever, apache version will map ${animal} to Golden Retriever. As I said, you should use libraries as far as possible. The above solution is only to be used if you have a special constraint which does not allow you to use a library.
there's nothing like this that i know of in the JRE, but writing one is simple enough.
Pattern p = Pattern.compile("${([a-zA-Z]+)}";
Matcher m = p.matcher(inputString);
int lastEnd = -1;
while (m.find(lastEnd+1)) {
int startIndex = m.start();
String varName = m.group(1);
//lookup value in map and substitute
inputString = inputString.substring(0,m.start())+replacement+inputString.substring(m.end());
lastEnt = m.start() + replacement.size();
}
this is of course horribly inefficient and you should probably write the result into a StringBuilder instead of replacing inputString all the time
I'm using the Stanford Named Entity Recognizer http://nlp.stanford.edu/software/CRF-NER.shtml and it's working fine. This is
List<List<CoreLabel>> out = classifier.classify(text);
for (List<CoreLabel> sentence : out) {
for (CoreLabel word : sentence) {
if (!StringUtils.equals(word.get(AnswerAnnotation.class), "O")) {
namedEntities.add(word.word().trim());
}
}
}
However the problem I'm finding is identifying names and surnames. If the recognizer encounters "Joe Smith", it is returning "Joe" and "Smith" separately. I'd really like it to return "Joe Smith" as one term.
Could this be achieved through the recognizer maybe through a configuration? I didn't find anything in the javadoc till now.
Thanks!
This is because your inner for loop is iterating over individual tokens (words) and adding them separately. You need to change things to add whole names at once.
One way is to replace the inner for loop with a regular for loop with a while loop inside it which takes adjacent non-O things of the same class and adds them as a single entity.*
Another way would be to use the CRFClassifier method call:
List<Triple<String,Integer,Integer>> classifyToCharacterOffsets(String sentences)
which will give you whole entities, which you can extract the String form of by using substring on the original input.
*The models that we distribute use a simple raw IO label scheme, where things are labeled PERSON or LOCATION, and the appropriate thing to do is simply to coalesce adjacent tokens with the same label. Many NER systems use more complex labels such as IOB labels, where codes like B-PERS indicates where a person entity starts. The CRFClassifier class and feature factories support such labels, but they're not used in the models we currently distribute (as of 2012).
The counterpart of the classifyToCharacterOffsets method is that (AFAIK) you can't access the label of the entities.
As proposed by Christopher, here is an example of a loop which assembles "adjacent non-O things". This example also counts the number of occurrences.
public HashMap<String, HashMap<String, Integer>> extractEntities(String text){
HashMap<String, HashMap<String, Integer>> entities =
new HashMap<String, HashMap<String, Integer>>();
for (List<CoreLabel> lcl : classifier.classify(text)) {
Iterator<CoreLabel> iterator = lcl.iterator();
if (!iterator.hasNext())
continue;
CoreLabel cl = iterator.next();
while (iterator.hasNext()) {
String answer =
cl.getString(CoreAnnotations.AnswerAnnotation.class);
if (answer.equals("O")) {
cl = iterator.next();
continue;
}
if (!entities.containsKey(answer))
entities.put(answer, new HashMap<String, Integer>());
String value = cl.getString(CoreAnnotations.ValueAnnotation.class);
while (iterator.hasNext()) {
cl = iterator.next();
if (answer.equals(
cl.getString(CoreAnnotations.AnswerAnnotation.class)))
value = value + " " +
cl.getString(CoreAnnotations.ValueAnnotation.class);
else {
if (!entities.get(answer).containsKey(value))
entities.get(answer).put(value, 0);
entities.get(answer).put(value,
entities.get(answer).get(value) + 1);
break;
}
}
if (!iterator.hasNext())
break;
}
}
return entities;
}
I had the same problem, so I looked it up, too. The method proposed by Christopher Manning is efficient, but the delicate point is to know how to decide which kind of separator is appropriate. One could say only a space should be allowed, e.g. "John Zorn" >> one entity. However, I may find the form "J.Zorn", so I should also allow certain punctuation marks. But what about "Jack, James and Joe" ? I might get 2 entities instead of 3 ("Jack James" and "Joe").
By digging a bit in the Stanford NER classes, I actually found a proper implementation of this idea. They use it to export entities under the form of single String objects. For instance, in the method PlainTextDocumentReaderAndWriter.printAnswersTokenizedInlineXML, we have:
private void printAnswersInlineXML(List<IN> doc, PrintWriter out) {
final String background = flags.backgroundSymbol;
String prevTag = background;
for (Iterator<IN> wordIter = doc.iterator(); wordIter.hasNext();) {
IN wi = wordIter.next();
String tag = StringUtils.getNotNullString(wi.get(AnswerAnnotation.class));
String before = StringUtils.getNotNullString(wi.get(BeforeAnnotation.class));
String current = StringUtils.getNotNullString(wi.get(CoreAnnotations.OriginalTextAnnotation.class));
if (!tag.equals(prevTag)) {
if (!prevTag.equals(background) && !tag.equals(background)) {
out.print("</");
out.print(prevTag);
out.print('>');
out.print(before);
out.print('<');
out.print(tag);
out.print('>');
} else if (!prevTag.equals(background)) {
out.print("</");
out.print(prevTag);
out.print('>');
out.print(before);
} else if (!tag.equals(background)) {
out.print(before);
out.print('<');
out.print(tag);
out.print('>');
}
} else {
out.print(before);
}
out.print(current);
String afterWS = StringUtils.getNotNullString(wi.get(AfterAnnotation.class));
if (!tag.equals(background) && !wordIter.hasNext()) {
out.print("</");
out.print(tag);
out.print('>');
prevTag = background;
} else {
prevTag = tag;
}
out.print(afterWS);
}
}
They iterate over each word, checking if it has the same class (answer) than the previous, as explained before. For this, they take advantage of the fact expressions considered as not being entities are flagged using the so-called backgroundSymbol (class "O"). They also use the property BeforeAnnotation, which represents the string separating the current word from the previous one. This last point allows solving the problem I initially raised, regarding the choice of an appropriate separator.
Code for the above:
<List> result = classifier.classifyToCharacterOffsets(text);
for (Triple<String, Integer, Integer> triple : result)
{
System.out.println(triple.first + " : " + text.substring(triple.second, triple.third));
}
List<List<CoreLabel>> out = classifier.classify(text);
for (List<CoreLabel> sentence : out) {
String s = "";
String prevLabel = null;
for (CoreLabel word : sentence) {
if(prevLabel == null || prevLabel.equals(word.get(CoreAnnotations.AnswerAnnotation.class)) ) {
s = s + " " + word;
prevLabel = word.get(CoreAnnotations.AnswerAnnotation.class);
}
else {
if(!prevLabel.equals("O"))
System.out.println(s.trim() + '/' + prevLabel + ' ');
s = " " + word;
prevLabel = word.get(CoreAnnotations.AnswerAnnotation.class);
}
}
if(!prevLabel.equals("O"))
System.out.println(s + '/' + prevLabel + ' ');
}
I just wrote a small logic and it's working fine. what I did is group words with same label if they are adjacent.
Make use of the classifiers already provided to you. I believe this is what you are looking for:
private static String combineNERSequence(String text) {
String serializedClassifier = "edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz";
AbstractSequenceClassifier<CoreLabel> classifier = null;
try {
classifier = CRFClassifier
.getClassifier(serializedClassifier);
} catch (ClassCastException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println(classifier.classifyWithInlineXML(text));
// FOR TSV FORMAT //
//System.out.print(classifier.classifyToString(text, "tsv", false));
return classifier.classifyWithInlineXML(text);
}
Here is my full code, I use Stanford core NLP and write algorithm to concatenate Multi Term names.
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;
import org.apache.log4j.Logger;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;
/**
* Created by Chanuka on 8/28/14 AD.
*/
public class FindNameEntityTypeExecutor {
private static Logger logger = Logger.getLogger(FindNameEntityTypeExecutor.class);
private StanfordCoreNLP pipeline;
public FindNameEntityTypeExecutor() {
logger.info("Initializing Annotator pipeline ...");
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner");
pipeline = new StanfordCoreNLP(props);
logger.info("Annotator pipeline initialized");
}
List<String> findNameEntityType(String text, String entity) {
logger.info("Finding entity type matches in the " + text + " for entity type, " + entity);
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
List<String> matches = new ArrayList<String>();
for (CoreMap sentence : sentences) {
int previousCount = 0;
int count = 0;
// traversing the words in the current sentence
// a CoreLabel is a CoreMap with additional token-specific methods
for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
String word = token.get(CoreAnnotations.TextAnnotation.class);
int previousWordIndex;
if (entity.equals(token.get(CoreAnnotations.NamedEntityTagAnnotation.class))) {
count++;
if (previousCount != 0 && (previousCount + 1) == count) {
previousWordIndex = matches.size() - 1;
String previousWord = matches.get(previousWordIndex);
matches.remove(previousWordIndex);
previousWord = previousWord.concat(" " + word);
matches.add(previousWordIndex, previousWord);
} else {
matches.add(word);
}
previousCount = count;
}
else
{
count=0;
previousCount=0;
}
}
}
return matches;
}
}
Another approach to deal with multi words entities.
This code combines multiple tokens together if they have the same annotation and go in a row.
Restriction:
If the same token has two different annotations, the last one will be saved.
private Document getEntities(String fullText) {
Document entitiesList = new Document();
NERClassifierCombiner nerCombClassifier = loadNERClassifiers();
if (nerCombClassifier != null) {
List<List<CoreLabel>> results = nerCombClassifier.classify(fullText);
for (List<CoreLabel> coreLabels : results) {
String prevLabel = null;
String prevToken = null;
for (CoreLabel coreLabel : coreLabels) {
String word = coreLabel.word();
String annotation = coreLabel.get(CoreAnnotations.AnswerAnnotation.class);
if (!"O".equals(annotation)) {
if (prevLabel == null) {
prevLabel = annotation;
prevToken = word;
} else {
if (prevLabel.equals(annotation)) {
prevToken += " " + word;
} else {
prevLabel = annotation;
prevToken = word;
}
}
} else {
if (prevLabel != null) {
entitiesList.put(prevToken, prevLabel);
prevLabel = null;
}
}
}
}
}
return entitiesList;
}
Imports:
Document: org.bson.Document;
NERClassifierCombiner: edu.stanford.nlp.ie.NERClassifierCombiner;
I need to have a cross-platform newline reference to parse files, and I'm trying to find a way to do the equivalent of the usual
System.getProperty("line.separator");
but trying that in J2ME, I get a null String returned, so I'm guessing line.separator isn't included here. Are there any other direct ways to get a universal newline sequence in J2ME as string?
edit: clarified question a bit
Seems like I forgot to answer my question. I used a piece of code that allowed me to use "\r\n" as delimiter and actually considered \r and \n as well seperately:
public class Tokenizer {
public static String[] tokenize(String str, String delimiter) {
StringBuffer strtok = new StringBuffer();
Vector buftok = new Vector();
char[] ch = str.toCharArray(); //convert to char array
for (int i = 0; i < ch.length; i++) {
if (delimiter.indexOf(ch[i]) != -1) { //if i-th character is a delimiter
if (strtok.length() > 0) {
buftok.addElement(strtok.toString());
strtok.setLength(0);
}
}
else {
strtok.append(ch[i]);
}
}
if (strtok.length() > 0) {
buftok.addElement(strtok.toString());
}
String[] splitArray = new String[buftok.size()];
for (int i=0; i < splitArray.length; i++) {
splitArray[i] = (String)buftok.elementAt(i);
}
buftok = null;
return splitArray;
}
}
I don't think "line.separator" is a system property of JME. Take a look at this documentation at SDN FAQ for MIDP developers: What are the defined J2ME system property names?
Why do you need to get the line separator anyway? What I know is that you can use "\n" in JME.