Could you please point out where is the bug in my code?
I have a simple text file with the following data structure:
something1
something2
something3
...
It results a String[] where every element is the last element of the file. I can't find the mistake, but it goes wrong somewhere around the line.setLength(0);
Any ideas?
public String[] readText() throws IOException {
InputStream file = getClass().getResourceAsStream("/questions.txt");
DataInputStream in = new DataInputStream(file);
StringBuffer line = new StringBuffer();
Vector lines = new Vector();
int c;
try {
while( ( c = in.read()) != -1 ) {
if ((char)c == '\n') {
if (line.length() > 0) {
// debug
//System.out.println(line.toString());
lines.addElement(line);
line.setLength(0);
}
}
else{
line.append((char)c);
}
}
if(line.length() > 0){
lines.addElement(line);
line.setLength(0);
}
String[] splitArray = new String[lines.size()];
for (int i = 0; i < splitArray.length; i++) {
splitArray[i] = lines.elementAt(i).toString();
}
return splitArray;
} catch(Exception e) {
System.out.println(e.getMessage());
return null;
} finally {
in.close();
}
}
I see one obvious error - you're storing the same StringBuffer instance multiple times in the Vector, and you clear the same StringBuffer instance with setLength(0). I'm guesing you want to do something like this
StringBuffer s = new StringBuffer();
Vector v = new Vector();
...
String bufferContents = s.toString();
v.addElement(bufferContents);
s.setLength(0);
// now it's ok to reuse s
...
If your problem is to read the contents of the file in a String[], then you could actually use apache common's FileUtil class and read in an array list and then convert to an array.
List<String> fileContentsInList = FileUtils.readLines(new File("filename"));
String[] fileContentsInArray = new String[fileContentsInList.size()];
fileContentsInArray = (String[]) fileContentsInList.toArray(fileContentsInArray);
In the code that you have specified, rather than setting length to 0, you can reinitialize the StringBuffer.
Related
I am trying to write a text classifier in Weka with Naive Bayes. I have a collection of Foursquare tips as training data with close to 500 of them marked as positive and approximately same marked as negative in an excel file. The input file has two columns with first one being the tip text and second one the marked polarity. I am using AFINN-111.txt to add an attribute to enhance the output. It calculates all the polar words in that tip and gives a final score of all the words. Here is my entire code:
public class DataReader {
static Map<String, Integer> affinMap = new HashMap<String, Integer>();
public List<List<Object>> createAttributeList() {
ClassLoader classLoader = getClass().getClassLoader();
initializeAFFINMap(classLoader);
File inputWorkbook = new File(classLoader
.getResource("Tip_dataset2.xls").getFile());
Workbook w;
Sheet sheet = null;
try {
w = Workbook.getWorkbook(inputWorkbook);
// Get the first sheet
sheet = w.getSheet(0);
} catch (Exception e) {
e.printStackTrace();
}
List<List<Object>> attributeList = new ArrayList<List<Object>>();
for (int i = 1; i < sheet.getRows(); i++) {
String tip = sheet.getCell(0, i).getContents();
tip = tip.replaceAll("'", "");
tip = tip.replaceAll("\"", "");
tip = tip.replaceAll("%", " percent");
tip = tip.replaceAll("#", " ATAUTHOR");
String polarity = getPolarity(sheet.getCell(1, i).getContents());
int affinScore = 0;
String[] arr = tip.split(" ");
for (int j = 0; j < arr.length; j++) {
if (affinMap.containsKey(arr[j].toLowerCase())) {
affinScore = affinScore
+ affinMap.get(arr[j].toLowerCase());
}
}
List<Object> attrs = new ArrayList<Object>();
attrs.add(tip);
attrs.add(affinScore);
attrs.add(polarity);
attributeList.add(attrs);
}
return attributeList;
}
private String getPolarity(String cell) {
if (cell.equalsIgnoreCase("positive")) {
return "positive";
} else {
return "negative";
}
}
private void initializeAFFINMap(ClassLoader classLoader) {
try {
InputStream stream = classLoader
.getResourceAsStream("AFINN-111.txt");
DataInputStream in = new DataInputStream(stream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String str;
while ((str = br.readLine()) != null) {
String[] array = str.split("\t");
affinMap.put(array[0], Integer.parseInt(array[1]));
}
in.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws Exception {
List<List<Object>> attrList=new DataReader().createAttributeList();
new CreateTrainedModel().createTrainingData(attrList);
}
}
Here is the actual classifier class:
public class CreateTrainedModel {
public void createTrainingData(List<List<Object>> attrList)
throws Exception {
Attribute tip = new Attribute("tip", (FastVector) null);
Attribute affin = new Attribute("affinScore");
FastVector pol = new FastVector(2);
pol.addElement("positive");
pol.addElement("negative");
Attribute polaritycl = new Attribute("polarity", pol);
FastVector inputDataDesc = new FastVector(3);
inputDataDesc.addElement(tip);
inputDataDesc.addElement(affin);
inputDataDesc.addElement(polaritycl);
Instances dataSet = new Instances("dataset", inputDataDesc,
attrList.size());
// Set class index
dataSet.setClassIndex(2);
for (List<Object> onList : attrList) {
Instance in = new Instance(3);
in.setValue((Attribute) inputDataDesc.elementAt(0), onList.get(0)
.toString());
in.setValue((Attribute) inputDataDesc.elementAt(1),
Integer.parseInt(onList.get(1).toString()));
in.setValue((Attribute) inputDataDesc.elementAt(2), onList.get(2)
.toString());
dataSet.add(in);
}
Filter f = new StringToWordVector();
f.setInputFormat(dataSet);
dataSet = Filter.useFilter(dataSet, f);
Classifier model = (Classifier) new NaiveBayes();
try {
model.buildClassifier(dataSet);
} catch (Exception e1) { // TODO Auto-generated catch block
e1.printStackTrace();
}
ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(
"FS-TipsNaiveBayes.model"));
oos.writeObject(model);
oos.flush();
oos.close();
FastVector fvWekaAttributes1 = new FastVector(3);
fvWekaAttributes1.addElement(tip);
fvWekaAttributes1.addElement(affin);
Instance in = new Instance(3);
in.setValue((Attribute) fvWekaAttributes1.elementAt(0),
"burger here is good");
in.setValue((Attribute) fvWekaAttributes1.elementAt(1), 0);
Instances testSet = new Instances("dataset", fvWekaAttributes1, 1);
in.setDataset(testSet);
double[] fDistribution = model.distributionForInstance(in);
System.out.println(fDistribution);
}
}
The problem I am facing is with any input the output distribution is always in the range of [0.52314376998377, 0.47685623001622995]. And it is always more towards the positive than the negative. These figures do not change drastically. Any idea what wrong am I doing?
I didn't read your code, but one thing I can say is that the AFFIN score is normalized between a certain range. If your output is more towards a positive range then you need to change your classification cost function, because it is overfitting your data.
As shown in the title, I read owl file which is generated by Protege to Jena, modified it by adding some NamedIndividuals, and I wanted to read the modified file by Protege. Things went on well until I open this owl file with Protege. Protege just can not read it!
I tried every "RDF/XML", "RDF/XML - ABBREV", "N - TRIPLE", "TTL", and still nothing.
One good news is that when I use "RDF/XML - ABBREV" and delete all the NamedIndividual I added, Protege works.
But I want my Individuals!!!
public class Pizza00 {
public static void main(String[] args) throws IOException{
String SOURCE = "http://www.seaice.com/ontologies/seaice.owl";
String NS = SOURCE + "#";
OntModel m = ModelFactory.createOntologyModel();
try {
m.read(new FileInputStream("G:/Protege/owl files/SeaIce.owl"), null);
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
reason rec = new reason();
rec = readFileByLines("G:/data/seaice_property.txt");
int i;
OntClass tmp = m.getOntClass(NS + "SeaIceProperty");
for(i = 1; i <= rec.line - 1; i ++){
m.createIndividual(NS + rec.s[i], tmp);
}
m.write(System.out);
OutputStream out = new FileOutputStream("G:/Protege/owl files/SeaIce - by jena.owl");
m.write(out, "RDF/XML-ABBREV", null);
out.close();
}
static class reason{
String[] s= new String[1000];
int line;
}
public static reason readFileByLines(String fileName) {
File file = new File(fileName);
BufferedReader reader = null;
reason x = new reason();
try {
reader = new BufferedReader(new FileReader(file));
String tempString = null;
// 一次读入一行,直到读入null为文件结束
while ((tempString = reader.readLine()) != null) {
// 显示行号
x.s[x.line] = tempString;
x.line++;
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException e1) {
}
}
}
return x;
}
}
As you mention in the comments, the error is
this line <j.0:SeaIceProperty
rdf:about="seaice.com/ontologies/seaice.owl#Optical Band Imagery"/>
error occurs.
And the error by Protege is
org.semanticweb.owlapi.rdf.syntax.RDFParserException:
[line=30:column=101] IRI 'seaice.com/ontologies/seaice.owl#Optical
Band Imagery' cannot be resolved against curent base IRI
file:/G:/Protege/owl%20files/SeaIce%20-%20by%20jena.owl
The not-quite IRI seaice.com/ontologies/seaice.owl#Optical Band Imagery is relative, and can't be resolved against the current base IRI, which is a file IRI: file:/G:/Protege/owl%20files/SeaIce%20-%20by%20jena.owl. The easiest way to fix this is to use absolute IRIs when you are creating your named individuals.
I see that you can't use string tokenizer on an array because you cant convert String() to String[]. After a length of time I realized that if the inputFromFile method reads it line by line, I can tokenize it line by line. I just don't know how to do it so that it returns the tokenized version of it.
I'm assuming in the line=in.ReadLine(); line I should put StringTokenizer token = new StringTokenizer(line,",").. but it doesn't seem to be working.
Any help? (I have to tokenize the commas).
public class Project1 {
private static int inputFromFile(String filename, String[] wordArray) {
TextFileInput in = new TextFileInput(filename);
int lengthFilled = 0;
String line = in.readLine();
while (lengthFilled < wordArray.length && line != null) {
wordArray[lengthFilled++] = line;
line = in.readLine();
}// while
if (line != null) {
System.out.println("File contains too many Strings.");
System.out.println("This program can process only "
+ wordArray.length + " Strings.");
System.exit(1);
} // if
in.close();
return lengthFilled;
} // method inputFromFile
public static void main(String[] args) {
String[] numArray = new String[100];
inputFromFile("input1.txt", numArray);
for (int i = 0; i < numArray.length; i++) {
if (numArray[i] == null) {
break;
}
System.out.println(numArray[i]);
}// for
for (int i=0;i<numArray.length;i++)
{
Integer.parseInt(numArray[i]);
}
}// main
}// project1
This is what I meant:
while (lengthFilled < wordArray.length && line != null) {
String[] tokens = line.split(",");
if(tokens == null || tokens.length == 0) {
//line without required token, add whole line as it is
wordArray[lengthFilled++] = line;
} else {
//add each token into wordArray
for(int i=0; i<tokens.length;i++) {
wordArray[lengthFilled++] = tokens[i];
}
}
line = in.readLine();
}// while
There can be other approaches as well. For instance, you can use a StringBuilder to read everything as one big string and them split it on your required tokens etc. The above logic is just to point you in right direction.
using System;
namespace jagged_array
{
class Program
{
static void Main(string[] args)
{
string[][] Members = new string[10][]{
new string[]{"amit","amit#gmail.com", "9999999999"},
new string[]{"chandu","chandu#gmail.com","8888888888"},
new string[]{"naveen","naveen#gmail.com", "7777777777"},
new string[]{"ramu","ramu#gmail.com", "6666666666"},
new string[]{"durga","durga#gmail.com", "5555555555"},
new string[]{"sagar","sagar#gmail.com", "4444444444"},
new string[]{"yadav","yadav#gmail.com", "3333333333"},
new string[]{"suraj","suraj#gmail.com", "2222222222"},
new string[]{"niharika","niharika#gmail.com","11111111111"},
new string[]{"anusha","anusha#gmail.com", "0000000000"},
};
for (int i =0; i < Members.Length; i++)
{
System.Console.Write("Name List ({0}):", i + 1);
for (int j = 0; j < Members[i].Length; j++)
{
System.Console.Write(Members[i][j] + "\t");
}
System.Console.WriteLine();
}``
Console.ReadKey();
}
}
}
The above is the code for my C# console program in which i used jagged array and i assigned values manually but now my requirement is 'without assigning manually into array i want the same details to import into my program from an csv file(which is at some location in my disc). So how to do it what functions should i make use , please help me with some example. Thank you.
static void Main()
{
string csv_file_path=#"C:\Users\Administrator\Desktop\test.csv";
DataTable csvData = GetDataTabletFromCSVFile(csv_file_path);
Console.WriteLine("Rows count:" + csvData.Rows.Count);
Console.ReadLine();
}
private static DataTable GetDataTabletFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
try
{
using(TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//Making empty value as null
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
}
catch (Exception ex)
{
}
return csvData;
}
Treat the CSV file like an excel workbook and you will find a lot of examples on the web for what you need to do.
ExcelFile ef = new ExcelFile();
// Loads file.
ef.LoadCsv("filename.csv");
// Selects first worksheet.
ExcelWorksheet ws = ef.Worksheets[0];
I won't go into details, but you can read lines text from a file with File.ReadAllLines.
Once you have those lines, you can split them into parts using String.Split (at least this will work if the CSV file contains very simple information as in your example).
JavaME is quite sparse on features. Please list your favourite utility functions for making using it more like using proper Java, one per answer. Try to make your answers specific to Java ME.
Small Logging Framework
MicroLog
http://microlog.sourceforge.net/site/
Splitting a string
static public String[] split(String str, char c)
{
int l=str.length();
int count = 0;
for(int i = 0;i < l;i++)
{
if (str.charAt(i) == c)
{
count ++;
}
}
int first = 0;
int last = 0;
int segment=0;
String[] array = new String[count + 1];
for(int i=0;i<l;i++)
{
if (str.charAt(i) == c)
{
last = i;
array[segment++] = str.substring(first,last);
first = last;
}
if(i==l-1){
array[segment++] = str.substring(first,l);
}
}
return array;
}
Read a line from a reader. See also this question.
public class LineReader{
private Reader in;
private int bucket=-1;
public LineReader(Reader in){
this.in=in;
}
public boolean hasLine() throws IOException{
if(bucket!=-1)return true;
bucket=in.read();
return bucket!=-1;
}
//Read a line, removing any /r and /n. Buffers the string
public String readLine() throws IOException{
int tmp;
StringBuffer out=new StringBuffer();
//Read in data
while(true){
//Check the bucket first. If empty read from the input stream
if(bucket!=-1){
tmp=bucket;
bucket=-1;
}else{
tmp=in.read();
if(tmp==-1)break;
}
//If new line, then discard it. If we get a \r, we need to look ahead so can use bucket
if(tmp=='\r'){
int nextChar=in.read();
if(tmp!='\n')bucket=nextChar;//Ignores \r\n, but not \r\r
break;
}else if(tmp=='\n'){
break;
}else{
//Otherwise just append the character
out.append((char) tmp);
}
}
return out.toString();
}
}