I tried the proposed solutions online by saving the file in ANSI and deleting the first line and changing the attributes to numeric instead of real as follows and even by adding a '}' symbol at the line 29 but I still get the following error in WEKA when I try to import the arff file.
Error Message:
Unable to determine structure as arff(Reason:java.io.IOException: } expected at end of enumeration, read Token[EOL],line 29)
ARFF file:
#relation Pilot
#attribute Gender? { Male (Lelaki),Female (Perempuan) }
#attribute Age? numeric
#attribute 1# numeric
#attribute 2# numeric
#attribute 22#{Nothing_to_Carry,Need_to_carry_many_things}
#attribute 14# numeric
#attribute 3# numeric
#attribute 18# numeric
#attribute 17# numeric
#attribute 4# numeric
#attribute 5# numeric
#attribute 15# numeric
#attribute 16# numeric
#attribute 19# {No,Yes}
#attribute 20# {Yes,No}
#attribute 6# numeric
#attribute 7# numeric
#attribute 8# numeric
#attribute 9# numeric
#attribute 11# numeric
#attribute 10# numeric
#attribute 12# numeric
#attribute 13# numeric
#attribute 21#{No,Yes}
#attribute Physical_Disability{Partially_Visually_Impaired,Blind}
#attribute 23#{Yes,Don't_know,No}
#attribute 24#{No,Don't know,Yes}
#attribute 25#{Yes,No}
}
#data
Male,36,2,3,Nothing_to_Carry,1,3,2,3,3,2,1,2,No,Yes,3,5,5,4,5,4,3,3,No,Partially_Visually_Impaired,Yes,No,Yes
Female,44,3,3,Nothing_to_Carry,3,4,3,3,4,3,1,1,No,Yes,4,4,3,2,3,3,4,4,No,Partially_Visually_Impaired,Yes,No,Yes
Male,34,3,4,Nothing_to_Carry,3,3,2,1,4,3,2,1,No,Yes,1,4,3,1,5,3,4,5,No,Blind,Yes,Don't know,Yes
Male,56,1,3,Nothing_to_Carry,3,4,4,4,4,3,3,3,No,Yes,1,5,5,5,3,3,5,1,Yes,Blind,Don't know,Yes,Yes
Male,54,5,5,Nothing_to_Carry,1,1,1,5,5,5,1,5,No,Yes,1,5,5,1,5,1,1,5,Yes,Blind,Yes,No,Yes
Female,39,1,1,Nothing_to_Carry,1,2,1,5,3,5,5,5,Yes,Yes,3,3,5,1,1,5,5,5,Yes,Blind,Yes,Yes,Yes
Male,49,2,3,Nothing_to_Carry,2,2,3,4,4,4,3,3,No,Yes,1,3,3,4,3,3,4,4,No,Partially_Visually_Impaired,No,No,Yes
Male,68,5,4,Nothing_to_Carry,4,4,2,5,2,3,3,3,No,No,1,2,3,1,3,3,3,4,No,Blind,Yes,Don't know,No
Male,44,1,1,Nothing_to_Carry,1,3,3,3,3,3,3,1,No,Yes,1,5,4,4,3,4,2,2,Yes,Blind,Yes,Yes,Yes
Male,45,1,1,Nothing_to_Carry,1,2,1,1,1,1,3,1,No,Yes,5,5,1,5,5,5,5,5,No,Partially_Visually_Impaired,No,No,Yes
Male,59,3,4,Nothing_to_Carry,4,3,3,3,3,3,3,3,No,No,2,1,3,1,4,3,4,2,No,Blind,Yes,Yes,No
Male,38,3,3,Nothing_to_Carry,4,4,3,4,4,3,3,3,No,Yes,4,2,4,1,2,3,3,3,No,Partially_Visually_Impaired,Yes,No,Yes
Male,29,4,2,Nothing_to_Carry,4,4,4,4,3,4,4,3,Yes,Yes,4,3,3,3,3,3,4,3,No,Blind,Yes,No,Yes
}
Please advise...Thank you.
Arff don't need closing bracelet at the end of attribute section or data section. So remove them.
Attribute name must start with an alphabetic character.
If nominal values contains space then they must be quoted e.g here values of gender, 24# attributes needs to be quoted i.e. 'Male (Lelaki)'.
Please check whether space needs to be given in between attribute name and attribute datatype even for nominal values.
Also make it sure that each line of data input consists of number of values equal to number of attributes specified in attribute section.
If above points fail to remove error please check arff file format details at http://www.cs.waikato.ac.nz/ml/weka/arff.html
Next time consider editing your question instead of writing a question in an answer.
The problem is that you should change Don't_know and Don't know at attributes 23 and 24 to Dont_know. Meaning that you have to skip punctuation and spaces.
Also you need to change either this #attribute Gender? { Male(Lelaki),Female(Perempuan) } to #attribute Gender? { Male,Female }
Or your data should be like this Male(Lelaki),36,2,3,Nothing_to_Carry,1,3,... instead of Male,36,2,3,Nothing_to_Carry,1,3...
Try to open your .csv file dataset from Weka Software.
It has some steps to do
By installing the Weka software and go to the "Experimenter" tab
Then go to the "analyze" tab
Then select the.csv data set from opening it through "file" option
Then click the "open explorer" button
Then save the opened output as .arff format with a name
After that Close the Weka software
7.Then open the Weka software again and go to the "Explorer" tab
Then open the earlier saved .arff format dataset from that
Afterthat this type of error doesn't come. You can do the preprocessing, classifications, association rules easily with Weka software.
I'm sure this is an issue anyone who uses Stata for publications or reports has run into:
How do you conveniently export your output to something that can be parsed by a scripting language or Excel?
There are a few ado files that do this for specific commands. For example:
findit tabout
findit outreg2
But what about exporting the output of the table command? Or the results of an anova?
I would love to hear about how Stata users address this problem for either specific commands or in general.
After experimenting with this for a while, I've found a solution that works for me.
There are a variety of ADOs that handle exporting specific functions. I've made use of outreg2 for regressions and tabout for summary statistics.
For more simple commands, it's easy to write your own programs to save results automatically to plaintext in a standard format. Here are a few I wrote...note that these both display results (to be saved to a log file) and export them into text files – if you wanted to just save to text you could get rid of the di's and qui the sum, tab, etc. commands:
cap program drop sumout
program define sumout
di ""
di ""
di "Summary of `1'"
di ""
sum `1', d
qui matrix X = (r(mean), r(sd), r(p50), r(min), r(max))
qui matrix colnames X = mean sd median min max
qui mat2txt, matrix(X) saving("`2'") replace
end
cap program drop tab2_chi_out
program define tab2_chi_out
di ""
di ""
di "Tabulation of `1' and `2'"
di ""
tab `1' `2', chi2
qui matrix X = (r(p), r(chi2))
qui matrix colnames X = chi2p chi2
qui mat2txt, matrix(X) saving("`3'") replace
end
cap program drop oneway_out
program define oneway_out
di ""
di ""
di "Oneway anova with dv = `1' and iv = `2'"
di ""
oneway `1' `2'
qui matrix X = (r(F), r(df_r), r(df_m), Ftail(r(df_m), r(df_r), r(F)))
qui matrix colnames X = anova_between_groups_F within_groups_df between_groups_df P
qui mat2txt, matrix(X) saving("`3'") replace
end
cap program drop anova_out
program define anova_out
di ""
di ""
di "Anova command: anova `1'"
di ""
anova `1'
qui matrix X = (e(F), e(df_r), e(df_m), Ftail(e(df_m), e(df_r), e(F)), e(r2_a))
qui matrix colnames X = anova_between_groups_F within_groups_df between_groups_df P RsquaredAdj
qui mat2txt, matrix(X) saving("`2'") replace
end
The question is then how to get the output into Excel and format it. I found that the best way to import the text output files from Stata into Excel is to concatenate them into one big text file and then import that single file using the Import Text File... feature in Excel.
I concatenate the files by placing this Ruby code in the output folder and then running int from my Do file with qui shell cd path/to/output/folder/ && ruby table.rb:
output = ""
Dir.new(".").entries.each do |file|
next if file =~/\A\./ || file == "table.rb" || file == "out.txt"
if file =~ /.*xml/
system "rm #{file}"
next
end
contents = File.open(file, "rb").read
output << "\n\n#{file}\n\n" << contents
end
File.open("out.txt", 'w') {|f| f.write(output)}
Once I import out.txt into its own sheet in Excel, I use a bunch of Excel's built-in functions to pull the data together into nice, pretty tables.
I use a combination of vlookup, offset, match, iferror, and hidden columns with cell numbers and filenames to do this. The source .txt file is included in out.txt just above the contents of that file, which lets you look up the contents of the file using these functions and then reference specific cells using vlookup and offset.
This Excel business is actually the most complicated part of this system and there's really no good way to explain it without showing you the file, though hopefully you can get enough of an idea to figure it out for yourself. If not, feel free to contact me through http://maxmasnick.com and I can get you more info.
I have found that the estout package is the most developed and has good documentation.
This is an old question and a lot has happened since it was posted.
Stata now has several built-in commands and functions that allow anyone to
export customized output fairly easily:
putexcel
putexcel with advanced syntax
putdocx
putpdf
There are also equivalent Mata functions / classes, which offer greater flexibility:
_docx*()
Pdf*()
xl()
From my experience, there aren't 100% general solutions. Community-contributed commands such as estout are now mature enough to handle most basic operations. That said, if you have something that deviates even slightly from the template you will have to program this yourself.
Most tutorials throw in several packages where it would indeed nice to have only one exporting everything, which is what Max suggests above with his interesting method.
I personally use tabout for summary statistics and frequencies, estout for regression output, and am trying out mkcorr for correlation matrixes.
It's been a while, but I believe you can issue a log command to capture the output.
log using c:\data\anova_analysis.log, text
[commands]
log close
I use estpost-- part of the estout package-- to tabulate results from non-estimation commands. You can then store them and export easily.
Here's an example:
estpost corr varA varB varC varD, matrix
est store corrs
esttab corrs using corrs.rtf, replace
You can then add options to change formatting, etc.
You can use asdoc that is available on SSC. To download,
ssc install asdoc
asdoc works well with almost all Stata commands. Specifically, it produces publication quality tables for :
summarize command - to report summary statistics
cor or pwcorr command - to report correlations
tabstat - for flexible tables of descriptive statistics
tabulate - for one-way, two-way, three-way tabulations
regress - for detailed, nested, and wide regression tables
table - flexible tables
and many more. You can explore more about asdoc here
https://fintechprofessor.com/2018/01/31/asdoc/