Unable to determine structure as arff in WEKA - attributes

I tried the proposed solutions online by saving the file in ANSI and deleting the first line and changing the attributes to numeric instead of real as follows and even by adding a '}' symbol at the line 29 but I still get the following error in WEKA when I try to import the arff file.
Error Message:
Unable to determine structure as arff(Reason:java.io.IOException: } expected at end of enumeration, read Token[EOL],line 29)
ARFF file:
#relation Pilot
#attribute Gender? { Male (Lelaki),Female (Perempuan) }
#attribute Age? numeric
#attribute 1# numeric
#attribute 2# numeric
#attribute 22#{Nothing_to_Carry,Need_to_carry_many_things}
#attribute 14# numeric
#attribute 3# numeric
#attribute 18# numeric
#attribute 17# numeric
#attribute 4# numeric
#attribute 5# numeric
#attribute 15# numeric
#attribute 16# numeric
#attribute 19# {No,Yes}
#attribute 20# {Yes,No}
#attribute 6# numeric
#attribute 7# numeric
#attribute 8# numeric
#attribute 9# numeric
#attribute 11# numeric
#attribute 10# numeric
#attribute 12# numeric
#attribute 13# numeric
#attribute 21#{No,Yes}
#attribute Physical_Disability{Partially_Visually_Impaired,Blind}
#attribute 23#{Yes,Don't_know,No}
#attribute 24#{No,Don't know,Yes}
#attribute 25#{Yes,No}
}
#data
Male,36,2,3,Nothing_to_Carry,1,3,2,3,3,2,1,2,No,Yes,3,5,5,4,5,4,3,3,No,Partially_Visually_Impaired,Yes,No,Yes
Female,44,3,3,Nothing_to_Carry,3,4,3,3,4,3,1,1,No,Yes,4,4,3,2,3,3,4,4,No,Partially_Visually_Impaired,Yes,No,Yes
Male,34,3,4,Nothing_to_Carry,3,3,2,1,4,3,2,1,No,Yes,1,4,3,1,5,3,4,5,No,Blind,Yes,Don't know,Yes
Male,56,1,3,Nothing_to_Carry,3,4,4,4,4,3,3,3,No,Yes,1,5,5,5,3,3,5,1,Yes,Blind,Don't know,Yes,Yes
Male,54,5,5,Nothing_to_Carry,1,1,1,5,5,5,1,5,No,Yes,1,5,5,1,5,1,1,5,Yes,Blind,Yes,No,Yes
Female,39,1,1,Nothing_to_Carry,1,2,1,5,3,5,5,5,Yes,Yes,3,3,5,1,1,5,5,5,Yes,Blind,Yes,Yes,Yes
Male,49,2,3,Nothing_to_Carry,2,2,3,4,4,4,3,3,No,Yes,1,3,3,4,3,3,4,4,No,Partially_Visually_Impaired,No,No,Yes
Male,68,5,4,Nothing_to_Carry,4,4,2,5,2,3,3,3,No,No,1,2,3,1,3,3,3,4,No,Blind,Yes,Don't know,No
Male,44,1,1,Nothing_to_Carry,1,3,3,3,3,3,3,1,No,Yes,1,5,4,4,3,4,2,2,Yes,Blind,Yes,Yes,Yes
Male,45,1,1,Nothing_to_Carry,1,2,1,1,1,1,3,1,No,Yes,5,5,1,5,5,5,5,5,No,Partially_Visually_Impaired,No,No,Yes
Male,59,3,4,Nothing_to_Carry,4,3,3,3,3,3,3,3,No,No,2,1,3,1,4,3,4,2,No,Blind,Yes,Yes,No
Male,38,3,3,Nothing_to_Carry,4,4,3,4,4,3,3,3,No,Yes,4,2,4,1,2,3,3,3,No,Partially_Visually_Impaired,Yes,No,Yes
Male,29,4,2,Nothing_to_Carry,4,4,4,4,3,4,4,3,Yes,Yes,4,3,3,3,3,3,4,3,No,Blind,Yes,No,Yes
}
Please advise...Thank you.

Arff don't need closing bracelet at the end of attribute section or data section. So remove them.
Attribute name must start with an alphabetic character.
If nominal values contains space then they must be quoted e.g here values of gender, 24# attributes needs to be quoted i.e. 'Male (Lelaki)'.
Please check whether space needs to be given in between attribute name and attribute datatype even for nominal values.
Also make it sure that each line of data input consists of number of values equal to number of attributes specified in attribute section.
If above points fail to remove error please check arff file format details at http://www.cs.waikato.ac.nz/ml/weka/arff.html

Next time consider editing your question instead of writing a question in an answer.
The problem is that you should change Don't_know and Don't know at attributes 23 and 24 to Dont_know. Meaning that you have to skip punctuation and spaces.
Also you need to change either this #attribute Gender? { Male(Lelaki),Female(Perempuan) } to #attribute Gender? { Male,Female }
Or your data should be like this Male(Lelaki),36,2,3,Nothing_to_Carry,1,3,... instead of Male,36,2,3,Nothing_to_Carry,1,3...

Try to open your .csv file dataset from Weka Software.
It has some steps to do
By installing the Weka software and go to the "Experimenter" tab
Then go to the "analyze" tab
Then select the.csv data set from opening it through "file" option
Then click the "open explorer" button
Then save the opened output as .arff format with a name
After that Close the Weka software
7.Then open the Weka software again and go to the "Explorer" tab
Then open the earlier saved .arff format dataset from that
Afterthat this type of error doesn't come. You can do the preprocessing, classifications, association rules easily with Weka software.

Related

How to split json data in PowerApps

How to split json data in PowerApps
I have a json format text
{"ID":"1","name":"yashpal"}
and I need to split the data and assign 1 to textbox1 and Yashpal to textbox2
Power Apps currently does not have a general JSON parsing mechanism, but if you know that the text you have will always have the same format, and the 'name' property cannot have double quotes ("), then you can use a regular expression to extract the values, something along the lines of
With(
Match(
<<the json text>>,
"\""ID\"":\""(?<id>[^\""]+)\"",\""name\"":\""(?<name>[^\""]+)\"""),
UpdateContext({defaultId: id, defaultName: name}))
And you can use the variables defaultId and defaultName as the Default property of 'textbox1' and 'textbox2', respectively.

Convert varchar string to Currency format in db2 SQL

I have a column from which i have to extract String and then format it back to US currency format with 2 decimal places.
For example :
Column value : {tag}0000020000890|
From this, I have to match the tag and extract 20000890, and format it to 200,008.90
I have extracted the part with below code:
LTRIM(REGEXP_SUBSTR('match pattern', 1,1,'i',,1), '0')
Where match pattern is '\{tag\}(.*?)\|'
With this, I am able to extract 20000890
And then I tried the below to_char and to_number function on top of it to format as comma separated currency with 2 decimal points.
to_char(ltrim(Regexp_substr('match pattern',1,1,'i',1),'0'), '99G999G999D99')
But this throws below error:
Sql error -20447, sqlstate 22007 sqlerrmc 99G999G999D99
Sysibm.Varchar-format
Then I tried,
to_char(to_number(ltrim(Regexp_substr('match pattern',1,1,'i',1),'0')), '99G999G999D99')
But this also throws error:
Sql error -20476, sqlstate 22018 sqlermc DECFLOAT_FORMAT; 99G999G999D99
I'm not sure what causes this error.
The format that you try to use is supported starting from V11.5 only.
TO_CHAR V11.5
TO_CHAR V11.1
Compare the Table 2. Format elements for decimal floating-point to varchar table from both links.
Moreover, you must cast a string to a numeric value in the 1-st parameter of TO_CHAR:
SELECT TO_CHAR(DECFLOAT(REGEXP_SUBSTR(V, '\{tag\}(.*?)\|', 1, 1, 'i', 1)), '99,999,999.99')
FROM (VALUES '{tag}0000020000890|') T(V);
Take a look at VARCHAR_FORMAT. It is the function TO_CHAR is mapped to. The group separator is not G, but "," or ".". Basically, you have to replace your formatting string 99G999G999D99 with something like 99,999,999.99.
The Db2 documentation has more examples on that.

Django-Python decimal number formatting comma issue

I have a model and there is this decimal field rule;
product_price = models.DecimalField(max_digits=10, decimal_places=2)
I am getting data from an xml file and most of the product prices are like this format;
847.54
So there is actual number after that for the separator from the remainder there is '.' (dot) for it and this is ok.
However some product prices are like this;
2,906.69
The actual number includes an extra comma and this causes a problem. I am getting this error;
django.core.exceptions.ValidationError: ["'2,906.69' value must be a decimal number."]
Should I change my model? or is there any other solution for it?
You can try replacing the comma and then converting the string to float before inserting into DB.
Ex:
def cleanDecimal(value):
return float(value.replace(",", ""))
product_price = models.DecimalField(max_digits=10, decimal_places=2, validators=[cleanDecimal])
Ex:
print(float("2,906.69".replace(",", "")))
2906.69

Reading hex value in datastage

We have a mainframe file which we are trying to read using Complex Flat File Stage. The column has data type PIC X(1) which we are reading as char(1) and assigning to char (10). Problem is it converts to value "26" when the value should be 30. The value displayed in mainframe is x'30' (which seems to be hex). Is there a way to convert value correctly in datastage? Right now following is being implemented in the transformer:
DecimalToString(seq(link1.DAY),"suppress_zero")

WEKA linear regression conversion issue

I have converted an excel file to csv and opened the csv file on WEKA to classify the data using linear regression but it doesn't allow me to select 'linear regression' option under the 'function' branch. This is my format
#RELATION book
#ATTRIBUTE bookID STRING
#ATTRIBUTE author STRING
#ATTRIBUTE genre STRING
#ATTRIBUTE publisher STRING
#ATTRIBUTE yearPublished NUMERIC
#ATTRIBUTE rating NUMERIC
#DATA
book1, suzzane-collins, horror, scholastic, 2008, 4011425
book2, jay-rowling, fantasy, scholastic, 2004, 1560433
book3, harper-lee, comedy, harper-classics, 2006, 2708232
book4, jane-austen, romance, modern-library, 2008, 1560433
book5, stephenie-meyer, romance, little-brown, 2006, 40114255
book6, john-lewis, thriller, harper-collins, 2002, 352728
book7, margarte, mystery, grand-central, 1964, 780522
book8, George-orwell, humour, nal, 2003, 1679178
book9, markus-zusak, legend, grand-central, 2006, 780522
book10, shel-silverstein, folklore, harper-collins, 1964, 592994
For linear regression, your attributes have to be #NUMERIC. If you want to do the regression based only on the last two attributes, then you will have to specify that (by checking those attributes) in the "Preprocess" tab in Weka so that it only uses the right ones. You can check this example to see what you are doing wrong. They explain how to run basic linear regression in WEKA from scratch.

Resources