I have a Support Vector model in Weka (SMO) and I want to extract knowledge from this output:
=== Classifier model (full training set) ===
SMO
Kernel used:
Puk kernel
Classifier for classes: Positive, Negative
BinarySMO
0.9349 * <0.364865 0 0 1 0 1 0 0 0 1 1 1 0 1 1 1 > * X]
+ 0.743 * <0.486486 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 > * X]
+ 0.8578 * <0.391892 0 0 1 0 1 1 1 0 1 0 1 0 1 1 1 > * X]
- 0.815 * <0.297297 1 0 1 0 1 0 0 0 1 0 1 0 1 1 0 > * X]
- 0.2347 * <0.391892 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 > * X]
+ 1.1502 * <0.527027 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 > * X]
+ 0.6922 * <0.554054 0 0 1 0 1 1 0 1 1 0 1 0 0 1 1 > * X]
.....
- 0.3291 * <0.594595 1 1 1 1 0 0 1 1 0 0 0 1 0 1 0 > * X]
+ 0.9296 * <0.364865 0 0 1 1 1 0 1 0 1 0 0 0 1 0 1 > * X]
+ 0.6504 * <0.351351 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 > * X]
- 0.0333 * <0.27027 1 1 0 0 0 1 0 1 0 0 0 1 1 1 1 > * X]
+ 0.0085 * <0.513514 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 > * X]
+ 0.8176 * <0.72973 0 1 1 0 1 1 0 1 0 0 0 1 0 0 1 > * X]
- 0.4812 * <1 1 0 0 1 1 0 1 1 0 0 1 0 0 0 1 > * X]
- 0.3286 * <0.256757 1 0 0 1 0 1 0 0 0 0 1 1 1 1 1 > * X]
.........
- 0.1838 * <0.635135 0 1 0 1 0 1 0 1 1 0 1 0 0 0 0 > * X]
- 0.0976 * <0.189189 1 1 0 1 1 1 0 0 1 1 1 1 0 1 1 > * X]
- 0.0036 * <0.364865 1 1 0 1 0 1 0 1 1 0 1 1 0 1 0 > * X]
- 0.0157 * <0.554054 0 1 0 1 0 1 0 1 1 0 1 1 1 1 1 > * X]
.........
- 0.0167 * <0.621622 0 1 1 0 0 0 1 1 0 1 1 1 0 0 0 > * X]
+ 0.2005 * <0.5 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 > * X]
- 0.589
Number of support vectors: 378
Number of kernel evaluations: 131997 (92.5% cached)
How can I interpret this output?
Thanks in advance
Have a look at SMO's toString() method to see how the output is constructed. Check out the Puk kernel itself (publication), to see how its calculations are done.
The textual output of classifiers is usually only for informative purposes (it is optional and has no impact on a classifier). People usually apply trained models directly to new data rather than trying to understand the output (especially with support vector machines).
df_2D = df[['sepal-length', 'petal-length']]
df_2D = np.array(df_2D)
k_means_2D_model = KMeans(n_clusters=3, max_iter=1000).fit(df_2D)
Error:
ValueError: Expected 2D array, got 1D array instead:
array=[0 1 0 0 2 2 1 2 1 1 0 1 0 2 0 2 1 2 0 2 2 1 0 1 2 0 0 2 1 0 2 0 1 1 0 0 2
1 2 1 0 0 1 1 1 1 2 1 2 2 0 2 2 2 0 1 1 1 1 0 1 2 1 2 1 2 2 1 2 1 0 0 2 1
1 0 2 0 0 1 1 0 1 2 1 1 0 1 0 1 0 0 0 2 1 1 1 2 0 2 0 0 2 2 0 0 1 2 2 1 0
2 2 1 1 2 0 0 2 2 2 0 0 1 0 1 0 2 2 2 0 0 0 0 2 1 2 2 2 2 1 1 1 2 2 0 0 0
1 0].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I tried to solve it by using 2 for loops and an if statement . But i was unable to get the desired output.
INPUT-
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
thislist=[1]*10
thislist=[thislist]*10
print(thislist)
for i in range(10):
for j in range(10):
print(thislist[i][j], end = " ")
print()
print()
for i in range(10):
for j in range(10):
if i>j:
thislist[i][j]=0
for i in range(10):
for j in range(10):
print(thislist[i][j], end = " ")
print()
This was the output i got:
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
but when i made a list using the below method i got the desired output.
thislist=[[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1]]
print(thislist)
for i in range(10):
for j in range(10):
if i>j:
thislist[i][j]=0
for i in range(10):
for j in range(10):
print(thislist[i][j], end = " ")
print()
note-This is the desired OUTPUT-
1 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1
0 0 0 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1 1 1
0 0 0 0 0 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 1
Can someone explain whats the difference between the above 2 codes?
As you pointed out, the problem comes from the manner you created your list of list. In your first example, you do something like this:
list1 = [1]*10
list_of_list1=[list1]*10
list_of_list1 is actually a list of shallow copies of the original list1. Then if you modify a value in list_of_list1, the modification will occurs in all the rows of list_of_list1.
The opposit of a shallow copy is a deep copy. You might want to search more info on the Internet about this topic
In the mean time, you can simply try this.
thislist = []
for row in range(10):
list1 = [1]*10
thislist.append(list1)
But I usually go with numpy when it is available.
I have a dataframe that looks something like this:
empl_ID day_1 day_2 day_3 day_4 day_5 day_6 day_7 day_8 day_9 day_10
1 1 1 1 1 1 1 0 1 1 1
2 0 0 1 1 1 1 1 1 1 0
3 0 1 0 0 1 1 1 1 1 1
4 1 0 1 0 1 1 1 0 1 0
5 1 0 0 1 1 1 1 1 1 1
6 0 0 0 0 1 1 1 1 1 1
As we can see we have 6 employees and index 1 indicates their presence for that day. I want to write a code using Python such that I can trace 2 continuous absences i.e. pattern 0 ,0 for day i, day i+1 in a time-frame of 6 days right from the person begins his employment.
For example, employee 1 begins his work at day_1 column, which is his first appearance of 1. So, from columns day_1 to day_6 if we do not observe any continuous 0, 0 that record should be labeled as '0'. Same would be the case for employee 2 (cols: day_3 to day_8), employee 4 (cols: day_1 to day_6) and employee 6 (cols: day_5 to day_10) and they will be labeled as '0'.
However, for employee 3 (cols: day_2 to day_7), employee 6 (cols: day_5 to day_10) they contain a 0, 0 pattern right from their first presence of 1 within the respective time-frame and thus will be labeled as '1'.
It would be really helpful if someone could help me in formulating a code to achieve the above objective. Thanks in advance!
The result should look something like this:
empl_ID day_1 day_2 day_3 day_4 day_5 day_6 day_7 day_8 day_9 day_10 label
1 1 1 1 1 1 1 0 1 1 1 0
2 0 0 1 1 1 1 1 1 1 0 0
3 0 1 0 0 1 1 1 1 1 1 1
4 1 0 1 0 1 1 1 0 1 0 0
5 1 0 0 1 1 1 1 1 1 1 1
6 0 0 0 0 1 1 1 1 1 1 0
Check with idxmcx and for loop with shift
s=df.set_index('empl_ID')
idx=s.columns.get_indexer(s.idxmax(1))
l=[(s.iloc[t, x :y].eq(s.iloc[t, x :y].shift())&s.iloc[t, x :y].eq(0)).any() for t , x ,y in zip(df.index,idx,idx+5)]
df['Label']=l
df
empl_ID day_1 day_2 day_3 day_4 ... day_7 day_8 day_9 day_10 Label
0 1 1 1 1 1 ... 0 1 1 1 False
1 2 0 0 1 1 ... 1 1 1 0 False
2 3 0 1 0 0 ... 1 1 1 1 True
3 4 1 0 1 0 ... 1 0 1 0 False
4 5 1 0 0 1 ... 1 1 1 1 True
5 6 0 0 0 0 ... 1 1 1 1 False
[6 rows x 12 columns]
I have an SPSS data set with 500+ respondents and 18 symptoms that they could have.
Each symptom has its own variable Symptom01 = 1 means they have the symptom 1 Symptom02 = 0 means they dont have the symptom 2 etc etc
What I want to know is what combination of 3 symptoms is more frequent in my data set. For example how many people have symptom 1, 5 and 6; how many people have symptom 1, 2 and 3, etc.
I doesn't mean that they only have those symptoms. Theey could have others. I just want to know which group of 3 symptoms is more frequent in my dataset.
It's a lot of combinations so how would you do this?
Can someone help me?
Please note the macro below uses the variable names Symptom1, Symptom2 etc' instead of "Symptom01", "Symptom02"...
First creating some sample data to work on:
data list list/Symptom1 to Symptom18.
begin data
1 0 0 1 1 0 0 1 1 0 0 0 0 0 1 1 1 1
1 1 1 1 0 1 0 0 1 0 1 1 0 0 1 1 0 0
0 1 1 0 1 1 1 1 1 1 1 0 1 0 0 1 0 0
1 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0
1 0 1 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0
0 0 0 1 1 1 0 0 0 0 1 0 0 0 1 0 0 1
1 0 1 1 1 1 1 0 1 1 0 0 0 1 1 1 0 1
1 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 1 0
0 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 1 1
1 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0
end data.
Now defining a macro with three loops:
EDIT - this version accounts for repeating combinations of symptoms
define AllCombsOf3 ()
!do !vr1=1 !to 18
!do !vr2=!vr1 !to 18
!do !vr3=!vr2 !to 18
!if (!vr2<>!vr1 !and !vr2<>!vr3) !then
compute !concat("C_",!vr1,"_",!vr2,"_",!vr3)= !concat("Symptom",!vr1)=1 & !concat("Symptom",!vr2)=1 & !concat("Symptom",!vr3)=1 .
!ifend
!doend
!doend
!doend
!enddefine.
Running the macro and displaying wanted results:
AllCombsOf3.
means C_1_2_3 to C_16_17_18.
EDIT 2 - new macro for a four symptom version
define AllCombsOf4 ()
!do !vr1=1 !to 18
!do !vr2=!vr1 !to 18
!do !vr3=!vr2 !to 18
!do !vr4=!vr3 !to 18
!if (!vr2<>!vr1 !and !vr2<>!vr3 !and !vr3<>!vr4) !then
compute !concat("C_",!vr1,"_",!vr2,"_",!vr3,"_",!vr4)=
!concat("Symptom",!vr1)=1 & !concat("Symptom",!vr2)=1 &
!concat("Symptom",!vr3)=1 & !concat("Symptom",!vr4)=1 .
!ifend
!doend !doend !doend !doend
!enddefine.
AllCombsOf4.
means C_1_2_3_4 to C_15_16_17_18.