make lower half of a n*n list zero without using any functions in python - python-3.x

I tried to solve it by using 2 for loops and an if statement . But i was unable to get the desired output.
INPUT-
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
thislist=[1]*10
thislist=[thislist]*10
print(thislist)
for i in range(10):
for j in range(10):
print(thislist[i][j], end = " ")
print()
print()
for i in range(10):
for j in range(10):
if i>j:
thislist[i][j]=0
for i in range(10):
for j in range(10):
print(thislist[i][j], end = " ")
print()
This was the output i got:
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
but when i made a list using the below method i got the desired output.
thislist=[[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1]]
print(thislist)
for i in range(10):
for j in range(10):
if i>j:
thislist[i][j]=0
for i in range(10):
for j in range(10):
print(thislist[i][j], end = " ")
print()
note-This is the desired OUTPUT-
1 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1
0 0 0 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1 1 1
0 0 0 0 0 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 1
Can someone explain whats the difference between the above 2 codes?

As you pointed out, the problem comes from the manner you created your list of list. In your first example, you do something like this:
list1 = [1]*10
list_of_list1=[list1]*10
list_of_list1 is actually a list of shallow copies of the original list1. Then if you modify a value in list_of_list1, the modification will occurs in all the rows of list_of_list1.
The opposit of a shallow copy is a deep copy. You might want to search more info on the Internet about this topic
In the mean time, you can simply try this.
thislist = []
for row in range(10):
list1 = [1]*10
thislist.append(list1)
But I usually go with numpy when it is available.

Related

ValueError in clustering evaluation: Expected 2D array, got 1D array instead

df_2D = df[['sepal-length', 'petal-length']]
df_2D = np.array(df_2D)
k_means_2D_model = KMeans(n_clusters=3, max_iter=1000).fit(df_2D)
Error:
ValueError: Expected 2D array, got 1D array instead:
array=[0 1 0 0 2 2 1 2 1 1 0 1 0 2 0 2 1 2 0 2 2 1 0 1 2 0 0 2 1 0 2 0 1 1 0 0 2
1 2 1 0 0 1 1 1 1 2 1 2 2 0 2 2 2 0 1 1 1 1 0 1 2 1 2 1 2 2 1 2 1 0 0 2 1
1 0 2 0 0 1 1 0 1 2 1 1 0 1 0 1 0 0 0 2 1 1 1 2 0 2 0 0 2 2 0 0 1 2 2 1 0
2 2 1 1 2 0 0 2 2 2 0 0 1 0 1 0 2 2 2 0 0 0 0 2 1 2 2 2 2 1 1 1 2 2 0 0 0
1 0].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Pattern identification in a dataset using python

I have a dataframe that looks something like this:
empl_ID day_1 day_2 day_3 day_4 day_5 day_6 day_7 day_8 day_9 day_10
1 1 1 1 1 1 1 0 1 1 1
2 0 0 1 1 1 1 1 1 1 0
3 0 1 0 0 1 1 1 1 1 1
4 1 0 1 0 1 1 1 0 1 0
5 1 0 0 1 1 1 1 1 1 1
6 0 0 0 0 1 1 1 1 1 1
As we can see we have 6 employees and index 1 indicates their presence for that day. I want to write a code using Python such that I can trace 2 continuous absences i.e. pattern 0 ,0 for day i, day i+1 in a time-frame of 6 days right from the person begins his employment.
For example, employee 1 begins his work at day_1 column, which is his first appearance of 1. So, from columns day_1 to day_6 if we do not observe any continuous 0, 0 that record should be labeled as '0'. Same would be the case for employee 2 (cols: day_3 to day_8), employee 4 (cols: day_1 to day_6) and employee 6 (cols: day_5 to day_10) and they will be labeled as '0'.
However, for employee 3 (cols: day_2 to day_7), employee 6 (cols: day_5 to day_10) they contain a 0, 0 pattern right from their first presence of 1 within the respective time-frame and thus will be labeled as '1'.
It would be really helpful if someone could help me in formulating a code to achieve the above objective. Thanks in advance!
The result should look something like this:
empl_ID day_1 day_2 day_3 day_4 day_5 day_6 day_7 day_8 day_9 day_10 label
1 1 1 1 1 1 1 0 1 1 1 0
2 0 0 1 1 1 1 1 1 1 0 0
3 0 1 0 0 1 1 1 1 1 1 1
4 1 0 1 0 1 1 1 0 1 0 0
5 1 0 0 1 1 1 1 1 1 1 1
6 0 0 0 0 1 1 1 1 1 1 0
Check with idxmcx and for loop with shift
s=df.set_index('empl_ID')
idx=s.columns.get_indexer(s.idxmax(1))
l=[(s.iloc[t, x :y].eq(s.iloc[t, x :y].shift())&s.iloc[t, x :y].eq(0)).any() for t , x ,y in zip(df.index,idx,idx+5)]
df['Label']=l
df
empl_ID day_1 day_2 day_3 day_4 ... day_7 day_8 day_9 day_10 Label
0 1 1 1 1 1 ... 0 1 1 1 False
1 2 0 0 1 1 ... 1 1 1 0 False
2 3 0 1 0 0 ... 1 1 1 1 True
3 4 1 0 1 0 ... 1 0 1 0 False
4 5 1 0 0 1 ... 1 1 1 1 True
5 6 0 0 0 0 ... 1 1 1 1 False
[6 rows x 12 columns]

Python: How do I assign string values to numeric data?

Running the four commands results in below output, from a dataframe called cancer.
$ print("\n target")
$ print(cancer.target)
$ print("\n target_names")
$ print(cancer.target_names)
target
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 0 0 1 0 0 1 1 1 1 0 1 0 0 1 1 1 1 0 1 0 0
1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 0 1 1 1 1 0 1 1 0 1 1
1 1 1 1 1 1 0 0 0 1 0 0 1 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 1 0 1 1 1 1 0 1
1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 1 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 1 0 0 0 1 0
1 0 1 1 1 0 1 1 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 1 1 0 1 0 0 0 0 1 1 0 0 1 1
1 0 1 1 1 1 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 0 1 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1
1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 1 0 0 0 1 1
1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0
0 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1
1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 1 1 1 1 1 0 1 1
0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1
1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 0 0 1 0 1 0 1 1 1 1 1 0 1 1 0 1 0 1 0 0
1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 0 0 0 0 0 0 1]
target_names
['malignant' 'benign']
How would I be able to assign "malignant" to 0 and "benign" to 1, and vice versa?
You can map it by using a dictionary
target=[0,0,0,0,0,1,1,1,0,1,0,0,1]
d={0:'malignant',1:'benign'}
target=[d[t] for t in target]
print(target)
['malignant', 'malignant', 'malignant', 'malignant', 'malignant', 'benign', 'benign', 'benign', 'malignant', 'benign', 'malignant', 'malignant', 'benign']
What so you mean by assign?
Something like this?
for i in range(len(cancer)):
if target[i]==0:
target[i]=target_names[0]
elif target[i]==1:
target[i]=target_names[1]

Combine columns in pandas to create a new column

Hello I am working on pandas dataframe and I want to create a column combining multiple columns and applying condition on them and I am looking for a smart way to do it.
Suppose the data frame looks as
A B C D
1 0 0 0
0 1 0 0
0 0 1 0
1 0 1 0
1 1 1 0
0 0 1 1
My output column should be as below
A B C D Output_col
1 0 0 0 A
0 1 0 0 B
0 0 1 0 C
1 0 1 0 A_C
1 1 1 0 A_B_C
0 0 1 1 C_D
I can certainly achieve this using below code but then I have to do it for every column.
test['Output_col'] = test.A.apply(lambda x: A if x > 0 else 0)
I was wondering if there is a way where I could achieve this without applying to every column if I have very large number of columns.
Thanks in advance !!
Use DataFrame.apply + join.
Select column names using x.index(
note that axis = 1 is used) + boolean indexing with Series.eq to filter the selected columns :
test['Output_col']=test.apply(lambda x: '_'.join(x.index[x.eq(1)]),axis=1)
print(test)
A B C D Output_col
0 1 0 0 0 A
1 0 1 0 0 B
2 0 0 1 0 C
3 1 0 1 0 A_C
4 1 1 1 0 A_B_C
5 0 0 1 1 C_D
To apply only a list of columns:
my_list_columns=['enter element of your list']
test['Output_col']=test[my_list_columns].apply(lambda x: '_'.join(x.index[x.eq(1)]),axis=1)
print(test)
case to all columns is 0
my_list_columns=['A','B','C','D']
df['Output_col']=df[my_list_columns].apply(lambda x: '_'.join(x.index[x.eq(1)]) if x.eq(1).any() else 'no_value',axis=1)
print(df)
A B C D Output_col
0 1 0 0 0 A
1 0 0 0 0 no_value
2 0 0 1 0 C
3 1 0 1 0 A_C
4 1 0 1 0 A_C
5 0 0 1 1 C_D
Edit: for a subset of columns (I use method 2)
cols = ['A', 'B']
df1 = df[cols]
s = df1.columns + '-'
df['Output_col'] = df1.dot(s).str[:-1]
Out[54]:
A B C D Output_col
0 1 0 0 0 A
1 0 1 0 0 B
2 0 0 1 0
3 1 0 1 0 A
4 1 1 1 0 A-B
5 0 0 1 1
Try this combination of str.replace and dot
df['Output_col'] = df.dot(df.columns).str.replace(r'(?<!^)(?!$)','-')
Out[32]:
A B C D Output_col
0 1 0 0 0 A
1 0 1 0 0 B
2 0 0 1 0 C
3 1 0 1 0 A-C
4 1 1 1 0 A-B-C
5 0 0 1 1 C-D
If you feel uneasy with regex pattern. You may try this way without using str.replace
s = df.columns + '-'
df['Output_col'] = df.dot(s).str[:-1]
Out[50]:
A B C D Output_col
0 1 0 0 0 A
1 0 1 0 0 B
2 0 0 1 0 C
3 1 0 1 0 A-C
4 1 1 1 0 A-B-C
5 0 0 1 1 C-D
This builds off a solution provided by #Jezrael : link
df['Output_col'] = df.dot(df.columns.str.cat(['_']*len(df.columns),sep='')).str.strip('_')
A B C D Output_col
0 1 0 0 0 A
1 0 1 0 0 B
2 0 0 1 0 C
3 1 0 1 0 A_C
4 1 1 1 0 A_B_C
5 0 0 1 1 C_D

How to find which combinations have the most frequency?

I have an SPSS data set with 500+ respondents and 18 symptoms that they could have.
Each symptom has its own variable Symptom01 = 1 means they have the symptom 1 Symptom02 = 0 means they dont have the symptom 2 etc etc
What I want to know is what combination of 3 symptoms is more frequent in my data set. For example how many people have symptom 1, 5 and 6; how many people have symptom 1, 2 and 3, etc.
I doesn't mean that they only have those symptoms. Theey could have others. I just want to know which group of 3 symptoms is more frequent in my dataset.
It's a lot of combinations so how would you do this?
Can someone help me?
Please note the macro below uses the variable names Symptom1, Symptom2 etc' instead of "Symptom01", "Symptom02"...
First creating some sample data to work on:
data list list/Symptom1 to Symptom18.
begin data
1 0 0 1 1 0 0 1 1 0 0 0 0 0 1 1 1 1
1 1 1 1 0 1 0 0 1 0 1 1 0 0 1 1 0 0
0 1 1 0 1 1 1 1 1 1 1 0 1 0 0 1 0 0
1 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0
1 0 1 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0
0 0 0 1 1 1 0 0 0 0 1 0 0 0 1 0 0 1
1 0 1 1 1 1 1 0 1 1 0 0 0 1 1 1 0 1
1 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 1 0
0 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 1 1
1 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0
end data.
Now defining a macro with three loops:
EDIT - this version accounts for repeating combinations of symptoms
define AllCombsOf3 ()
!do !vr1=1 !to 18
!do !vr2=!vr1 !to 18
!do !vr3=!vr2 !to 18
!if (!vr2<>!vr1 !and !vr2<>!vr3) !then
compute !concat("C_",!vr1,"_",!vr2,"_",!vr3)= !concat("Symptom",!vr1)=1 & !concat("Symptom",!vr2)=1 & !concat("Symptom",!vr3)=1 .
!ifend
!doend
!doend
!doend
!enddefine.
Running the macro and displaying wanted results:
AllCombsOf3.
means C_1_2_3 to C_16_17_18.
EDIT 2 - new macro for a four symptom version
define AllCombsOf4 ()
!do !vr1=1 !to 18
!do !vr2=!vr1 !to 18
!do !vr3=!vr2 !to 18
!do !vr4=!vr3 !to 18
!if (!vr2<>!vr1 !and !vr2<>!vr3 !and !vr3<>!vr4) !then
compute !concat("C_",!vr1,"_",!vr2,"_",!vr3,"_",!vr4)=
!concat("Symptom",!vr1)=1 & !concat("Symptom",!vr2)=1 &
!concat("Symptom",!vr3)=1 & !concat("Symptom",!vr4)=1 .
!ifend
!doend !doend !doend !doend
!enddefine.
AllCombsOf4.
means C_1_2_3_4 to C_15_16_17_18.

Resources