if cell 1 contains X, then cell 2 should equal W, if cell 1 contains Z, then look at next criteria - excel

I am trying to transform 8 columns of dummy variables into one column of a 8 level rank.
I am trying to do so with this formular:
=IF(OR(A1="1");"1";IF(OR(B1="1");"2";IF(OR(C1="1");"3";IF(OR(D1="1");"4";IF(OR(E1="1");"5";IF(OR(F1="1");"6";IF(OR(G1="1");"7";IF(OR(H1="1");"8";""))))))))
Here is a view on the table col. 1 to 8 is the data and col.9 is what I would like my command to return:
1 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 1
0 0 0 0 1 0 0 0 5
0 1 0 0 0 0 0 0 2
1 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0 7
1 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 1
0 0 0 0 1 0 0 0 5
I have used these other stackoverflow questions as inspriration for the structure.
But it does not work, I don't get an error message, but I also don't get the right output.
Anyone who can see where the problem arises? - Would be much appreciated :)
Best wishes,
Mathilde

Use the MATCH() Function:
=MATCH(1,A1:H1,0)
It appears you use ; instead of , for the delimiter. If so use this.
=MATCH(1;A1:H1;0)

Related

Parse .log file to collect data after keyword then create a nested dictionary using predefined column names

I am trying to parse a .log file using python for the Status information about processes of a linux system. The .log file has a lot of different sections of information, the sections of interest start with "##START-ALLPROCESSES-xxxxxxxx" where x is the epoch date and end with '##END-ALLPROCESSES-xxxxxxx". After this line each process is listed with 52 columns each, the number of processes may change depending on the info recorded at the time, and there may be multiple sections with this information at different times.
The idea is to open the .log file, find the sections and then use the XXXXXXX as the key for a nested dictionary where the keys are the predefined column dates filled in with the values from the section, and do this for all different sections that would be found on the .log fie. The nested dictionary would look something like below
[date1-XXXXXX:
[ columnName1: process1,
.
.
.
columnName52: info1
],
.
.
.
[ columnName1: process52,
.
.
.
columName52: info52
]
],
[date2-XXXXXX:
[ columnName1: process1,
.
.
.
columnName52: info1
],
.
.
.
[ columnName1: process52,
.
.
.
columName52: info52
]
]
The data in the .log file looks as follow and would have multiple sections as this but with a different date each line starts with the process id and (process name)
##START-ALLPROCESSES-1676652419
1 (systemd) S 0 1 1 0 -1 4210944 2070278 9743969773 2070 2703 8811 11984 7638026 9190549 20 0 1 0 0 160043008 745 18446744073709551615 187650352414720 187650353516788 281474853505456 0 0 0 671173123 4096 1260 1 0 0 17 0 0 0 2706 0 0 187650353585800 187650353845340 187651263758336 281474853506734 281474853506745 281474853506745 281474853507053 0
10 (rcu_bh) I 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 20 0 1 0 2 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10251 (kworker/1:2) I 2 0 0 0 -1 69238880 0 0 0 0 0 914 0 0 20 0 1 0 617684776 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 1 0 0 0 0 0 0 0 0 0 0 0 0 0
10299 (loop2) S 2 0 0 0 -1 3178560 0 0 0 0 0 24 0 0 0 -20 1 0 10871 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 0 0 169 0 0 0 0 0 0 0 0 0 0
10648 (kworker/2:0) I 2 0 0 0 -1 69238880 0 0 0 0 0 567 0 0 20 0 1 0 663634994 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 2 0 0 0 0 0 0 0 0 0 0 0 0 0
1082 (nvme-wq) I 2 0 0 0 -1 69238880 0 0 0 0 0 0 0 0 0 -20 1 0 109 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 3 0 0 0 0 0 0 0 0 0 0 0 0 0
1095 (scsi_eh_0) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 20 0 1 0 110 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 3 0 0 0 0 0 0 0 0 0 0 0 0 0
1096 (scsi_tmf_0) I 2 0 0 0 -1 69238880 0 0 0 0 0 0 0 0 0 -20 1 0 110 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1099 (scsi_eh_1) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 20 0 1 0 110 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 (migration/0) S 2 0 0 0 -1 69238848 0 0 0 0 0 4961 0 0 -100 0 1 0 2 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 99 1 0 0 0 0 0 0 0 0 0 0 0
1100 (scsi_tmf_1) I 2 0 0 0 -1 69238880 0 0 0 0 0 0 0 0 0 -20 1 0 110 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##END-ALLPROCESSES-1676652419
I have tried it multiple ways but I cannot seem to get it to go correctly, my last attempt
columns = ['pid', 'comm', 'state', 'ppid', 'pgrp', 'session', 'tty_nr', 'tpgid', 'flags', 'minflt', 'cminflt', 'majflt', 'cmajflt', 'utime', 'stime',
'cutime', 'cstime', 'priority', 'nice', 'num_threads', 'itrealvalue', 'starttime', 'vsize', 'rss', 'rsslim', 'startcode', 'endcode', 'startstack', 'kstkesp',
'kstkeip', 'signal', 'blocked', 'sigignore', 'sigcatch', 'wchan', 'nswap', 'cnswap', 'exit_signal', 'processor', 'rt_priority', 'policy', 'delayacct_blkio_ticks',
'guest_time', 'cguest_time', 'start_data', 'end_data', 'start_brk', 'arg_start', 'arg_end', 'env_start', 'env_end', 'exit_code' ]
for file in os.listdir(dir):
if file.endswith('.log'):
with open(file, 'r') as f:
data = f.read()
data = data.split('##START-ALLPROCESSES-')
data = data[1:]
for i in range(len(data)):
data[i] = data[i].split('##END-ALLPROCESSES-')
data[i] = data[i][0]
data[i] = re.split('\r', data[i])
data[i] = data[i][0]
data[i] = re.split('\n', data[i])
for j in range(len(data[i])):
data[i][j] = re.split('\s+', data[i][j])
#print(data[i])
data[i][0] = str(data[i][0])
data_dict = {}
for i in range(len(data)):
data_dict[data[i][0]] = {}
for j in range(len(columns)):
data_dict[data[i][0]][columns[j]] = data[i][j+1]
print(data_dict)
I converted the epoch date into a str as I was getting unhashable list errors, however that made it so the epoch date shows as a key but each column now has the entire list for the 52 columms of information as a single one, so definitely I am missing something
To solve this problem, you could follow the following steps:
Open the .log file and read the contents
Search for all the sections of interest by finding lines that start with "##START-ALLPROCESSES-" and end with "##END-ALLPROCESSES-"
For each section found, extract the epoch date and create a dictionary with an empty list for each of the 52 columns
Iterate over the lines within the section and split the line into the 52 columns using space as a separator. Add the values to the corresponding list in the dictionary created in step 3
Repeat steps 3 and 4 for all the sections found in the .log file
Return the final nested dictionary
Here is some sample code that implements these steps:
import re
def parse_log_file(log_file_path):
with open(log_file_path, 'r') as log_file:
log_contents = log_file.read()
sections = re.findall(r'##START-ALLPROCESSES-(.*?)##END-ALLPROCESSES-', log_contents, re.DOTALL)
nested_dict = {}
for section in sections:
lines = section.strip().split('\n')
epoch_date = lines[0].split('-')[-1]
column_names = ['column{}'.format(i) for i in range(1, 53)]
section_dict = {column_name: [] for column_name in column_names}
for line in lines[1:]:
values = line.strip().split()
for i, value in enumerate(values):
section_dict[column_names[i]].append(value)
nested_dict['date{}-{}'.format(epoch_date, len(section_dict['column1']))] = section_dict
return nested_dict
You can call this function by passing the path to the .log file as an argument. The function returns the nested dictionary described in the problem statement.

how to find combinations present in different columns

I have dataset with sample and shop names. I am trying to figure out a way to calculate proportion of shops that sell a combination of samples. For example, sample 12,13 and 22 are available in shop2,3 and 4. Like wise, sample6,7,8,9,10, 16 and 17 is available in shop1.
The dataset i have is very large with 9000 columns and 26 rows. Here what i show is just a small dataset. What i want to do is to figure a way to screen the table for all possible combination of samples present in shops (if >0) and print out in a dictionary, for example sample12_sample13_sample22:[shop2,shop3,shop4] and List out all possible combinations that are available.
Sorry that I could not figure out how to do this, so i do not have any code, right now.
What approach should i use here?
Any help is appreciated.
Thanks!
Name Shop1 Shop2 Shop3 Shop4
Sample1 0 0 0 0
Sample2 0 0 0 0
Sample3 0 0 0 0
Sample4 0 0 0 0
Sample5 0 0 0 0
Sample6 1 0 0 0
Sample7 4 0 0 0
Sample8 12 0 0 0
Sample9 1 0 0 0
Sample10 1 0 0 0
Sample11 0 0 0 0
Sample12 0 5 21 233
Sample13 0 8 36 397
Sample14 0 4 0 0
Sample15 0 0 0 0
Sample16 2 0 0 0
Sample17 17 0 0 0
Sample18 0 0 0 0
Sample19 0 0 0 0
Sample20 0 0 0 0
Sample21 0 0 0 0
Sample22 0 1 20 127
What we can do is melt then we groupby twice
s = df.melt('Name')
s = s[s.value!=0]
s = s.groupby('Name')['variable'].agg([','.join,'count'])
out = s[s['count']>1].reset_index().groupby('join')['Name'].agg(','.join)
out
Out[104]:
join
Shop2,Shop3,Shop4 Sample12,Sample13,Sample22
Name: Name, dtype: object

J Language rank of power function

t=:1
test=: monad define
t=.y
t=. t, 0
)
testloop=: monad def'test^:y t'
testloop 1
1 0
testloop 2
1 0 0
testloop 10
1 0 0 0 0 0 0 0 0 0 0
In order to simplify this
(testloop 0),(testloop 1), (testloop 2), ...
110100100010000...
I tried
, testloop"0 (i.10)
but it gives
1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0...
It seems like I have a problem with a rank, I can't figure out which one to use.
I would be grateful if you could help me on this issue.
Thank you!
This is not so much a rank problem as the fact that the results are padded with zeros so that the row lengths match.
testloop 1
1 0
testloop 2
1 0 0
testloop"0 [ 1 2
1 0 0
1 0 0
testloop"0 [ 1 2 3
1 0 0 0
1 0 0 0
1 0 0 0
If I redefine your test and testloop to add a different appending digit, we can see how the padding is working.
test2 =: 3 : 0
​t=. y
​t=. t,2
​)
test2loop=: monad def'test2^:y t'
test2loop"0 [1
1 2
test2loop"0 [2
1 2 2
test2loop"0 [ 1 2 NB. 0 padded in first row
1 2 0
1 2 2
test2loop"0 [ 1 2 3 NB. 0's padded in first two rows
1 2 0 0
1 2 2 0
1 2 2 2
To get around the padding issue I will use each=: &.> so that the results are boxed before combining to avoid the padding.
testloop each 1 2 3
+---+-----+-------+
|1 0|1 0 0|1 0 0 0|
+---+-----+-------+
testloop each i. 10
+-+---+-----+-------+---------+-----------+-------------+---------------+-----------------+-------------------+
|1|1 0|1 0 0|1 0 0 0|1 0 0 0 0|1 0 0 0 0 0|1 0 0 0 0 0 0|1 0 0 0 0 0 0 0|1 0 0 0 0 0 0 0 0|1 0 0 0 0 0 0 0 0 0|
+-+---+-----+-------+---------+-----------+-------------+---------------+-----------------+-------------------+
using ; to unbox and ravel the results
; testloop each i. 10
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
To be honest I would be more inclined to use the fact that complex numbers used as the left argument of # introduce 0's for padding. The number of 0's depends on the imaginary value of the complex number.
1j0 # 1
1
1j1 # 1
1 0
1j2 # 1
1 0 0
test3=: monad def '(1 j. y)#1'
test3 1
1 0
test3 2
1 0 0
test3 1 2
1 0 1 0 0
test3 i. 10
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

How to find which combinations have the most frequency?

I have an SPSS data set with 500+ respondents and 18 symptoms that they could have.
Each symptom has its own variable Symptom01 = 1 means they have the symptom 1 Symptom02 = 0 means they dont have the symptom 2 etc etc
What I want to know is what combination of 3 symptoms is more frequent in my data set. For example how many people have symptom 1, 5 and 6; how many people have symptom 1, 2 and 3, etc.
I doesn't mean that they only have those symptoms. Theey could have others. I just want to know which group of 3 symptoms is more frequent in my dataset.
It's a lot of combinations so how would you do this?
Can someone help me?
Please note the macro below uses the variable names Symptom1, Symptom2 etc' instead of "Symptom01", "Symptom02"...
First creating some sample data to work on:
data list list/Symptom1 to Symptom18.
begin data
1 0 0 1 1 0 0 1 1 0 0 0 0 0 1 1 1 1
1 1 1 1 0 1 0 0 1 0 1 1 0 0 1 1 0 0
0 1 1 0 1 1 1 1 1 1 1 0 1 0 0 1 0 0
1 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0
1 0 1 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0
0 0 0 1 1 1 0 0 0 0 1 0 0 0 1 0 0 1
1 0 1 1 1 1 1 0 1 1 0 0 0 1 1 1 0 1
1 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 1 0
0 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 1 1
1 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0
end data.
Now defining a macro with three loops:
EDIT - this version accounts for repeating combinations of symptoms
define AllCombsOf3 ()
!do !vr1=1 !to 18
!do !vr2=!vr1 !to 18
!do !vr3=!vr2 !to 18
!if (!vr2<>!vr1 !and !vr2<>!vr3) !then
compute !concat("C_",!vr1,"_",!vr2,"_",!vr3)= !concat("Symptom",!vr1)=1 & !concat("Symptom",!vr2)=1 & !concat("Symptom",!vr3)=1 .
!ifend
!doend
!doend
!doend
!enddefine.
Running the macro and displaying wanted results:
AllCombsOf3.
means C_1_2_3 to C_16_17_18.
EDIT 2 - new macro for a four symptom version
define AllCombsOf4 ()
!do !vr1=1 !to 18
!do !vr2=!vr1 !to 18
!do !vr3=!vr2 !to 18
!do !vr4=!vr3 !to 18
!if (!vr2<>!vr1 !and !vr2<>!vr3 !and !vr3<>!vr4) !then
compute !concat("C_",!vr1,"_",!vr2,"_",!vr3,"_",!vr4)=
!concat("Symptom",!vr1)=1 & !concat("Symptom",!vr2)=1 &
!concat("Symptom",!vr3)=1 & !concat("Symptom",!vr4)=1 .
!ifend
!doend !doend !doend !doend
!enddefine.
AllCombsOf4.
means C_1_2_3_4 to C_15_16_17_18.

fill randomly an array in Excel with 0 and 1 without repeating 1

I want to fill randomly an array in Excel with 0 and 1 but with 1 just one time;
I have tried this formula but it fails:
=IF(COUNTIF($K$43:K43;1)=1;1;0)
My table using :
=RANDBETWEEN(0;1)
0 0 0 0 1 0 1 0 1 0 1 0
Result of my formula:
0 0 0 0 1 1 0 0 0 0 0 0
I want the result of 1 just one time:
0 0 0 0 1 0 0 0 0 0 0 0
This is because COUNTIF is still 1 at stage 0 0 0 0 1 0.
Please use formula as below (eg: if output required on L43)
L43=IF(AND(COUNTIF($K$43:K43,1)=1,L42=0),1,0)
Refer my screenshot.

Resources