I have this dataset:
Groups A A B B
location a b c d
3 4 0 5
I also have a transformed version for better clarification:
Groups location
A a 3
A b 4
B c 0
B d 5
What I want is a simple matrix that fills binary.
The function should check each row and column wether it has the MAXIF from it's respective group and then compare it to the second value, which is a second MAXIF from it's group. Therefor the combination b and d has to resolve to 1.
The intended output is as following:
I have a dataset with a-n locations that are grouped in a-n groups. So group A has the locations a,b; group B the locations c,d. The columns represent different features at each location.
I want to build a matrix out of it, but not the "usual" distance matrix but one that incorporates the following questions:
-When building the matrix, the maximum values of each group get compared
-I want to find out, if the value I am looking at is the maximum value in this group and if so, compare it to the second groups maximum -> if this number is larger -> set it to 1
-this should automatically fill all fields in the matrix
I need this for a network analysis of my data, to wipe out not needed connections
My current input is somewhat like this:
=IF(AND(>0(MAXIF()=value)>(AND(>0(MAXIF()=value);1;0)
How it looks like in excel:
=IF(AND(A$1<>$A7; A$3>0;(MAXIFS($A$3:$D$3;$A$1:$D$1;A$1)=A$3))<(AND(A$1<>$A7; $C7>0;MAXIFS($C$7:$C$10;$A$7:$A$10;$A7)=$C7));1;0)
However I think internally it does not actually compare values but TRUES and FALSE. Therefore connections that are smaller than MAX are getting 1s. My output currently:
A A B B
a b c d
A a 0 0 0 0
A b 0 0 1 0
B c 0 0 0 0
B d 1 0 0 0
As you can see, the value a and d resolve to 1.
The output should look like this:
(the matrix is generally speaking 0, but when beacons like d (5) and b (4) meet, it gets "1" since both are the highest within their group. Only here's a connection between the two groups.
A A B B
a b c d
A a 0 0 0 0
A b 1 0 0 1
B c 0 0 0 0
B d 0 1 0 0
I understand the problem but don't know how to fix that.
I'm fairly sure this doesn't work properly, but it may help. I've restructured your data slightly to make it easier to write the formula.
The formula in C3 etc is
=IF(AND(MAXIFS($G$3:$G$6,$A$3:$A$6,$A3)=$G3,MAXIFS($C$7:$F$7,$C$1:$F$1,C$1)=C$7,$G3>MAXIFS($G$3:$G$6,$A$3:$A$6,"<>" & $A3)),1,0)
It's just checking if the value in G is the max for the group in column A, and if the value in row 7 is the max for group in row 1, and if the value in G is greater than the max of the other group. If they're all satisfied it inserts a '1'.
Related
I'm trying to use the following max function by the group in column A. The value I want in C is the max entry of each group in column B. When column B is normal numbers, this function works. However, I would like column B to be populated by the results of a different function (in this case an =IF function that successfully returns a number). When I do this, I get 0's.
in column C: =MAX(IF($A$2:$A$5=C2,$B$2:$B$5))
Column A
Column B
Column C
Group 1
5
5
Group 1
4
5
Group 2
6
7
Group 2
7
7
Column A
Column B
Column C
Group 1
=IF(...)
0
Group 1
=IF(...)
0
Group 2
=IF(...)
0
Group 2
=IF(...)
0
Any idea what could be going on? Please let me know if I can provide anything additional to help explain.
Thank you!
Reese
I have got an excel question that I can not answer. Here is my table:
ID Key Count Unique Available Text Results
1 0 Text-1 Dupe-Y
2 1 Y Text-1 Y
3 0 Text-1 Dupe-Y
4 0 Text-1 Dupe-Y
5 1 N Text-2 N
6 1 Y Text-3 Y
7 0 Text-2 Dupe-N
8 0 Duplicate Text-2 Dupe-N
9 0 Duplicate Text-2 Dupe-N
10 0 Y Text-2 Dupe-N
Id Key is just unique key.
Count unique picks up the first time each value in column Text appears. Available can have Y, N, Duplicate and Text is the main column I need to analyze my table. The Results are for the first time each value in Text appears (Count unique = 1), if there is a value in Available then that is the value I need, if Count Unique is 0 then is either Dupe-Y or Dupe-N depending on the value in Available.
I tried with a formula like this one but got stuck after initial progress. =IF(B2=0,"",IFERROR(IF(COUNTIF(D:D,D2)>1,IF(COUNTIF($D:$D,D2)=1,"",C2),1),1))
Note that the column Results is the one I need to populate with a formula that is not affected by sorting or lack of it.
I guess you got all those values and you just need a formula for column Results.
My formul will work only if the data is sorted like in your example. If sorting changes, formula will fail:
My formula is:
=IF(B2=1;D2;"Dupe-"&RIGHT(G1;1))
How can I randomly select and assign values to given number of rows in python dataframe.
Col B contains only 1's and 0's.
Suppose I have a dataframe as below
Col A Col B
A 0
B 0
A 0
B 0
C 0
A 0
B 0
C 0
D 0
A 0
I aim to randomly chose 5% of the rows and change the value of Col B to 1. I saw df.sample() but that wont allow me to do inplace changes to the column data
You can try Random library. Random has it's own sample function.
import Random
randindx = Random.sample(arr.between(0, dataframe['Col B'].size), dataframe['Col B'].size//20)
Considering 5%, you need to divide by 20.
You can first use the sample method to get the random 5% of examples and get hold of their indices like so:
samples_indices = df.sample(frac=0.05, replace=False).index
With the knowledge of the indices, loc method can be used to update the values corresponding to the examples.
df.loc[samples_indices, 'Col B'] = 1
Key
----------
0 a
1 a
2 b
3 b
4 a
5 c
so far i tried this:
df.groupby(["key1"],).count()
However it is also showing the counts of b and c, i want only for a.
Create mask and count by sum:
df["Key"].eq('a').sum()
Expected OutputIn an Excel spreadsheet I am working on there are two columns of interest, column B and column E. In column B there are some 0 values and these are getting carried over to the column E based on the loop that I am running with respect to column D. I want to write a Python script to ignore these 0's and pick the next highest value based on their frequencies into column E.
12NC ModifiedSOCwrt12NC SOC
0 232270463903 0 0
1 232270463903 0 0
2 232270463903 0 0
3 232270463903 0 0
4 232270463903 0 RC0603FR-0738KL
5 232270463903 0 RC0603FR-0738KL
6 232270463903 0 RC0603FR-0738KL
I want to run a loop which picks non-zero values from SOC (column B) and carries it over to ModifiedSOCwrt12NC (column E) based on unique values in Column D.
For example, Column B has values = [0, RCK2] in multiple rows which are based on unique values in column D. So the current loop picks the maximum occurrences of values in column B and fills it into column E. If there is a tie between occurrences of 0 and RCK2, it picks 0 as per the ASCII standard (which I don't want to happen). I want the code to pick RCK2 and fill those in column E.
Since your Data is not accessible, I have created a test data similar to one below -
We can read data in pandas -
import pandas as pd
df = pd.read_excel("ExcelTemplate.xlsx")
df
Index SOC Index2 12NC
0 YXGMY 0 ZJIZX 23445
1 NQHQC 0 JKJKT 23445
2 MWTLY 0 EFCYD 23445
3 RPQFE AC VLOJZ 23445
4 GPLUQ AC AKKKG 23445
5 WGYYM AC DSMLO 23445
6 XGTAQ 0 ZHGWS 45667
7 AMWDT 0 YROLO 45667
following code will do the summarization -
First summarize data on 12NC and SOC and take count
Sort by 12NC, count and SOC, with highest count first
Take the first value of SOC for each 12NC
Merge with original Data to create column E
Export back to Excel
df1 = df.groupby(['12NC', 'SOC'])['Index'].count().reset_index()
df = df.merge(df1[df1['SOC']!=0].sort_values(by=['12NC', 'Index', 'SOC'], ascending=[True, False, True])\
.drop_duplicates(subset=['12NC'], keep='first')[['12NC', 'SOC']].\
rename(index=str, columns={'SOC': 'ModifiedSOCwrt12NC'}),\
on = ['12NC'], how='left')
df.to_excel("ExcelTemplate_modifies.xlsx", index=False)