Excel - Formula to calculate difference between columns with blank cells - excel

I have an Excel sheet with values similar to the table below.
-------------------------------------
| A | B | C | D | E | F |
-------------------------------------
| 95| | 98| 96| 95| |
-------------------------------------
| 96| 95| | 92| 91| |
-------------------------------------
| 93| | 92| 98| 94| |
-------------------------------------
| 92| 98| | 95| 92| |
-------------------------------------
| 95| | 99| 92| 98| |
-------------------------------------
The formula for F1 should be =(B1-A1)+(C1-B1)+(D1-C1)+(E1-D1)
However, some cells are blank. So, if the cell is blank, it should take the next cell.
eg; F1 should be =(C1-A1)+(D1-C1)+(E1-D1)
and F2 should be =(B2-A2)+(D2-B2)+(E2-D2)
and so on...
Is there a formula to automate this?

The formula:
= (B1-A1) + (C1-B1) + (D1-C1) + (E1-D1)
can also be written as:
= B1 - A1 + C1 - B1 + D1 - C1 + E1 - D1
or
= - A1 + (B1 - B1) + (C1 - C1) + (D1 - D1) + E1
where only the first and last values prevail as all other void themselves, thus leaving this formula:
= - A1 + E1
So the formula then becomes the last non-blank value minus the first non-blank value.
Try this formula:
= INDEX( $A1:$E1, 0, AGGREGATE( 14, 6, COLUMN(1:1) / ( $A1:$E1 <> "" ), 1 ))
- INDEX( $A1:$E1, 0, AGGREGATE( 15, 6, COLUMN(1:1) / ( $A1:$E1 <> "") ,1 ))
See these pages for further explanations on the Worksheet Functions used:
AGGREGATE function, INDEX function.

Related

How to extract rows with some processing steps using python pandas?

My dataframe:
| query_name | position_description |
|------------|----------------------|
| A1 | [1-10] |
| A1 | [3-5] |
| A2 | [1-20] |
| A3 | [1-15] |
| A4 | [10-20] |
| A4 | [1-15] |
I would like to remove those rows with (i)same query_name and (ii) overlap entirely for the position_description?
Desired output:
| query_name | position_description |
|------------|----------------------|
| A1 | [1-10] |
| A2 | [1-20] |
| A3 | [1-15] |
| A4 | [10-20] |
| A4 | [1-15] |
If there can be no more than one row contained in another we can use:
from ast import literal_eval
df2 = pd.DataFrame(df['position_description'].str.replace('-', ',')
.apply(literal_eval).tolist(),
index=df.index).sort_values(0)
print(df2)
0 1
0 1 10
2 1 20
3 1 15
5 1 15
1 3 5
4 10 20
check = df2.groupby(df['query_name']).shift()
df.loc[~(df2[0].gt(check[0]) & df2[1].lt(check[1]))]
query_name position_description
0 A1 [1-10]
2 A2 [1-20]
3 A3 [1-15]
4 A4 [10-20]
5 A4 [1-15]
This should work for any number of ranges being contained by some ranges:
First, extract the boundaries
df = pd.DataFrame({
'query_name': ['A1', 'A1', 'A2', 'A3', 'A4', 'A4'],
'position_description': ['[1-10]', '[3-5]', '[1-20]', '[1-15]', '[10-20]', '[1-15]'],
})
df[['pos_x', 'pos_y']] = df['position_description'].str.extract(r'\[(\d+)-(\d+)\]').astype(int)
Then we will define the function that can choose what ranges to keep:
def non_contained_ranges(df):
df = df.drop_duplicates('position_description', keep='first') #Duplicated ranges will be seen as being contained by one another and thus all wouldn't pass this check. Drop all but one duplicate here.
range_min = df['pos_x'].min()
range_max = df['pos_y'].max()
range_size = range_max - range_min + 1
b = np.zeros((len(df), range_size))
for i, (x, y) in enumerate(df[['pos_x', 'pos_y']].values - range_min):
b[i, x: y+1] = 1.
b2 = np.logical_and(np.logical_xor(b[:, np.newaxis], b), b).any(axis=2)
np.fill_diagonal(b2, True)
b3 = b2.all(axis=0)
return df[b3]
If there are N ranges within a group (query_name), this function will do N x N comparisons, using boolean array operations.
Then we can do groupby and apply the function to yield the expected result
df.groupby('query_name')\
.apply(non_contained_ranges)\
.droplevel(0, axis=0).drop(columns=['pos_x', 'pos_y'])
Outcome:
query_name position_description
0 A1 [1-10]
2 A2 [1-20]
3 A3 [1-15]
4 A4 [10-20]
5 A4 [1-15]

Remove duplicates from multiple column so that output will be unique in excel

I have these 2 coulmn:
Column 1 Column 2
A E
B F
C B
D A
G
I need to compare the two columns so the output must be:
Column 3
A
B
C
D
E
F
G
Well if your data is in columns A and B:
| A | B |
|--- |---- |
| A | E |
| B | F |
| C | B |
| D | A |
| G | |
You can use this one that will throw #N/A once you hit the limit and requires first row to have some unique text:
=IFERROR(INDEX($A$1:$A$4,MATCH(0,COUNTIF($D$1:D1,$A$1:$A$4),0)),INDEX($B$1:$B$5,MATCH(0,COUNTIF($D$1:D1,$B$1:$B$5),0)))
And it is an array formula that needs to be applied with Ctrl + Shift + Enter.
Result it column D:
| D |
|-------- |
| Unique |
| A |
| B |
| C |
| D |
| E |
| F |
| G |
| #N/A |
This can be done with a formula like this one starting in Cell C2:
=IFERROR(INDEX($A$2:$A$5,MATCH(0,INDEX(COUNTIF($C$1:C1,$A$2:$A$5),,),)),IFERROR(INDEX($B$2:$B$6,MATCH(0,INDEX(COUNTIF($C$1:C1,$B$2:$B$6),,),)),""))
This is a regular (non-array) formula.
Note that althought the result looks sorted, there is no sorting, it is reported in the order that the values are encountered.
I know this wasn't part of the request, but since I mentioned it, if we wanted the list sorted despite the order of the original columns, I think this formula entered into C2 and copied down would work as an array formula entered with [Ctrl]+[Shift]+[Enter]:
=IFERROR(INDEX($A$2:$B$6, SMALL(IF(SMALL(IF(COUNTIF($C$1:C1, $A$2:$B$6)+ISBLANK($A$2:$B$6)=0, COUNTIF($A$2:$B$6, "<"&$A$2:$B$6)+1, ""), 1)=IF(ISBLANK($A$2:$B$6), "", COUNTIF($A$2:$B$6, "<"&$A$2:$B$6)+1), ROW($A$2:$B$6)-MIN(ROW($A$2:$B$6))+1), 1), MATCH(MIN(IF(COUNTIF($C$1:C1, $A$2:$B$6)+ISBLANK($A$2:$B$6)>0, "", COUNTIF($A$2:$B$6, "<"&$A$2:$B$6)+1)), INDEX(IF(ISBLANK($A$2:$B$6), "", COUNTIF($A$2:$B$6, "<"&$A$2:$B$6)+1), SMALL(IF(SMALL(IF(COUNTIF($C$1:C1, $A$2:$B$6)+ISBLANK($A$2:$B$6)=0, COUNTIF($A$2:$B$6, "<"&$A$2:$B$6)+1, ""), 1)=IF(ISBLANK($A$2:$B$6), "", COUNTIF($A$2:$B$6, "<"&$A$2:$B$6)+1), ROW($A$2:$B$6)-MIN(ROW($A$2:$B$6))+1), 1), , 1), 0), 1),"")
See the following with a modified set of original values. I referenced the input columns as a range which makes it a bit complicated since I had to remove blank values.

How to reference a cell where sheet name is cited as a value from a different workbook?

I have two workbooks data.xlsx (which is a readonly and contains sheets mainsheet, a, b,c,d...) andresult.xlsx` (where I will put all my computations and formula).
data.xlsx!mainsheet contains:
-------
A | B |
-------
1 | c |
2 | b |
3 | a |
.
.
.
-------
and results.xlsx contains
-------------------
| A | B | C |
-------------------
1 |S1 | S2 | Sum |
2 | 3 | 1 | |
3 | 2 | 3 | |
4 | 1 | 2 | |
Values of cell A1 of sheets a, b, c are 10, 5 and 50 respectively.
What should be the formula so that:
Sheet C2 should be the sum of A1 values of sheet a and c
Sheet C3 should be the sum of A1 values of sheet b and a
Sheet C2 should be the sum of A1 values of sheet c and b
So the expected result will be cell C2 = 10+50=60, C3=5+10=15, C4=50+5=55.
Use vlookup to get the sheenames which needs to be summed.
Use indirect to get the values
=INDIRECT(VLOOKUP(A4,[data.xlsx]mainsheet!$A:$B,2,0)&"!A1")+INDIRECT(VLOOKUP(B4,[data.xlsx]mainsheet!$A:$B,2,0)&"!A1")

Google Sheets How to find the Top 3 closest columns to a given column

I have a table in google sheets like this one,
-------------------
| A | B | C | D |
-------------------
1 |C1 |C2 |C3 |C4 |
2 | 1 | 2 | 1 | 2 |
3 | 2 | 3 | 4 | 3 |
4 | 5 | 7 | 1 | 6 |
-------------------
My goal is to find which 2 columns C1,C2,C3 are closest to C4,
by calculate the average difference bewteen each column and column C4,
e,g Column C1 will have an averyage of abs( ( (1-2)+(2-3)+(5-6) ) /3 )
which is , abs( ( (A2-D2)+(A3-D3)+(A4-D4) )/(number of rows) )
I'm using ARRYFORMULA to get the average differece for one column and then I drag it horizontally so As will increase to Bs and so on
=ArrayFormula({A1;abs(average( (checks if there is empty cell) ,$D2:$D-(A2:A) )))})
if I use it in cell Z1, Z1 will show 'C1', and Z2 will show the average difference for column C1
but i'm not sure how to use a single nested formula to do it for all columns A:C at once, with out having to drag it
like I if I type =FORMULA(...) in Z1, and a table will show up
Thank you
Try the formula:
=QUERY(ARRAYFORMULA(ABS((ROW(A2:C)*COLUMN(A2:C))^0*D2:D26-A2:C26)),
"select avg(Col"&JOIN("), avg(Col",ArrayFormula(row(INDIRECT("A1:A"&COLUMNS(A2:C)))))&")")
Explanation
(ROW(A2:C)*COLUMN(A2:C))^0*D2:D26 -- copy C4 to compare it with others
"select avg(Col"&JOIN("), avg(Col"... -- compose query to get the average for each column.
Note: in your formula abs(average( must be replaced → average(abs( in order to complete abs function first.

Creating an SQL select statement out of excel values

I have a sheet with 2 columns. I need to CONCATENATE the two cells within each row to create a large WHERE statement in the SQL based off every row. For example:
Where A1 = 'B1' and A2 = 'B2' etc etc.
What do you suggest is the best method to do this? I need to do this across many sheets. Originally I was going to do something like this: C1=CONCATENATE(A1," = ","'",B1,"'") across every row, then CONCATENATE those outputs as well (C1,D1 etc) but just wondering if there are any other options? Would using VBA be easier?
No need to use any functions.
You may do like this,
assuming your excel sheet is like:
| A | B | C | D |
1 | a1 | b1 | | |
2 | a2 | b2 | | |
3 | a3 | b3 | | |
4 | a4 | b4 | | |
Insert one new row between A and B, and write =' in cell B1 and drag that cell to COPY cell value upto total number of your rows. Similarly write ' and in Cell D1 and do same, so it will be like this.
| A | B | C | D |
1 | a1 | =' | b1 |' and|
2 | a2 | =' | b2 |' and|
3 | a3 | =' | b3 |' and|
4 | a4 | =' | b4 |' and|
Now, Copy paste these cells to Notepad++ and replace TAB and \n by a space (new lines)
So, now you should get string like,
a1='b1' and a2='b2' and a3='b3' and a4='b4' and
You just have to edit minor thing, place this to your query and remove last and

Resources