Standard Deviation Excel from two different lists - excel

I am currently trying to find a z score for my values. To start off, I am drawing two separate lists from a different sheet to try to find the standard deviation. Currently I have
(G11-(AVERAGE(INDEX('Asia Last Yields'!DU:DU,
MATCH(B3,'Asia Last Yields'!DT:DT)):INDEX('Asia Last Yields'!DU:DU,
MATCH(B1,'Asia Last Yields'!DT:DT)))-AVERAGE(INDEX('Asia Last Yields'!DL:DL,
MATCH(B3,'Asia Last Yields'!DK:DK)):INDEX('Asia Last Yields'!DL:DL,
MATCH(B1,'Asia Last Yields'!DK:DK)))))/(need the standard dev)
This is basically (original value - mean) right now. As you can see it is pretty complicated right now, but basically my problem is since I am drawing from two lists that do not form an array, I cannot simply just use the Standard Deviation function. Is there anyway to combine the two lists without creating a seperate list within the worksheet?
Let me know if I wasn't clear, and thanks for the help!

Related

How to split a Pandas dataframe into multiple csvs according to when the value of a column changes

So, I have a dataframe with 3D point cloud data (X,Y,Z,Color):
dataframe sample
Basically, I need to group the data according to the color column (which takes values of 0,0.5 and 1). However, I don't need an overall grouping (this is easy). I need it to create new dataframes every time the value changes. That is, I'd like a new dataframe for every set of rows that are followed by and preceded by 5 zeros (because single zeros are sometimes erroneously present in chunks of data that I'm interested in).
Basically, the zero values (black) are meaningless for me; I'm only interested in the 0.5 (red) and 1 values (green). What I want to accomplish is to segment the original point cloud into smaller clusters that I can then visualize. I hope this is clear. I can't seem to find answers to my question anywhere.
First of all, you should understand the for loop well. Python is a great programming language for using the code of any library inside functions and loops. Let's say you have a dataset and you want to navigate and control column a. First, let's start the loop with the "for i in dataset:" code. When you move to the bottom line, you have now specified the criteria you want with the code if "i[a] > 0.5:" in each for loop. Now if the value is greater than 0.5, you can write the necessary codes to create a new dataset with all the data of the row you are in. In terms of personal training, I did not write ready-made code.

Calculate the minimum value of each column in a matrix in EXCEL

Alright this should be a simple one.
I apologize in case it has been already solved, but I can only find posts related to solving this issue with programming languages and not specifically to EXCEL.
Furthermore, I could find posts that address a sub-problem of my question (e.g. regarding limitation of certain EXCEL functions) and should solve/invalidate my request but maybe, just maybe, there is a workaround.
Problem statement:
I want to calculate the minimum value for each column in an EXCEL matrix. Simply enough, I want to input a 2D array (mxn matrix) in a function and output an array with dimension 1xm where each item is the minimum value MIN(nj) of each nj column.
However, I want to solve this with specific constraints:
Avoid using VBA and other non-function scripting: that I could devise myself;
All in one function: what I want to achieve here is to have one and one function only, not split the problem into multiple passages (such as for example copypasting a MIN() function below each column, that wouldn't do it);
The result should be a transposable array (which is already ok, I assume);
Where I am stranded with my solution so far:
The main issue here is that any function I am trying to use takes the entire matrix as a single array input and would calculate the MIN() of the entire matrix, not each column. My current (not working) function for an exemplary 4x4 matrix in range A1:D4 would be as below (the part in bold is where it is clearly not working):
=MIN(INDEX(A1:D4,SEQUENCE(4,4,1,1)))
which ofc does not work, because INDEX() does probably not "understand" SEQUENCE() as an array of items to take into account. Another, not working, way of solving this is to input a series of ranges (A1:A4;B1:B4;C1:C4;D1:D4) so that INDEX() "understands" the ranges as single columns, but ofc does not know and I do not know sincerely how to formulate that. I could use INDIRECT() in some way to reference the array of ranges, but do not know how and could find a way by searching online.
Fundamental question is: can a function, which works with single arrays, also work with multiple arrays? Basically, I do not know how to communicate an EXCEL array formula, that each batch of data I am inputting is a single array and must be evaluated separately (this is very easily solved with for() cycles, I know).
Many thanks for any suggestion and any workaround, any function and solution works as longs as it fits in the constrains defined above (maybe a LAMBA() function? don't know).
This is ofc a simplification of a way more complex problem (I am trying to calculate the annual mean temperature evolution for a specific location by finding the value - for each year from 1950 to 2021 - that is associated to the lat/lon coordinates that are the nearest to the one of the location inputted, given a netCDF-imported grid of time-arrayed data; the MIN() function is used to selected the nearest location, which is then used, via INDEX() to find temp data). I need to do this in one hit (meaning just pasting the function, which evaluates a matrix of data that is referenced by a fixed range), so that I can just use it modularly for other data sets. I already have a working solution, which is "elegant"* enough, but not "elegant"* as the one I could develop solving this issue.
*where "elegant"= it saves me one click every time for 1000+ datasets when applying the function.
If I understand your problem correct then this should solve it:
=BYCOL(A1:D4,LAMBDA(d,MIN(d)))

Selection based on several inputs without extreme duplication

I have a library of data that i need to pull specific rows from, at the moment i have an ID made up of several dropdown menus =$C$2&$F$2... that i compare to an index made up of a combination of column content: =[#Column1]&[#Column2]... that i then use to pull the right data for that instance with VLOOKUP.
Now however i need a much more varied set with more selections, 5 columns worth. That creates 16 sets for every index on the first column and will generate thousands of lines if i am to create one version of every permutation.
The best scenario would be a way to use a modular form of the selections above, if there is any input on X, Y and Z then it functions like now, but if Y and Z are empty it only pulls X. Easy in theory but i dont know the format it will have to take, and it gets even more complicated if i want X and Z for instance, or Y and Z, but still create a neat list of the selections.
An alternative might be a way to pull tables based on a selection, and make one table for every "part" of my query but i cant find a way to do that either.
What i need is any way to pull and combine several rows from a library (based on dropdown or similar input) and assembled in a neat list that i can print.
First post, and thanks in advance =)

Can I use MINIFS or INDEX/MATCH on two non-contiguous ranges...?

Problem is straightforward, but solution is escaping. Hopefully some master here can provide insight.
I have a big data grid with prices. Those prices are ordered by location (rows) and business name (cols). I need to match the location/row by looking at two criteria (location name and a second column). Once the matching row is found (there will always be a match), I need to get the minimum/lowest price from two ranges within the grid.
The last point is the real challenge. Unlike a normal INDEX or MINIFS scenario, the columns I need to MIN aren't contiguous... for example, I need to know what the MIN value is between I4:J1331 and Q4:U1331. It's not an intersection, it's a contiguous set of values across two different arrays.
You're probably saying "hey, why don't you just reorder your table to make them contiguous"... not an option. I have a lot of data, and this spreadsheet is used for a bunch of other stuff. So, I have to work with the format I have, and that means figuring out how to do a lookup/min across multiple non-contiguous ranges. My latest attempt:
=MINIFS(AND($I$4:$J$1331,$K$4:$P$1331),$B$4:$B$1331,$A2,$E$4:$E$1331,$B2)
Didn't work, but it should make it more clear what I'm trying to do. There has GOT to be an easy way to just tell excel "use these two ranges instead of one".
Thanks,
Rick
Figured it out. For anyone else who's interested, there doesn't seem to be any easy way to just "AND" arrays together for a search (Hello MS, backlog please). So, what I did instead was to just create multiple INDEX/MATCH arrays inside of a MIN function and take the result. Like this:
MIN((INDEX/MATCH ARRAY 1),(INDEX/MATCH ARRAY 2))
They both have identical criteria, the only difference is the set of arrays being indexed in each function. That basically gives me this:
MIN((match array),(match array))
And Min can then pull the lowest value from either.
Not as elegant as I'd like... lots of redundant code, but at least it works.
-rt

Iterative/Looped Substitute without VBA

Abridged Question:
If I have a concatenated string of "|#|#|#|...|#|", how can I apply a multiplier to each of the numbers and update the concatenated text? For example, for |4|12|8|, multiply by a factor of 2 and update the concatenated text to |8|24|16|.
Background
I have three columns of interest. The first column contains a date, the second an amount or factor, and the third column concatenates data into the format "|#|#|...|#|" (e.g., |2|5|, |2|5|12|, |4|12|, etc.). At times, a multiplying factor needs to be applied to the concatenated data, and the individual numbers would need to be updated accordingly.
An example would be—
Date Amt Concatenated Data
01/01/18 2 |2|
01/05/18 5 |2|5|
02/06/18 12 |2|5|12|
03/25/18 -3 |4|12|
03/31/18 8 |4|12|8|
04/01/18 F2 |8|24|16| (factor of 2 applied)
04/15/18 12 |8|24|16|12|
04/01/18 F1/4 |2|6|4|3| (factor of 1/4 applied)
With a formula, how can I apply the factor to the concatenated data, and update the individual numbers?
I'm bound by the following conditions:
Excel 2007, so no TEXTJOIN function
No VBA or UDFs (due to security policies)
Individual numbers are dynamic (i.e., I can't use a static value for the "old_text" parameter of the SUBSTITUTE formula)
Amount of individual numbers within concatenated data is also dynamic (may contain one number, or may contain dozens of different numbers)
I can pull out the individual numbers using an array formula. I can even then multiply those numbers by the factor to produce an array result. However, I can't rebuild the concatenated data, because CONCATENATE doesn't work on an array. I've also tried SUBSTITUTE, but I can't iterate through the "|" separators. I can only substitute a given segment (e.g., change all entries of "|2|" to "|4|"). Nesting SUBSTITUTE or using individual columns won't work, since it could potentially involve dozens of instances.
Just to add some info on the concatenated data:
Amt>0, then value is concatenated to the end of the previous concatenated value
Amt<0, begin reducing individual numbers in concatenated value (CV) until reduction amount reached (e.g., for |2|5|12| and Amt=-3, reduce CV to |4|12|, which is -2 from the first segment and -1 from the second segment)
Amt reduction is limited to the sum of the previous CV's individual numbers (e.g., for |4|12|, the reduction cannot exceed 16)
Amt=F#, indicates a multiplying factor, and the CV's numbers need to be updated
The CV has no max (could have dozens to hundreds of individual numbers, with numbers going from 1 to 100,000+), other than any max applied by Excel itself on string length
HIGH LEVEL
Four parts to this solution
They satisfy pre-requisites (2007 compatibility, no VB, no Office 365 requirement, no custom VB functions, provide for complete 'dynamic' nature of variable length of cells to concatenate)
Caveat: to best knowledge / research, there is no parsimonious single-cell function & therefore an interim step has been proposed)
One more caveat: I imagine the simple 'hack' of wrapping a graph around the delimited data is out of question (see 'Other/Various' below ☺)
PARTS 1-4
Accompanying parts 1-4 below are functions which relate to the following screenshot:
I have also uploaded / amended to meet requirements of Google Sheets (see here)
Parts 1 & 2:
Similar in that they rely upon FilterXML technique to count component / terms, and split cells respectively:
Part 1:
=COUNT(2*TRANSPOSE(FILTERXML("<AllText><Num>"&SUBSTITUTE(LEFT(MID(D12,2,LEN(D12)-1),LEN(MID(D12,2,LEN(D12)-1))-1),"|","</Num><Num>")&"</Num></AllText>","//Num")))
Note: google sheets doesn't recognise FilterXML, so have amended technique/functions accordingly. For instance, above can be determined using counta on the split cells in Part 2 (easier / much more simpler than proposed approach above, albeit less robust given any cells lying to the right of the split cells will interfere with ordinary functionality of this approach).
Part 2:
It's either a manual approach, a fancy series of 'mid' &/or substitute / left/right functions, or the following FilterXML code which, per various sources (e.g. here) should be compatible with Excel 2007:
=IF(LEFT(C12,1)="F",1*SUBSTITUTE(C12,"F",""),1)*TRANSPOSE(FILTERXML("<AllText><Num>"&SUBSTITUTE(LEFT(MID(D12,2,LEN(D12)-1),LEN(MID(D12,2,LEN(D12)-1))-1),"|","</Num><Num>")&"</Num></AllText>","//Num"))
Commonality with Part 1 (re: FilterXML) can be seen - the only difference is that the count(Part 1) has been replaced with the transformation (multiplicative factor, as given in O.P's Q).
Part 3
Nothing fancy here - a simple concatenation (which is a far cry from a 'recursive' substitution function, I know, but hey - it does the trick and can always be placed in a mirror copy of the original sheet to avoid space issues/cell interaction issues)
=IF(H12="","",IF(G24="","|","")&G24&H12&"|")
Part 4
Thanks to the number of terms derived in Part 1, an offset function can easily determine the final cell pertaining to the concatenated 'build up' of 'transformed' values (per Part 3):
=OFFSET(H31,0,E31-1,1,1)
OTHER / VARIOUS
Various other proposals and 'workarounds' exist; unfortunately, these appear to fall short in one way or another of the pre-requisites set forth, videlicit:
a) Function/formula based
b) No VB
c) Excel 2007
d) Dynamic (variable/unknown number of terms)
Manual: e.g. function = concatenate(transpose(desired range)), and then components of the concatenate function and pressing F9 to convert to calculated values, which are readily applicable in the concatenate function. Disadvantage: time consuming in relation to 'automated solution' (needs to be done for each applicable toy). Advantage: no additional 'spreadsheet real estate' required, quicker/straightforward implementation in first instance.
Variants of the 'build-up' method: e.g. per Part 3, however, this alone does not ensure for an automated approach across an unknown number of terms in the original concatenated list.
Have mentioned in a previous solution (here), but may be case that you are eligible for Office 365 functionality whilst on a previous version of Excel (see Office Insider here)
Other proposals (above/below in this forum mind you) propose textjoin (so not sure if this is a comprehension issue or what ☺)
And yes, as alluded to at outset, you can easily achieve the desired outcome using a simple graph! Just for fun then, sort the data in reverse order, and include the split/delimited values as a bar graphs' "x-values" (which, by defn. for this type of graph, will now appear along the ordinary Cartesian 'y/vertical' axis)...
Zero points for this but thought it was an interesting discovery on my part!
(and if still in doubt, here's what the 'graph' would look like if I didn't kill everything except for the axis labels...):
Numerous references for relevant other items above, including research areas, as follows:
Excel Champs
StackOverflow - alternative applications for FilterXML
JUST ONE MORE THING...
In true Columbo style modus operandi, other ideas/approaches considered:
Application of pivot table?
Constructing matrices: I got a solution with a series of offset functions, but couldn't think of a feasible way to implement given space issues
Converting the split cells into a long digit through summation: e.g. 8 22 16 = 80 000 + 22 00 + 16. Using a substitute function with text (long digit, "General") I was able to successfully introduce the delimiter character ('|') for pairs of adjacent 'tuples' (e.g. I could get '8|2216', '822|16', but then a 'build up' formula where one cell depends upon converted values of the previous and so one was required once more, which landed me back to the proposal I have set out above
fyi - the matrix consideration only solves tuples of 2, for n-dimensional /combination one would need to 'pass' a string of characters over its mirror copy - e.g. {6,10,22} would pass over {6,10,22}, ignoring duplicate values would yield a trapezium as follows:
6
10
22
6
10
22
6
10
22
after the copy has 'passed' over the original (first row), we have the desired combination (22,10,6) (on the 'diagonal' such a matrix). This is akin to how Fourier Transforms work (kind of); but that aside, it was tempting to construct a matrix like this, but couldn't be bothered at this stage.
Will probably turn out to be a far simpler way that someone comes up with (I won't be the only person surprised based upon the various sources I've considered...)

Resources