Excel array formula anomaly - excel

I have an array formula in Excel that works fine in all cells of the array except when there is a change in the conditional tests, and I'm not sure why.
The array formula is:
{=TEXT(VALUE(Header!$A$2)+VALUE(ReadingID)
*(IF(EventID="2", 1,IF(EventID="4", 1,0))*(VALUE(Header!$N$2)/86400)
+IF(EventID="2", 0, IF(EventID="4", 0, 1))*(VALUE(Header!$M$2)/86400))
, "#.000000")}
Typical data for the formula cells value:
Header!$A$2 = '43432.40434' # An excel serial date/time number as text.
ReadingID = #incremental numbers as text e.g. '1000', '1001' etc.
EventID = # Values 1 or 2 or 3 or 4 as text.
Header!$M$2 = 60 # as text.
Header!$N$2 = 10 # as text.
The ReadingID and EventID columns are the same size as the array formula column.
Typical results when EventID changes from, say, "2" to "3", are as follows:
ReadingID EventID Result Diff
'1540 '2 43432.582581 0.000116
'1541 '2 43432.582696 0.000115
'1542 '3 43433.475173 0.892477
'1543 '3 43433.475868 0.000695
'1544 '3 43433.476562 0.000694
The Diff column is simply to show the increment from row to row and is consistent either side of the transition in EventID value (e.g. from "2" to "3"). The same anomaly occurs at all points where the EventID value changes (i.e. "1" to "2"; "3" to "4").
The array formula spans several thousand cells and returns the expected result in all other rows, except when EventID changes.
I originally tried an OR function to perform the incremental sum, but that didn't work, hence the nested IF statements.
Can anyone suggest if there is something wrong with the array formula, or how to avoid this rogue result?
NOTE: The data is in text format as it is being imported from elsewhere in CSV format and I would like to preserve the raw import.

Related

Sort data contained in blocks in excel

I have a large amount of reference data in excel, which I am trying to manipulate in a variety of ways. I'm having some problems with the way it is structured and sorting into a more manageable format.
Problem number 1:
I have three columns. Column A contains first a date, and then a designator of high or low. Column B contains times, Column C contains heights.
I would like to sort the data by column B (easy enough) EXCEPT I would like the date headings in Column A preserved. It's almost as though I have 365 tables, each with between 3 and 5 pieces of data - I'm looking to sort the 3 - 5 pieces of data within each date only.
This is what I have currently:
There's no issue with me taking the data and manipulating it some other way first - this is ultimately around me being able to take a batch of data (5x different reference points, each for 365 days) and develop a process to sanitise it and get it displayed in time order, as well as being able to get it into a usable format for problem 2 (I need to adjust some other data points by the sorted data once I have it).
This is what I would like it to look like (I manually went through each of these blocks and sorted them):
It is possible to do it in Excel as follows in cell E2:
=LET(rng, A1:C11, set, FILTER(rng, (INDEX(rng,,1) <>"")),
dates, SCAN("", INDEX(set,,1), LAMBDA(acc, item, IF(ISNUMBER(item), item, acc))),
in, FILTER(HSTACK(dates, set), INDEX(set,,2)<>""), inDates, INDEX(in,,1),
out, REDUCE("", UNIQUE(inDates), LAMBDA(acc, date,
LET(sorted, VSTACK(date, DROP(SORT(FILTER(in, inDates = date),3),,1), {"","",""}),
VSTACK(acc, sorted)
))), IFERROR(DROP(DROP(out,1),-1),"")
)
Here is the output:
You can avoid the clean-up process except for removing the last row as follow:
=LET(rng, A1:C11, set, FILTER(rng, (INDEX(rng,,1) <>"")),
dates, SCAN("", INDEX(set,,1), LAMBDA(acc, item, IF(ISNUMBER(item), item, acc))),
in, FILTER(HSTACK(dates, set), INDEX(set,,2)<>""), inDates, INDEX(in,,1),
out, REDUCE("", UNIQUE(inDates), LAMBDA(acc, date,
LET(sorted, VSTACK(HSTACK(date,"",""), DROP(SORT(FILTER(in, inDates = date),3),,1),
{"","",""}), IF(MAX(LEN(acc))=0, sorted, VSTACK(acc, sorted))
))), DROP(out, -1)
)
Explanation
Basically is to carry out the manual steps but using excel functions. The name set, is the same as the input data (rng) but we removed the empty rows. The name dates, is a column with the same size as rng, repeating all the dates. The condition in the SCAN function to identify a new date is ISNUMBER because dates are stored in Excel as whole numbers. The name in has the data in the format we want for doing the sorting and filter by date removing the date header and adding as the first column the dates.
Now we use DROP/REDUCE/VSTACK pattern (check the answer to the question: how to transform a table in Excel from vertical to horizontal but with different length provided by David Leal) to append each sorted data for a given unique date. We add the date as the first row, then sorted data, and finally an empty row to separate each group of data. Finally, we do a clean-up via IFERROR/DROP to remove the #N/A values and the first and the last empty row.

Determine Size Of Array of Equal Items Excel

I have an array that looks like this
11100100110
essentially, an array of fixed size with each item being a 1 or 0 with the last item always equal to 0.
Consider each set of consecutive 1's to be a "bucket". I'd like a formula to determine the size of each bucket. So the output of this formula for the above sequence should be
312
as an array. Ideally this works in both excel and google sheets.
If you are interested this is the result of a list of stars and bars configurations where the 0's in my sequence represent bars and the 1's represent stars (the final value is a dummy 0 to make things easier to work with). I want the size of each non-empty bucket in a given configuration of stars and bars.
Thanks, in advance.
You could also use the standard method with Frequency which will work with Excel 365 and GS:
=FILTER(FREQUENCY(IF(A1:A11=1,ROW(A1:A11)),IF(A1:A11=0,ROW(A1:A11))),FREQUENCY(IF(A1:A11=1,ROW(A1:A11)),IF(A1:A11=0,ROW(A1:A11))))
try:
=INDEX((JOIN(, LEN(SPLIT(A1, 0)))))
update:
=INDEX(IFERROR(1/(1/SUBSTITUTE(FLATTEN(QUERY(TRANSPOSE(IFERROR(1/(1/
LEN(SPLIT(SUBSTITUTE(FLATTEN(QUERY(
TRANSPOSE(A1:K),, 9^9)), " ", ), 0))))),, 9^9)), " ", ))))
Assuming A2:A9 contains the data,
=ARRAYFORMULA(QUERY(FREQUENCY(IF(A2:A9,ROW(A2:A9)),IF(NOT(A2:A9),ROW(A2:A9))),"where Col1>0",))
FREQUENCY(data,classes) to get the frequency of data in classes
Make sequence of row numbers as data, if 1
Make sequence of row numbers as classes, if not 1
QUERY to get rid of zeros

Filter Multiple Columns between multiple range

I have a very big data contains nnumerical values mostly. I want to filter multiple columns that each is between different range. The problem is columns and range will be selected by user which means that filtered columns and ranges can be changed each time.
e.g 0<df[a]<5 & 0<df[b]<10. It can be "a" and "b" and "c" also, totaly depend on input.
I want to see how many rows in a range such that for example; for each column; col.a is between "0" and "1", "1" and "2" etc. until 5 and same for col.b or any other until e.g "10"
Because of my code is very long , tried to explain the attached some part inside strings:
# -*- coding: utf-8 -*-
"""
excel_file: readed excel file dataframe
entered_parameters: (list) to be filtered columns typed by user
parameters: readed columns of excel_file
limits: (list) upper_limits inputted by user for each entered_parameters
ranges: range or incrementation list for each entered parameters
boolean_frame: Boolean dataframe returned for filtering each entered_parameters(columns) upto limits in each cycle
total_boolean_frame:appended boolean_frame(shows ranges up to limits for each parameter)
total_frame: concat of total_boolean_frame (shows all filtered boolean values by range for all param)
"""
total_frame=pd.DataFrame()
parameters=[i for i in excel_file.columns if type(i)==str]
totalrownumberlist=[]
for i,v in enumerate(limits):
if i==0:
totalrownumberlist.append(len(excel_file)*v)
else:
totalrownumberlist.append(totalrownumberlist[i-1]*v)
totalrownumber=totalrownumberlist[-1]
for i,param in enumerate(entered_parameters):
total_boolean_frame=pd.DataFrame()
appended_row_num=totalrownumberlist[i]
if param in parameters:
while appended_row_num<=totalrownumber:
boolean_frame=pd.DataFrame()
initial=0
while initial<limits[i]:
boolean_frame[param]=(excel_file[param]>=initial) & (excel_file[param]<=initial+ranges[i])
boolean_frame["aralik-%s"%param]="%s-%s"%(initial,initial+ranges[i])
initial=initial+ranges[i]
total_boolean_frame=total_boolean_frame.append(boolean_frame,sort=False,ignore_index=True)
appended_row_num=appended_row_num+totalrownumberlist[i]
total_frame=pd.concat([total_frame,total_boolean_frame],axis=1)`
Edit: Output should be like this; count(range[0-1] col.a and range[0-1] col.b)=2 (avg. through axis=1 if all cells in the row is True which means avg of rows for excel_file[total_frame.all(axis=1)]. count(range[1-2] col.a and range[0-1] col.b)=3 with avg. again, count(range[2-3] col.a and range[0-1] col.b)=6 and avg. and goes on...
Thnks

Using tbl.Lookup to match just part of a column value

This question relates to the Schematiq add-in for Microsoft Excel.
Using =tbl.Lookup(table, columnsToSearch, valuesToFind, resultColumn, [defaultValue]) the values in the valuesToFind column have a consistent 3 characters to the left and then varying characters after (e.g. 908-123456 or 908-321654 - i.e. 908 is always consistent)
How can I tell the function to lookup the value based on the first 3 characters only? The expected answer should be the sum of the results of the above, i.e. 500 + 300 = 800
tbl.Lookup() works by looking for an exact match - this helps ensure it's fast but in this case it means you need an extra step to calculate a column of lookup values, something like this:
A2: =tbl.CalculateColumn(A1, "code", "x => LEFT(x, 3)", "startOfCode")
This will give you a new column that you can use for the columnsToSearch argument, however tbl.Lookup() also looks for just one match - it doesn't know how to combine values together if there is more than one matching row in the table, so I think you also need one more step to group your table by the first 3 chars of the code, like this:
A3: =tbl.Group(A2, "startOfCode", "amount")
Because tbl.Group() adds values together by default, this will give you a table with a row for each distinct value of startOfCode and the subtotal of amount for each of those values. Finally, you can do the lookup exactly as you requested, which for your input table will return 800:
A4: =tbl.Lookup(A3, "startOfCode", "908", "amount")

Excel: Combining semi-duplicated records (with different columns)?

How would I combine records if specified columns are the same?
Here's what I have, and the result I'm looking for:
It can be done using array formulas if you don't mind them being big and ugly. This example should do what you're looking for. In case of duplicate entries, it simply takes the last defined value (Prog instead of Programmer for Kevin Moss):
Enter the following formula into C11 and D11, then press CTRL+SHIFT+ENTER to apply the array formula. You can then copy the formula to the rows below as needed.
=INDEX((IF((((($A11=$A$2:$A$7)+($B11=$B$2:$B$7))=2)+(C$2:C$7<>""))=2,C$2:C$7,"")),MAX(IF((IF((((($A11=$A$2:$A$7)+($B11=$B$2:$B$7))=2)+(C$2:C$7<>""))=2,C$2:C$7,""))<>"",ROW($A$1:$A$6),0)))
This breaks down what's happening a little bit, but admittedly it's still pretty opaque, sorry:
=INDEX(
(IF( # This IF statement collects all entries in a data field for a given Fname/Lname combination
(((($A11=$A$2:$A$7) + ($B11=$B$2:$B$7))=2) + (C$2:C$7<>""))=2, # Checks that First and Last Name Match, and Data field isn't empty
C$2:C$7, # Return data field if TRUE
"" # Return empty if FALSE
)),
MAX( # Take the highest index number, use it to select a row from the result of the IF statement above
IF(( # This IF statement returns an index number if the data field isn't empty
IF( # This IF statement collects all entries in a data field for a given Fname/Lname combination (copied from above)
(((($A11=$A$2:$A$7)+($B11=$B$2:$B$7))=2)+(C$2:C$7<>""))=2,
C$2:C$7,
"")
)<>"", # End of conditional statement
ROW($A$1:$A$6), # Value if TRUE (ROW used as an incrementing counter)
0 # Value if FALSE (0 will be ignored in the MAX function that uses this result)
)
)
)

Resources