I am working with three datasets in MATLAB, e.g.,
Dates:
There are D dates that are chars each, but saved in a cell array.
{'01-May-2019','02-May-2019','03-May-2019'....}
Labels:
There are 100 labels that are strings each, but saved in a cell array.
{'A','B','C',...}
Values:
[0, 1, 2,...]
This is one row of the Values matrix of size D×100.
I would like the following output in Excel:
date labels Values
01-May-2019 A 0
01-May-2019 B 1
01-May-2019 C 2
till the same date repeats itself 100 times. Then, the next date is added (+ repeated 100 times) onto the subsequent row along with the 100 labels in the second column and new values from 2nd row of Values matrix transposed in third column. This repeats until the date length D is reached.
For the first date, I used:
c_1 = {datestr(datenum(dates(1))*ones(100,1))}
c_2 = labels
c_3 = num2cell(Values(1,:)')
xlswrite('test.xls',[c_1, c_2, c_3])
but, unfortunately, this seemed to have put everything in one column, i.e., the date, then, labels, then, 1st row of values array. I need these to be in three columns.
Also, I think that the above needs to be in a for loop over each day that I am considering. I tried using the table function, but, didn't have much luck with it.
How to solve this efficiently?
You can use repmat and reshape to build your columns and (optionally) add them to a table for exporting.
For example:
dates = {'01-May-2019','02-May-2019'};
labels = {'A','B', 'C'};
values = [0, 1, 2];
n_dates = numel(dates);
n_labels = numel(labels);
dates_repeated = reshape(repmat(dates, n_labels, 1), [], 1);
labels_repeated = reshape(repmat(labels, n_dates, 1).', [], 1);
values_repeated = reshape(repmat(values, n_dates, 1).', [], 1);
full_table = table(dates_repeated, labels_repeated, values_repeated);
Gives us the following table:
>> full_table
full_table =
6×3 table
dates_repeated labels_repeated values_repeated
______________ _______________ _______________
'01-May-2019' 'A' 0
'01-May-2019' 'B' 1
'01-May-2019' 'C' 2
'02-May-2019' 'A' 0
'02-May-2019' 'B' 1
'02-May-2019' 'C' 2
Which should export to a spreadsheet with writetable as desired.
What we're doing with repmat and reshape is "stacking" the values and then converting them into a single column:
>> repmat(dates, n_labels, 1)
ans =
3×2 cell array
{'01-May-2019'} {'02-May-2019'}
{'01-May-2019'} {'02-May-2019'}
{'01-May-2019'} {'02-May-2019'}
We transpose the labels and values so they get woven together (e.g [0, 1, 0, 1] vs [0, 0, 1, 1]), as repmat is column-major.
If you don't want the intermediate table, you can use num2cell to create a cell array from values so you can concatenate all 3 cell arrays together for xlswrite (or writematrix, added in R2019a, which also deprecates xlswrite):
values_repeated = num2cell(reshape(repmat(values, n_dates, 1).', [], 1));
full_array = [dates_repeated, labels_repeated, values_repeated];
I have a dataframe which has 500K rows and 7 columns for days and include start and end day.
I search a value(like equal 0) in range(startDay, endDay)
Such as, for id_1, startDay=1, and endDay=7, so, I should seek a value D1 to D7 columns.
For id_2, startDay=4, and endDay=7, so, I should seek a value D4 to D7 columns.
However, I couldn't seek different column range successfully.
Above-mentioned,
if startDay > endDay, I should see "-999"
else, I need to find first zero (consider the day range) and such as for id_3's, first zero in D2 column(day 2). And starDay of id_3 is 1. And I want to see, 2-1=1 (D2 - StartDay)
if I cannot find 0, I want to see "8"
Here is my data;
data = {
'D1':[0,1,1,0,1,1,0,0,0,1],
'D2':[2,0,0,1,2,2,1,2,0,4],
'D3':[0,0,1,0,1,1,1,0,1,0],
'D4':[3,3,3,1,3,2,3,0,3,3],
'D5':[0,0,3,3,4,0,4,2,3,1],
'D6':[2,1,1,0,3,2,1,2,2,1],
'D7':[2,3,0,0,3,1,3,2,1,3],
'startDay':[1,4,1,1,3,3,2,2,5,2],
'endDay':[7,7,6,7,7,7,2,1,7,6]
}
data_idx = ['id_1','id_2','id_3','id_4','id_5',
'id_6','id_7','id_8','id_9','id_10']
df = pd.DataFrame(data, index=data_idx)
What I want to see;
df_need = pd.DataFrame([0,1,1,0,8,2,8,-999,8,1], index=data_idx)
You can create boolean array to check in each row which 'Dx' column(s) are above 'startDay' and below 'endDay' and the value is equal to 0. For the first two conditions, you can use np.ufunc.outer with the ufunc being np.less_equal and np.greater_equal such as:
import numpy as np
arr_bool = ( np.less_equal.outer(df.startDay, range(1,8)) # which columns Dx is above startDay
& np.greater_equal.outer(df.endDay, range(1,8)) # which columns Dx is under endDay
& (df.filter(regex='D[0-9]').values == 0)) #which value of the columns Dx are 0
Then you can use np.argmax to find the first True per row. By adding 1 and removing 'startDay', you get the values you are looking for. Then you need to look for the other conditions with np.select to replace values by -999 if df.startDay >= df.endDay or 8 if no True in the row of arr_bool such as:
df_need = pd.DataFrame( (np.argmax(arr_bool , axis=1) + 1 - df.startDay).values,
index=data_idx, columns=['need'])
df_need.need= np.select( condlist = [df.startDay >= df.endDay, ~arr_bool.any(axis=1)],
choicelist = [ -999, 8],
default = df_need.need)
print (df_need)
need
id_1 0
id_2 1
id_3 1
id_4 0
id_5 8
id_6 2
id_7 -999
id_8 -999
id_9 8
id_10 1
One note: to get -999 for id_7, I used the condition df.startDay >= df.endDay in np.select and not df.startDay > df.endDay like in your question, but you can cahnge to strict comparison, you get 8 instead of -999 in this case.
I have an excel file that has been uploaded here
http://www58.zippyshare.com/v/99974349/file.html
The formula works great except for a column with descending values.
=INDEX(
INDIRECT("'"&LOOKUP(B5,TblA)&"'!A6:A36"),
LOOKUP(9.99999999999999E+307,
SEARCH("-"&C8&"-","-"&INDIRECT("'"&LOOKUP(B5,TblA)&"'!C6:C36")&"-"),
ROW(C6:C36)-ROW(C6)+1))
Let me explain the excel file.
I have one main sheet 'Report' and 4 other sheets correspond to 4 age groups. - 4.2.0 to 4.7.30, 4.8.0 to 5.1.30, 5.2.0 to 5.7.30 and 5.8.0 to 6.1.30. Depending on the Age (B5) in the sheet 'Report', I select one of the 4 sheet to pick values from. I pick the correct sheet using a Table Name TblA which contains all sheet names and is defined from A24 to B27 in the sheet 'Report'.
In the sample sheet that is uploaded, B5 contains the value 5.7 which means we have to select the sheet 5.2.0 to 5.7.30.
Now from the sheet 5.2.0 to 5.7.30, I have to seek the respective Standard Score (1st column) for every Raw Score entered in 'Report'.
Here are the steps:
A. Enter Raw scores in sheet 'Report' C7 to C15
B. Search Respective sheet depending on age (B5 cell), in our case 5.2.0 to 5.7.30 since age is 5.7
C. Populate Standard score from Raw scores by picking the corresponding column in the 4 sheets. For example, if Raw Score of Col1 is 25 (C7), then pick the Standard score of Col1 from 5.2.0 to 5.7.30 and enter in D7 and so on.
D. This way all standard scores are filled in D7 to D15.
The formula works great except for D13 in sheet 'Report' since if you observe ColD in 5.2.0 to 5.7.30, it is in descending order.
How do I change the formula to accomodate this unique column?
Well, it's not really the order that's causing the error, it's because you don't have any results! The formula you use is trying to find -159- which it cannot find at all in the age sheet. You really need something to look into ranges, so that if you have 159, it will return a positive result when you try to match against 139-160.
I have made a formula building it from smaller ones, but when assembled, the repeating units make it daunting... Also, it's an array formula, so you need to use Ctrl+Shift+Enter for it to work as intended. You can still drag the formula down.
=INDEX(
INDIRECT("'"&LOOKUP($B$5,TblA)&"'!A6:A36"),
IFERROR(
MATCH(
C7,
INDEX(
INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36"),
0,
MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0)
)*1,
0
),
MATCH(
1,
IF(
1*LEFT(
INDEX(
INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36"),
0,
MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0)
),
FIND(
"-",
INDEX(
INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36"),
0,
MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0)
)
)-1
)<=C7,
1,
0
)*
IF(
1*MID(
INDEX(
INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36"),
0,
MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0)
),
FIND(
"-",
INDEX(
INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36"),
0,
MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0)
)
)+1,
100
)>=C7,
1,
0
)
,0
)
)
)
The single line version...
=INDEX(INDIRECT("'"&LOOKUP($B$5,TblA)&"'!A6:A36"),IFERROR(MATCH(C7,INDEX(INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36"),0,MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0))*1,0),MATCH(1,IF(1*LEFT(INDEX(INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36"),0,MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0)),FIND("-",INDEX(INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36"),0,MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0)))-1)<=C7,1,0)*IF(1*MID(INDEX(INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36"),0,MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0)),FIND("-",INDEX(INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36"),0,MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0)))+1,100)>=C7,1,0),0)))
You can notice that there are some repeating blocks, namely:
INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36")
For the sheet name;
INDEX(
INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36"),
0,
MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0)
)
Which is a larger block to make the formula a bit more flexible (it automatically picks the correct column e.g. if you change B8 Exclusion to Col1, the formula will automatically adjust itself)
If I call the first Sheet and the second Column, it becomes much shorter and perhaps easier to understand:
=INDEX(
Sheet,
IFERROR(
MATCH(
C7,
Column*1,
0
),
MATCH(
1,
IF(
1*LEFT(
Column,
FIND(
"-",
Column
)-1
)<=C7,
1,
0
)*
IF(
1*MID(
Column,
FIND(
"-",
Column
)+1,
100
)>=C7,
1,
0
)
,0
)
)
)
Or
=INDEX(Sheet,IFERROR(MATCH(C7,Column*1,0),MATCH(1,IF(1*LEFT(Column,FIND("-",Column)-1)<=C7,1,0)*IF(1*MID(Column,FIND("-",Column)+1,100)>=C7,1,0),0)))
Disclaimer: I'm not sure if there are any way to make this even shorter, but I guess that as long as it's working right now ^^
You can download your updated sheet here.
Explanation:
As I mentioned before, the formula is based off several smaller ones and quite a few repeats of those.
INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36")
As you already know (it's a variation of a part of your own formula), this gives the area containing all the different ages. Using it and the below, we get this:
INDEX(
INDIRECT('"&LOOKUP($B$5,TblA)&"'!B6:J36"),
0,
MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0)
)
Into:
INDEX(
'Sheet'!B6:J36,
0,
MATCH(B7,'Sheet'!B5:J5,0)
)
Index will thus look into the range 'Sheet'!B6:J36, 0 indicates it will take any column(s) and MATCH(B7,'Sheet'!B5:J5,0) returns the nth column by taking the value of B7 (in the case of your spreadsheet, Col1) and looking it into 'Sheet'!B5:J5 which gives 1. The above will thus return the range 'Sheet'!B6:B36. Let's put it in the formula:
=INDEX(
'Sheet'!A6:A36,
IFERROR(
MATCH(
C7,
'Sheet'!B6:B36*1,
0
),
MATCH(
1,
IF(
1*LEFT(
'Sheet'!B6:B36,
FIND(
"-",
'Sheet'!B6:B36
)-1
)<=C7,
1,
0
)*
IF(
1*MID(
'Sheet'!B6:B36,
FIND(
"-",
'Sheet'!B6:B36
)+1,
100
)>=C7,
1,
0
)
,0
)
)
)
This formula is itself a giant INDEX formula, with range 'Sheet'!A6:A36 and row number as the big IFERROR group. The first part of the IFERROR() gets evaluated first:
MATCH(
C7,
'Sheet'!B6:B36*1,
0
)
This should be easy enough to understand. It looks for the raw score (from C7) into the range we obtained earlier, times 1 to convert everything to number (you can't look up numbers and text and expect a match). So that if there's an exact match of a number, it will return the row number of the found raw score and feed it to the INDEX(). For example, if the first row is returned, we get:
=INDEX('Sheet'!A6:A36,1)
Which is 'Sheet'!B6. If however there's no match (i.e. the raw score cannot be found), MATCH will return an error. And that's when the second part of the IFERROR comes into play:
MATCH(
1,
IF(
1*LEFT(
'Sheet'!B6:B36,
FIND(
"-",
'Sheet'!B6:B36
)-1
)<=C7,
1,
0
)*
IF(
1*MID(
'Sheet'!B6:B36,
FIND(
"-",
'Sheet'!B6:B36
)+1,
100
)>=C7,
1,
0
)
,0
)
This MATCH tries to find 1 within what seems to be two IFs; the first one being:
IF( 1*LEFT('Sheet'!B6:B36,FIND("-",'Sheet'!B6:B36)-1)<=C7 , 1 , 0)
FIND("-",'Sheet'!B6:B36)-1 gets the position of the last character before the - in the column 'Sheet'!B6:B36.
With those values, this FIND would return:
12-13 -> 2
145-155 -> 3
1567-1865 -> 4
The IF thus becomes:
IF( 1*LEFT('Sheet'!B6:B36,{2,3,4})<=C7 , 1 , 0)
Notice the braces here; they indicate an array and that's why this is an array formula. LEFT then extracts all the characters before the - (remember your other question, I answered with a technique very similar to this):
12-13 -> 2 -> 12
145-155 -> 3 -> 145
1567-1865 -> 4 -> 1567
Which is...
IF( 1*{12,145,1567}<=C7 , 1 , 0)
Again, 1* converts those to actual numbers because LEFT be default returns text characters. It's important here to do this because we're going to use the comparator <=, so that if the value to the left of C7 (the raw score), then the IF should return 1, else, it should return 0. Let's say that the raw score was 154. The results would be:
IF( {12,145,1567}<=154 , 1 , 0)
IF( {TRUE,TRUE,FALSE} , 1 , 0)
{1,1,0}
I just realised that the formula can be made a little shorter xD Anyway, we'll see that later. The next IF behaves in a similar fashion, but checks for the value at the right of the -:
IF( 1*MID('Sheet'!B6:B36,FIND("-",'Sheet'!B6:B36)+1,100)>=C7 , 1 , 0)
With...
FIND MID('Sheet'!B6:B36, X, 100)
12-13 -> 4 -> 13
145-155 -> 5 -> 155
1567-1865 -> 6 -> 1865
You can notice that this formula will stop working if you have something above 100 character long here. Anyway, the IF thus becomes:
IF( {13,155,1865}>=154 , 1 , 0)
IF( {FALSE,TRUE,TRUE} , 1 , 0)
{0,1,1}
Now that we have these, the MATCH from before becomes:
MATCH( 1 , {1,1,0}*{0,1,1} , 0)
Some simple math makes this into:
MATCH( 1 , {0,1,0} , 0)
And what is the position of the 1 in there? That's right, position 2!
Our original formula this becomes:
=INDEX( 'Sheet'!A6:A36 , IFERROR( #Error! , 2 ) )
So that if nothing was found at first, it will return an error (#N/A in this case) and instead return 2. =INDEX( 'Sheet'!A6:A36 , 2 ) gives 'Sheet'!A7.
And the slightly shorter version is:
=INDEX(INDIRECT("'"&LOOKUP($B$5,TblA)&"'!A6:A36"),IFERROR(MATCH(C7,INDEX(INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36"),0,MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0))*1,0),MATCH(1,(1*LEFT(INDEX(INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36"),0,MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0)),FIND("-",INDEX(INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36"),0,MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0)))-1)<=C7)*(1*MID(INDEX(INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36"),0,MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0)),FIND("-",INDEX(INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B6:J36"),0,MATCH(B7,INDIRECT("'"&LOOKUP($B$5,TblA)&"'!B5:J5"),0)))+1,100)>=C7),0)))
I actually removed the inner IFs, because (a>b)*(c>b) already returns 0s and 1s since TRUE multiplied by TRUE gives 1 in excel.
In Excel 2007 the formulae are returning a lot of circular reference warnings, so it may be worth adding a tag for your Excel version. Mine is Excel 2007 but with it the results you want seem achievable as below:
To shorten the formulae and reduce computation I have added in Report C5 "Table" and in D5 =VLOOKUP(B5,TblA,2,1).
I have also inserted a column immediately to the right of ColumnH ("ColD") in 5.2.0 to 5.7.30 and applied Text To Columns on Column H, with - as the delimiter.
I then applied to Report E7 and copied down to E15:
=INDEX(INDIRECT("'"&D$5&"'!A6:A36"),MATCH(C7,INDIRECT("'"&D$5&"'!"&CHAR(ROW()+59)&"6:"&CHAR(ROW()+59)&"36"),0))
and adjusted the 59s to 60s in the last three rows. Such adjustment would not be necessary if ColumnI were moved far enough to the right.
In E13 I changed the match from exact to next higher (final 0 to -1).
For Figures I cheated and changed K23 in 5.2.0 to 5.7.30 to 22 from 21-22, but such banding could, for other columns, be treated in much the same way as I did for ColD.