I have a table as follows:
ID Start End
AB 001 020
VG 004 098
I want to output a single row of ID series as follows:
ID2
AB001
AB002
AB003
...
AB020
VG001
...
VG097
VG098
I am trying to do this with Power Query in Excel as I cannot use R (the tool will be used by another person without access to R).
I am trying Table.InsertRows and Table.RepeatRows after transposing the table. But I am so far unable to use the Start/End values in my query (the number of IDs may vary) or even incrementing the values. I am quite a noob in this and to this day have worked with only minor manipulations of the GUI functions. Any detailed answer will be highly appreciated.
Thank you for your efforts in advance.
Try this - it generates a list from Start - End for each row, applies the ID prefix, then combines the output:
let
ListFunction = (Start, End, Prefix) =>
let
NewList = List.Transform(List.Numbers(Start, End - Start + 1), each Prefix & Number.ToText(_, "000"))
in
NewList,
Source = #table(type table [#"ID"=text, #"Start"=text, #"End"=text],{{"AB","001","020"},{"VG","004","098"}}),
#"Make Lists" = Table.AddColumn(Source, "NewList", each ListFunction(Number.From([Start]), Number.From([End]), [ID])),
#"Combine Lists" = Table.FromList(List.Combine(#"Make Lists"[NewList]), Splitter.SplitByNothing(),{"ID2"})
in
#"Combine Lists"
Related
Summary of problem:
I have a PowerQuery Table in Excel that contains 13 columns. The 13th Column is a custom column "Task Start Week Number". I want the PowerQuery to apply a formula to each of the rows generated for this Query. The formula is as follows:
=IFS(AND('Program Dates'!$B$2<WEEKNUM(New_Items_to_Save[Start Date]),
WEEKNUM(New_Items_to_Save[Start Date])<54),
'Program Dates'!$G$2-('Program Dates'!$D$2-(-53+WEEKNUM(New_Items_to_Save[Start Date]))),
WEEKNUM(New_Items_to_Save[Start Date])<'Program Dates'!$B$2,
'Program Dates'!$G$2-('Program Dates'!$D$2-(-53+WEEKNUM(New_Items_to_Save[Start Date])))+53)
What I've done here is reference a cell which contains the formula, that way I can just run the GetValue() function for a named range. I can't get this to work and I don't know what I'm doing wrong.
Thank you in advance for your help!
Context:
This is the query table I need to add the calculation to.
The last column is the custom column, and those values should be calculated using the following cells:
This is the source of the other info needed to calculate the week number of the program, with reference arrows shown.
Note: The dates referenced in the function have already been converted using the WEEKNUM() operation. I am comparing Week# to Week#, not Date to Week#
Function Logic:
AND: if the date falls within the range of the current year ie. week# is less than 54, but after the start of the program, then perform this calc.
IFS: otherwise, if week# is before the end of the program ie. 2023, then perform this calculation.
Edit:
Here is the PowerQuery function I want to call for each of the new cells in this custom column:
Parameter2 = Date.WeekOfYear(StartWeek)
let
GetWeek = ()
if GetValue("Start_Week") < Parameter2 < 54
then (GetValue("Program_Duration") - GetValue("End_Week") + 53 - Parameter2))
else
(GetValue("Program_Duration") - GetValue("End_Week") + 53 - Parameter2 +53))
in
GetWeek
I don't know if I need the let statement or if I should just put it in a function
f(x) => [equation]
and then call "...each f([column name])" in power query?
I think that there are actually three different parts to your question, and maybe your confusion is coming from combining them all together.
The way I see it is in these parts:
How to create a custom function.
How to apply a function to a new column.
How to apply a function to an existing column.
How to create a custom function
There are two main ways to create a custom function in Power Query:
Using the UI (follow steps here):
Step
Description
Image
1
Write your query
2
Parameterise your query
3
Create your function
Using only code (follow steps here):
Example to filter a table:
let fun_FilterTable = (tbl_InputTable as table, txt_FilterValue as text) as table =>
let
Source = tbl_InputTable,
Filter = Table.SelectRows(DayCount, each Text.Contains([Column], txt_FilterValue))
in
Filter
in
fun_FilterTable
Example to check if one string contains another:
let fun_CheckStringContains = (txt_String as text, txt_Check as text) as nullable logical =>
let
Source = txt_String,
Check = Text.Contains(Source, txt_Check)
in
Check
in
fun_CheckStringContains
More resources:
Using custom functions
Custom Functions Made Easy in Power BI Desktop
PowerQuery best practices
DataFlow best practices
How to apply a function to a new column
Also has two different ways to achieve:
Custom Column (follow steps here):
Step
Description
Image
1
Create custom column
2
Add function
Custom Function (follow steps here):
Step
Description
Image
1
Invoke custom function
Sources:
Add a custom column
Using custom functions
Custom Functions Made Easy in Power BI Desktop
How to apply a function to an existing column
Also has two different ways to achieve (unfortunately, only possible with pure code):
Using Transformation:
Example to uppercase an entire column:
let
Source = Table,
#"Uppercased text" = Table.TransformColumns(Source, {{"Column", each Text.Upper(_), type nullable text}})
in
#"Uppercased text"
Example to add a prefix to all rows in one column:
let
Source = Table,
#"Added prefix" = Table.TransformColumns(Source, {{"Column", each "test_" & _, type text}})
in
#"Added prefix"
Example to coerce column to date in Australian format:
let
Source = Table,
#"Fix date" = Table.TransformColumns(Source, {{"DateColumn", each Date.From(_, "en-AU"), type date}})
in
#"Fix date"
Using Replacement
Example to replace some text:
let
Source = Table,
#"Replaced value" = Table.ReplaceValue(Source, "Admin", "Administrator", Replacer.ReplaceText, {"Column"})
in
#"Replaced value"
Example to replace with values from another column
let
Source = Table,
#"Replaced value" = Table.ReplaceValue(Source, each [FixThisColumn], each [OtherColumn], Replacer.ReplaceText, {"FixThisColumn"})
in
#"Replaced value"
Your Specific Problem
Without some dummy data to use, I have created some here. Please note, in future, please provide some data in a minimum reproducible example (see here), so that we can easily recreate the scenario from your example.
Data:
ID
ProgramStartDate
ProgramEndDate
1
1/Jan/2020
1/Dec/2021
2
1/Jan/2022
1/Mar/2023
3
1/Mar/2022
1/Dec/2022
4
1/Sep/2021
1/Dec/2023
5
1/Jan/2023
1/Dec/2023
I think that you should be using a combination of the PowerQuery in-build date functions (see here) and some of the PowerQuery conditional processes (see here).
My code would look something like this:
let
Source = Table.FromColumns({{1,2,3,4,5},{"1/Jan/2020","1/Jan/2022","1/Mar/2022","1/Sep/2021","1/Jan/2023"},{"1/Dec/2021","1/Mar/2023","1/Dec/2022","1/Dec/2023","1/Dec/2023"}},{"ID","ProgramStartDate","ProgramEndDate"}),
fix_Types = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"ProgramStartDate", type date}, {"ProgramEndDate", type date}}),
add_Today = Table.AddColumn(fix_Types, "DateToday", each Date.From(DateTime.LocalNow()), type date),
add_CheckCurrentYear = Table.AddColumn(add_Today, "IsInCurrentYear", each Date.IsInCurrentYear([DateToday]), type logical),
add_CheckProgramRunning = Table.AddColumn(add_CheckCurrentYear, "ProgramIsCurrent", each [DateToday]>[ProgramStartDate] and [DateToday]<[ProgramEndDate], type logical),
add_ConditionalCheck = Table.AddColumn(add_CheckProgramRunning, "DoSomething", each if [IsInCurrentYear] and [ProgramIsCurrent] then "Do Something" else null, type text)
in
add_ConditionalCheck
And the final output would look something like this:
ID
ProgramStartDate
ProgramEndDate
DateToday
IsInCurrentYear
ProgramIsCurrent
DoSomething
1
1/01/2020
1/12/2021
22/12/2022
TRUE
FALSE
null
2
1/01/2022
1/03/2023
22/12/2022
TRUE
TRUE
Do Something
3
1/03/2022
1/12/2022
22/12/2022
TRUE
FALSE
null
4
1/09/2021
1/12/2023
22/12/2022
TRUE
TRUE
Do Something
5
1/01/2023
1/12/2023
22/12/2022
TRUE
FALSE
null
This should help you work towards resolving your issue.
I'm trying to find the amount of words in this table:
Download Table here: http://www.mediafire.com/file/m81vtdo6bdd7bw8/Table_RandomInfoMiddle.mat/file
Words are indicated by the "Type" criteria, being "letters". The key thing to notice is that not everything in the table is a word, and that the entry "" registers as a word. In other words I need to determine the amount of words, by only counting "letters", except if it is a "missing".
Here is my attempt (Yet unsuccessful - Notice the two mentions of "Problem area"):
for col=1:size(Table_RandomInfoMiddle,2)
column_name = sprintf('Words count for column %d',col);
MiddleWordsType_table.(column_name) = nnz(ismember(Table_RandomInfoMiddle(:,col).Variables,{'letters'}));
MiddleWordsExclusionType_table.(column_name) = nnz(ismember(Table_RandomInfoMiddle(:,col).Variables,{'<missing>'})); %Problem area
end
%Call data from table
MiddleWordsType = table2array(MiddleWordsType_table);
MiddleWordsExclusionType = table2array(MiddleWordsExclusionType_table); %Problem area
%Take out zeros where "Type" was
MiddleWordsTotal_Nr = MiddleWordsType(MiddleWordsType~=0);
MiddleWordsExclusionTotal_Nr = MiddleWordsExclusionType(MiddleWordsExclusionType~=0);
%Final answer
FinalMiddleWordsTotal_Nr = MiddleWordsTotal_Nr-MiddleWordsExclusionTotal_Nr;
Any help will be appreciated. Thank you!
You can get the unique values from column 1 when column 2 satisfies some condition using
MiddleWordsType = numel( unique( ...
Table_RandomInfoMiddle{ismember(Table_RandomInfoMiddle{:,2}, 'letters'), 1} ) );
<missing> is a keyword in a categorical array, not literally the string "<missing>". That's why it appears blue and italicised in the workspace. If you want to check specifically for missing values, you can use this instead of ismember:
ismissing( Table_RandomInfoMiddle{:,1} )
I posted question previously as "using “.between” for string values not working in python" and I was not clear enough, but I could not edit, so I am reposting with clarity here.
I have a Data Frame. In [0,61] I have string. In [0,69] I have a string. I want to slice all the data in cells [0,62:68] between these two and merge them, and paste the result into [1,61]. Subsequently, [0,62:68] will be blank, but that is not important.
However, I have several hundred documents, and I want to write a script that executes on all of them. The strings in [0,61] and [0,69] are always present in all the documents, but along different locations in that column. So I tried using:
For_Paste = df[0][df[0].between('DESCRIPTION OF WORK / STATEMENT OF WORK', 'ADDITIONAL REQUIREMENTS / SUPPORTING DOCUMENTATION', inclusive = False)]
But the output I get is: Series([], Name: 0, dtype: object)
I was expecting a list or array with the desired data that I could merge and paste. Thanks.
enter image description here
If you want to select the rows between two indices (say idx_start and idx_end), excluding these two rows) on column col of the dataframe df, you will want to use
df.loc[idx_start + 1 : idx_end, col]
To find the first index matching a string s, use
idx = df.index[df[col] == s][0]
So for your case, to return a Series of the rows between these two indices, try the following:
start_string = 'DESCRIPTION OF WORK / STATEMENT OF WORK'
end_string = 'ADDITIONAL REQUIREMENTS / SUPPORTING DOCUMENTATION'
idx_start = df.index[df[0] == start_string][0]
idx_end = df.index[df[0] == end_string][0]
For_Paste = df.loc[idx_start + 1 : idx_end, 0]
I am having trouble figuring out how to extract specific text within a string. My dataset has been pulled from de-identified electronic health records, and contains a list of every medication that our patients have been prescribed. I am, however, only concerned with a specific list of medications, which I have in another table. Within each cell is the name of the medication, dose, and form (Tabs, Caps, etc.) [See image]. Much of this information is not important for my analysis though, and I only need to extract the medication names that match my list. It might also be useful to extract the first word from each string, as it is (in most cases) the name of the medication.
I have examined a number of different methods of pulling substrings, but haven't quite found something that meets my needs. Any help would be greatly appreciated.
Thanks.
Data DRUGS;
infile datalines flowover;
length drug1-drug69 $20;
array drug[69];
input (drug1-drug69)($);
datalines;
AMITRIPTYLINE
AMOXAPINE
BUPROPION
CITALOPRAM
CLOMIPRAMINE
DESIPRAMINE
DOXEPIN
ESCITALOPRAM
FLUOXETINE
FLUVOXAMINE
IMIPRAMINE
ISOCARBOXAZID
MAPROTILINE
MIRTAZAPINE
NEFAZODONE
NORTRIPTYLINE
PAROXETINE
PHENELZINE
PROTRIPTYLINE
SERTRALINE
TRANYLCYPROMINE
TRAZODONE
TRIMIPRAMINE
VENLAFAXINE
AMITRIP
ELEVIL
ENDEP
LEVATE
ADISEN
AMOLIFE
AMOXAN
AMOXAPINE
DEFANYL
OXAMINE
OXCAP
WELLBUTRIN
BUPROBAN
APLENZIN
BUDEPRION
ZYBAN
CELEXA
ANAFRANIL
NORPRAMIN
SILENOR
PRUDOXIN
ZONALON
LEXAPRO
PROZAC
SARAFEM
LUVOX
TOFRANIL
TOFRANIL-PM
MARPLAN
LUDIOMIL
REMERON
REMERONSOLTAB
PAMELOR
PAXIL
PEXEVA
BRISDELLE
NARDIL
VIVACTIL
ZOLOFT
PARNATE
OLEPTRO
SURMONTIL
EFFEXOR
DESVENLAFAXINE
PRISTIQ
;;;;
run;
Data DM4_;
if _n_=1 then set DRUGS;
array drug[69];
set DM4;
do _i = 1 to countw(Description,' ().,');
_med = scan(Description,_i,' ().,');
_whichmed = whichc(_med, of drug[*]);
if _whichmed > 0 then leave;
end;
run;
Data DM_Meds (drop = drug1-drug69 _i _med _whichmed);
Set DM4_;
IF _whichmed > 0 then anti = _med;
else anti = ' ';
run;
This is a fairly common problem with a bunch of possible solutions depending on your needs.
The simplest answer is to create an array, assuming you have a smallish number of medicines. This isn't necessarily the fastest solution, but it would work fairly well and is simple to construct. Just get your drug list into a dataset, transpose it to horizontal (one row with lots of meds), then load it up this way. You iterate over the words in the name of the medicine and see if any of them are in the medicine list - if they are, then bingo, you have your drug! In real use of course drop the drug: variables afterwards.
This works a bit better than the inverse (searching each drug to see if it's in the medicine name) since usually there are more words in the drug list than in the medicine name. The hash solution might be faster, if you're comfortable with hashes (load the drug list into a hash table then use find() to do the same as what whichc is doing here).
data have;
input #1 medname $50.;
datalines;
PROVIGIL OR
ENSURE HIGH PROTEIN OR LIQD
BENADRYL 25 MG OR CAPS
ECOTRIN LOW STRENGTH 81 MG OR TBEC
SPIRONOLACTONE 25 MG PO TABS
NORVASC 5 MG OR TABS
FLUOXETINE HCL 25MG
IBUPROFEN 200MG
NEFAZODONE TABS OR CAPS 20MG
PAXIL (PAROXETINE HCL) 25MG
;;;;
run;
data drugs;
infile datalines flowover;
length drug1-drug19 $20;
array drug[19];
input (drug1-drug19) ($);
datalines;
AMITRIPTYLINE
AMOXAPINE
BUPROPION
CITALOPRAM
CLOMIPRAMINE
DESIPRAMINE
OXEPIN
ESCITALOPRAM
FLUOXETINE
FLUVOXAMINE
IMIPRAMINE
ISOCARBOXAZID
MAPROTILINE
MIRTAZAPINE
NEFAZODONE
NORTRIPTYLINE
PAROXETINE
PHENELZINE
PROTRIPTYLINE
;;;;
run;
data want;
if _n_ = 1 then set drugs;
array drug[19];
set have;
do _i = 1 to countw(medname,' ().,');
_medword = scan(medname,_i,' ().,');
_whichmed = whichc(_medword, of drug[*]);
if _whichmed > 0 then leave;
end;
run;
This should be an easy task for PROC SQL.
Let's say you have patient information in table A and drug names in table B (long format, not the wide format you gave). Here is the code filtering table A rows into table C where description in A contains drug name in B.
PROC SQL;
CREATE TABLE C AS SELECT DISTINCT *
FROM A LEFT JOIN B
ON UPCASE(A.description) CONTAINS UPCASE(B.drug);
QUIT;
I have two excel functions that I am trying to convert into R:
numberShares
=IF(AND(N213="BOH",N212="BOH")=TRUE,P212,IF(AND(N213="BOH",N212="Sell")=TRUE,ROUNDDOWN(Q212/C213,0),0))
marketValue
=IF(AND(N212="BOH",N213="BOH")=TRUE,C213*P212,IF(AND(N212="Sell",N213="Sell")=TRUE,Q212,IF(AND(N212="BOH",N213="Sell")=TRUE,P212*C213,IF(AND(N212="Sell",N213="BOH")=TRUE,Q212))))
The cells that are referenced include:
c = closing price of a stock
n = position values of either "buy or hold" or "sell"
p = number of Shares
q = market value, assuming $10,000 initial equity (number of shares*closing price)
and the tops of the two output columns that i am trying to recreate look like this:
output
So far, in R I have constructed a dataframe with the necessary four columns:
data.frame
I just don't know how to write the functions that will populate the number of shares and market value columns. For loops? ifelse?
Again, thank you!!
Covert the AND()'s to infix "&"; the "=" to "=="; and the IF's to ifelse() and you are halfway there. The problem will be in converting your cell references to array or matrix references, and for that task we would have needed a better description of the data layout:
numberShares <-
ifelse( N213=="BOH" & N212=="BOH",
#Perhaps PosVal[213] == "BOH" & PosVal[212] == "BOH"
# ... and very possibly the 213 should be 213:240 and the 212 should be 212:239
P212,
ifelse( N213=="BOH" & N212=="Sell" ,
round(Q212/C213, digits=0),
0))
(You seem to be returning incommensurate values which seems preeety questionable.) Assuming this is correct code despite my misgivings the next translation involves apply the same substitutions in this structure (although you seem to be missing an else-consequent in the last IF function:
marketValue <-
IF( AND(N212="BOH", N213="BOH")=TRUE,
C213*P212,
IF(AND(N212="Sell",N213="Sell")=TRUE,
Q212,
IF( AND(N212="BOH",N213="Sell")=TRUE,
P212*C213,
IF(AND(N212="Sell",N213="BOH")=TRUE,
Q212))))
(Your testing for AND( .,.)=TRUE is I believe unnecessary in Excel and certainly unnecessary in R.)