KDB: How to match string of possible dates with rows of table? - string

I essentially just want to create a date column that represents the date of the filename.
My table filesInDir is just a single column and 4 rows called filepath:
":..\..\code\products\Q\ExtData\CIBC\availability\Global\EquityOnly\daily\bnyMellon_inventory\push_list_20190314_040253_Equity.csv"
":..\..\code\products\Q\ExtData\CIBC\availability\Global\EquityOnly\daily\bnyMellon_inventory\push_list_20190314_040306_Equity.csv"
":..\..\code\products\Q\ExtData\CIBC\availability\Global\EquityOnly\daily\bnyMellon_inventory\push_list_20190311_040321_Bond.csv"
":..\..\code\products\Q\ExtData\CIBC\availability\Global\EquityOnly\daily\bnyMellon_inventory\push_list_20190312_999999_Cash.csv"
I also have a list of possible dates, 2019.03.12 2019.03.11 2019.03.14. How can match the list of dates with the rows of the table above so that I can get a new column that specifies that date value that matched with the filepath string.

If all your file paths follow the same format as your example ones,you can create a date column pretty easily doing:
update date:"D"$8#'103_'filePaths from filesInDir
Then match with your dates using this column.

This is how you parse the filePath to get the date, Please note as "\" is the escape sign so you need to write as "\\" instead that or directly retrieve the string from system command.
Created table
filesInDir:([]filePaths:(":..\\..\\code\\products\\Q\\ExtData\\CIBC\\availability\\Global\\EquityOnly\\daily\\bnyMellon_inventory\\push_list_20190314_040253_Equity.csv";
":..\\..\\code\\products\\Q\\ExtData\\CIBC\\availability\\Global\\EquityOnly\\daily\\bnyMellon_inventory\\push_list_20190314_040306_Equity.csv";
":..\\..\\code\\products\\Q\\ExtData\\CIBC\\availability\\Global\\EquityOnly\\daily\\bnyMellon_inventory\\push_list_20190311_040321_Bond.csv";
":..\\..\\code\\products\\Q\\ExtData\\CIBC\\availability\\Global\\EquityOnly\\daily\\bnyMellon_inventory\\push_list_20190312_999999_Cash.csv"))
Create the date column
update date:{"D"$("_"vs last "\\" vs x)[2]} each filePaths from `filesInDir
I also have a list of possible dates, 2019.03.12 2019.03.11 2019.03.14. How can match the list of dates with the rows of the table above so that I can get a new column that specifies that date value that matched with the filepath string.
Can you give some example illustration on that? Not clear about what you intend and expect to see
If you just want let say a "Flag" column indicating whether the date in the records match with the dateRange, you can simply use a in to match that.
dateRange:2019.03.12 2019.03.11 2019.03.14
update match:date in dateRange from `filesInDir
This will be the output:

Another approach using 0:
update date:raze(" D";"_")0:filePaths from filesInDir
It is dependent on the occurrences of _ in your filepaths

Related

pyspark how to pass the values dynamically to countDistinct

I have a csv file that contains (FileName,ColumnName,Rule and RuleDetails) as headers.
I have multiple rules for column Rule like NotNull,Max,Min etc.
For the rule "Unique" there can be multiple columns, I need to pass those columns and perform countDistinct.
If I pass the values dynamically instead of hardcoding I'm getting below error
AnalysisException: Column '`"SITEID", "ASSETNUM"`' does not exist. Did you mean one of the following? [spark_catalog.maximo_dq.Assets_new.ASSETNUM, spark_catalog.maximo_dq.Assets_new.HasLD, spark_catalog.maximo_dq.Assets_new.SITEID, spark_catalog.maximo_dq.Assets_new.Status, spark_catalog.maximo_dq.Assets_new.SerialNumber, spark_catalog.maximo_dq.Assets_new.Description, spark_catalog.maximo_dq.Assets_new.InstallDate, spark_catalog.maximo_dq.Assets_new.Classification, spark_catalog.maximo_dq.Assets_new.LongDescription];
Similarly how to get the count of records which are not matching the specified date format.
I need to take check how many records in INSTALLDATE are not in the format of RuleDetails
Use tuple unpacking to pass the values
UNIQUUECOLSString = ['a','b','c'] #keep it in an array
df.select(countDistinct( *UNIQUUECOLSString ))

Power Query: Split table column with multiple cells in the same row

I have a SharePoint list as a datasource in Power Query.
It has a "AttachmentFiles" column, that is a table, in that table i want the values from the column "ServerRelativeURL".
I want to split that column so each value in "ServerRelativeURL"gets its own column.
I can get the values if i use the expand table function, but it will split it into multiple rows, I want to keep it in one row.
I only want one row per unique ID.
Example:
I can live with a fixed number of columns as there are usually no more than 3 attachments per ID.
I'm thinking that I can add a custom column that refers to "AttachmentFiles ServerRelativeURL Value(1)" but I don't know how.
Can anybody help?
Try this code:
let
fn = (x)=> {x, #table({"ServerRelativeUrl"},List.FirstN(List.Zip({{"a".."z"}}), x*2))},
Source = #table({"id", "AttachmentFiles"},{fn(2),fn(3),fn(1)}),
replace = Table.ReplaceValue(Source,0,0,(a,b,c)=>a[ServerRelativeUrl],{"AttachmentFiles"}),
cols = List.Transform({1..List.Max(List.Transform(replace[AttachmentFiles], List.Count))}, each "url"&Text.From(_)),
split = Table.SplitColumn(replace, "AttachmentFiles", (x)=>List.Transform({0..List.Count(x)-1}, each x{_}), cols)
in
split
I manged to solve it myself.
I added 3 custom columns like this
CustomColumn1: [AttachmentFiles]{0}
CustomColumn2: [AttachmentFiles]{1}
CustomColumn3: [AttachmentFiles]{2}
And expanded them with only the "ServerRelativeURL" selected.
It would be nice to have a dynamic solution. But this will work fine for now.

Get last item with date range and name filter in google sheets

I have the below set of records in Google Sheets. I would like to filter the rows with specific name and date range. Once I have the filtered data, I would like to fetch the last row's final amount cell data.
Ex: I would like to fetch final amount as 300 if my date(dd/mm/yyyy) range is 01/01/206 to 11/06/2016 and Name selection is 'Sandeep'.
As I have experience SQLite db, I have inserted the same records in DB and got the expected result using the below query.
select Final from MyTable where Date in (select max(Date) from MyTable WHERE Date BETWEEN '01/01/2016' AND '11/06/2016' and name = "Sandeep")
But I am not getting idea how to use multiple select statements in google sheets. It is ok for me to get result using any other way. So please help me to get the result as explained above.
= QUERY (A1:E50,"Select F where A > date '2016-1-1' and A < date '2016-6-11' and B ='Sandeep' order by A desc limit 1")
Use Column IDs A,B,C instead of name, income. Multiple columns can be given in a single Select clause separated by a ,
Dates in where clause should be written in yyyy-mm-dd format only(regardless of the format of dates in actual column)
See if this works
=index(E:E, max(filter(row(A:A), A:A>date(2016, 1, 1), A:A<date(2016, 6, 11), B:B="Sandeep")))
If you want to include start and end date, change > to >= and < to <=.

Excel Power Query - from web with dynamic worksheet cell value

We have a spreadsheet that gets updated monthly, which queries some data from our server.
The query url looks like this:
http://example.com/?2016-01-31
The returned data is in a json format, like below:
{"CID":"1160","date":"2016-01-31","rate":{"USD":1.22}}
We only need the value of 1.22 from the above and I can get that inserted into the worksheet with no problem.
My questions:
1. How to use a cell value [contain the date] to pass the date parameter [2016-01-31] in the query and displays the result in the cell next to it.
2. There's a long list of dates in a column, can this query be filled down automatically per each date?
3. When I load the query result to the worksheet, it always load in pairs. [taking up two cells, one says "Value", the other contains the value which is "1.22" in my case]. Ideally I would only need "1.22", not the title, can this be removed? [Del won't work, will give you a "Column 1" instead, or you have to hide the entire row which will mess up with the layout].
I know this is a lot to ask but I've tried a lot of search and reading in the last few days and I have to say the M language beats me.
Thanks in advance.
Convert your Web.Contents() request into a function:
let
myFunct = ( param as date ) => let
x = Web.Contents(.... & Date.ToText(date) & ....)
in
x
in
myFunct
Reference your data request function from a new query, include any transformations you need (in this case JSON.Document, table expansions, remove extraneous data. Feel free to delete all the extra data here, including columns that just contain the label 'value'.
(assuming your table of domain values already exists) add a custom column like
=Expand(myFunct( [someparameter] ))
edit: got home and got into my bookmarks. Here is a more detailed reference for what you are looking to do: http://datachix.com/2014/05/22/power-query-functions-some-scenarios/
For a table - Add column where you get data and parse JSON
let
tt=#table(
{"date"},{
{"2017-01-01"},
{"2017-01-02"},
{"2017-01-03"}
}),
add_col = Table.AddColumn(tt, "USD", each Json.Document(Web.Contents("http://example.com/?date="&[date]))[rate][USD])
in
add_col
If you need only one value
Json.Document(Web.Contents("http://example.com/?date="&YOUR_DATE_STRING))[rate][USD]

Multi-condition lookup with dates and text

I have been melting my brain trying to work out the formula i need for a multiple conditional lookup.
I have two data sets, one is job data and the other is contract data.
The job data contains customer name, location of job and date of job. I need to find out if the job was contracted when it took place, and if it was return a value from column N in the contract data.
The problem comes when i try to use the date ranges, as there are frequently more than one contract per customer.
So for example, in my job data:-
CUSTOMER | LOCATION | JOB DATE
Cust A | Port A | 01/01/2014
Cust A | Port B | 01/02/2014
Customer A had a contract in port B that expired on 21st Feb 2014, so here i would want it to return the value from column N in my contract data as the job was under contract.
Customer A did not have a contract in port A at the time of the job, so i would want it to return 'no contract'.
Contract data has columns containing customer name, port name, and a start and end date value, as well as my lookup category.
I think i need to be using index / match but i can't seem to get them to work with my date ranges. Is there another type of lookup i can use to get this to work?
Please help, I'm losing the plot!
Thanks :)
You can use two approaches here:
In both result and source tables make a helper column that concatenates all three values like this: =A2&B2&C2. So that you get something like 'Cust APort A01/01/2014'. That is, you get a unique value by which you can identify the row. You can add delimiter if needed: =A2&"|"&B2&"|"&C2. Then you can perform VLOOKUP by this value.
You can add a helper column with row number (1, 2, 3 ...) in source table. Then you can use =SUMIFS(<row_number_column>,<source_condition_column_1>,<condition_1>,<source_condition_column_2>,<condition_2>,...) to return the row number of source table that matches all three conditions. You can use this row number to perform INDEX or whatever is needed. But BE CAREFUL: check that there are only unique combinations of all three columns in source table, otherwise this approach may return wrong results. I.e. if matching conditions are met in rows 3 and 7 it will return 10 which is completely wrong.

Resources