Kusto query language split # character and take last item - azure

If I have a string for example:
"this.is.a.string.and.I.need.the.last.part"
I am trying to get the last part of the string after the last ".", which in this case is "part"
How to I achieve this?
One way I tried was to split the string on ".", I get a array back, but then I don't know how to retrieve the last item in the array.
| extend ToSplitstring = split("this.is.a.string.and.I.need.the.last.part", ".")
gives me:
["this", "is","a","string","and","I","need","the","last", "part"]
and a second try I have tried this:
| extend ToSubstring = substring(myString, lastindexof(myString, ".")+1)
but Kusto do not have a function of lastindexof.
Anyone with tips?

you can access the last member of the array using a negative index -1.
e.g. this:
print split("this.is.a.string.and.I.need.the.last.part", ".")[-1]
returns a single table, with a single column and a single record, with the value part

You can try the code below, and feel free to change it to meet your need:
let lastIndexof = (input:string, lookup: string) {
indexof(input, lookup, 0, -1, countof(input,lookup))
};
your_table_name
| extend ToSubstring = substring("this.is.a.string.and.I.need.the.last.part", lastIndexof("this.is.a.string.and.I.need.the.last.part", ".")+1)

Related

Python: (partial) matching elements of a list to DataFrame columns, returning entry of a different column

I am a beginner in python and have encountered the following problem: I have a long list of strings (I took 3 now for the example):
ENSEMBL_IDs = ['ENSG00000040608',
'ENSG00000070371',
'ENSG00000070413']
which are partial matches of the data in column 0 of my DataFrame genes_df (first 3 entries shown):
genes_list = (['ENSG00000040608.28', 'RTN4R'],
['ENSG00000070371.91', 'CLTCL1'],
['ENSG00000070413.17', 'DGCR2'])
genes_df = pd.DataFrame(genes_list)
The task I want to perform is conceptually not that difficult: I want to compare each element of ENSEMBL_IDs to genes_df.iloc[:,0] (which are partial matches: each element of ENSEMBL_IDs is contained within column 0 of genes_df, as outlined above). If the element of EMSEMBL_IDs matches the element in genes_df.iloc[:,0] (which it does, apart from the extra numbers after the period ".XX" ), I want to return the "corresponding" value that is stored in the first column of the genes_df Dataframe: the actual gene name, 'RTN4R' as an example.
I want to store these in a list. So, in the end, I would be left with a list like follows:
`genenames = ['RTN4R', 'CLTCL1', 'DGCR2']`
Some info that might be helpful: all of the entries in ENSEMBL_IDs are unique, and all of them are for sure contained in column 0 of genes_df.
I think I am looking for something along the lines of:
`genenames = []
for i in ENSEMBL_IDs:
if i in genes_df.iloc[:,0]:
genenames.append(# corresponding value in genes_df.iloc[:,1])`
I am sorry if the question has been asked before; I kept looking and was not able to find a solution that was applicable to my problem.
Thank you for your help!
Thanks also for the edit, English is not my first language, so the improvements were insightful.
You can get rid of the part after the dot (with str.extract or str.replace) before matching the values with isin:
m = genes_df[0].str.extract('([^.]+)', expand=False).isin(ENSEMBL_IDs)
# or
m = genes_df[0].str.replace('\..*$', '', regex=True).isin(ENSEMBL_IDs)
out = genes_df.loc[m, 1].tolist()
Or use a regex with str.match:
pattern = '|'.join(ENSEMBL_IDs)
m = genes_df[0].str.match(pattern)
out = genes_df.loc[m, 1].tolist()
Output: ['RTN4R', 'CLTCL1', 'DGCR2']

I need an Integer but its a string with a comma

I'm using sqlite3 and trying to get the oid by using the title of the row and then trying to use that oid to update a column in my table.
allOID is a tuple, and when I print it i get this:
>>> <class 'tuple'>
>>> [(1,)]
I'm trying to get the integer out of this tuple but the comma is throwing it off and I can't seem to get it.
Here is all of the code being used currently:
c.execute("""SELECT oid FROM books
WHERE title = :title""",
{
'title': title
})
allOID = c.fetchall()
print(type(allOID[0]))
print(allOID)
c.execute("SELECT * FROM books")
c.execute("""UPDATE books SET
rented = :rented
WHERE oid = :oid""",
{
'rented': rentedVar,
'oid': allOID[0]
})
any help and comments are greatly appreciated!
The comma just indicates that it is a tuple with a single element.
Access it using allOID[0][0].
allOID[0] gets you the tuple out of the list of results, going one level further with allOID[0][0] gets you the first element of the tuple.
For more info, see the docs:
Empty tuples are constructed by an empty pair of parentheses; a tuple with one item is constructed by following a value with a comma (it is not sufficient to enclose a single value in parentheses). Ugly, but effective.

count occurrences of a string in a structure

I have a structure mydata and I need to access one of its fields mydata.myfield, and within that field, access another field mydata.myfield.mysecondfield. In the last field, mydata.myfield.mysecondfield I need to check how many times a particular string ('apple') occurs.
I have tried with:
aaa=unique(mydata.myfield.mysecondfield,'apple')
bbb=cellfun(#(x) sum(ismember(mydata.myfield.mysecondfield,x)),aaa,'un',0)
but I get this error: Attempt to reference field of non-structure array.
The structure contains fields with both strings and numeric values.
The underlying problem may be due to the fact that the structure is a little bit different from how you describe it. Following your question, I created it as follows:
mydata = struct();
mydata.myfield.mysecondfield = {'apple' 'apple' 'orange' 'banana' 'apple' 'kiwi'};
and since I'm not getting the same error you get, I think the underlying types may be slightly mismatching. Anyway, given mydata defined as above, if you change your code as follows, it should work but it will return the count of every unique occurrence within the field:
aaa = unique(mydata.myfield.mysecondfield);
bbb = cellfun(#(x) sum(ismember(mydata.myfield.mysecondfield,x)),aaa,'un',0)
bbb =
4×1 cell array
[3]
[1]
[1]
[1]
If you only want to count the number of apple occurrences, you should use the following approach instead:
apple_count = sum(strcmp(mydata.myfield.mysecondfield,'apple')); % 3

how to use like and substring in where clause in sql

Hope one can help me and explain this query for me,
why the first query return result but the second does not:
EDIT:
first query:
select name from Items where name like '%abc%'
second Query:
select name from Items where name like substring('''%abc%''',1,10)
why the first return result but the second return nothing while
substring('''%abc%''',1,10)='%abc%'
If there are a logic behind that, Is there another approach to do something like the second query,
my porpuse is to transform a string like '''abc''' to 'abc' in order to use like statement,
You can concatenate strings to form your LIKE string. To trim the first 3 and last 3 characters from a string use the SUBSTRING and LEN functions. The following example assumes your match string is called #input and starts and ends with 3 quote marks that need to be removed to find a match:
select name from Items where name like '%' + SUBSTRING(#input, 3, LEN(#input) - 4) + '%'

Find index of a specific character in a string then parse the string

I have strings which looks like this [NAME LASTNAME/NAME.LAST#emailaddress/123456678]. What I want to do is parse strings which have the same format as shown above so I only get NAME LASTNAME. My psuedo idea is find the index of the first instance of /, then strip from index 1 to that index of / we found. I want this as a VBScript.
Your way should work. You can also Split() your string on / and just grab the first element of the resulting array:
Const SOME_STRING = "John Doe/John.Doe#example.com/12345678"
WScript.Echo Split(SOME_STRING, "/")(0)
Output:
John Doe
Edit, with respect to comments.
If your string contains the [, you can still Split(). Just use Mid() to grab the first element starting at character position 2:
Const SOME_STRING = "[John Doe/John.Doe#example.com/12345678]"
WScript.Echo Mid(Split(SOME_STRING, "/")(0), 2)
Your idea is good here, you should also need to grab index for "[".This will make script robust and flexible here.Below code will always return strings placed between first occurrence of "[" and "/".
var = "[John Doe/John.Doe#example.com/12345678]"
WScript.Echo Mid(var, (InStr(var,"[")+1),InStr(var,"/")-InStr(var,"[")-1)

Resources