python bigquery script extract childjob information before query execution - python-3.x

following this https://cloud.google.com/bigquery/docs/samples/bigquery-query-script
we can get the child_job information after the query (parentjob) executed
but how do we extract the child_job query before it executed?
parent_job
"""
-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;
-- Build an array of the top 100 names from the year 2017.
SET top_names = (
SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE year = 2000
);
-- Which names appear as words in Shakespeare's plays?
SELECT
name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
SELECT word
FROM `bigquery-public-data.samples.shakespeare`
);
"""
when this query executed it will create 2 child job:
childjob 1
SELECT STRUCT<ARRAY<STRING>>(( SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100) FROM `bigquery-public-data.usa_names.usa_1910_2013` WHERE year = 2000 )).*;
childjob 2
SELECT name AS shakespeare_name FROM UNNEST(top_names) AS name WHERE name IN ( SELECT word FROM `bigquery-public-data.samples.shakespeare` )
is there any way we can extract the child_job query before the parentjob executed?

Related

Reuse code that contains variables in same script bash

I have a small block of code in bash like below
#!/bin/bash
query="select * from table where id = ${row_id}"
row_id=1
echo "$query"
row_id=3
echo "$query"
row_id=4
echo "$query"
expected output is below
select * from table where id = 1
select * from table where id = 3
select * from table where id = 5
But I am getting nothing as output
I know I am referencing variable before assigning it.
The idea here is to use reusable code instead of writing the same code at many places
How can I achieve what I want
You can create a function and call the function at various place by assign variable to it
#!/bin/bash
# create a function with variable and write your command
# here your command is print the query
my_function_name(){
arg1=$1
echo "select * from table where id = ${arg1}"
}
# assign varaible
row_id=1
# print the ourput of function when above variable is assigned
query=$(my_function_name "$row_id")
echo $query
# assign varaible
row_id=2
# print the ourput of function when above variable is assigned
query=$(my_function_name "$row_id")
echo $query
# assign varaible
row_id=3
# print the ourput of function when above variable is assigned
query=$(my_function_name "$row_id")
echo $query
you should be getting
select * from table where id =
select * from table where id =
select * from table where id =
and as you already mentioned the reason is
I know I am referencing variable before assigning it.
One way to implement this
$ for row_id in 1 3 5;
do
echo "select * from table where id = $row_id";
done
select * from table where id = 1
select * from table where id = 3
select * from table where id = 5
UPDATE
Based on the comment
Here if row_id is a random variable I get as part of another query then
how do I get the correct query as my output
which is different from the posted question, better to define a function
$ getquery() { echo "select * from table where id = $1"; }
$ getquery $RANDOM
select * from table where id = 12907

Azure DevOps API WIQL get maximum max work item id

I'm using WIQL to query for a list of work items in Azure DevOps. However, Azure DevOps will return a maximum of 20000 work items in a single query. If the query results contain more that 20000 items, an error code is returned instead of the work items. To get a list of all work items matching my query, I modified the query to filter my ID and then loop to build of a list of work items with multiple queries. The issue is that there is apparently no way to know when I have reached the end of my loop because I don't know the maximum work item ID in the system.
idlist = []
max_items_per_query = 19000
counter = 0
while not done:
wiql = ("SELECT [System.Id] FROM WorkItems WHERE [System.WorkItemType] IN ('User Story','Bug')
AND [System.AreaPath] UNDER 'mypath' AND System.ID >= count AND System.ID < counter + max_items".format(counter, counter + max_items_per_query))
url = base_url+'wiql'
params = {'api-version':'4.0'}
body = {'query':wiql}
request = session.post(url, auth=('',personal_access_token), params=params, json=body)
response = request.json()
newItems = [w['id'] for w in response['workItems']]
idlist.extend(newItems)
if not newItems:
done = True
This works in most cases but the loop exits prematurely if it encounters a gap in work item ids under the specified area path. Ideally, I could make this work if there was a way to query to the max work item ID in the system and then use this number to exit when the counter reaches that value. However, I can't find a way to do this. Is there a way to query for this number or possibly another solution that will allow me to get a list of all work items matching a specific criteria?
You can use $top parameter to get the last one.
Something like below ( This is just sample - you can extend it to your query)
SELECT [System.Id] FROM workitems WHERE [System.Id] > 0 ORDER BY [System.Id] DESC with $top = 1
This will return the maximum System id - as it arranging it in the descending order.
Suggestion :
You can also change your logic something like below as well :
SELECT [System.Id] FROM workitems WHERE [System.Id] > 0 ORDER BY [System.Id] ASC with $top = 5000
Get the 5000th item System.Id , let's us assume there it is 5029
The next query would be :
SELECT [System.Id] FROM workitems WHERE [System.Id] > 5029 ORDER BY [System.Id] ASC with $top = 5000
You will get the next 5000 items starting from the system id- 5029.
You can loop the above logic.
For the exit case of the loop, you can check the number of items returned as part of the iteration - if it is less than 5000, then that would be the end of the iteration.

ClickHouse- Search within nested fields

I have a nested field named items.productName wherein I want to check if the product name contains a particular string.
SELECT * FROM test WHERE hasAny(items.productName,['Samsung'])
This works only when the product name is Samsung.
I have tried array join
SELECT
*
FROM test
ARRAY JOIN items
WHERE items.productName LIKE '%Samsung%'
This works but it is very slow (~1 sec for 5 million records)
Is there a way to perform like within hasAny?
You can achieve this using arrayFilter function. ClickHouse docs
Query
Select * from test where arrayFilter(x -> x LIKE '%Samsung%', items.productName) != []
If you do not use != [] then you will get an error "DB::Exception: Illegal type Array(String) of column for filter. Must be UInt8 or Nullable(UInt8) or Const variants of them."

How to write a query for reordering elements

I'm working on a code that will have a list of items in a specific order and I'd like to reorder them at will. The setup isn't really that important, but to summarize it, it's a node server with MSSQL database.
For the sake of the demonstration lets say we're discussing forum categories that show in a specific order.
Id | OrderNumber | Name
------------------------
1 | 1 | Rules
2 | 3 | Off-topic
5 | 2 | General
8 | 4 | Global
I've already handled the front end that will allow me to reorder them as I like and the problem is what should happen when I press the save button on the database.
Ideally I'd like to send a JavaScript object containing item IDs in the right order to the API endpoint on the server that will execute a stored procedure. Something like:
Data = {
IDs:"5,2,8,1"
}
Is there a way that I can program a that stored procedure that it's only parameter is the list of Ids but that it can go through that list and do something I can only describe as the following pseudo code:
var Order = 1;
foreach ID in Data.IDs
UPDATE Categories SET OrderNum = Order WHERE Id = ID
Order = Order + 1
My biggest problem is that I'm not very experienced with advanced SQL commands, but that's the only part I need help with, I handled everything else already. Thank you for your help.
Example
Declare #IDs varchar(max) = '5,2,8,1'
Update A
set OrderNumber=B.RetSeq
From YourTable A
Join (
Select RetSeq = row_number() over (order by (select null))
,RetVal = B.n.value('(./text())[1]', 'int')
From ( values (cast('<x>' + replace(#IDs,',','</x><x>')+'</x>' as xml) )) A(xmldata)
Cross Apply xmldata.nodes('x') B(n)
) B on A.ID=B.RetVal
Updated Table
Id OrderNumber Name
1 4 Rules
2 2 Off-topic
5 1 General
8 3 Global

MDX return fiscal PriorMTD date as value

I am trying to create MDX Calculated member which returns prior mtd date.
This is Calculated member I've created:
CREATE MEMBER CURRENTCUBE.[Measures].PriorMTDDate
AS cousin(
[Date].[Fiscal].CurrentMember,
[Date].[Fiscal].CurrentMember.parent.parent.lag(1)
),
VISIBLE = 1 ;
And this is query, but it returns just null:
select {[Measures].[PriorMTDDate]} on 0
from [WH_Cube]
WHERE ( [Date].[Fiscal].[Date].&[2014-09-12T00:00:00] )
Any idea what am I doing wrong?
EDIT: Another example returning null:
WITH MEMBER Measures.x AS
[Date].[Fiscal].CurrentMember
SELECT Measures.x ON 0
FROM [WH_Cube]
WHERE ( [Date].[Fiscal].[Date].&[2014-09-30T00:00:00] )
Does a measure need to be a numeric value?:
CREATE MEMBER CURRENTCUBE.[Measures].PriorMTDDate
AS cousin(
[Date].[Fiscal].CurrentMember,
[Date].[Fiscal].CurrentMember.parent.parent.lag(1)
).MemberValue ,
VISIBLE = 1 ;
.CurrentMember is evaluated at the row level, doesn't look into the slicer. The slicer is a global restriction on the cube, providing a sub-cube domain for your query.
In your query, [Date].[Fiscal].CurrentMember is underfined, as there is nothing on the Rows clause.
Try
select {[Measures].[PriorMTDDate]} on 0,
[Date].[Fiscal].[Date].&[2014-09-12T00:00:00] on 1
from [WH_Cube]

Resources