Join files from different pathes in USQL

Join files from different pathes in USQL - azure

My data is saved on a daily basis in the following path: "/Data/{year}/{month}/{day}/mydata.json"
So, e.g. "/Data/2018/10/1/mydata.json" , "/Data/2018/10/2/mydata.json", "/Data/2018/11/1/mydata.json", "/Data/2018/12/5/mydata.json", etc.
I would like to combine all the months and days in one file using USQL. Is it possible to do it in an easy way without mentioning each path separately (otherwise it's crazy to do it for all the days of the year)?
At the moment I use this:
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
#a =
EXTRACT EventCategory string
, EventAction string
, EventLabel string
FROM "/Data/2018/10/2/mydata.json"
USING new JsonExtractor()
UNION ALL
EXTRACT EventCategory string
, EventAction string
, EventLabel string
FROM "/Data/2018/11/2/mydata.json"
USING new JsonExtractor();
OUTPUT #a
TO "/Output/mydata.Csv"
USING Outputters.Csv(outputHeader:true);

I would like to combine all the months and days in one file using USQL. Is it possible to do it in an easy way without mentioning each path separately (otherwise it's crazy to do it for all the days of the year)?
Yes! You can do this using patterns, a basic example:
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
DECLARE #input string = "/Data/2018/{*}/2/mydata.json";
USING Microsoft.Analytics.Samples.Formats.Json;
#a =
EXTRACT EventCategory string
, EventAction string
, EventLabel string
FROM #input
USING new JsonExtractor()
OUTPUT #a
TO "/Output/mydata.Csv"
USING Outputters.Csv(outputHeader:true);
this will load all data of the second day of the month.
Other variations:
DECLARE #input string = "/Data/2018/{*}/{*}/mydata.json"; will process all files of 2018
DECLARE #input string = "/Data/{*}/12/{*}/mydata.json"; will process all files generated in the 12th month of all years
If you want to retrieve the file parts to get the actual date parts you can do:
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
#a =
EXTRACT EventCategory string
, EventAction string
, EventLabel string
, date DateTime
FROM "/Data/{date:yyyy}/{date:MM}/{date:dd}/mydata.json"
USING new JsonExtractor()
OUTPUT #a
TO "/Output/mydata.Csv"
USING Outputters.Csv(outputHeader:true);
As you can see there is now an additional column date of type DateTime that can be used in the query and/or included in the output.

Related

How to stop Python Pandas from converting specific column from int to float

trying to out put a dataframe into txt file (for a feed). a few specific columns are getting automatically converted into float instead of int as intented.
how can i specific those columns to use int as dtype?
i tried to output the whole dataframd as string and that did not work.
the columns i would like to specify are named [CID1] and [CID2]
data = pd.read_sql(sql,conn)
data = data.astype(str)
data.to_csv('data_feed_feed.txt', sep ='\t',index=True)

Based on the code you provided, you turn all of your data to strings just before export.
Thus, you either need to turn some cols back to desired type, such as:
data["CID1"] = data["CID1"].astype(int)
or not convert them in the first place.
It is not clear from what you provided why you'd have issues with ints being converted to floats.
this post provides heaps of info:
stackoverflow.com/a/28648923/9249533

Concatinating strings to a single string in python

I have a code that save some string to some text (work correctly). I want to concatinate the string to one string and save it to one file in one time.
`np.savetxt(f'{PTH}/pointwise_layer_{number_layer}_channel_{ch}.txt',np.transpose(np.squeeze(layers[number_layer].weights[1][:,:,:,ch],0)))`
I have wrote this code, but it does not work:
pointStr = np.squeeze(layers[number_layer].weights[1][:,:,:,ch],0) + pointStr
Do you know the correct code?

How to specify a Date Range in U-SQL Extract statement

My input files are in a month directory, with the naming pattern
_.csv
I can create an extract to grab all files
#InputFile_Daily + "{*}.json"
However I now need be able to create a file set of a specific range of dates, eg Today -> Today-3
Is there a way to specify this kind of range, be it regex or other within the U-SQL extract? or as I've seen elsewhere, extract all data and then filter the result down to the range I'm interested in. This is not ideal as cost is a factor

In U-SQL you extract all files like you said (#InputFile_Daily + "{*}.json") and then in the 1st select you apply your date filter, and it internally only extracts the needed data.
Example:
DECLARE #input string = #"/temp/stackoverflow.json";
// Read input file
#inputData =
EXTRACT Account string,
Alias string,
Company string,
date DateTime,
Json string
FROM #input
USING Extractors.Text(delimiter : '\n', quoting : false);
#extractedFields =
SELECT Account,
Alias,
Company,
date,
Json
FROM #inputData
WHERE #referenceDate == DateTime.MinValue OR (date >= #dateFrom AND date <= #dateTo);
If you have 1 million files, and your filter is for most recent files, for example 5 files, it will extract only 5 files. You can confirm this then on the u-sql job graph how many files have been extracted.

Filter a String which holds a TimeStamp - Kotlin

I have written a function which generate a TimeStamp and convert it to a String using toString(). I want to remove the whitespaces and other special character from that string. Is there is any efficient way to do it ?
This is a function which generate ID using TimeStamp , since timestamp will be unique (Note : When IDs are generated at different M.Sec)
fun autoGenerateID() : String = Timestamp(java.util.Date().getTime()).toString()
When I call the function, It should return :
20190612121912463
But the produced result was :
2019-06-12 12:19:12.463

I would suggest dropping the use of Timestamp class. it is outdated and anything it provides can be achieved in easier ways.
For your use case you could just use the SimpleDateFormat. It would look like this:
SimpleDateFormat("yyyyMMddHHmmssSSS").format(Date())

Arithmetic operation in jmeter Groovy

I created a script that receives a variable from another sampler.
I put the variable in a new variable (not want to mess with the source).
And I tried to double the result, the problem is that it multiply as a string and not as in math.
The variable is 6, and I wanted to display 12, but it display 6.0 6.0.
Moreover, how can I save the results in a new variable?
System.out.println(" Impression Price *2 is: " + Impression_price*2);
System.out.println(" Impression Price*2 is: " + (Impression_price.multiply(2.0)));

You need to cast your args[3] which is a String to a corresponding numeric type, for example:
def Impression_price = args[3] as float
Demo:
More information: Creating JMeter Variables in Java - The Ultimate Guide

You need to convert String do double using Double.parseDouble, for example:
def Impression_price= Double.parseDouble(args[3]);
When you log you need to convert back to String using String.valueOf, for example:
log.info(String.valueOf(Impression_price*2));
To put a non String value you need to use vars.putObject:
vars.putObject("Impression_price_double", Impression_price *2);

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Join files from different pathes in USQL - azure

Related

How to stop Python Pandas from converting specific column from int to float

Concatinating strings to a single string in python

How to specify a Date Range in U-SQL Extract statement

Filter a String which holds a TimeStamp - Kotlin

Arithmetic operation in jmeter Groovy

Categories

Resources