I have an input to my stream analytics job as a CSV string such as follows:
jon,41,111 treadmill lane,07831231123,aa,bb,123...etc.
I'd like to sort this data into columns of an SQL table with column headings:
name,age,address,phone,result1,result2,result3...etc.
I've tried using SQL split functions but none I've tried seem to be compatible with Azure stream analytics job query. Could anyone provide any assistance as to how I can split my string into the appropriate tables? Many thanks.
If your events are coming in with a CSV format, you don't have to do anything in your query to work with it. The trick is to set the correct serialisation for your input. When you create your IoT Hub input, set the serialisation to CSV:
This will work if your CSV message has the headers included in the message:
name,age,address,phone,result1,result2,result3
jon,41,111 treadmill lane,07831231123,aa,bb,123
It will show up in the input preview like so:
When the headers are present, you can use them in your queries.
SELECT
name,
age
INTO
target
FROM
[csv-input]
Related
I am trying to extract data from SAP using SAP CDC Connector in ADF. The source data looks something like this.
START_DT|PROD_NAME|END_DT
20201230165830.0|BBEESABX|20180710143703.0
When we perform a preview data on the source, we are getting data just like above. But while performing copy via copy activity, below failure is observed :-
Failure happened on 'Source' side. ErrorCode=SapParsingDataFailure,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed when parsing data, parsing value: 'ESABX 201807', expected data type 'Microsoft.DataTransfer.Common.Shared.ClrTypeCode'.Please check your origin data in SAP side,Source=Microsoft.DataTransfer.Runtime.SapRfcHelper,''Type=System.FormatException,Message=Input string was not in a correct format.,Source=mscorlib,'
I have tried several combination and changes on sink side such as changing parquet to csv, changing Copy behavior to all available options...but nothing seems to work.
Probably you have hiding fields in the SAP Extractor? (RSA6). Try this workaround, make a selection of all fields in the SAP CDC connector and run it again.
I created a table in my Azure Data Explorer with the following command:
.create table MyLogs ( Level:string, Timestamp:datetime, UserId:string, TraceId:string, Message:string, ProcessId:int32 )
I then created my Storage Account --> Container and i then uploaded a simple txt file with the following content
Level Timestamp UserId TraceId Message ProcessId
I then generated a SAS for the container holding that txt file and used in into the query section of my Azure Data Explorer like the following:
.ingest into table MyLogs (
h'...sas for my txt file ...')
Now, when i read the table i see something like this
Level TimeStamp UserId TraceID MEssage ProcessId
Level Timestamp UserId TraceId Message ProcessId
So it basically put all the content into the first column.
I was expecting some automatic splitting. I tried with tab, spaces, commas and many other separators.
I tried to configure an injection mapping with csv format but had no luck.
For what I understood, each new line in the txt is a new row in the table. But how to split the same line with some specific separator?
I read many pages of documentation but had no luck
You can specify any of the formats that you want to try using the format argument, see the list of formats and the ingestion command syntax example that specify the format here
In addition, you can use the "one click ingestion" from the web interface.
This should work (I have done it before with Python SDK)
.create table MyLogs ingestion csv mapping 'MyLogs_CSV_Mapping' ```
[
{"Name":"Level","datatype":"datetime","Ordinal":0},
{"Name":"Timestamp","datatype":"datetime","Ordinal":1},
{"Name":"UserId","datatype":"string","Ordinal":2},
{"Name":"TraceId","datatype":"string","Ordinal":3},
{"Name":"Message","datatype":"string","Ordinal":4},
{"Name":"ProcessId","datatype":"long","Ordinal":5}
]```
https://learn.microsoft.com/de-de/azure/data-explorer/kusto/management/data-ingestion/ingest-from-storage
.ingest into table MyLogs SourceDataLocator with (
format="csv",
ingestionMappingReference = "MyLogs_CSV_Mapping"
)
Hopefully this will help a bit :)
I have a copy activity in Azure Data Factory with a Google BigQuery source.
I need to import the whole table (which contains nested fields - Records in BigQuery).
Nested fields get imported as follows (a string containing only data values):
"{\"v\":{\"f\":[{\"v\":\"1\"},{\"v\":\"1\"},{\"v\":\"1\"},{\"v\":null},{\"v\":\"1\"},{\"v\":null},{\"v\":null},{\"v\":\"1\"},{\"v\":null},{\"v\":null},{\"v\":null},{\"v\":null},{\"v\":\"0\"}]}}"
Expected output would be something like:
{"nestedColName" : [{"subNestedColName": 1}, {"subNestedColName": 1}, {"subNestedColName": 1}, {"subNestedColName": null}, ...] }
I think this is a connector issue from Data Factory's side but am not sure how to proceed.
Have considered using Databricks to import data from GBQ directly and then saving the DataFrame to sink.
Have also considered querying for a subset of columns and using UNNEST where required but would rather not do this as Parquet handles both Array and Map types.
Anyone encountered this before / what did you do?
Solution used:
Databricks (Spark) connector for Google BigQuery:
https://docs.databricks.com/data/data-sources/google/bigquery.html
This preserves schemas and nested field names.
Preferring the simpler setup of ADF BigQuery connector to Databricks's BigQuery support, I opted for a solution where I extract the data in JSON and 'massage' it into Parquet using Databricks:
Use a Copy activity to get data from BigQuery with all the data packed into a single JSON string field. Output format can be Parquet or JSON (I'm using Parquet). Use a BigQuery query like this:
select TO_JSON_STRING(t) as value from `<your BigQuery table>` as t
NOTE: The name of the field must be value. The df.write.text() text file writer writes the contents of value column into each row of the text file, which is a JSON string in this case.
Run a Databrick notebook activity with code like this:
# Read data and write it out as text file to get the JSON. (Compression is optional).
dfInput=spark.read.parquet(inputpath)
dfInput.write.mode("overwrite").option("compression","gzip").text(tmppath)
# Read back as JSON to extract the correct schema.
dfTemp=spark.read.json(tmppath)
dfTemp.write.mode("overwrite").parquet(outputpath)
Use the output as is, or use a Copy activity to copy it to where you like.
It's seem ADF v2 does not support writing data to TEXT file (.TXT).
After select File System
But don't see TextFormat at the next screen
So do we any method to write data to TEXT file ?
Thanks,
Thai
Data Factory only support these 6 file formats:
Please see: Supported file formats and compression codecs in Azure Data Factory.
If we want to write data to a txt file, the only format we can using is Delimited text, when the pipeline finished, you will get a txt file.
Reference: Delimited text: Follow this article when you want to parse the delimited text files or write the data into delimited text format.
For example, I create a pipeline to copy data from Azure SQL to Blob, choose DelimitedText format as Sink dataset:
The txt file I get in Blob Storeage:
Hope this helps
I think what you are looking for is DelimitedText dataset. You can specify extension as part of the file name
I need help, can't see the issue of this problem:
I am provisioning an Azure Data Lake Store with a Stream Analytics jobs.
File are tab separated and the job is running without errors.
I deployed an Azure Data Lake Analytics service to aggregate data like this:
#input =
EXTRACT [applicationname] string,
[clientip] string,
[continent] string,
[country] string,
[province] string,
[city] string,
[latitude] string,
[longitude] string
FROM "adl://mydatalakesotre.azuredatalakestore.net/instrumentationoutput/mystore/2017-10-22/{*}"
USING Extractors.Text(delimiter: '\t', skipFirstNRows: 1, silent: true);
OUTPUT #input
TO "output/PowerBI_output.tsv"
USING Outputters.Tsv(outputHeader: true);
I can't find the way to make it working... I Have other 5 MB of input data, but the output got only the headers, as specify in the query... What am I missing.
Thanks for the help.
As Bob mentions in the comment, you are getting an empty result because you most likely have a misalignment in your schema definition and the actual files that you extract from.
I suggest that you open the file in the ADL Tools of Visual Studio and use the CREATE EXTRACT statement wizard to create you the EXTRACT statement. If you still get error messages (after removing silent:true), please update your question with the detected error message and we will give you an updated answer.