Read Shapefiles in Azure Data Factory - azure

I want to find out if it's possible to read shapefiles in Azure Data Factory?
An example will be to store the file into a container, and then manipulate the data of the shapefile with a data flow.
Please let me know.
Thank you.
I've tried loading the file into a container, and reading the file as an Azure Blob, but it doesn't seem to be the correct way of reading a shapefile in this platform.
I'm not sure if there is a support for shapefiles in ADF...

Related

Using Pandas to Write to a File within a Samba Share

I am using a GCP Cloud Function to read from a BigQuery Table and output the results to a CSV file located on a network drive (all the infrastructure parts necessary to communicate with on-prem are in place). I was wondering whether there is a way to write data out to this location using Pandas and PYSMB?
I have done a fair bit of reading on the topic and couldn't find a way, but thought someone with more experience may have an idea.
Thank you very much for your help.
Regards,
Scott

MS PowerPlatform: Use data inside a Azure data lake via a Flow

Edit: Still nowhere closer to an answer. Help appreciated...
My company has some simple lists with data I need and we already have these lists in the .parquet format inside a data lake. (Edit: Gen2)
I'm supposed to build a small PowerApp that uses some of the information inside these lists but I can't figure out the correct way to get the content of them via a Flow.
There's a connector "Azure Blob Storage: Get Blob Content" which sounds like the right one and indeed outputs a cryptic content string. But how do I get from this to an actually readable table where I can use the items? Or is this the wrong connector for this?
(Very new to all this Microsoft stuff. Don't really know anything about how this data lake is set up etc. Not sure whether this helps but basically the following Python script works and is exactly what I need to do via a Flow so it can be done automatically daily:)
import os
from io import BytesIO
import pandas as pd
from azure.storage.blob import BlobServiceClient, BlobClient
from azure.storage.blob import ContentSettings, ContainerClient
blob = BlobClient.from_connection_string(MY_CONNECTION_STRING, "myContainer", "myFile.parquet")
df = pd.read_parquet(BytesIO(blob.download_blob().readall()))
Thanks for any help :)
To clarify: By no means do I have to use this exact process. If you tell me "The standard way is to build a python REST Api on top of the data lake that answers this, that's perfectly fine. I just need to know the easiest and most standard way to access data inside a data lake)
If the Azure Data Lake is Gen1, then you don't need Power Automate to access it. You can go directly from a canvas app PowerApp using the Azure Data Lake connector.
https://learn.microsoft.com/en-us/connectors/azuredatalake/
you can call the ReadFile operation to read the contents of your file. this returns a binary which you then just convert to a string and work from there.
Since your data is in ADLS Gen2, I don't think you've got a direct connector that can transform Parquet data in Power Automate for this.
I would not bother ingesting and transforming parquet files in automate as you're bordering on a requirement for an ETL tool at this stage.
I would look at transforming the file/s using Azure Data Factory Pipelines and maybe dumping them in a database of your choice. Then power automate can pick it from there as it's got connectors to most databases. You could also convert them to CSV files if database is overkill for your setup.
I would use power automate as my orchestration layer. It will call this data factory pipeline > wait for it to complete > pick it from there.
https://learn.microsoft.com/en-us/azure/data-factory/format-parquet
https://learn.microsoft.com/en-au/connectors/azuredatafactory/#actions

Kafka to Azure to Power BI

Expected Fuctionality: send data from Kafka to azure ADLS and to power BI for visualization.
I am currently sending a hard coded text in my Kafka program (java) to Azure event hub. While Capture event hub content to ADLS gen1, .Avro file is created in ADLS. Since .avro files aren't supported in Power BI, I'm unable to proceed.
I need a solution to send the hard coded text as a .txt or .json file instead of .avro so that it ill be easy for me visualize in power BI. Else i need a solution to convert the .avro file to .txt or .json and save it in ADLS.
So far, I don't have the comment privileges. So i will ask it here.
Can you directly create a
and with parameters and then use Kafka CURL commands to call the rest API and push data.
I tried pushing data and was able to do it. Since I haven't worked with Kafka, I can't say for sure.

Azure IoT + Stream Analytics with blob data

we currently try to evaluate whether or not we should port our business logic
to Azure IoT Hub.
So far this looks promising but i have a questions about stream analytics.
Lets say we have IoT device in the field that send their data as csv files.
Currently our back end has some huge problem to go through this data, analyse it and inject it into our database systems with a decent performance.
I want to try to use Azure for that.
If I use IoT hub and wanna send this csv format to the hub. We assume that the csv format is fixed so i can't just port to the d2c communication format.
Can the stream analytics service work with this csv format and can it puts the embedded data into specific tables in a table storage ?
This would be really important. Are there any example of that out there that might clear things up for me ?
I guess Auzre has its libraries for handling csv files. What if we use no csv format but instead another industry standard format that Azure might not know about ?
Hope you can help me here.
Azure Stream Analytics (ASA) does support CSV as input:
Event serialization format: The serialization format (JSON, CSV, or Avro) of the incoming data stream.
And yes, it also support Azure Table Storage as output . See the docs
When you create an ASA job you can upload your csv file to test the query, so you can easily try it out if you create a sample file.
They have some example csv data on github
I suggest you create a small Proof of concept based on your sample data.
If, for some reason (like the data is in an unsupported format), ASA does not fit you can always retrieve the IoT Hub data using different techniques, for example using an EventProcessorHost. This way you have complete control over the data and you can output it using everything you want and it will still be scalable (but of course this depends on the data destination as well). See this post as a rough idea. It seems a bit outdated but the concept is still valid this day.
The official docs about possible other options to read data from the EventHub can be found here

Azure CosmosDB - Download all documents in collection to local directory

I am trying to download all the documents in my cosmosDB collection to a local directory. I want to modify a few things in all of the JSON documents using python, then upload them to another Azure account. What is the simplest, fastest way to download all of the documents in my collection? Should I use the CosmosDB emulator? I've been told to check out Azure's data factory? Would that help with downloading files locally? I've also been referred to CosmosDB's data migration tool and I saw that it facilitates import data to CosmosDB but I can't find much on exporting. I have about 6GB of Json files in my collection.
Thanks.
In the past I've used the DocumentDb (CosmosDb) Data Migration Tool which is available for download from Microsoft.
When running the app you need to specify source and target as in the screenshot below
Make sure that you choose to Import from DocumentDb and specify the connection string and collection you want to export from. If you want to dump the entire contents of your collection the query would just be
SELECT * FROM c
Then under the Target Information you can choose a JSON file which will be saved to your local hard drive. You're free to modify the contents of that file in any way and then use it as Source Information later when you're ready to import it back to another collection.
I used the migration tool and found that it is great if you have a reasonably sized db as it does use processing and bandwidth for a considerable period. I had to chunk a 10GB db and that took too long so ended up using Data Lake Analytics to transfer via script to SQL server and Blob Storage. It gives you a lot of flexibility to transform the data and store either in Data Lake of other distributed systems. As well if needed it helps if you are using cosmos for staging and need to run the data through any cleaning algorithms.
The other advantages are that you can set up batching and you get a lot of processing stats to determine how to optimize large data transformations. Hope this helps. Cheers.

Resources