Streaming through .NET application in Azure - azure

I have a .NET executable through which I want to stream data in Pig on my Azure HDInsight cluster. I've uploaded it to my container, but when I try to stream data through it, I get the following error:
<line 1, column 393> Failed to generate logical plan. Nested exception: java.io.IOException: Invalid ship specification: '/util/myStreamApp.exe' does not exist!
I define and use my action as follows:
DEFINE myApp `myStreamApp.exe` SHIP('/util/myStreamApp.exe');
outputData = STREAM inputData THROUGH myApp;
I try with and without the leading /, tried qualifying as wasb:///util/myStreamApp.exe and tried fully qualifying it as wasb://myContainer#myAccount.blob.core.windows.net/util/myStreamApp.exe, but in every case, I get the message that my file doesn't exist.
This page on uploading to HDInsight indicates you can use the Azure Blob Storage path of wasb:///example/data/davinci.txt in HDInsight as /example/data/davinci.txt, which indicates to me that there shouldn't be a problem with the paths.

It turns out the problem was that I wasn't declaring a dependency on the caller's side. I've got a console app that creates the Pig job:
var job = new PigJobCreateParameters()
{
Query = myPigQuery,
StatusFolder = myStatusFolder
};
But I needed to add to the job.Files collection a dependency upon my file:
job.Files.Add("wasbs://myContainer#myAccount.blob.core.windows.net/util/myStreamApp.exe");

Related

Azure Media Services -- Create Live Output and Streaming Locator with Python SDK

I am working on a project that uses the Azure Media Services Python SDK (v3). I have the following code which creates a live output and a streaming locator once the associated live event is running:
# Step 2: create a live output (used to reference the manifest file)
live_outputs = self.__media_services.live_outputs
config_data_live_output = LiveOutput(asset_name=live_output_name, archive_window_length=timedelta(minutes=30))
output = live_outputs.create(StreamHandlerAzureMS.RESOUCE_GROUP_NAME, StreamHandlerAzureMS.ACCOUNT_NAME, live_event_name, live_output_name, config_data_live_output)
# Step 3: get a streaming locator (the ID of the locator is used in the URL)
locators = self.__media_services.streaming_locators
config_data_streaming_locator = StreamingLocator(asset_name=locator_name)
locator = locators.create(StreamHandlerAzureMS.RESOUCE_GROUP_NAME, StreamHandlerAzureMS.ACCOUNT_NAME, locator_name, config_data_streaming_locator)
self.__media_services is an object of type AzureMediaServices. When I run the code above, I receive the following exception:
azure.mgmt.media.models._models_py3.ApiErrorException: (ResourceNotFound) Live Output asset was not found.
Question: Why is Azure Media Services throwing this error with an operation that creates a resource? How can I resolve this issue?
Note that I have managed to authenticate the SDK to Azure Media Services using a service principal and that I can successfully push video to the live event using ffmpeg.
I suggest that you take a quick look at the flow of a live event in this tutorial, which unfortunately is in .NET. We are still working on updating Python samples.
https://learn.microsoft.com/en-us/azure/media-services/latest/stream-live-tutorial-with-api
But it should help with the issue. The first issue I see is that it's likely you did not create the Asset for the Live output to record into.
You can think of Live Outputs as "tape recorder" machines, and the Assets at the tapes. They are locations in your storage account that the tape recorder is going to write to.
So after you have the Live Event running, you can have up to 3 of these "tape recorders" operating and writing to 3 different "tapes" (Assets) in storage.
Create an empty Asset
Create a live output and point it to that Asset
get the streaming locator for that Asset - so you can watch the tape. Notice that you are creating the streaming locator on the asset you created in step 1. Think of it as "I want to watch this tape" and not "I want to watch this tape recorder".

Can't access files via Local file API on Databricks

I'm trying to access small text file stored directly on dbfs using local file API.
I'm getting the following error.
No such file or directory
My code:
val filename = "/dbfs/test/test.txt"
for (line <- Source.fromFile(filename).getLines()) {
println(line)
}
At the same time I can access this file without any problems using dbutils or load it to RDD via spark context.
I've tried specifying the path starting with dbfs:/ or /dbfs/ or just with the test folder name, both in Scala and Python, getting the same error each time. I'm running the code from the notebook. Is it some problem with the cluster configuration?
Check if your cluster has Credentials Passthrough enabled. If so, local file Api is not available.
https://docs.azuredatabricks.net/data/databricks-file-system.html#local-file-apis

Reading data from Azure Blob Storage into Azure Databricks using /mnt/

I've successfully mounted my blob storage to Databricks, and can see the defined mount point when running dbutils.fs.ls("/mnt/"). This has size=0 - it's not clear if this is expected or not.
When I try and run dbutils.fs.ls("/mnt/<mount-name>"), I get this error:
java.io.FileNotFoundException: / is not found
When I try and write a simple file to my mounted blob with dbutils.fs.put("/mnt/<mount-name>/1.txt", "Hello, World!", True), I get the following error (shortened for readability):
ExecutionError: An error occurred while calling z:com.databricks.backend.daemon.dbutils.FSUtils.put. : shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
...
Caused by: com.microsoft.azure.storage.StorageException: The specified resource does not exist.
All the data is in the root of the Blob container, so I have not defined any folder structures in the dbutils.fs.mount code.
thinking emoji
The solution here is making sure you are using the 'correct' part of your Shared Access Signature (SAS). When the SAS is generated, you'll find there are lots of different parts of it that you can use - it's likely sent to you as one long connection string, e.g:
BlobEndpoint=https://<storage-account>.blob.core.windows.net/;QueueEndpoint=https://<storage-account>.queue.core.windows.net/;FileEndpoint=https://<storage-account>.file.core.windows.net/;TableEndpoint=https://<storage-account>.table.core.windows.net/;SharedAccessSignature=sv=<date>&ss=nwrt&srt=sco&sp=rsdgrtp&se=<datetime>&st=<datetime>&spr=https&sig=<long-string>
When you define your mount point, use the value of the SharedAccessSignature key, e.g:
sv=<date>&ss=nwrt&srt=sco&sp=rsdgrtp&se=<datetime>&st=<datetime>&spr=https&sig=<long-string>

SqlDataProvider connection string in Suave on Azure

I can't get SqlDataProvider to work when executed in a fsx script which is running in an Azure Web Site.
I have started from the samples that Tomas Petrecek has here: https://github.com/tpetricek/Dojo-Suave-FsHome.
In short it is a FSX script that is executed using the IIS httpPlatformHandler so that all http requests to my Azure Web site is forwarded to my F# script.
The F# Script use Suave to handle the requests.
When I tried adding some database access to my HTTP handlers I got into problems.
The problematic code looks like this:
[<Literal>]
let connStr = "Server=(localdb)\\v11.0;Initial Catalog=My_Database;Integrated Security=true;"
[<Literal>]
let resolutionFolder = __SOURCE_DIRECTORY__
FSharp.Data.Sql.Common.QueryEvents.SqlQueryEvent |> Event.add (printfn "Executing SQL: %s")
// the following line fails when executing in azure
type db = SqlDataProvider<connStr, Common.DatabaseProviderTypes.MSSQLSERVER, ResolutionPath = resolutionFolder>
let saveData someDataToSave =
let ctx = db.GetDataContext(Environment.GetEnvironmentVariable("SQLAZURECONNSTR_QUERIES"))
.....
/// code using the context here
This works just fine when I run it locally, but when I deploy it to the azure site it will fail at the line where the type dbis created.
The error message is (line 70 is the line that has the type db = ...:
D:\home\site\wwwroot\app.fsx(70,11): error FS3033: The type provider
'FSharp.Data.Sql.SqlTypeProvider' reported an error: A network-related
or instance-specific error occurred while establishing a connection to
SQL Server. The server was not found or was not accessible. Verify
that the instance name is correct and that SQL Server is configured to
allow remote connections. (provider: SQL Network Interfaces, error: 52
- Unable to locate a Local Database Runtime installation. Verify that SQL Server Express is properly installed and that the Local Database
Runtime feature is enabled.)
The design-time database in the connStr is not available in the azure site, but I thought this is why we have the GetDataContext overload that takes a connection string to be used at run-time?
Is it because it is running as a script and not as compiled code that it is trying to access the database when creating the TypeProvider?
If yes, does it mean that my only option is to compile and provide the database code as a compiled assembly that I load and use in my Suave FSX script?
Reading the connection string from a config file does not work very well as this is in a azure site. I really need to get the connection string from an environment variable (which is set in the azure management interface).
Hmm, this is a bit unfortunate - as #Fyodor mentioned in the comments, the problem is that the script-based deployment to Azure actually compiles the script on the Azure machine - and so you need to have a statically-resolved connection string that works on Azure.
There are two options:
Use compiled project instead. If you compile your F# code locally and deploy the compiled code to Azure it will work. Sadly, there are no good samples for that.
Do some clever trick to make the connection string accessible to the script at compile time.
Send a PR to the SQL provider so that you can give it the name of an environment variable and it reads the connection string from there.
I think (3) would actually be quite nice and useful feature.
I'm not necessarily sure what the best way to do (2) would be. But I think you might be able to modify app.azure.fsx so that it creates a file (say connection.fsx) that contains something like:
module Connection
let [<Literal>] ConnString = "<Contents of SQLAZURECONNSTR_QUERIES>"
Then app.fsx could load this script and use Connection.ConnString in the argument of SQL type provider.

How to configure Quartz.net to use an Azure SQL database to store ADOJobStore details

I am using quartz.net as a scheduler in a Microsoft Azure Web Role. I can get Quartz.net to work just fine if I use the RamDataStore. However, I want to break this into two components: the first will allow scheduling of jobs through a web interface and the second will execute the jobs through a worker role. To have this distributed processing, I will need to use an ADOJobStore.
Everything works fine with the RamDataStore but it breaks when I try to switch over to the ADOJobStore. So this leads me to believe that there is something in my properties that I'm missing. I am using Azure SQL database and while this is similar to SQL Server, there are some gotchas that sometimes cause problems.
I am using Quartz.net 2.0 (from nuGet) in VS2010, the database is Azure SQL.
When I call .GetScheduler(), I get the following exception:
{"JobStore type 'Quartz.Impl.AdoJobStore.JobStoreTX' props could not
be configured."}
with the details:
{"Could not parse property 'default.connectionString' into correct
data type: No writable property 'Default.connectionString' found"}
My connection code (including programatically set properties):
NameValueCollection properties = new NameValueCollection();
properties["quartz.scheduler.instanceName"] = "SchedulingServer";
properties["quartz.threadPool.type"] = "Quartz.Simpl.ZeroSizeThreadPool, Quartz";
properties["quartz.jobStore.type"] = "Quartz.Impl.AdoJobStore.JobStoreTX, Quartz";
properties["quartz.jobStore.tablePrefix"] = "QRTZ_";
properties["quartz.jobStore.clustered"] = "false";
properties["quartz.jobStore.driverDelegateType"] = "Quartz.Impl.AdoJobStore.SqlServerDelegate, Quartz";
properties["quartz.jobStore.dataSource"] = "default";
properties["quartz.jobStore.default.connectionString"] = "Server=tcp:serverName.database.windows.net;Database=scheduler;User ID=scheduler#serverName;Password=***;Trusted_Connection=False;Encrypt=True;";
properties["quartz.jobStore.default.provider"] = "SqlServer-20";
properties["quartz.jobStore.useProperties"] = "true";
ISchedulerFactory sf = new StdSchedulerFactory(properties);
_scheduler = sf.GetScheduler();
Any help or suggestions would be appreciated.
You have small but subtle error in your data source property naming, it should read:
properties["quartz.dataSource.default.connectionString"] = "Server=tcp:serverName.database.windows.net;Database=scheduler;User ID=scheduler#serverName;Password=***;Trusted_Connection=False;Encrypt=True;";
Also there is a property connectionStringName if you want to use the connection string section of configuration file.

Resources