Azcopy interprets source as local and adds current path when it is a gcloud storage https url - azure

We want to copy files from Google Storage to Azure Storage.
We used following this guide: https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-google-cloud
We run this command:
azcopy copy 'https://storage.googleapis.com/telia-ddi-delivery-plaace/activity_daily_al1_20min/' 'https://plaacedatalakegen2.blob.core.windows.net/teliamovement?<SASKEY>' --recursive=true
And get this resulting error:
INFO: Scanning...
INFO: Any empty folders will not be processed, because source and/or destination doesn't have full folder support
failed to perform copy command due to error: cannot start job due to error: cannot scan the path /Users/peder/Downloads/https:/storage.googleapis.com/telia-ddi-delivery-plaace/activity_daily_al1_20min, please verify that it is a valid.
It seems to us that azcopy interprets the source as a local file destination and therefore adds the current location we run it from which is: /Users/peder/Downloads/. But we are unable to find any arguments to indicate that it is a web location and it is identical to the documentation in this guide:
azcopy copy 'https://storage.cloud.google.com/mybucket/mydirectory' 'https://mystorageaccount.blob.core.windows.net/mycontainer/mydirectory' --recursive=true
What we have tried:
We are doing this on a Mac in Terminal, but we also tested PowerShell for Mac.
We have tried single and double quotes.
We copied the Azure Storage url with SAS key from the console to ensure that has correct syntax
We tried cp instead of copy as the help page for azcopy used that.
Is there anything wrong with our command? Or can it be that azcopy has been changed since the guide was written?
I also created an issue for this on the Azure Documentation git page: https://github.com/MicrosoftDocs/azure-docs/issues/78890

The reason you're running into this issue is because the URL storage.cloud.google.com is hardcoded in the application source code for Google Cloud Storage. From this link:
const gcpHostPattern = "^storage.cloud.google.com"
const invalidGCPURLErrorMessage = "Invalid GCP URL"
const gcpEssentialHostPart = "google.com"
Since you're using storage.googleapis.com instead of storage.cloud.google.com, it is not recognized by azcopy as a valid Google Cloud Storage endpoint and it considers the value as one of the directories in your local file system.

Related

Azure Blob Using Python

I am accessing a website that allows me to download CSV file. I would like to store the CSV file directly to the blob container. I know that one way is to download the file locally and then upload the file, but I would like to skip the step of downloading the file locally. Is there a way in which I could achieve this.
i tried the following:
block_blob_service.create_blob_from_path('containername','blobname','https://*****.blob.core.windows.net/containername/FlightStats',content_settings=ContentSettings(content_type='application/CSV'))
but I keep getting errors stating path is not found.
Any help is appreciated. Thanks!
The file_path in create_blob_from_path is the path of your local file, looks like "C:\xxx\xxx". This path('https://*****.blob.core.windows.net/containername/FlightStats') is Blob URL.
You could download your file to byte array or stream, then use create_blob_from_bytes or create_blob_from_stream method.
Other answer uses the so called "Azure SDK for Python legacy".
I recommend that if it's fresh implementation then use Gen2 Storage Account (instead of Gen1 or Blob storage).
For Gen2 storage account, see example here:
from azure.storage.filedatalake import DataLakeFileClient
data = b"abc"
file = DataLakeFileClient.from_connection_string("my_connection_string",
file_system_name="myfilesystem", file_path="myfile")
file.append_data(data, offset=0, length=len(data))
file.flush_data(len(data))
It's painful, if you're appending multiple times then you'll have to keep track of offset on client side.

Possible to edit web.config of cloud app deployed on windows Azure without redeploying app?

I would like to add rewrite URL code on azure web app's web.config without redeploying the whole app again. for this I am using 'app service editor' and 'kudu- debug console' for editing the web.config, first I cant save the file and gives me error.
after some search I found that under APP SETTING KEY value should be 0 instead 1
edited the value 1 to 0 and save the APP SETTING KEY, after that I am able to edited the config file, in order to test the code again I changed the value 0 to 1 and save the setting. but when I refresh the file which is opened in editor or kudu the pasted code disappeared, the site is connected with automatic azure deployment pipeline
How I can edited the web.config file without redeploying the app again.
Yes, it's possible to make changes without redeploying the app.
Some details:
Check Run the package document and we can find:
1.The zip package won't be extracted to D:\home\site\wwwroot, instead it will be uploaded directly to D:\home\data\SitePackages.
2.A packagename.txt which contains the name of the ZIP package to load at runtime will be created in the same directory.
3.App Service mounts the uploaded package as the read-only wwwroot directory and runs the app directly from that mounted directory. (That's why we can't edit the read-only wwwroot directory directly)
So my workaround is:
1.Navigate to D:\home\data\SitePackages in via kudu- debug console:
Download the zip(In my case it's 20200929072235.zip) which represents your deployed app, extract this zip file and do some changes to web.config file.
2.Zip those files(choose those files and right-click...) into a childtest.zip, please follow my steps carefully here!!! The folder structure of Run-from-package is a bit strange!!!
3.Then zip the childtest.zip into parenttest.zip(When uploading the xx.zip, the kudu always automatically extra them. So we have to zip the childtest.zip into parenttest.zip first)
4.Drag and drop local parenttest.zip into online SitePackages folder in kudu-debug console and we can get a childtest.zip now:
5.Modify the packagename.txt, change the content from 20200929072235.zip to childtest.zip and Save:
Done~
Check and test:
Now let's open App Service Editor to check the changes:
In addition: Though it answers the original question, I recommend using other deployment methods(web deploy...) as a workaround. It could be much easier~

Location for SSH private key and temporary SFTP download data in Azure functions

I am writing an Azure function that uses WinSCP library to download files using SFTP and upload the files on blob storage. This library doesn't allow to get files as a Stream. Only option is to download them locally. My code also uses a private key file. So i have 2 questions.
sessionOptions.SshPrivateKeyPath = Path.GetFullPath("privateKey2.ppk");
is working locally. I have added this file in solution with option "copy to output" and it works. But will it work when Azure function is deployed?
While getting the files I need to specify local path where the files will be downloaded.
var transferResult = session.GetFiles(
file.FullName, Path.GetTempPath() + #"SomeFolder\" + file.Name, false,
transferOptions);
The second parameter is the local path.
What should I use in place of Path.GetTempPath() that will work when Azure function is deployed?
For the private key, just deploy it along with your function project. You can simply add it to your VS project.
See also Including a file when I publish my Azure function in Visual Studio.
For the download: The latest version of WinSCP already supports streaming the files. Use the Session.GetFile method.
To answer your question about the temporary location, see:
Azure Functions Temp storage.
Where to store files for Azure function?

How to download via URL from DBFS in Azure Databricks

Documented here its mentioned that I am supposed to download a file from Data Bricks File System from a URL like:
https://<your-region>.azuredatabricks.net?o=######/files/my-stuff/my-file.txt
But when I try to download it from the URL with my own "o=" parameter similar to this:
https://westeurope.azuredatabricks.net/?o=1234567890123456/files/my-stuff/my-file.txt
it only gives the following error:
HTTP ERROR: 500
Problem accessing /. Reason:
java.lang.NumberFormatException: For input string:
"1234567890123456/files/my-stuff/my-file.txt"
Am I using the wrong URL or is the documentation wrong?
I already found a similar question that was answered, but that one does not seem to fit to the Azure Databricks documentation and might for AWS Databricks:
Databricks: Download a dbfs:/FileStore File to my Local Machine?
Thanks in advance for your help
The URL should be:
https://westeurope.azuredatabricks.net/files/my-stuff/my-file.txt?o=1234567890123456
Note that the file must be in the filestore folder.
As a side note I've been working on something called DBFS explorer to help with things like this if you would like to give it a try?
https://datathirst.net/projects/dbfs-explorer/

Google Cloud Natural Language Example

I have followed the getting started page closely.
https://cloud.google.com/natural-language/docs/reference/libraries#client-libraries-install-php
The example code has the following: $projectId = 'YOUR_PROJECT_ID';
I fill in my project id taken from the json file and the Google console--e.g. "$projectID = 'myproject-197218'" and I always get a fatal error with "Permission Denied."
I have set the env variable, run composer to install the library. And, I created the Google json file. I am running the example in PHP code.
I am running the code on my local server (xampp).
I figured out my problem. The Google Cloud json file was stored on my drive d:, so in the env variable I referenced it as 'GOOGLE_APPLICATION_CREDENTIALS= d:\xampp\htdocs\googapi\mproj.json', it did not work; when I moved it to the root of the c: drive and referenced it there (GOOGLE_APPLICATION_CREDENTIALS=c:proj.json), it worked fine.
Are you sure that the ID of your project is that one? I'm working in Google Cloud and I cannot see this project ID in our database, but if I type "my-project-197218" with a "-" between "my" and "project" I am able to find one project. please, to make sure that this is your correct project ID, run this command in your Google Cloud Shell to get the default project ID:
gcloud config list --format 'value(core.project)' 2>/dev/null

Resources