Audit Logs Retrieval from Azure Data Lake Storage (Gen 2) - azure

I am trying to retrieve audit logs from Azure Data Lake Storage (Gen 2)..
So far I have tried using AZCOPY, REST API (unsupported for now) in Gen 2 to retrieve (connect) the audit logs and looking for an alternative solution for retrieving the logs
When connected using AZCOPY it uses nothing but API based calls and when I tried to retrieve log I got the error that API calls are not supported for hierarchical namespace accounts. Image added for reference.
Snapshot of AZCOPY error
Is there any workaround for this use case or any other approach which I can try to retrieve logs?

Update:
I can get the file content from the ADLS GEN2 with read api. I can provide you an example written by python code(you can change to any other language as per my code). From the code below, you can directly get the file content, or get the Authorization which can be used in postman.
Python 3.7 code like below:
import requests
import datetime
import hmac
import hashlib
import base64
storage_account_name = 'xxx'
storage_account_key = 'xxx'
api_version = '2018-11-09'
request_time = datetime.datetime.utcnow().strftime('%a, %d %b %Y %H:%M:%S GMT')
#the file path on adls gen2
FILE_SYSTEM_NAME='dd1/myfile.txt'
string_params = {
'verb': 'GET',
'Content-Encoding': '',
'Content-Language': '',
'Content-Length': '',
'Content-MD5': '',
'Content-Type': '',
'Date': '',
'If-Modified-Since': '',
'If-Match': '',
'If-None-Match': '',
'If-Unmodified-Since': '',
'Range': '',
'CanonicalizedHeaders': 'x-ms-date:' + request_time + '\nx-ms-version:' + api_version,
'CanonicalizedResource': '/' + storage_account_name+'/'+FILE_SYSTEM_NAME
}
string_to_sign = (string_params['verb'] + '\n'
+ string_params['Content-Encoding'] + '\n'
+ string_params['Content-Language'] + '\n'
+ string_params['Content-Length'] + '\n'
+ string_params['Content-MD5'] + '\n'
+ string_params['Content-Type'] + '\n'
+ string_params['Date'] + '\n'
+ string_params['If-Modified-Since'] + '\n'
+ string_params['If-Match'] + '\n'
+ string_params['If-None-Match'] + '\n'
+ string_params['If-Unmodified-Since'] + '\n'
+ string_params['Range'] + '\n'
+ string_params['CanonicalizedHeaders']+'\n'
+ string_params['CanonicalizedResource'])
signed_string = base64.b64encode(hmac.new(base64.b64decode(storage_account_key), msg=string_to_sign.encode('utf-8'), digestmod=hashlib.sha256).digest()).decode()
#print out the datetime
print(request_time)
#print out the Authorization
print('SharedKey ' + storage_account_name + ':' + signed_string)
headers = {
'x-ms-date' : request_time,
'x-ms-version' : api_version,
'Authorization' : ('SharedKey ' + storage_account_name + ':' + signed_string)
}
url = ('https://' + storage_account_name + '.dfs.core.windows.net/'+FILE_SYSTEM_NAME)
#print out the url
print(url)
r = requests.get(url, headers = headers)
#print out the file content
print(r.text)
After run the code, the file content is fetched:
And you can also use the generated values like authorization / date in the above code, in the postman:
As you may know that the SDK is not ready for azure data lake gen 2, so as of now, the solution is using ADLS Gen2 Read api.
After retrieving the content of the file, you can save it.
And you may do your own work for the authentication. If you have any issues about how to read using ADLS Gen 2 api, please feel free to let me know.

ADLS Gen2 $logs are now available when you sign up for Multi Protocol Access in ADLS Gen2. A blog describing Multi Protocol Access can be found at http://aka.ms/mpaadls. You can sign up for access here.
Enabling logs in the Azure portal is not currently supported. Here's an example of how to enable the logs by using PowerShell.
$storageAccount = Get-AzStorageAccount -ResourceGroupName <resourceGroup> -Name <storageAccountName>
Set-AzStorageServiceLoggingProperty -Context $storageAccount.Context -ServiceType Blob -LoggingOperations read,write,delete -RetentionDays <days>.
To consume logs, you can use AzCopy and SDKs today. You cannot view $logs in Azure Storage Explorer for the time being.

With November 2019 (Version 1.11.1) release of Azure Storage Explorer, it is now possible to view hidden containers such as $logs

Related

Finding the azure account key with Blob Service Client fails(azure python sdk)

I am using
Name: azure-mgmt-storage
Version: 16.0.0
Summary: Microsoft Azure Storage Management Client Library for Python
Home-page: https://github.com/Azure/azure-sdk-for-python
for generating a report to find the storage container size.
The snippet of my code that I am using is as below
from azure.mgmt.storage import StorageManagementClient
subscription_client = Subscription(tenant=tenant_id, client_id=client_id, secret=client_secret)
service_principals = subscription_client.credentials
subscription_id = subscription_client.find_subscription_id()
storage_client = StorageManagementClient(credential=service_principals, subscription_id=subscription_id)
storage_account_list = storage_client.storage_accounts.list()
for storage_account in storage_account_list:
blob_service_client = BlobServiceClient(account_url=storage_account.primary_endpoints.blob,credential=service_principals)
account_info = blob_service_client.get_service_properties()
keys = blob_service_client.credential.keys()
When I evaluate expression blob_service_client.credential, value is
<azure.identity._credentials.client_secret.ClientSecretCredential object at 0x05747E98>
blob_service_client.api_version evaluates to 2020-02-10.
And blob_service_client.credential.account_key or blob_service_client.credential.account_key() evaluates to {AttributeError}'ClientSecretCredential' object has no attribute 'account_key'
or even when I try blob_service_client.credential.keys() I get {AttributeError}'ClientSecretCredential' object has no attribute 'keys' error
Any Azure expert can help me out here? Also connnection strings are another way to approach this problem where I can use:
BlobServiceClient.from_connection_string(connection_string)
for which I am also required to generate the connection_string dynamically, which I am unable to.
Since you are already using the client secret credential, you can do your storage operation (calculating storage container size in this case). Note below in my code, I had the subscription id handy already, so I did not use subscription client. But you can definitely like your original code.
from azure.identity import ClientSecretCredential
from azure.mgmt.storage import StorageManagementClient
from azure.storage.blob import BlobServiceClient, ContainerClient
tenant_id='<tenant id>'
client_id='<client id>'
client_secret='<secret>'
subscription_id='<subscription id>'
credentials = ClientSecretCredential(tenant_id=tenant_id, client_id=client_id, client_secret=client_secret)
storage_client = StorageManagementClient(credential=credentials, subscription_id=subscription_id)
storage_account_list = storage_client.storage_accounts.list()
for storage_account in storage_account_list:
blob_service_client = BlobServiceClient(account_url=storage_account.primary_endpoints.blob,credential=credentials)
containers = blob_service_client.list_containers()
for container in containers:
container_client = ContainerClient(account_url=storage_account.primary_endpoints.blob,credential=credentials, container_name=container.name)
blobs = container_client.list_blobs()
container_size = 0
for blob in blobs:
container_size = container_size + blob.size
print('Storage Account: ' + storage_account.name + ' ; Container: ' + container.name + ' ; Size: ' + str(container_size))

Download Zoom Recordings to Local Machine and use Resumable Upload Method to upload the video to Google Drive

I have built a function in Python3 that will retrieve the data from a Google Spreadsheet. The data will contain the Recording's Download_URL and other information.
The function will download the Recording and store it in the local machine. Once the video is saved, the function will upload it to Google Drive using the Resumable Upload method.
Even though the response from the Resumable Upload method is 200 and it also gives me the Id of the file, I can't seem to find the file anywhere in my Google Drive. Below is my code.
import os
import requests
import json
import gspread
from oauth2client.service_account import ServiceAccountCredentials
DOWNLOAD_DIRECTORY = 'Parent_Folder'
def upload_recording(file_location,file_name):
filesize = os.path.getsize(file_location)
# Retrieve session for resumable upload.
headers = {"Authorization": "Bearer " + access_token, "Content-Type": "application/json"}
params = {
"name": file_name,
"mimeType": "video/mp4"
}
r = requests.post(
"https://www.googleapis.com/upload/drive/v3/files?uploadType=resumable",
headers=headers,
data=json.dumps(params)
)
print(r)
location = r.headers['Location']
# Upload the file.
headers = {"Content-Range": "bytes 0-" + str(filesize - 1) + "/" + str(filesize)}
r = requests.put(
location,
headers=headers,
data=open(file_location, 'rb')
)
print(r.text)
return True
def download_recording(download_url, foldername, filename):
upload_success = False
dl_dir = os.sep.join([DOWNLOAD_DIRECTORY, foldername])
full_filename = os.sep.join([dl_dir, filename])
os.makedirs(dl_dir, exist_ok=True)
response = requests.get(download_url, stream=True)
try:
with open(full_filename, 'wb') as fd:
for chunk in response.iter_content(chunk_size=512 * 1024):
fd.write(chunk)
upload_success = upload_recording(full_filename,filename)
return upload_success
except Exception as e:
# if there was some exception, print the error and return False
print(e)
return upload_success
def main():
scope = ["https://spreadsheets.google.com/feeds", "https://www.googleapis.com/auth/spreadsheets",
"https://www.googleapis.com/auth/drive.file", "https://www.googleapis.com/auth/drive"]
creds = ServiceAccountCredentials.from_json_keyfile_name('creds.json', scope)
client = gspread.authorize(creds)
sheet = client.open("Zoom Recordings Data").sheet1
data = sheet.get_all_records()
# Get the Recordings information that are needed to download
for index in range(len(sheet.col_values(9))+1 ,len(data)+2):
success = False
getRow = sheet.row_values(index)
session_name = getRow[0]
topic = getRow[1]
topic = topic.replace('/', '')
topic = topic.replace(':', '')
account_name = getRow[2]
start_date = getRow[3]
file_size = getRow[4]
file_type = getRow[5]
url_token = getRow[6] + '?access_token=' + getRow[7]
file_name = start_date + ' - ' + topic + '.' + file_type.lower()
file_destination = session_name + '/' + account_name + '/' + topic
success |= download_recording(url_token, file_destination, file_name)
# Update status on Google Sheet
if success:
cell = 'I' + str(index)
sheet.update_acell(cell,'success')
if __name__ == "__main__":
credentials = ServiceAccountCredentials.from_json_keyfile_name(
'creds.json',
scopes='https://www.googleapis.com/auth/drive'
)
delegated_credentials = credentials.create_delegated('Service_Account_client_email')
access_token = delegated_credentials.get_access_token().access_token
main()
I'm still trying to figure out how to upload the video to the folder that it needs to be. I'm very new to Python and the Drive API. I would very appreciate if you can give me some suggestions.
How about this answer?
Issue and solution:
Even though the response from the Resumable Upload method is 200 and it also gives me the Id of the file, I can't seem to find the file anywhere in my Google Drive. Below is my code.
I think that your script is correct for the resumable upload. From your above situation and from your script, I understood that your script worked, and the file has been able to be uploaded to Google Drive with the resumable upload.
And, when I saw your issue of I can't seem to find the file anywhere in my Google Drive and your script, I noticed that you are uploading the file using the access token retrieved by the service account. In this case, the uploaded file is put to the Drive of the service account. Your Google Drive is different from the Drive of the service account. By this, you cannot see the uploaded file using the browser. In this case, I would like to propose the following 2 methods.
Pattern 1:
The owner of the uploaded file is the service account. In this pattern, share the uploaded file with your Google account. The function upload_recording is modified as follows. And please set your email address of Google account to emailAddress.
Modified script:
def upload_recording(file_location, file_name):
filesize = os.path.getsize(file_location)
# Retrieve session for resumable upload.
headers1 = {"Authorization": "Bearer " + access_token, "Content-Type": "application/json"} # Modified
params = {
"name": file_name,
"mimeType": "video/mp4"
}
r = requests.post(
"https://www.googleapis.com/upload/drive/v3/files?uploadType=resumable",
headers=headers1, # Modified
data=json.dumps(params)
)
print(r)
location = r.headers['Location']
# Upload the file.
headers2 = {"Content-Range": "bytes 0-" + str(filesize - 1) + "/" + str(filesize)} # Modified
r = requests.put(
location,
headers=headers2, # Modified
data=open(file_location, 'rb')
)
# I added below script.
fileId = r.json()['id']
permissions = {
"role": "writer",
"type": "user",
"emailAddress": "###" # <--- Please set your email address of your Google account.
}
r2 = requests.post(
"https://www.googleapis.com/drive/v3/files/" + fileId + "/permissions",
headers=headers1,
data=json.dumps(permissions)
)
print(r2.text)
return True
When you run above modified script, you can see the uploaded file at "Shared with me" of your Google Drive.
Pattern 2:
In this pattern, the file is uploaded to the shared folder using the resumable upload with the service account. So at first, please prepare a folder in your Google Drive and share the folder with the email of the service account.
Modified script:
Please modify the function upload_recording as follows. And please set the folder ID you shared with the service account.
From:
params = {
"name": file_name,
"mimeType": "video/mp4"
}
To:
params = {
"name": file_name,
"mimeType": "video/mp4",
"parents": ["###"] # <--- Please set the folder ID you shared with the service account.
}
When you run above modified script, you can see the uploaded file at the shared folder of your Google Drive.
Note:
In this case, the owner is the service account. Of course, the owner can be also changed.
Reference:
Permissions: create

Creating Azure storage authorization header using python

I am trying to create the Authorization header for using Azure storage REST APIs. What a nightmare. The reason I am trying to do this is because I am trying to use a workflow builder (Alteryx) to call the API so my only programmatic options are Alteryx, python, or command line.
I think I'm close, but I just don't understand these last three lines of code, following this article - https://learn.microsoft.com/en-us/azure/storage/common/storage-rest-api-auth?toc=%2fazure%2fstorage%2fblobs%2ftoc.json
// Now turn it into a byte array.
byte[] SignatureBytes = Encoding.UTF8.GetBytes(MessageSignature);
// Create the HMACSHA256 version of the storage key.
HMACSHA256 SHA256 = new HMACSHA256(Convert.FromBase64String(storageAccountKey));
// Compute the hash of the SignatureBytes and convert it to a base64 string.
string signature = Convert.ToBase64String(SHA256.ComputeHash(SignatureBytes));
So if I follow this correctly, I have to create a SHA256 version of the storage key but then I make a SHA256 hash of the SHA256 hash of the signaturebytes?
I'm current googling and not getting far, but basically trying to do the same thing above in .net using python.
In python, you can just use this line of code:
signed_string = base64.b64encode(hmac.new(base64.b64decode(storage_account_key), msg=string_to_sign.encode('utf-8'), digestmod=hashlib.sha256).digest()).decode()
Here is the complete code of using List blobs api:
import requests
import datetime
import hmac
import hashlib
import base64
storage_account_name = 'xx'
storage_account_key = 'xxx'
container_name='aa1'
api_version = '2017-07-29'
request_time = datetime.datetime.utcnow().strftime('%a, %d %b %Y %H:%M:%S GMT')
string_params = {
'verb': 'GET',
'Content-Encoding': '',
'Content-Language': '',
'Content-Length': '',
'Content-MD5': '',
'Content-Type': '',
'Date': '',
'If-Modified-Since': '',
'If-Match': '',
'If-None-Match': '',
'If-Unmodified-Since': '',
'Range': '',
'CanonicalizedHeaders': 'x-ms-date:' + request_time + '\nx-ms-version:' + api_version + '\n',
'CanonicalizedResource': '/' + storage_account_name +'/'+container_name+ '\ncomp:list\nrestype:container'
}
string_to_sign = (string_params['verb'] + '\n'
+ string_params['Content-Encoding'] + '\n'
+ string_params['Content-Language'] + '\n'
+ string_params['Content-Length'] + '\n'
+ string_params['Content-MD5'] + '\n'
+ string_params['Content-Type'] + '\n'
+ string_params['Date'] + '\n'
+ string_params['If-Modified-Since'] + '\n'
+ string_params['If-Match'] + '\n'
+ string_params['If-None-Match'] + '\n'
+ string_params['If-Unmodified-Since'] + '\n'
+ string_params['Range'] + '\n'
+ string_params['CanonicalizedHeaders']
+ string_params['CanonicalizedResource'])
signed_string = base64.b64encode(hmac.new(base64.b64decode(storage_account_key), msg=string_to_sign.encode('utf-8'), digestmod=hashlib.sha256).digest()).decode()
headers = {
'x-ms-date' : request_time,
'x-ms-version' : api_version,
'Authorization' : ('SharedKey ' + storage_account_name + ':' + signed_string)
}
url = ('https://' + storage_account_name + '.blob.core.windows.net/'+container_name+'?restype=container&comp=list')
r = requests.get(url, headers = headers)
print(r.status_code)
print('\n\n'+r.text)
Test result:

Canonicalized Resource to list Azure storage tables

I had successfully retrieved Azure storage table details using the following code.
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("https://" + storageAccountName + ".table.core.windows.net/" + tableName;);
request.Method = "GET";
request.Accept = "application/json";
var date = DateTime.UtcNow.ToString("R", System.Globalization.CultureInfo.InvariantCulture);
request.Headers["x-ms-date"] = date;
request.Headers["x-ms-version"] = "2015-04-05";
string stringToSign = date + "\n/" + storageAccount + "/" + tableName; //Canonicalized Resource
System.Security.Cryptography.HMACSHA256 hasher = new System.Security.Cryptography.HMACSHA256(Convert.FromBase64String("accessKey"));
string strAuthorization = "SharedKeyLite " + storageAccountName + ":" + System.Convert.ToBase64String(hasher.ComputeHash(System.Text.Encoding.UTF8.GetBytes(stringToSign)));
request.Headers["Authorization"] = strAuthorization;
Task<WebResponse> response = request.GetResponseAsync();
HttpWebResponse responseresult = (HttpWebResponse)response.Result;
But when trying to get table list in a Storage account using the following REST API, exception occurred as "The remote server returned an error: (403) Forbidden."
https://myaccount.table.core.windows.net/Tables
I assumed that Canonicalized Resource should be different for this REST request and analyzed some Microsoft documentation, but cannot able to find any reference to construct it for list tables REST api.
Please help me in retrieving Azure Storage account tables list.
Please change the following line of code:
string stringToSign = date + "\n/" + storageAccount + "/" + tableName;
to
string stringToSign = date + "\n/" + storageAccount + "/Tables";
Also, please note that your request URL will also change to https://storageaccount.table.core.windows.net/Tables.

No 'Access-Control-Allow-Origin' header is present on the requested resource while uploading to Azure Blob

I've been struggling with this for about a day now. I am testing direct to Azure Blob storage upload and getting the dreaded CORS issue. "XMLHttpRequest cannot load https://tempodevelop.blob.core.windows.net/tmp/a4d8e867-f13e-343f-c6d3-a603…Ym0PlrBn%2BU/UzUs7QUhQw%3D&sv=2014-02-14&se=2016-10-12T17%3A59%3A26.638531. Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://localhost:8000' is therefore not allowed access. The response had HTTP status code 403."
Things I have already tried:
set the CORS to all hosts:
tried hosting my app locally and on heroku
made sure that I could upload a file using a different tool (Azure Storage Explorer)
configured my AccessPolicy to 'rwdl' and I am definitely getting an access signature (verified in unit tests).
The code as a whole is available here: https://github.com/mikebz/azureupload
But the relevant parts are here, front end upload:
<script>
/*
* not a true GUID, see here: http://stackoverflow.com/questions/105034/create-guid-uuid-in-javascript
*/
function guid() {
function s4() {
return Math.floor((1 + Math.random()) * 0x10000)
.toString(16)
.substring(1);
}
return s4() + s4() + '-' + s4() + '-' + s4() + '-' +
s4() + '-' + s4() + s4() + s4();
}
function startUpload() {
var fileName = guid();
jQuery.getJSON("/formfileupload/signature/" + fileName , function(data) {
console.log("got a signature: " + data.bloburl);
uploadFile(data.bloburl, data.signature);
})
.fail(function(jqxhr, textStatus, error) {
console.log( "error: " + textStatus + " - " + error );
})
}
function uploadFile(bloburl, signature) {
var xhr = new XMLHttpRequest();
fileData = document.getElementById('fileToUpload').files[0];
xhr.open("PUT", bloburl + "?" + signature);
xhr.setRequestHeader('x-ms-blob-type', 'BlockBlob');
xhr.setRequestHeader('x-ms-blob-content-type', fileData.type);
result = xhr.send(fileData);
}
</script>
The signature generation code in python is here:
def generate_access_signature(self, filename):
"""
calls the Azure Web service to generate a temporary access signature.
"""
blob_service = BlobService(
account_name=self.account_name,
account_key=self.account_key
)
expire_at = datetime.utcnow()
expire_at = expire_at + timedelta(seconds = 30)
access_policy = AccessPolicy(permission="rwdl", expiry=expire_at.isoformat())
sas_token = blob_service.generate_shared_access_signature(
container_name="tmp",
blob_name = filename,
shared_access_policy=SharedAccessPolicy(access_policy)
)
return sas_token
According to the error message [The response had HTTP status code 403], it may be the CORS is not enabled for the service or no CORS rules matches the preflight request. Detail Please refer to the Cross-Origin Resource Sharing (CORS) Support for the Azure Storage Services.
Or it may be the SAS signature incorrect.
Please have a try to troubleshoot
try to check the CORS setting on the Azure Portal under the Blob Service. As there are other services like table, queue, file.
Also Azure explore tools you can use to generate the SAS token
Get the SAS and try to debug it with the generated SAS
Thanks to Tom and Microsoft's support the issue has been resolved.
Solution part #1 - make sure you use the Azure Storage Library for Python version 0.33 or later.
Here is my requirements file:
azure-common==1.1.4
azure-nspkg==1.0.0
azure-storage==0.33.0
cffi==1.8.3
cryptography==1.5.2
dj-database-url==0.4.1
Django==1.10.2
enum34==1.1.6
futures==3.0.5
gunicorn==19.6.0
idna==2.1
ipaddress==1.0.17
pep8==1.7.0
psycopg2==2.6.2
pyasn1==0.1.9
pycparser==2.16
python-dateutil==2.5.3
requests==2.11.1
six==1.10.0
whitenoise==3.2.2
Second issue is generating the signature. The code that generates the right signature is here:
from azure.storage.blob import BlockBlobService, ContainerPermissions
from datetime import datetime, timedelta
class AzureUtils:
def __init__(self, account_name, account_key):
if account_name is None:
raise ValueError("account_name should not be None")
if account_key is None:
raise ValueError("account_key should not be None")
self.account_name = account_name
self.account_key = account_key
def generate_access_signature(self, filename):
"""
calls the Azure Web service to generate a temporary access signature.
"""
block_blob_service = BlockBlobService(
account_name=self.account_name,
account_key=self.account_key
)
expire_at = datetime.utcnow()
expire_at = expire_at + timedelta(seconds=30)
permissions = ContainerPermissions.READ | ContainerPermissions.WRITE | ContainerPermissions.DELETE | ContainerPermissions.LIST
sas_token = block_blob_service.generate_container_shared_access_signature(
"tmp",
permission=permissions,
expiry=expire_at
)
return sas_token
The solution can also be retrieved here: https://github.com/mikebz/azureupload

Resources