I followed the sample given by NotificationHub for the bulk import of registrations.
I have an existing NotificationHub and tried to export the registrations and the file was downloaded to the Azure storage container. Added some new tags to the exported file and tried to import the updated file to the same NotificationHub. A jobId was created and the status was running state. It's been more than 24 hours and the status shows still running and the job progress parameter is still showing 0.
Related
My package is very simple. It is loading data from a csv file that I have stored in an Azure storage container, and inserting that data into an Azure SQL database. The issue is stemming from the connection to my Azure storage container. here is an image of the output:
Making this even more odd, while the data flow task is failing:
The individual components within the data flow task all indicate success:
Setting up the package, it seems that the connection to the container is fine (after all, it was able to extract all the column names from the desired file and map them to their destination). Here is an image showing the connection is fine:
So the issue is only realized upon execution.
I will also note that I found this post that was experiencing the exact same issue that I am now. As the top response there instructed, I added the new registry keys, but no cigar.
Any thoughts would be helpful.
First, make sure your blob can be access by public:
And if you don't have requirement to set networking, please make sure:
Then set the container access level:
And make sure the container is correct.
Google have Cloud Storage Data Transfer option to copy from one bucket to another but this will only work if both the buckets are in the same project. Using gutil -m rsync -r -d is an easy option to run as cron but we are migrating all bash to python3. So I need a python 3 script to use it as google cloud function to do a weekly copy whole bucket from project1 to another bucket in project2.
Language: python 3
app : Cloud Function
Process : Copy one bucket to another
Source Project: project1
Source bucket : bucket1
Dest Project: project2
Dest Bucket: bucket2
pseudo cmd: rsync -r gs://project1/bucket1 gs://project2/bucket2
Any quick and readable python 3 code script to do that.
A python script to do this will be really slow.
I would use a Dataflow (apache bream) batch process to do this. You can code this in python3 easily.
Basically you need:
One Operation to list all files.
One shuffle() operation to distribute the load among several workers.
One Operation to actually copy from source to destination.
The good part is Google will scale the workers for you and won't take much time.
You'll be billed for the storage operations and the gigabytes + cpu that takes to move al data.
Rsync is not an operation that can't be performed via a single request in the storage rest API, and gsutil is not available on Cloud Functions, for this reason rsync both buckets via a python script is not possible.
You can create a function to start a preemptible VM with a startup script that executes the rsync between buckets and shut down the instance after finalizing the rsync operation.
By using a VM instead of a serverless service you can avoid any timeout that could be generated by a long rsync process.
A preemptible VM can run for up to 24Hours before been stopped and you only will charged by the time that the instance is turned on (the disk storage will be charged independently of the status)
If the VM is powered off before a minute you won't be charged by the usage.
For this approach first is necessary to create a bash script in a bucket, this will be executed by the preemptible VM at the startup time for example:
#! /bin/bash
gstuil rsync -r gs://mybucket1 gs://mybucket2
sudo init 0 #this is similar to poweroff, halt or shutdown -h now
After that, you need to create a preemptible VM with a Startup script, I recommend an f1-micro instance since the rsync command between buckets doesn't require so much resources.
1.- go to the VM Instances page.
2.- Click Create instance.
3.- On the Create a new instance page, fill in the properties for your instance.
4.- Click Management, security, disks, networking, sole tenancy.
5.In the Identity and API access section, select a service account that has access to read your startup script file in Cloud Storage and the buckets to be synced
Select Allow full access to all Cloud APIs.
7.- Under Availability policy, set the Preemptibility option to On. This setting disables automatic restart for the instance, and sets the host maintenance action to Terminate.
8.- In the Metadata section, provide startup-script-url as the metadata key.
9.- In the Value box, provide a URL to the startup script file, either in the gs://BUCKET/FILE or https://storage.googleapis.com/BUCKET/FILE format.
10.Click Create to create the instance.
With this configuration every time that your instance will be started the script also will be executed.
This is the python function to start a VM (independently if this is preemptible)
def power(request):
import logging
# this libraries are mandatory to reach compute engine api
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
# the function will take the service account of your function
credentials = GoogleCredentials.get_application_default()
# this line is to specify the api that we gonna use, in this case compute engine
service = discovery.build('compute', 'v1', credentials=credentials, cache_discovery=False)
# set correct log level (to avoid noise in the logs)
logging.getLogger('googleapiclient.discovery_cache').setLevel(logging.ERROR)
# Project ID for this request.
project = "yourprojectID" # Update placeholder value.
zone = "us-central1-a" # update this to the zone of your vm
instance = "myvm" # update with the name of your vm
response = service.instances().start(project=project, zone=zone, instance=instance).execute()
print(response)
return ("OK")
requirements.txt file
google-api-python-client
oauth2client
flask
And you can schedule your function by Cloud Scheduler:
Create a service account with functions.invoker permission within your function
Create new Cloud scheduler job
Specify the frequency in cron format.
Specify HTTP as the target type.
Add the URL of your cloud function and method as always.
Select the token OIDC from the Auth header dropdown
Add the service account email in the Service account text box.
In audience field you must only need to write the URL of the function without any additional parameter
On cloud scheduler, I hit my function by using these URL
https://us-central1-yourprojectID.cloudfunctions.net/power
and I used this audience
https://us-central1-yourprojectID.cloudfunctions.net/power
please replace yourprojectID in the code and in the URLs and the zone us-central1
Every day, an Excel file is automatically uploaded to my Azure blob storage account. I have a Python script that reads the Excel file, extracts the necessary information, and saves the output as a new blob in the Azure storage account. I set up a Docker container that runs this Python script. It works correctly when run locally.
I pushed the Docker image to the Azure container registry and tried to set up an Azure logic app that starts a container with this Docker image every day at the same time. It runs, however, it does not seem to be working with the most updated version of my Azure storage account.
For example, I pushed an updated version of the Docker image last night. A new Excel file was added to the Azure storage account this morning and the logic app ran one hour later. The container with the Docker image, however, only found the files that were present in Azure storage account yesterday (so it was missing the most recent file, which is the one I needed analyzed).
I confirmed that the issue is not with the logic app as I added a step in the logic app to list the files in the Azure storage account, and this list included the most recent file.
UPDATE: I have confirmed that I am accessing the correct version of the environment variables. The issue remains: the Docker container seems to access Azure blob storage as it was at the time I most recently pushed the Docker image to the container registry. My current work around is to push the same image to the registry everyday, but this is annoying.
ANOTHER UPDATE: Here is the code to get the most recent blob (an Excel file). The date is always contained in the name of the blob. In theory, it finds the blob with the most recent date:
blobs = blob_service.list_blobs(container_name=os.environ.get("CONTAINERNAME"))
blobstring = blob_service.get_blob_to_text(os.environ.get("CONTAINERNAME"),
backup_csv).content
current_df = pd.read_csv(StringIO(blobstring))
add_n = 1
blob_string = re.compile("sales.xls")
for b in blobs:
if blob_string.search(b.name):
dt = b.name[14:24]
dt = datetime.strptime(dt, "%Y-%m-%d").date()
date_list.append(dt)
today = max(date_list)
print(today)
However, the blobs don't seem to update. It returns the most recent blob as of the date that I last pushed the image to the registry.
I also checked print(date.today()) in the same script and this works as expected (it prints the current date).
Figured out that I just needed to make all of the variables in my .env file and add them as environment variables with appropriate values in the 'containers environment' section of the image above. This https://learn.microsoft.com/en-us/azure/container-instances/container-instances-environment-variables was a helpful resource.
ALSO the container group needs to be deleted as the last action in the logic app. I named the wrong container group, so when the logic app ran each day, it used the cached version of the container.
In my release pipeline to my Integration Environment I want to restore the production database prior running the migration script.
Both my databases are hosted in Azure. So I thought I could use the Azure SQL Database Deployment task that is already integrated to the Azure Dedvops. I created two separate tasks. First Export to a .bacpac file and then import that .bacpac file again. Currently I am running into the following issue:
...
2018-11-22T09:02:28.9173416Z Processing Table '[dbo].[EmergencyContacts]'.
2018-11-22T09:02:31.6364073Z Successfully exported database and saved it to file 'D:\a\r1\a\GeneratedOutputFiles\DatabaseName.bacpac'.
2018-11-22T09:02:31.6726957Z Generated file D:\a\r1\a\GeneratedOutputFiles\DatabaseName.bacpac. Uploading file to the logs.
2018-11-22T09:02:31.6798180Z ##[error]Unable to process command '##vso[task.uploadfile] D:\a\r1\a\GeneratedOutputFiles\DatabaseName.bacpac' successfully. Please reference documentation (http://go.microsoft.com/fwlink/?LinkId=817296)
2018-11-22T09:02:31.6812314Z ##[error]Cannot upload task attachment file, attachment file location is not specified or attachment file not exist on disk
2018-11-22T09:02:31.7016975Z Setting output variable 'SqlDeploymentOutputFile' to 'D:\a\r1\a\GeneratedOutputFiles\DatabaseName.bacpac'
2018-11-22T09:02:31.7479327Z ##[section]Finishing: Azure SQL Export`enter code here`
Any ideas how I could solve this?
I have created a f# project that contains two functions. I can run theses locally and when I do func start (or start debug with F5 it in VS Code). One of the two functions copy data from one azure storage container to another and the other function copies some data from a DB and puts it in an azure storage container. All this works nicely when I run it locally. Now I would like to deploy these to Azure Functions. I have created a resource group, created the Function app and ensured that the Function App settings indicate that it is a Azure function version 2. When I try to deploy the functions via:
func azure functionapp publish <FUNCTION APP NAME>
The code is uploaded to Azure. And the output is:
Getting site publishing info...
Creating archive for current directory...
Uploading archive...
Upload completed successfully.
Syncing triggers...
In Azure portal under deployment options I see that a deployment has been triggered and looking at the details for the latest one I get:
Mon 09/17 Updating submodules.
Mon 09/17 Preparing deployment for commit id '75833a2816'.
Mon 09/17 Generating deployment script. View Log
Mon 09/17 Running deployment command... View Log
Mon 09/17 Running post deployment command(s)...
Mon 09/17 Syncing 2 function triggers with payload size 317 bytes successful.
Mon 09/17 Deployment successful.
This seems to indicate that two functions have been found and successfully deployed. However, the functions are not listed under the Functions under the Function App. And I have not been able to make successfull calls to them.
Do I have to provide some additional configuration in order to run a F# application as an Azure Function v2?
Here is what I see in the logs for Function App BokioMLDataExtractorFunctionsTest:
CopyImagesToBokioAIStorage: Invalid script file name configuration. The 'scriptFile' property is set to a file that does not exist.
CopyOcrToBokioAIStorage: Invalid script file name configuration. The 'scriptFile' property is set to a file that does not exist.
That is why the two functions are not showing up. Hopefully that helps figure out the issue?