I have created Azure run books under and Azure Automation account to process a large Azure Analysis services Tabular model.
When I attempt a full Tabular model process via PowerShell it times out at just after an hour of runtime.
Start: 8:47:50 AM
End: 9:48:25 AM
Command Start
Invoke-ProcessASDatabase -server "asazure://---" -DatabaseName "---" -RefreshType Full
Command End
Error Start
Invoke-ProcessASDatabase : Failed to save modifications to the server. Error returned: 'Timeout expired. The timeout
period elapsed prior to completion of the operation.. The exception was raised by the IDbCommand interface.
Technical Details:
RootActivityId: ---
Date (UTC): ---
The command has been canceled.. The exception was raised by the IDbCommand interface.
The command has been canceled.. The exception was raised by the IDbCommand interface.
The command has been canceled.. The exception was raised by the IDbCommand interface.
The command has been canceled.. The exception was raised by the IDbCommand interface.
The command has been canceled.. The exception was raised by the IDbCommand interface.
The command has been canceled.. The exception was raised by the IDbCommand interface.
'.
At line:3 char:1
Invoke-ProcessASDatabase -server "asazure://--- ...
+ CategoryInfo : InvalidArgument: (---:String) [Invoke-ProcessASDatabase],
OperationException
+ FullyQualifiedErrorId : Microsoft.AnalysisServices.PowerShell.Cmdlets.ProcessASDatabase
Error End
I then broke the process down to the partition level. The process partitions run successfully for about an hour as well processing over 100 partitions but then start getting authentication errors.
How can I get a full Tabular model process completed running under an Azure runbook?
Start: 8:59:50 PM
End: 10:06:28 PM
Command Start
Invoke-ProcessPartition -PartitionName "2018_Q4" -TableName "FACT_AR" -server "asazure://---" -Database "---" -RefreshType Full
Command End
Error Start
Invoke-ProcessPartition : Authentication failed.
Technical Details:
RootActivityId: ---
Date (UTC): ---
At line:104 char:1
Invoke-ProcessPartition -PartitionName "2018_Q4" -TableName "FACT_AR ...
+ CategoryInfo : NotSpecified: (:) [Invoke-ProcessPartition], ConnectionException
+ FullyQualifiedErrorId :
Microsoft.AnalysisServices.ConnectionException,Microsoft.AnalysisServices.PowerShell.Cmdlets.ProcessPartition
Error End
Welcome to Stack Overflow :)
Your issue looks similar to this -> Exceed 3 hours timeout Automation Runbook Azure. Please check it if the answers given in there helps to resolve your issue.
Also you may read about fair share from below Microsoft documentation.
https://learn.microsoft.com/en-us/azure/automation/automation-runbook-execution#fair-share
Hope this helps!!
It turned out the credential was somehow corrupted. Dropped the credential and re-created it and the job worked.
Related
I have a powershell script that is throwing error. I am trying to create an Alert Action Group through powershell. The last line is throwing error.
$TenantId = Get-AzTenant | select Id
Connect-AzAccount -TenantId $TenantId.Id -SubscriptionName $SubscriptionName
$Receiver1 = New-AzActionGroupReceiver -Name $ActionGroupReceiver -EmailReceiver -EmailAddress $EmailAddress
Set-AzActionGroup -Name $ActionGroup -ResourceGroup $Rg -ShortName $ActionGroupShortName -Receiver $Receiver1
I have Owner access to the subscription and can confirm that all of these variables have appropriate values.
The same code worked for a different subscription in the same tenant where I have contributor access. This is probably an access issue however I am not able to figure why I am getting Forbidden even with Owner access.
Edit - Error text
Set-AzActionGroup : Exception type: ErrorResponseException, Message:
Microsoft.Azure.Management.Monitor.Models.ErrorResponseException: Operation returned an invalid status code 'Forbidden'
at Microsoft.Azure.Management.Monitor.ActionGroupsOperations.d__5.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.Management.Monitor.ActionGroupsOperationsExtensions.d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.Management.Monitor.ActionGroupsOperationsExtensions.CreateOrUpdate(IActionGroupsOperations
operations, String resourceGroupName, String actionGroupName, ActionGroupResource actionGroup)
at Microsoft.Azure.Commands.Insights.ActionGroups.SetAzureRmActionGroupCommand.ProcessRecordInternal()
at Microsoft.Azure.Commands.Insights.MonitorCmdletBase.ExecuteCmdlet(), Code: Null, Status code:Null, Reason
phrase: Null
At C:\Users\YashTamakuwala\Desktop\live_traffic\Alerts\Alerts\CreateActionGroup.ps1:25 char:1
Set-AzActionGroup -Name $ActionGroup -ResourceGroup $Rg -ShortName $A ...
+ CategoryInfo : CloseError: (:) [Set-AzActionGroup], PSInvalidOperationException
+ FullyQualifiedErrorId : Microsoft.Azure.Commands.Insights.ActionGroups.SetAzureRmActionGroupCommand
I found the problem. I tried performing the same action through portal but couldn't. Got this error.
There were policies in place to prevent such activities. Also, the requests from powershell are tracked in Activity Logs which in hindsight, I should have looked into. So I temporarily disabled the policies and was able to run the script successfully.
I'm attempting to deploy to my Virtual machine scale set using the custom script extension as below.
az vmss extension set --debug --name 'CustomScriptExtension' `
--resource-group 'my-rg' `
--publisher 'Microsoft.Compute' `
--version '1.9.5' `
--vmss-name 'myvmss' `
--settings '{\"commandToExecute\": \"powershell.exe ./download-package.ps1\", \"fileUris\": [\"https://[REDACTED].blob.core.windows.net/upload/download-package.ps1\"]}' `
--protected-settings '{\"managedIdentity\": {\"objectId\": \"[REDACTED]\"}}'
When running I get the following error:
cli.azure.cli.core.azclierror : Deployment failed. Correlation ID: 73f4d16b-afe0-4373-8773-1d7dd7d26940. VM has reported a failure when processing extension 'CustomScriptExtension'. Error message: "Failed to download all specified files. Exiting. Error Message: Exception of type 'Microsoft.WindowsAzure.GuestAgent.Plugins.CustomScriptHandler.Downloader.MsiNotFoundException' was thrown."
More information on troubleshooting is available at https://aka.ms/VMExtensionCSEWindowsTroubleshoot
Deployment failed. Correlation ID: 73f4d16b-afe0-4373-8773-1d7dd7d26940. VM has reported a failure when processing extension 'CustomScriptExtension'. Error message: "Failed to download all specified files. Exiting. Error Message: Exception of type 'Microsoft.WindowsAzure.GuestAgent.Plugins.CustomScriptHandler.Downloader.MsiNotFoundException' was thrown."
The file to be downloaded requires authentication so I have given the scale set a system assigned identity and granted it the Storage Blob Data Reader role on the storage account hosting the powershell file.
The custom extension logs on the VM suggest that it was unable to get the identity of the vm:
[7108+00000001] [11/20/2020 09:12:28.79] [INFO] Handler successfully enabled
[7108+00000001] [11/20/2020 09:12:28.80] [INFO] Loading configuration for sequence number 1
[7108+00000001] [11/20/2020 09:12:28.84] [INFO] HandlerSettings = ProtectedSettingsCertThumbprint: [REDACTED], ProtectedSettings: {[REDACTED]}, PublicSettings: {FileUris: [https://[REDACTED].blob.core.windows.net/upload/download-package.ps1], CommandToExecute: powershell.exe ./download-package.ps1}
[7108+00000001] [11/20/2020 09:12:29.26] [INFO] Downloading files specified in configuration...
[7108+00000001] [11/20/2020 09:12:30.90] [INFO] Attempting to get MSI from IMDS
[7108+00000001] [11/20/2020 09:12:31.04] [WARN] WebClient: non retryable error occurred System.Net.WebException: The remote server returned an error: (400) Bad Request.
at System.Net.WebClient.DownloadDataInternal(Uri address, WebRequest& request)
at System.Net.WebClient.DownloadString(Uri address)
at Microsoft.WindowsAzure.GuestAgent.Plugins.MsiUtils.WebClient.<>c__DisplayClass3_0.<DownloadStringWithRetries>b__0()
at Microsoft.WindowsAzure.GuestAgent.Plugins.MsiUtils.WebClientWithRetryAbstract.ActionWithRetries(Action action)
[7108+00000001] [11/20/2020 09:12:31.14] [ERROR] Unknown exception occurred while attempting to get MSI token System.Net.WebException: The remote server returned an error: (400) Bad Request.
at System.Net.WebClient.DownloadDataInternal(Uri address, WebRequest& request)
at System.Net.WebClient.DownloadString(Uri address)
at Microsoft.WindowsAzure.GuestAgent.Plugins.MsiUtils.WebClient.<>c__DisplayClass3_0.<DownloadStringWithRetries>b__0()
at Microsoft.WindowsAzure.GuestAgent.Plugins.MsiUtils.WebClientWithRetryAbstract.ActionWithRetries(Action action)
at Microsoft.WindowsAzure.GuestAgent.Plugins.MsiUtils.WebClient.DownloadStringWithRetries(Uri address)
at Microsoft.WindowsAzure.GuestAgent.Plugins.MsiUtils.MsiProvider.GetMsiHelper(NameValueCollection queries)
[7108+00000001] [11/20/2020 09:12:31.14] [INFO] Msi was not obtained
I can retrieve the identity token from the metadata endpoint via Invoke-WebRequest -Method Get -Uri 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://management.azure.com/' so that appears to be set up correctly.
Any advice on what the problem could be or how to further diagnose this issue would be greatly appreciated.
Here are the few fixes you can try
The object ID of the managed identity might be incorrect.
Please also move commandToExecute and FileUris into protected settings with managed identities.
If want to use system assigned managed identity, you don't need to pass a clientId or objectID, more info here https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/custom-script-windows#property-managedidentity
edit: please explicitly pass an empty json object as settings when you add commandToExecute and fileUris to protected settings. Extensions would fail otherwise due to duplicated settings.
I am extracting a .zip archive on azure webjob.
That worked ok for some time.
Now the webjob suddenly started to fail:
[12/11/2017 16:59:57 > bf607f: ERR ] Command 'cmd /c ""run.cmd""' was aborted due to no output nor CPU activity for 121 seconds. You can increase the SCM_COMMAND_IDLE_TIMEOUT app setting (or WEBJOBS_IDLE_TIMEOUT if this is a WebJob) if needed.
cmd /c ""run.cmd""
[12/11/2017 16:59:57 > bf607f: ERR ] replace D:\home\site\store\extracted/documents/6465465465466015.pdf? [y]es, [n]o, [A]ll, [N]one, [r]ename:
[12/11/2017 16:59:57 > bf607f: SYS INFO] Status changed to Failed
[12/11/2017 16:59:57 > bf607f: SYS ERR ] System.AggregateException: One or more errors occurred. ---> Kudu.Core.Infrastructure.CommandLineException: Command 'cmd /c ""run.cmd""' was aborted due to no output nor CPU activity for 121 seconds. You can increase the SCM_COMMAND_IDLE_TIMEOUT app setting (or WEBJOBS_IDLE_TIMEOUT if this is a WebJob) if needed.
cmd /c ""run.cmd""
at Kudu.Core.Infrastructure.IdleManager.WaitForExit(IProcess process)
at Kudu.Core.Infrastructure.ProcessExtensions.<Start>d__12.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Kudu.Core.Infrastructure.Executable.<ExecuteAsync>d__31.MoveNext()
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
at System.Threading.Tasks.Task`1.get_Result()
at Kudu.Core.Infrastructure.Executable.ExecuteInternal(ITracer tracer, Func`2 onWriteOutput, Func`2 onWriteError, Encoding encoding, String arguments, Object[] args)
at Kudu.Core.Infrastructure.Executable.ExecuteReturnExitCode(ITracer tracer, Action`1 onWriteOutput, Action`1 onWriteError, String arguments, Object[] args)
at Kudu.Core.Jobs.BaseJobRunner.RunJobInstance(JobBase job, IJobLogger logger, String runId, String trigger, ITracer tracer, Int32 port)
---> (Inner Exception #0) ExitCode: -1, Output: Command 'cmd /c ""run.cmd""' was aborted due to no output nor CPU activity for 121 seconds. You can increase the SCM_COMMAND_IDLE_TIMEOUT app setting (or WEBJOBS_IDLE_TIMEOUT if this is a WebJob) if needed., Error: Command 'cmd /c ""run.cmd""' was aborted due to no output nor CPU activity for 121 seconds. You can increase the SCM_COMMAND_IDLE_TIMEOUT app setting (or WEBJOBS_IDLE_TIMEOUT if this is a WebJob) if needed., Kudu.Core.Infrastructure.CommandLineException: Command 'cmd /c ""run.cmd""' was aborted due to no output nor CPU activity for 121 seconds. You can increase the SCM_COMMAND_IDLE_TIMEOUT app setting (or WEBJOBS_IDLE_TIMEOUT if this is a WebJob) if needed.
cmd /c ""run.cmd""
at Kudu.Core.Infrastructure.IdleManager.WaitForExit(IProcess process)
at Kudu.Core.Infrastructure.ProcessExtensions.<Start>d__12.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Kudu.Core.Infrastructure.Executable.<ExecuteAsync>d__31.MoveNext()<---
I mean it seems when the exception happened the webjob was pretty busy, so why do I get that idle timeout exception?
I would suggest you add SCM_COMMAND_IDLE_TIMEOUT and WEBJOBS_IDLE_TIMEOUT setting in your Web App ‘App settings’ configuration with a value of your choice.
For example:
SCM_COMMAND_IDLE_TIMEOUT = 3600
WEBJOBS_IDLE_TIMEOUT = 3600
You may turn on the ‘Always On’ feature if not enabled and see if that helps.
By default, Web Apps are unloaded if they are idle for some period of time. This lets the system conserve resources. In Basic or Standard mode, you can enable ‘Always On’ to keep the app loaded all the time. If your app runs continuous WebJobs, you should enable ‘Always On’, or the WebJobs may not run reliably. To enable, Goto web app -> Settings -> Application Settings -> enable ‘Always On’.
Also, refer diagnostic log stream to get more details on this issue.
I mean it seems when the exception happened the webjob was pretty busy, so why do I get that idle timeout exception?
Root Cause:
There is no output in the console for long time.
Solution:
We also could do as Ashok metioned to increase WEBJOBS_IDLE_TIMEOUT value.This should be set in the configuration setting for the Web App, rather than in the App.config of the WebJob. And the value is in second.
You also could add output to the console every minute. More details could refer to this blog.
Another solution is to add output to the Console, which is especially useful for jobs that are doing long running asynchronous tasks or polling external services, For these cases adding a heartbeat style Console write every minute is better than increasing the Idle Timeout to huge numbers
I am working in a windows 7 environment, running my code on the windows command prompt. I am running a very simple set of code of right now.
data = [('Alice', 1), ('Bob', 2)]
df = sqlContext.createDataFrame(data)
Which gives me the errors
py4j.protocol.Py4JJavaError: An error occurred while calling o23.applySchemaToPythonRDD.
: java.lang.RuntimeException: java.lang.RuntimeException: Error while running command to get file permissions : ExitCodeException exitCode=-1073741515:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:866)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:849)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.
There is much more error output following but the actual error is the first line. I've looked up this error on other post but they don't concern actually creating a dataframe.
I looked at the runtime exception as well and saw there was an error trying to get file permissions. I tried running my command prompt in administrator mode instead but it didn't help.
Does anyone have any ideas what could be causing this?
I am trying to connect to my Azure Service Fabric cluster from a new Azure virtual machine I just set up. But when I use the Connect-ServiceFabricCluster cmdlet is get the following error message:
Connect-ServiceFabricCluster : An error occurred during this operation. Please check the trace logs for more details.
At line:1 char:1
+ Connect-ServiceFabricCluster -ConnectionEndpoint ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [Connect-ServiceFabricCluster], FabricException
+ FullyQualifiedErrorId : CreateClusterConnectionErrorId,Microsoft.ServiceFabric.Powershell.ConnectCluster
The command I am using in powershell is (values are obfuscated):
Connect-ServiceFabricCluster -ConnectionEndpoint {ENDPOINT ADDRESS} -FindType FindByThumbprint -FindValue {THUMBPRINT} -X509Credential -ServerCertThumbprint {THUMBPRINT} -StoreLocation CurrentUser -StoreName My
When I use the exact same command on my development PC it is working just fine. Any suggestions on what is going wrong and how I might debug this is welcome!
Ensure that you have your client certificate installed on the VM at the location indicated in the parameters to the Connect-ServiceFabricCluster cmdlet.