MSDTC, Communication with the underlying transaction manager has failed + Windows Azure VM - azure

My application is deployed on 2 windows azure virtual machines. 1 machine for sql server and the other for the application.
In the application, I am using TransactionScope. So I applied the configuration of transaction on both VMs as shown in the image below.
In addition, I have allowed the Distributed Transaction Coordinator in the Firewall on both machines.
I have a long running process that have a loop, inside each loop i have a separate TransactionScope. Sometimes, not always, I am getting the below exception.
Communication with the underlying transaction manager has failed. ------- Inner Exception: The MSDTC transaction manager was unable to pull the transaction from the source transaction manager due to communication problems. Possible causes are: a firewall is present and it doesn't have an exception for the MSDTC process, the two machines cannot find each other by their NetBIOS names, or the support for network transactions is not enabled for one of the two transaction managers.
The "System Center Endpoint Protection" is installed on Both VMs, I turned off the real-time protection also with no result.
I tried to run the process on the sql VM, everything worked fine with no exception.

Actually, I found the root of the problem after several days of searching and investigating. The problem was that the 2 machines were not ping-able by the net bios name. They were ping-able only by IP. After fixing the ping issue. Everything worked fine.

Related

Azure Data Factory Integration Runtime Going Into Limited State

My team have created an IR in an on-premises VM and we are trying to create a Linked Service to an on-prem DB using that IR
Whenever we click on Test Connection in the Linked Service, the connection fails and IR goes into a limited state
We also whitelisted the IPs provided by Microsoft for IR ADF and also checked the network traces and all seems fine there
Also, we stopped and restarted the IR, uninstalled and installed it again but still the problem resists
Have anyone faced a similar kind of issue?
As this has been a long time we are facing this issue which has now become a blocker for us
Thanks!
This is observed when nodes can't communicate with each other.
You can Log in to the node-hosted virtual machine (VM). Go to Applications and Services Logs > Integration Runtime, open Event Viewer, and filter the error logs. If you find the error System.ServiceModel.EndpointNotFoundException or Cannot connect to worker manager
Follow the official documentation with detailed steps for Troubleshooting Error message: Self-hosted integration runtime node/logical self-hosted IR is in Inactive/ "Running (Limited)" state
As it states:
try one or both of the following methods to fix:
- Put all the nodes in the same domain.
- Add the IP to host mapping in all the hosted VM's host files.
I ran into same issue. Our organization has firewall rules preventing specific ports or url's from outside network. We added Data factory services tags with internet facing in Route table, and IR then connected successfully.

Azure - I keep losing RDP connectivity to my VMs every now and then

I've been evaluating Azure for a couple months. I'm using it via my MSDN subscription. The intention is to determine if my development team should migrate from VMWare to Azure machines.
I managed to setup multiple VMs and work on them successfully. I tend to shut down all VMs as often as I can in order to not use up my monthly resource allowance.
Very often I lose RDP connectivity to all my VMs. Sometimes it helps to resize the VM but not always. I've tried all steps included on the link below, for instance.
What I am missing?
https://azure.microsoft.com/pl-pl/documentation/articles/virtual-machines-troubleshoot-remote-desktop-connections/
Thx guys (and sorry). It was indeed due to network issues (DNS fails on my home internet provider from time to time).

Getting an intermittent error while connecting to on-premise sql database from Azure service

Created an azure MVC website, from service (controller) code we are connecting to an on-premise sql server using Azure Hybrid Connection. Intermittently we are facing below issue.
"A transport-level error has occurred when receiving results from the
server. (provider: TCP Provider, error: 0 - The specified network name
is no longer available.)"
Please provide suggestions to resolve this issue.
You can try following solutions :
Try increasing connection time-out.
check if remote connection is enabled.
Try adding firewall exception.
First of all the error means either the networks has some extra latency, the database is down or you may have too many concurrent connections open the database.
(Make sure you are closing all open datareaders.)
also it may be due to this
These are transient faults and are to be expected in the cloud. Implementing defensive programming is usually a must in the cloud. Try using some retry logic. Microsoft's transient fault exception library is an excellent start. Though meant primarily for SQL Azure and Azure Service bus, you can use the library for SQL IaaS.
In my opinion, 98% sure, because I recently had the same experience, it is a network issue from the server provider.
For instance: if you are rent the server from Ionos, by default all remote connections are blocked, even though you disable the firewall in the server. You still won't be able to connect remotely. You can, however, do your work on the server without any problem.
To connect remotely, you have to contact the server provider. They will explain how to enable firewall ports from your control panel.
I contacted my server provider as I almost get frustrated. Here was their response.
enter image description here
After this, every permitted client can connect remotely to the server.
I wish you success.

Services down after implementing High Availability on Windows Azure

After trying to implement High Availability to one of the existing servers following this article
https://www.windowsazure.com/en-us/documentation/articles/virtual-machines-capture-image-windows-server/
After I was done the newly created machine is running, however I cannot RDP or PING any of the services that are running on the server existing. It shows that the VM is running
Has anyone faced such a problem before ?
In case this unexpected issue happens. The workaround is to create an XSmall VM and RDP to the VM that is not responding through internal IP. This should get it to start.

HTTP Request Timeout Windows Azure Deploy

I have an MVC 4 website using a WCF service. When I deploy to Windows Azure using the VS 2012 publish wizard, I get this error:
10:13:19 AM - The HTTP request to 'https://management.core.windows.net/42d4257b-5f38-400d-aac5-2e7acee9597d/services/hostedservices/myapp?embed-detail=true' has exceeded the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout.
After cleaning the project and publishing a few times, the error goes away. What am I doing wrong?
Whenever you start publish process from VS machine, a SSL tunnel is established first and once the tunnel is created, the package is transferred from your machine to Windows Azure Portal first. After the upload is completed, you will see the result notifications are posted back to Publish result windows and that is how it happens.
In your case, the time to build the SSL tunnel doe secure package transfer is longer then normal, this could be because of network latency between your machine and the Windows Azure Management Portal. For security reason the time to create the tunnel smaller windows and if the connection is not created, the retry cycle starts the process again and even if that fails you are greeted with the failure message. This could be caused by excessive traffic on either side or both sides. So this is mainly a networking related issue rather then specific to Windows Azure as after some time successive tries, you could upload your package.
In such failure/situation, you can run network capture utilities i.e netmon, wireshark, and see the time taken during failure and success to see the different in various transfer. This will help you to understand the underlying delaying issues.
Try to update your roles diagnostics
like below
then update your storage credentials because it may be expired.

Resources