Pre-login handshake woes with connecting directly to SQL Azure - azure

We are currently experiencing a rather troublesome problem in our development environment with the following message...
A connection was successfully established with the server,
but then an error occurred during the pre-login handshake.
(provider: SSL Provider, error: 0 - The certificate's CN
name does not match the passed value.)
...the commonly accepted wisdom to resolving this problem is to set the TrustServerCertificate portion of the connection to True. However, this does not work reliably or consistently.
This particular error occurs in a number of instances, for instance testing our WCF Service in our Azure Emulator talking to live / hosted SQL Azure Instance or even using SQL Management Studio. The only common denominator we've found is that this occurs only when we connect directly to SQL Azure as opposed to when its hosted and Azure is talking directly to SQL Azure (which does work).
I've tried a number of tactics to resolve the problem (such as the one detailed here), i.e. believing it was connection related and removing pooling and other modifications to the connection string. But alas, none are conclusive and more irritating is that the error is intermittent and will prevent access for a short period of time before magically resolving itself.
Other factors that I've eliminated.
We're using the Transcient Application Block to attempt to recover from these errors, but no.
Our office has no proxy server with our connection to the Azure hosted services.
Has anyone else experienced this problem or has any suggestions?

You need to scan for Non-IFS Winsock BSPs or LSPs which not compatible with the FILE_SKIP_COMPLETION_PORT_ON_SUCCESS flag ,problem results primarily from non-IFS LSPs Being installed.
Just run "netsh WinSock Show Catalog" from command prompt , and check any "service flag" which doesn't look in the format of 0x20xxx
In my case I found that "Speed Accelerator" with service flag 0x66,removing this software solve my Problem .
More information can be found here : http://support.microsoft.com/kb/2568167

What does your connection string look like? Not sure if you've tried this yet but I remember having a problem similar when using a remote SQL connection to SQL Azure and found that I had to set:
Trusted_Connection=False;Encrypt=True
and remove any Connect Timeout from the string entirely.

Related

Azure Service Fabric Activation Error 7148

I have a service fabric cluster which hosts numerous applications. One of the applications has a service type where the service is created, runs for a bit, and then is deleted. Everything works great, but the cluster virtually always has its state set to error because there will be a few of these in the "Unhealthy evaluations" section.
Error event: SourceId='System.Hosting', Property='CodePackageActivation:Code:EntryPoint'.
There was an error during CodePackage activation.The service host terminated with exit code:7148
I've wrapped both the program's main and RunAsync in exception handlers, but never see anything in analytics. Is there any way to look up what exit code 7148 means? Thanks.
7148 is a general error code that indicates that something failed in SF in the process of setting up or activating your service's host process. So that's the reason that you're not seeing any errors or exceptions - your code is never getting a chance to run.
Examples of things I've seen that led to 7148:
The exe was not actually a windows exe due to corruption
The service's manifest had a reference to a cert or some other pre-req like an endpoint that was incorrectly configured (like a port that was already in use or the wrong thumbprint for a cert)
Something blew up inside Windows that cause the process creation to fail, like a failure to correctly configure host networking for a container
Most of the times when I see this I have to look at the windows error logs to see what's really happening. The SF folks are also trying to capture more common causes of failures and reporting them as better health errors rather than relying on 7148.

Azure WebJob DB Connection Error Only on some instances

I have two Azure WebJobs. The first takes an incoming message that tells it to grab a PDF and break it into individual page images and then queue another message for the second WebJob to process the individual pages. It worked fine on our QC instance, but when we tried to move to production I started getting strange errors on the second job, but not consistently. The first job runs and breaks the file into page images. That is working fine. I have confirmed that every page image gets created and every page message gets queued. However, for the second job, only some of the messages are getting processed correctly. The remaining show this error in the WebJob diagnostics:
Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.ProcessBatchPage ---> System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: SQL Network Interfaces, error: 52 - Unable to locate a Local Database Runtime installation. Verify that SQL Server Express is properly installed and that the Local Database Runtime feature is enabled.) ---> System.ComponentModel.Win32Exception: The system cannot find the file specified
But what's weird is that this error mentions the Local Database Runtime and SQL Server Express and I am not references either anywhere in my code. The system points at an Azure SQL DB. The job is ADO.Net and I have hardcoded the connection string to try to eliminate any issues with configuration based connection strings. But what's weird is that it only happens to a certain portion of the messages. The others process perfectly.
Lastly, I ran the job in debug locally (still pointing at the real queue and DB on Azure) and got the same problem. But the job outputs a console line with the job ID as the first line of the code. For those jobs that process successfully, I see this writeline. For those that fail, I never see anything. It's almost like the job is not really starting up correctly. (the failed jobs also have a really short run time 50-100ms)
I had the same issue with some jobs and I've came accross these articles to find a solution:
Transient Fault Handling (Building Real-World Cloud Apps with Azure)
Connection Resiliency / Retry Logic (EF6 onwards)
From theses articles :
Causes of transient failures :
In the cloud environment you’ll find that failed and dropped database connections happen periodically. That’s partly because you’re going through more load balancers compared to the on-premises environment where your web server and database server have a direct physical connection. Also, sometimes when you’re dependent on a multi-tenant service you’ll see calls to the service get slower or time out because someone else who uses the service is hitting it heavily. In other cases you might be the user who is hitting the service too frequently, and the service deliberately throttles you – denies connections – in order to prevent you from adversely affecting other tenants of the service.
Use smart retry/back-off logic to mitigate the effect of transient failures:
The Microsoft Patterns & Practices group has a Transient Fault Handling Application Block that does everything for you if you’re using ADO.NET for SQL Database access (not through Entity Framework). You just set a policy for retries – how many times to retry a query or command and how long to wait between tries – and wrap your SQL code in a using block :
public void HandleTransients()
{
var connStr = "some database";
var _policy = RetryPolicy.Create < SqlAzureTransientErrorDetectionStrategy(
retryCount: 3,
retryInterval: TimeSpan.FromSeconds(5));
using (var conn = new ReliableSqlConnection(connStr, _policy))
{
// Do SQL stuff here.
}
}
When you use the Entity Framework you typically aren’t working directly with SQL connections, so you can’t use this Patterns and Practices package, but Entity Framework 6 builds this kind of retry logic right into the framework. In a similar way you specify the retry strategy, and then EF uses that strategy whenever it accesses the database.
To use this feature in the Fix It app, all we have to do is add a class that derives from DbConfiguration and turn on the retry logic.
// EF follows a Code based Configuration model and will look for a class that
// derives from DbConfiguration for executing any Connection Resiliency strategies
public class EFConfiguration : DbConfiguration
{
public EFConfiguration()
{
AddExecutionStrategy(() => new SqlAzureExecutionStrategy());
}
}

Connection to SQL Azure DB is unstable

I'm having a strange error with a recently deployed Azure website.
Everything seems to work most of the time, but on a regular basis (at least daily) there is a period during which I receive following error:
A network-related or instance-specific error occured while establishing a connection to SQL Server.
The server was not found or was not accessible.
Is this a stability issue with Azure or is it possible that something's wrong in my code (but why does it work then most of the time)?
Is the code using the Transient Fault Handling Application Block - http://msdn.microsoft.com/en-us/library/hh680934(v=PandP.50).aspx? This block understands how to handle the transient errors that can, and will, happen with SQL Database.

SymmetricDS and Azure SQL Server

I need help connecting Azure database using SymmetricDS 3.5.1. I can't seem to the configuration correct. I get an error saying "Cannot create PoolableConnectionFactory" with the message either "socket closed" (when I don't specify ssl parameter) or "login timeout" (when I specify the ssl parameter). I have specified a timeout amount in the connection string, however, it does not seem to work and defaults to 30 seconds. Is there any documentation on how to connect to an Azure database using SymmetricDS? Anyway, take a look and tell me what I need to change in my engine.properties file? I have the following:
db.url=jdbc:jtds:sqlserver://MyServer.database.windows.net:1433;database=MyDatabase;user=MyUser#MyServer;password=MyPassowrd;encrypt=true;hostNameInCertificate=*.database.windows.net;loginTimeout=300;useCursors=true;bufferMaxMemory=10240;lobBuffer=5242880;ssl=require
db.user=MyUser#MyServer
db.database=MyDatabase
db.password=MyPassword
db.driver=net.sourceforge.jtds.jdbc.Driver
It turns out you have to use the Microsoft JDBC driver. I didn't see any documentation on how to set it up so for the sake of others this is what I did after reading http://www.symmetricds.org/docs/how-to/connect-to-database
Download the Microsoft jdbc driver
Put the sqljdbc4.jar file in the lib folder of your symmetric folder
Change the *.properties file to be the following connection information...
db.driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
db.url=jdbc:sqlserver://{your_server_name}.database.windows.net:1433;database={database_name};user={user}#{your_server_name};password={password};encrypt=true;hostNameInCertificate=*.database.windows.net;loginTimeout=300;useCursors=true;bufferMaxMemory=10240;lobBuffer=5242880;

WebSphere MQ Security Authentication Exception on Unix

We have our application running on a Sun Solaris system and have a local WebSphere MQ installation. The applcation uses bindings mode to connect to queue manager. When trying to send message to the local queue, the JNDI binding is successfull but we encounter javax.jms.JMSSecurityException: MQJMS2013: invalid security authentication supplied for MQQueueManager error. When investigated found that the credentials (userid) used for authentication is not case sensitive as the user on which the application is running. The userid matches but it is not a case sensitive match. By default the user on which the application is running will be passed for authentication, but here the case sensitive match is failing. The application server is WebLogic. Appreciate any inputs.
In order to open the local queue, the application must have first connected to the queue manager successfully. The error on the remote queue is a connection error so it is not even getting to the queue manager. This suggests that you are using different connection factories and that the second one has some differences in the connectivity parameters. First step is to reconcile those differences.
Also, a MQJMS2013 Security Error can be many things, most of which are not actually MQ issues. For example some people store their managed objects in LDAP and an authentication problem there will throw this error. For people who use a filesystem-based JNDI, OS file permissions can cause the same thing. However if it is an actual WMQ issue (which this appears to be) then the linked exception will contain the MQ Reason Code (for example, MQRC=2035). If you want to be able to better diagnose MQ (or for that matter any JMS transport) issues, it pays to get in the habit of printing linked exceptions.
If you are not able to resolve this issue based on this input, I would advise updating the question with details of the managed object definitions and the reason code obtained from printing the linked exceptions.
We were using createQueueConnection() in QueueConnectionFactory for creating the connection and the issue got resolved by using the method createQueueConnection("",""). The unix userid (webA) is case sensitive and the application was trying to authenticate on the MQ with the case insensitive userid (weba) and MQ queue manager was rejecting the connection attempt. Can you tell us why the application was sending the case insensitive userid (weba) earlier?
Thanks,
Arun

Resources