We have our application running on a Sun Solaris system and have a local WebSphere MQ installation. The applcation uses bindings mode to connect to queue manager. When trying to send message to the local queue, the JNDI binding is successfull but we encounter javax.jms.JMSSecurityException: MQJMS2013: invalid security authentication supplied for MQQueueManager error. When investigated found that the credentials (userid) used for authentication is not case sensitive as the user on which the application is running. The userid matches but it is not a case sensitive match. By default the user on which the application is running will be passed for authentication, but here the case sensitive match is failing. The application server is WebLogic. Appreciate any inputs.
In order to open the local queue, the application must have first connected to the queue manager successfully. The error on the remote queue is a connection error so it is not even getting to the queue manager. This suggests that you are using different connection factories and that the second one has some differences in the connectivity parameters. First step is to reconcile those differences.
Also, a MQJMS2013 Security Error can be many things, most of which are not actually MQ issues. For example some people store their managed objects in LDAP and an authentication problem there will throw this error. For people who use a filesystem-based JNDI, OS file permissions can cause the same thing. However if it is an actual WMQ issue (which this appears to be) then the linked exception will contain the MQ Reason Code (for example, MQRC=2035). If you want to be able to better diagnose MQ (or for that matter any JMS transport) issues, it pays to get in the habit of printing linked exceptions.
If you are not able to resolve this issue based on this input, I would advise updating the question with details of the managed object definitions and the reason code obtained from printing the linked exceptions.
We were using createQueueConnection() in QueueConnectionFactory for creating the connection and the issue got resolved by using the method createQueueConnection("",""). The unix userid (webA) is case sensitive and the application was trying to authenticate on the MQ with the case insensitive userid (weba) and MQ queue manager was rejecting the connection attempt. Can you tell us why the application was sending the case insensitive userid (weba) earlier?
Thanks,
Arun
Related
I have two Azure WebJobs. The first takes an incoming message that tells it to grab a PDF and break it into individual page images and then queue another message for the second WebJob to process the individual pages. It worked fine on our QC instance, but when we tried to move to production I started getting strange errors on the second job, but not consistently. The first job runs and breaks the file into page images. That is working fine. I have confirmed that every page image gets created and every page message gets queued. However, for the second job, only some of the messages are getting processed correctly. The remaining show this error in the WebJob diagnostics:
Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.ProcessBatchPage ---> System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: SQL Network Interfaces, error: 52 - Unable to locate a Local Database Runtime installation. Verify that SQL Server Express is properly installed and that the Local Database Runtime feature is enabled.) ---> System.ComponentModel.Win32Exception: The system cannot find the file specified
But what's weird is that this error mentions the Local Database Runtime and SQL Server Express and I am not references either anywhere in my code. The system points at an Azure SQL DB. The job is ADO.Net and I have hardcoded the connection string to try to eliminate any issues with configuration based connection strings. But what's weird is that it only happens to a certain portion of the messages. The others process perfectly.
Lastly, I ran the job in debug locally (still pointing at the real queue and DB on Azure) and got the same problem. But the job outputs a console line with the job ID as the first line of the code. For those jobs that process successfully, I see this writeline. For those that fail, I never see anything. It's almost like the job is not really starting up correctly. (the failed jobs also have a really short run time 50-100ms)
I had the same issue with some jobs and I've came accross these articles to find a solution:
Transient Fault Handling (Building Real-World Cloud Apps with Azure)
Connection Resiliency / Retry Logic (EF6 onwards)
From theses articles :
Causes of transient failures :
In the cloud environment you’ll find that failed and dropped database connections happen periodically. That’s partly because you’re going through more load balancers compared to the on-premises environment where your web server and database server have a direct physical connection. Also, sometimes when you’re dependent on a multi-tenant service you’ll see calls to the service get slower or time out because someone else who uses the service is hitting it heavily. In other cases you might be the user who is hitting the service too frequently, and the service deliberately throttles you – denies connections – in order to prevent you from adversely affecting other tenants of the service.
Use smart retry/back-off logic to mitigate the effect of transient failures:
The Microsoft Patterns & Practices group has a Transient Fault Handling Application Block that does everything for you if you’re using ADO.NET for SQL Database access (not through Entity Framework). You just set a policy for retries – how many times to retry a query or command and how long to wait between tries – and wrap your SQL code in a using block :
public void HandleTransients()
{
var connStr = "some database";
var _policy = RetryPolicy.Create < SqlAzureTransientErrorDetectionStrategy(
retryCount: 3,
retryInterval: TimeSpan.FromSeconds(5));
using (var conn = new ReliableSqlConnection(connStr, _policy))
{
// Do SQL stuff here.
}
}
When you use the Entity Framework you typically aren’t working directly with SQL connections, so you can’t use this Patterns and Practices package, but Entity Framework 6 builds this kind of retry logic right into the framework. In a similar way you specify the retry strategy, and then EF uses that strategy whenever it accesses the database.
To use this feature in the Fix It app, all we have to do is add a class that derives from DbConfiguration and turn on the retry logic.
// EF follows a Code based Configuration model and will look for a class that
// derives from DbConfiguration for executing any Connection Resiliency strategies
public class EFConfiguration : DbConfiguration
{
public EFConfiguration()
{
AddExecutionStrategy(() => new SqlAzureExecutionStrategy());
}
}
Ok, I want to check if I can run some OS or MQSC commands in MQ server remotely. As long as I know, this could be done with SYSTEM.ADMIN.SVRCONN. In order to do that, I add a remote Queue Manager to my WebSphere MQ client. I put Queue Manager name on the server with proper IP, but when I use SYSTEM.ADMIN.SVRCONN as channel name, I have got: Channel name not recognized (AMQ4871) error.
Also, if I have a channel name like MY.CHANNEL.NAME and it is a server-connection channel with mqm as its MCAUSER, can I run some commands (MQSC or OS) through this channel on the server?
Thanks.
Edit1
I am using WebSphere MQ v.7.0
By "I add a remote Queue Manager to my WebSphere MQ client" I meant I added a remote queue manager to MQ Explorer.
Edit2
I want to explain my question more precisely in this edit. I want to connect to a remote Qmanager via MQ Explorer. I know Qmanager name and its IP ofcourse. Also, the remote Qmanager has both SYSTEM.ADMIN.SVRCONN and SYSTEM.AUTO.SVRCONN channel available. When I check CHLAUTH for these channels, I've got:
AMQ8878: Display channel authentication record details.
CHLAUTH(SYSTEM.ADMIN.SVRCONN) TYPE(ADDRESSMAP)
ADDRESS(*) USERSRC(CHANNEL)
AMQ8878: Display channel authentication record details.
CHLAUTH(SYSTEM.*) TYPE(ADDRESSMAP)
ADDRESS(*) USERSRC(NOACCESS)
dis chl(SYSTEM.ADMIN.SVRCONN) MCAUSER
5 : dis chl(SYSTEM.ADMIN.SVRCONN) MCAUSER
AMQ8147: WebSphere MQ object SYSTEM.ADMIN.SVRCONN not found.
dis chl(SYSTEM.AUTO.SVRCONN) MCAUSER
6 : dis chl(SYSTEM.AUTO.SVRCONN) MCAUSER
AMQ8414: Display Channel details.
CHANNEL(SYSTEM.AUTO.SVRCONN) CHLTYPE(SVRCONN)
MCAUSER( )
As you can see here, I should be able to connect via these two channels and run some commands. But when I choose SYSTEM.ADMIN.SVRCONN as channel name in remote configuration I get : Channel name not recognized (AMQ4871) and when I choose SYSTEM.AUTO.SVRCONN as channel name, I get: You are not authorized to perform this operation (AMQ4036).
Any idea?
when I use SYSTEM.ADMIN.SVRCONN as channel name, I have got: Channel
name not recognized (AMQ4871) error.
Did you define the channel on the remote queue manager?
if I have a channel name like MY.CHANNEL.NAME and it is a
server-connection channel with mqm as its MCAUSER, can I run some
commands (MQSC or OS) through this channel on the server?
Sure. And so can anyone else. You should read up on MQ security and not expose your queue manager to hackers.
I add a remote Queue Manager to my WebSphere MQ client.
I'm not at all sure what this means exactly. MQ Explorer keeps a list of queue manager definitions. MQ Client is just a library for making connections.
If you meant you added a remote queue manager to MQ Explorer, then it makes sense. In addition to defining the connection in Explorer, you will also have to provision the connection at the queue manager. This means defining the SYSTEM.ADMIN.SVRCONN channel or one with a name of your choosing, defining and starting a listener. If you are on a 7.1 or higher queue manager (it's always good to list versions when asking about MQ), then you will also need to create a CHLAUTH rule to allow the connection, and another CHLAUTH rule to allow the connection with administrative privileges. either that or disable CHLAUTH rules altogether, but this is not recommended.
If I have a channel name like MY.CHANNEL.NAME and it is a server-connection channel with mqm as its MCAUSER, can I run some commands (MQSC or OS) through this channel on the server?
Maybe.
Out of the box, MQ denies all client connections. There are CHLAUTH rules to deny administrative connections, and other CHLAUTH rules to deny connections for any SYSTEM.* channel other than SYSTEM.ADMIN.SVRCONN. Since admin connections are denied, non-admin users must have access provisioned using SET AUTHREC or setmqaut commands before they can use SYSTEM.ADMIN.SVRCONN, hence MQ is said to be "secure by default."
When you create MY.CHANNEL.NAME and connect as an admin, and if CHLAUTH is enabled, the connection will be denied. You would have to add a new CHLAUTH rule such as...
SET CHLAUTH('MY.CHANNEL.NAME') TYPE(BLOCKUSER) USERLIST('*NOBODY') WARN(NO) ACTION(ADD)
...in order to allow the admin connection.
(Note: MQ CHLAUTH blocking rules use a blacklist methodology. The default rule blocks *MQADMIN from all channels. The rule I listed above overrides the default rule because the channel name is more specific, and it blocks *NOBODY - which is a list of user IDs that does not include mqm or any other administrative user ID. It's weird, but that's how it works.)
More on this topic at http://t-rob.net/links, and in particular Morag's conference presentation on CHLAUTH rules is a must-read.
20150506 Update
Response to edits #1 & #2 in the original question is as follows:
The first edit says that the QMgr is at v7.0 and the second shows that the QMgr had CHLAUTH records defined. Since CHLAUTH wasn't available until v7.1, these two statements are mutually exclusive - they cannot both be true. When providing the version of the MQ server or client, best to paste in the output of dspmqver. If the question pertains to GSKit, Java code or other components beyond the base code, then dspmqver -a would be even better.
The MQSC output provided in the question updates completely explains the errors.
dis chl(SYSTEM.ADMIN.SVRCONN) MCAUSER
5 : dis chl(SYSTEM.ADMIN.SVRCONN) MCAUSER
AMQ8147: WebSphere MQ object SYSTEM.ADMIN.SVRCONN not found.
As Morag notes, the SYSTEM.ADMIN.SVRCONN cannot be used because it has not been defined.
AMQ8878: Display channel authentication record details.
CHLAUTH(SYSTEM.*) TYPE(ADDRESSMAP)
ADDRESS(*) USERSRC(NOACCESS)
The auths error happens because any connection to any SYSTEM.* SVRCONN channel that is not expressly overridden is blocked by this rule. The CHLAUTH rule for SYSTEM.ADMIN.SVRCONN takes precedence because it is more explicit, and allows non-admin connections to that channel. The lack of a similar overriding rule for SYSTEM.AUTO.SVRCONN means it is denied by the existing rule for SYSTEM.* channels as listed above.
As noted previously, it is STRONGLY recommended to go to the linked web site and read Morag's conference presentation on MQ v7.1 security and CHLAUTH rules. It explains how the CHLAUTH rules are applied, how the precedence works, and perhaps most importantly, how to verify them with the MATCH(RUNCHECK) parameter.
To do the MQ Security right, you need at least the following:
Define a channel with a name that does not begin with SYSTEM.* and set MCAUSER('*NOBODY').
Define a CHLAUTH rule to allow connections to that channel, mapping the MCAUSER as required.
If you wish to connect to the channel with mqm or admin access, define CHLAUTH rules to authenticate the channel, preferably using TLS and certificates.
There are several things that you do not want to do, for reasons that are explained in Morag's and my presentations. These include...
Using SYSTEM.AUTO.SVRCONN for any legitimate connections.
For that matter, using SYSTEM.* anything (except for SYSTEM.ADMIN.SVRCONN or SYSTEM.BROKER.*) for legitimate connections.
Allowing unauthenticated admin connections.
Disabling CHLAUTH rules.
Disabling OAM.
I want people to learn MQ security, and to learn it really well. However, as a consultant who specializes in it I must tell you that even customers who have hired me to give on-site classes and help with their implementation have trouble getting it locked down. There are two insights to be gleaned from this fact.
First, if you do not enough time to get up to speed on MQ security then the implementation will not be secure. To study the topic to the point of understanding how all the pieces fit together well enough to devise a decent security model requires hands-on training with a QMgr that you can build, hammer at, tear down, build again, etc. takes weeks of dedicated hands-on study, or months or years of casual study. My advice here is to go get MQ Advanced for Developers. It is fully functional, free, and has a superset of the controls you have on the v7.1 or v7.5 QMgr you are working on now.
Second insight is that there is no shortcut to learning MQ (or any other IT) security. If it is approached as though it were simply a matter of configuration, then it is almost guaranteed to not be secure when implemented. If it is approached as learning all the different controls available for authentication, authorization, and policy enforcement, and then learning how they all interact, and if security is approached as a discipline of practice, then it is possible to achieve some meaningful security.
Addressing that second issue will require an investment in education. Read through the presentations and try out the various controls live on a test QMgr that you administer. Understand which errors go to which error logs, which generate event messages and which type of events are generated. Obtain some of the diagnostic tools in the SupportPacs, such as MS0P which is one of my favorites, and get familiar with them. Consider attending the MQ Tech Conference (where you can meet many of the folks responding here on SO) for more in-depth training.
If you (or your employer) are not ready to commit to in-depth skill building or are trying to learn this for an imminent project deadline, then consider hiring deep MQ Security skills on an as-needed basis because reliance on just-in-time on-the-job training for this topic is a recipe for an unsecure network.
I have several message handlers in a particular endpoint that do their work against a SQL Azure database (at the moment still using a local SQL 2012 instance). I have a command handler that publishes 2 events, call them X and Y. In the same endpoint I have a subscriber to X and a subscriber to Y. Both of these subscribers are internally using the same data access component, call that Z. Dependency injection is configured on a per-call basis, not shared.
Component Z is using Entity Framework 6 under the curtains. The issue I am having is that just opening the database is throwing a SqlException and complaining about MSDTC escalations.
I have temporarily wrapped the handlers in a TransactionScope.Suppress and that has stopped the error but I believe I'm missing something more fundamental.
Is it a simple matter of configuring the endpoint to be non-transactional? I would have thought this would just work seeing as I've configured to use Azure Service Bus as the transport mechanism. If I do this will NServiceBus still retry if an exception is thrown within the message handler? (Up to the SLR limits -- not part of the question, I also understand the idempotency issues).
#Phil,
First, you shouldn't be using MSDTC with SQL Azure - it's not supported. The feature is suggested, but only under review. DTC is not supported on Azure. Alternatively, you could look into the following suggestion to use SqlTransaction approach.
Second, transport you're using has nothing to do with your data access. Since you're using Azure Service Bus, it will not be part of your handler code. Making handler a transactional is to force an atomic change or roll-back. Regardless of your handler, will retry. Challenge is that when handler/endpoint is not transactional, and within handler first write to DB succeeded and second failed, first write won't be reverted. As for Azure Service Bus as a transport, it's not transactional in its nature (ie no DTC).
Which version of NServiceBus.Azure are you on? Do you have a stack trace of the exception? Where does it come from?
We push the sends and publishes out of the scope of the receive transaction scope explicitly to prevent promotion to the DTC, so that the transaction is local to the sql, so I doubt that is what is happening here.
From you description it looks like you are using a different data access instance for each handler (per call container config) and you have multiple handlers on the same message. If both of these open a new connection to the SQL you would see promotion as well (even if it is the same server)
Could that be it? That it throws on the second open?
I’m currently building an application using NServicebus and Azure.
The regular processes are working, but now I’d like to do more about the management and monitoring aspect of the application.
The customer wants to see a dashboard where he can see the health of the application and also be able to correct issues.
What I’d like to do is:
Detect when things are sent to an error queue (to be able to send an alert to an admin)
Allow admin to handle messages on error queue from management application, without
resorting to the provided command line tool.
Is there a way to programmatically do error handling in NServicebus? I know which errors are transient and which errors might need manual intervention.
Is it possible to plug in logic to the error handling logic of nservicebus?
Is it possible to handle messages on the error queue programmatically?
Thanks,
Erwin
Regarding "dashboard where he can see the health of the application and also be able to correct issues":
Please take a look at ServicePulse (http://particular.net/ServicePulse) for production and online monitoring.
This provides both endpoint health indicators and Failed message indicators (including "Retry" capabilities).
For advanced debugging and visualization of your process you should also consider ServiceInsight (http://particular.net/ServiceInsight).
Behind the scenes of ServicePulse there's the ServiceControl server which exposes REST HTTP API with programmatic access to audited and error messages.
HTH,
Danny.
We are currently experiencing a rather troublesome problem in our development environment with the following message...
A connection was successfully established with the server,
but then an error occurred during the pre-login handshake.
(provider: SSL Provider, error: 0 - The certificate's CN
name does not match the passed value.)
...the commonly accepted wisdom to resolving this problem is to set the TrustServerCertificate portion of the connection to True. However, this does not work reliably or consistently.
This particular error occurs in a number of instances, for instance testing our WCF Service in our Azure Emulator talking to live / hosted SQL Azure Instance or even using SQL Management Studio. The only common denominator we've found is that this occurs only when we connect directly to SQL Azure as opposed to when its hosted and Azure is talking directly to SQL Azure (which does work).
I've tried a number of tactics to resolve the problem (such as the one detailed here), i.e. believing it was connection related and removing pooling and other modifications to the connection string. But alas, none are conclusive and more irritating is that the error is intermittent and will prevent access for a short period of time before magically resolving itself.
Other factors that I've eliminated.
We're using the Transcient Application Block to attempt to recover from these errors, but no.
Our office has no proxy server with our connection to the Azure hosted services.
Has anyone else experienced this problem or has any suggestions?
You need to scan for Non-IFS Winsock BSPs or LSPs which not compatible with the FILE_SKIP_COMPLETION_PORT_ON_SUCCESS flag ,problem results primarily from non-IFS LSPs Being installed.
Just run "netsh WinSock Show Catalog" from command prompt , and check any "service flag" which doesn't look in the format of 0x20xxx
In my case I found that "Speed Accelerator" with service flag 0x66,removing this software solve my Problem .
More information can be found here : http://support.microsoft.com/kb/2568167
What does your connection string look like? Not sure if you've tried this yet but I remember having a problem similar when using a remote SQL connection to SQL Azure and found that I had to set:
Trusted_Connection=False;Encrypt=True
and remove any Connect Timeout from the string entirely.