Hybris business process in running state - cron

The order confirmation email did not get triggered after placing an order. Looking at the business processes for the order, none of them got triggered.
Things I check-listed:
Restarted admin node
Deleted all business processes that are in running state
Looked for any failed feed in the hot folder
Had a search for TaskExecutor-master in Hybris thread dumps - to identify which part is causing an issue, none found.
The task.engine.loadonstartup = false.
None of the above worked.
Please help me resolve this issue.

**Solved:
We had an issue with the cluster ID. All the nodes had the same cluster ID. This was caused due to a wrong deployment. We re-mediated it which fixed the issue.**

Related

Zabbix trigger does not resolve for CentOS 7

I have zabbix version 3.4. i have 2 templates. one for monitoring the OS and the other for monitoring Databases. i have few servers with CentOS 6.9 added to these templates. everything works just fine.
then i added 4 servers to these templates with CentOS 7. items works correctly. they have the expected results. the problem is when a trigger is activated for these 4 servers, they don't resolve and stay active and we see them in dashboard.
for example, in Database template, we have an item that is for service status. if it is 1 then it means service is running, if it is something other than 1 it means service is not running. i stopped the service on one of those CentOS 7 servers. the result that agent got was 0. trigger was activated. then i started the service. in latest data i can see that the value is 1 which means service is running, but the trigger did't resolved and it is still up.
then i did the above steps for one of the CentOS 6.9 servers and everything works just fine.
why this happens and how i can fix it?
Update:
the trigger expression is:
{log-b:db2stat.db2instance_service[].last()}<>1
Long story short: maybe check the DB logs if some inserts/updates are not failing (especially in event_recovery and problem tables)
Short story long:
We observe the similar behaviour on ZBX 4.4 and only with certain triggers checking last 10 minutes of data (e.g. item_key.str('problem',10m)=1 ). The problem get detected but later will not get resolved even after several days event though the trigger conditions are no longer matched.
In our particular case:
I looked into DB and found the event with appropriate eventid (e.g. 123) in events table and noted down objectid (e.g.100123)
Then I checked the events table for the specific objectid (100123) and found that indeed there was a "resolution" event (e.g. 125)
when checked event_recovery table I couldn't find an entry which would match those two eventids (while in case of other triggers, they had an entry in event_recovery table after they got resolved)
I simply created the entry: insert into event_recovery (eventid, r_eventid) VALUES ('123', '125');
it is not enough however as the similar pairing needs to be adjusted in problems table
in problems table I found a problem with my eventid (123) and simply mapped recovery event to that: update problem set r_eventid='125' where eventid='123' and objectid='100123';
The problem with this is that this is not a solution, just a one time workaround. The issue keeps popping up and at this time we suspect the problem is on database side (we have a primary+standby DB with selects directed to standby which can cause certain select operations which do write in the end to fail as standby DB is in read-only mode).
We will try to redirect everything to primary DB to see if it helps.

MongoDB direct connect works but replica group fails after server out of HD

A development server I was using ran low on disk space causing the system to crash. When I checked the replica set cluster it came back 1 node was unreachable. I removed the bad nodes and forced config. I went home for the day the next day I came back, and the status was not good saying unreachable for one of the nodes. I worked on something else and later that day it when I checked rs.status it came back primary and secondary. I then added the 3rd node back that ran out of space. Now I can connect to each node and the data looks ok, but I cannot connect to the replica set group in php/nodejs or stuido3t. When i use the group connect it returns auth invalid but I can use that same auth for each node.
Any ideas what could be going on and how to fix it?
What I needed to do was take down the 3 services making up the replica docker swarm. I redeployed it using my scripts with auth turned on. It I checked the replica status and it returns host unreachable, I checked it a few hours later that day it came back online. I was unable to get the replica set back online with rs.add / remove, but I did get it back up and running by recreating the services.

Beanstalkd / Pheanstalk security issue

I have just started using beanstalkd and pheanstalk and I am curious whether the following situation is a security issue (and if not, why not?):
When designing a queue that will contain jobs for an eventual worker script to pick up and preform SQL database queries, I asked a friend what I could do to prevent an online user from going into port 11300 of my server, and inserting a job into the queue himself and hence causing the job to be executed with malicious code. I was told that I could include a password inside the job being sent.
Though after some time passed, I recognized that someone could preform a few simple commands on a terminal and obtain the job inside the queue, and hence find the password, and then create jobs with the password included:
telnet thewebsitesipaddress 11300 //creating a telnet connection
list-tubes //finding which tubes are currently being used
use a_tube_found //using one of the tubes found
peek-ready //see whats inside one of the jobs and find the password
What could be done to make sure this does not happen and my queue doesn't get hacked / controlled?
Thanks in advance!
You can avoid those situations by placing beanstalkd behind a firewall or in a private network.
DigitalOcean (for example) offers such a service where you have a private network IP address which can be accessed only from servers of the same location.
We've been using beanstalkd in our company for more than a year, and we haven't had any of those issues yet.
I see, but what if the producer was a page called index.php, where when someone entered it, a job would be sent to the queue. In this situation, wouldn't the server have to be an open network?
The browser has no way to get in contact with the job server, it only access the resources /you/ allow them to, that is the view page. Only the back-end is allowed to access the job server. Also, if you build the web application in a certain way that the front-end is separated from the back-end, you're going to have even less potential security issues.

Node.js(&MongoDB) server crashes , Database operations halfway?

I have a node.js app with mongodb backend going to production in a week and i have few doubts on how to handle app crashes and restart .
Say i have a simple route /followUser in which i have 2 database operations
/followUser
----->Update User1 Document.followers = User2
----->Update User2 Document.followers = User1
----->Some other mongodb(via mongoose)operation
What happens if there is a server crash(due to power failure or maybe the remote mongodb server is down ) like this scenario :
----->Update User1 Document.followers = User2
SERVER CRASHED , FOREVER RESTARTS NODE
What happens to these operations below ? The system is now in inconsistent state and i may have error everytime i ask for User2 followers
----->Update User2 Document.followers = User1
----->Some other mongodb(via mongoose)operation
Also please recommend good logging and restart/monitor modules for apps running in linux.
Right now im using domains for to catch exceptions , doing server.close , but before process.exit() i want to make sure all database transactions are done , can i check this by testing if the event loop is empty or not (how?) and then process.exit(1) ?
You need transactions for this, and since MongoDB doesn't have them here is a workaround http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/
One way to address this problem is to add cleanup code to your application that runs whenever the application starts. You write the cleanup code to perform sanity checks on any portions of your data that can be updated in multiple steps like your example and then repairs that data in whatever way make sense for your application.
Depending on the complexity of your application/data, this may also require that you keep a log of actions the app was trying to perform, but that gets complicated real fast. Ideally it's more a matter of refreshing denormalized data and deleting partial data.
You want to do this during startup rather than shutdown as there's no guarantee your shutdown code will fully run and if you're shutting down because of an exception you don't know what the state of your system is at that point.
the solution given by vkurchatkin in this link is a workaround in case your appserver crashes, because you will be able of knowing which transactions were pending at that moment. If you implement this in your code you can create cleanup code when your system restart as suggested by JohnnyHK. The code you mention (catching exceptions, test when closing, etc) will not work because...well.. your server crashed! ;-)
That said, this is done using the database, so you will have to warrantee to a certain point that your database does not crash. I would suggest your use replication for that. It is basically a cluster of servers that recovers itself if one node fails, and also you can make some check to make sure that the data reached the servers and is safe.
Hope this helps.

HTTP Error 503.2 - Service Unavailable. The serverRuntime#appConcurrentRequestLimit setting is being exceeded

I have a intranet SiteCore website set up on IIS 7 which randomly throws the following error message
HTTP Error 503.2 - Service Unavailable
The serverRuntime#appConcurrentRequestLimit setting is being exceeded.
To fix this issue, I have made following changes
Increased the Queue Length of application pool myrjetAppPool from 1000 to 65535.
Modified Machine.Config to increase requestQueueLimit property of ProcessModel element to 100000
Increased appConcurrentRequestLimit to 10000 by running
C:\Windows\System32\inetsrv\appcmd.exe set config /section:serverRuntime /appConcurrentRequestLimit:100000
But I'm still getting the same error. ANy help is greatly appreaciated.
You might check to see where all your threads are going. We had occurrences where threads for Media Library assets were hanging and blocking up the queue.
In IIS Manager, select the server node from the tree, then the "Worker Processes" feature icon, then right-click the application pool of interest and select "View current requests". You might find something is getting stuck. I sometimes hit F5 on this screen a few dozen times in very quick succession to see the rate the requests are going through (of course Performance Monitor is better for viewing metrics but it won't tell you what URLs are being processed).
Investigate references in the linked url to 'MaxConcurrentReqeustsPerCPU' which you may need to set by creating a new registry key, depending on your OS and framework.
https://learn.microsoft.com/en-us/archive/blogs/tmarq/asp-net-thread-usage-on-iis-7-5-iis-7-0-and-iis-6-0
As already commented - check the actual concurrent request count using performance counters to determine which limit you're hitting i.e. it could be a limit of 5000 or maybe 12 (per cpu).
Edit: I realise this may look like I'm talking about a different setting entirely, but I believe there is overlap here.
We got this problem after an installation of an IIS plugin. After long investigating we saw that the config-file C:\Windows\System32\inetsrv\config\applicationHost.config had an extra location tag for the site with the problem. After removing the extra entry and an iisreset, the site/server worked normally againg. So something must went wrong during the installation....

Resources