How to see python script errors being run from rsyslog action - python-3.x

This is my action in rsyslog.conf:
module(load="omprog")
if( $msg contains "UPDOWN") then {
action(type="omprog" binary="/etc/rsyslog.d/netmiko.py" template="RSYSLOG_TraditionalFileFormat")
}
This is the python script I am working on:
pattern = re.compile('GigabitEthernet0\/\d{1,2}')
def process_line(line):
state = ''
if 'to up' in line:
state = f'UP\n'
elif 'to down' in line:
state = f'DOWN\n'
file = open("/home/blinky/python.log","a")
result = re.findall(pattern, line)
if len(result) > 0:
file.write(f'{result} - {state}')
file.close()
try:
msg = sys.stdin.readline()
file = open("/home/blinky/python.log","a")
file.write(line)
file.close()
process_line(msg)
except Exception as e:
file = open("/etc/rsyslog.d/python_error.log","a")
file.write(e)
file.close()
So the issue I have is trying to debug the python script, I can not see any of the errors it produces, as you can see I am trying to output the exception to a file but I get nothing there either. I have looked in the log file and this is the response I get from doing a shut no shut on the switch port:
Nov 20 21:50:39 10.0.0.254 1281: Nov 20 21:50:38.013: %LINK-5-CHANGED: Interface GigabitEthernet0/14, changed state to administratively down
Nov 20 21:50:39 10.0.0.254 1282: Nov 20 21:50:39.013: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/14, changed state to down
Nov 20 21:50:39 repperio rsyslogd: omprog: program '/etc/rsyslog.d/netmiko.py' (pid 2006160) terminated; will be restarted [v8.2112.0 try https://www.rsyslog.com/e/2119 ]
Nov 20 21:50:39 repperio rsyslogd: action 'action-1-omprog' suspended (module 'omprog'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2112.0 try https://www.rsyslog.com/e/2007 ]
Nov 20 21:50:40 repperio rsyslogd: action 'action-1-omprog' resumed (module 'omprog') [v8.2112.0 try https://www.rsyslog.com/e/2359 ]
Nov 20 21:50:43 10.0.0.254 1283: Nov 20 21:50:42.756: %LINK-3-UPDOWN: Interface GigabitEthernet0/14, changed state to up
Nov 20 21:50:43 10.0.0.254 1284: Nov 20 21:50:43.756: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/14, changed state to up
Nov 20 21:50:43 repperio rsyslogd: child process (pid 2006317) exited with status 1 [v8.2112.0]
Nov 20 21:50:43 repperio rsyslogd: omprog: program '/etc/rsyslog.d/netmiko.py' (pid 2006317) terminated; will be restarted [v8.2112.0 try https://www.rsyslog.com/e/2119 ]
Nov 20 21:50:43 repperio rsyslogd: action 'action-1-omprog' suspended (module 'omprog'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2112.0 try https://www.rsyslog.com/e/2007 ]
Nov 20 21:50:44 repperio rsyslogd: action 'action-1-omprog' resumed (module 'omprog') [v8.2112.0 try https://www.rsyslog.com/e/2359 ]
The script monitors the Cisco switch for interfaces going up and down and triggers the python script, this in turn will alter the configuration of the switch port using Netmiko. Without the ability to debug the python script I am scuppered, any ideas?

Related

Is it possible that any Userspace application can call our driver routines without opening the /dev interface node

Suppose I have implemented file_operations such as read, write, open, release, flush etc. and I wrote userspace application which calls these routines. In character driver, Userspace application communicate through /dev interface node.
for example - (/dev/diagnostics_1000_1-4:2.1) and I am bit surprise another application call our driver routines and we don't have control on those application.
Do they really call flush system call which directly/indirectly mapped to our function pointer ".flush"?
Snippet below -
[Wed Aug 16 23:07:02 2022] UserspaceOpen:448 PID = 291098, Pname = MyApp
[Wed Aug 16 23:07:02 2022] Diagnostics_1000: UserspaceOpen:461 - Interface is 3
....
[Tue Aug 16 23:07:38 2022] UserspaceRead:1182 PID = 25460, Pname = MyApp
[Tue Aug 16 23:07:38 2022] UserspaceFlush:622 PID = 25470, Pname = lsb_release
[Tue Aug 16 23:07:38 2022] UserspaceFlush:631 Wrong process:: PID = 25470, Pname = lsb_release, and pDev->pName = MyApp
[Tue Aug 16 23:07:38 2022] UserspaceFlush:622 PID = 25469, Pname = sh
[Tue Aug 16 23:07:38 2022] UserspaceFlush:631 Wrong process:: PID = 25469, Pname = sh, and pDev->pName = MyApp
[Tue Aug 16 23:07:38 2022] UserspaceWrite:1394 PID = 25463, Pname = MyApp
[Tue Aug 16 23:07:38 2022] UserspaceWrite_bulk_callback:1241 PID = 25428, Pname = VizCompositorTh
[Tue Aug 16 23:07:39 2022] UserspaceRead:1182 PID = 25460, Pname = MyApp
You can see that MyApp opens the interface "diagnostics_1000_1-4:2.1" but UserspaceFlush driver routines also called by lsb_release and sh process in the middle of operation and breaking the code flow. Though lsb_release and sh process haven't open the interface but somehow they triggered Flush operation.
We fixed the code by comparing the process name and continues if it matches otherwise return error code.
UserspaceFlush:631 Wrong process:: PID = 25469, Pname = sh, and pDev->pName = MyApp
Is there any design flaw? I want to understand what I am missing conceptually and how we can make it secure.
How to make sure that flush routine get called by the same process always i.e MyApp and file descriptor get closed by same application i.e MyApp because files actually opened by MyApp only.

running background tasks through dramatic does not work

I'm trying to run background task processing, redis and rabbitMQ work in separate docker containers
#dramatiq.actor(store_results=True)
def count_words(url):
try:
response = requests.get(url)
count = len(response.text.split(" "))
print(f"There are {count} words at {url!r}.")
except requests.exceptions.MissingSchema:
print(f"Message dropped due to invalid url: {url!r}")
result_backend = RedisBackend(host="172.17.0.2", port=6379)
result_broker = RabbitmqBroker(host="172.17.0.5", port=5672)
result_broker.add_middleware(Results(backend=result_backend))
dramatiq.set_broker(result_broker)
message = count_words.send('https://github.com/Bogdanp/dramatiq')
print(message.get_result(block=True))
RabbitMQ:
{"queue_name":"default","actor_name":"count_words","args":["https://github.com/Bogdanp/dramatiq"],"kwargs":{},"options":{},"message_id":"8e10b6ef-dfef-47dc-9f28-c6e07493efe4","message_timestamp":1608877514655}
Redis
1:C 22 Dec 2020 13:38:15.415 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1:M 22 Dec 2020 13:38:15.417 * Running mode=standalone, port=6379.
1:M 22 Dec 2020 13:38:15.417 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 22 Dec 2020 13:38:15.417 # Server initialized
1:M 22 Dec 2020 13:38:15.417 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 22 Dec 2020 13:38:15.417 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
1:M 25 Dec 2020 10:08:12.274 * Background saving terminated with success
1:M 26 Dec 2020 19:23:59.445 * 1 changes in 3600 seconds. Saving...
1:M 26 Dec 2020 19:23:59.660 * Background saving started by pid 24
24:C 26 Dec 2020 19:23:59.890 * DB saved on disk
24:C 26 Dec 2020 19:23:59.905 * RDB: 4 MB of memory used by copy-on-write
1:M 26 Dec 2020 19:23:59.961 * Background saving terminated with success
Error:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/dramatiq/message.py", line 147, in get_result
return backend.get_result(self, block=block, timeout=timeout)
File "/usr/local/lib/python3.6/dist-packages/dramatiq/results/backends/redis.py", line 81, in get_result
raise ResultTimeout(message)
dramatiq.results.errors.ResultTimeout: count_words('https://github.com/Bogdanp/dramatiq')

Torque cannot communicate with host

I have been attempting to setup the torque scheduler for a small cluster. I followed the steps to setup the scheduler from http://docs.adaptivecomputing.com/torque/archive/3-0-2/1.2configuring_torque_on_server.php
However when i attempt
qterm -t quick
I get the following error
$ sudo qterm -t quick
Unable to communicate with Terra(192.168.1.25)
Cannot connect to specified server host 'Terra'.
qterm: could not connect to server '' (111) Connection refused
but the server starts just fine. However when I attempt to run a command that runs on multiple nodes such as
qsub -l nodes=2:ppn=4 /home/user/scripts/someScript
it prints out somethign like
7.Terra
where Terra is the name of the head node, but is also a node in the cluster. This isn't the problem. The problem is that it does not run. nor does it have any output anywhere :/
The torque server log: https://ptpb.pw/EaKo
The terra node log: https://ptpb.pw/9w5M
and the Marte log: https://ptpb.pw/o4PT
I can get it to run with a pbs script but only with one node....
#!/bin/bash
#PBS -l pmem=1gb,nodes=1:ppn=4
#PBS -m abe
cd Documents/
wc -l largeTest.csv
Here is the ouput of qstat after submitting a job
Job ID Name User Time Use S
Queue
------------------------- ---------------- --------------- -------- - -----
16.Terra testPerformance justin 0 R batch
the output of pbsnodes -a
Terra
state = free
power_state = Running
np = 4
properties = Tower
ntype = cluster
status = opsys=linux,uname=Linux Terra 4.17.14-arch1-1-ARCH #1 SMP PREEMPT Thu Aug 9 11:56:50 UTC 2018 x86_64,sessions=11525 22029,nsessions=2,nusers=1,idletime=57964,totmem=8111556kb,availmem=7539284kb,physmem=8111556kb,ncpus=4,loadave=0.00,gres=,netload=30570521372,state=free,varattr= ,cpuclock=Fixed,macaddr=e0:3f:49:44:72:20,version=6.1.1.1,rectime=1534937388,jobs=
mom_service_port = 15002
mom_manager_port = 15003
gpus = 1
Marte
state = free
power_state = Running
np = 4
properties = NFSServer
ntype = cluster
status = opsys=linux,uname=Linux Marte 4.18.1-arch1-1-ARCH #1 SMP PREEMPT Wed Aug 15 21:11:55 UTC 2018 x86_64,sessions=366 556 563,nsessions=3,nusers=2,idletime=58140,totmem=7043404kb,availmem=6703808kb,physmem=7043404kb,ncpus=4,loadave=0.02,gres=,netload=36500663511,state=free,varattr= ,cpuclock=Fixed,macaddr=c8:5b:76:4a:65:91,version=6.1.1.1,rectime=1534937359,jobs=
mom_service_port = 15002
mom_manager_port = 15003
and the /var/spool/torque/server_priv/nodes
Terra np=4 gpus=1 Tower
Marte np=4 NFSServer
Edit: Here are the most recent logs as well
Mom Log for Node: https://ptpb.pw/DhKi
Mom Log for head node: https://ptpb.pw/MTlD
and the server log: https://ptpb.pw/HPkE

Cannot add members to MongoDB Replica Set

I'm trying to configure a MongoDB Replica Set but every time I try to add another member it fails.
I have 3 members I'm trying to configure. Their mongod.conf files all look like this:
# mongo.conf
#where to log
logpath=/log/mongod.log
logappend=true
# fork and run in background
fork = true
smallfiles=true
rest=true
port = 27017
replSet=KidzpaceReplSet
dbpath=/data
With the acception of the ports. They are 27017(Primary), 27018(Secondary) and 27019(Arbiter) respectively.
I have verified that the members can see each other:
[ec2-user#domU-12-31-39-06-C4-74 ~]$ mongo --host 174.129.232.170 --port 27018
MongoDB shell version: 2.4.3
connecting to: 174.129.232.170:27018/test
>
[ec2-user#domU-12-31-39-0A-30-E8 ~]$ mongo --host 174.129.230.20 --port 27017
MongoDB shell version: 2.4.3
connecting to: 174.129.230.20:27017/test
>
When adding the second member to the set it returns OK:
KidzpaceReplSet:PRIMARY> rs.add("174.129.232.170:27018")
{ "ok" : 1 }
However whatever the next command I run is, In this case it's adding my Arbiter, the set fails with this error:
KidzpaceReplSet:PRIMARY> rs.add("174.129.232.177:27019", true)
Tue May 28 20:24:07.139 DBClientCursor::init call() failed
Tue May 28 20:24:07.140 trying reconnect to 127.0.0.1:27017
Tue May 28 20:24:07.141 reconnect 127.0.0.1:27017 ok
reconnected to server after rs command (which is normal)
This is the the log file:
Tue May 28 20:44:06.173 [rsStart] replSet I am domU-12-31-39-06-C4-74:27017
Tue May 28 20:44:06.173 [rsStart] replSet STARTUP2
Tue May 28 20:44:07.175 [rsSync] replSet SECONDARY
Tue May 28 20:44:07.175 [rsMgr] replSet info electSelf 0
Tue May 28 20:44:08.174 [rsMgr] replSet PRIMARY
Tue May 28 20:44:29.813 [conn1] replSet replSetReconfig config object parses ok, 2 members specified
Tue May 28 20:44:29.817 [conn1] replSet replSetReconfig [2]
Tue May 28 20:44:29.817 [conn1] replSet info saving a newer config version to local.system.replset
Tue May 28 20:44:29.834 [conn1] replSet saveConfigLocally done
Tue May 28 20:44:29.834 [conn1] replSet info : additive change to configuration
Tue May 28 20:44:29.834 [conn1] replSet replSetReconfig new config saved locally
Tue May 28 20:44:39.835 [rsHealthPoll] DBClientCursor::init call() failed
Tue May 28 20:44:39.835 [rsHealthPoll] replset info 174.129.232.170:27018 heartbeat failed, retrying
Tue May 28 20:44:40.834 [rsHealthPoll] DBClientCursor::init call() failed
Tue May 28 20:44:40.834 [rsHealthPoll] replSet info 174.129.232.170:27018 is down (or slow to respond):
Tue May 28 20:44:40.835 [rsHealthPoll] replSet member 174.129.232.170:27018 is now in state DOWN
Tue May 28 20:44:40.835 [rsMgr] replSet total number of votes is even - add arbiter or give one member an extra vote
Tue May 28 20:44:40.835 [rsMgr] can't see a majority of the set, relinquishing primary
Tue May 28 20:44:40.835 [rsMgr] replSet relinquishing primary state
Tue May 28 20:44:40.835 [rsMgr] replSet SECONDARY
Tue May 28 20:44:40.835 [rsMgr] replSet closing client sockets after relinquishing primary
Tue May 28 20:44:42.044 [conn1] end connection 127.0.0.1:58727 (0 connections now open)
Tue May 28 20:44:46.150 [rsHealthPoll] replSet member 174.129.232.170:27018 is up
Tue May 28 20:44:46.151 [rsMgr] replSet not electing self, not all members up and we have been up less than 5 minutes
Tue May 28 20:44:52.156 [rsMgr] replSet not electing self, not all members up and we have been up less than 5 minutes
UPDATE
I'm wondering if maybe the problem is when I run rs.initiate(). It gives me this output:
{
"set" : "KidzpaceReplSet",
"date" : ISODate("2013-05-28T20:59:05Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "domU-12-31-39-06-C4-74:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 23,
"optime" : {
"t" : 1369774732,
"i" : 1
},
"optimeDate" : ISODate("2013-05-28T20:58:52Z"),
"self" : true
}
],
"ok" : 1
}
Notice the name of the member? "name" : "domU-12-31-39-06-C4-74:27017" Where does this name come from? It's not my IP Address. I'm not sure but maybe this could be the source of the problem.
So it turns out rs.initiate() might give the member that launches it some kind of internal alias for it's IP address. In my case it was: domU-12-31-39-06-C4-74.
The initial connection to the secondary is fine because the primary instigates it. However since the secondary now has this alias to use when it tries to talk back to the primary, it fails.
The solution was a to copy the existing configuration:
cfg = rs.conf()
manually change the name(host) of the primary node:
cfg.members[0].host = 666.666.666.666:27017
And reconfigure the replica set:
rs.reconfig(cfg)

Need help - SoapUi testRunner.getStatus() is returning the status as "RUNNING" indefinitely

In SoapUI after executing a soap request test step (which is under a test suite -> test case)
through testRunner.runTestStepByName("Soap Request Name")
and waiting for 10 seconds after that soap request execution testRunner.getStatus() is returning RUNNING status . below is the groovy script (which is under same test suite -> test case)
import groovy.sql.Sql;
import com.eviware.soapui.model.testsuite.TestRunner.Status
testRunner.runTestStepByName("GetCitiesByCountry - Request 1")
sleep(10000)
log.info( "...${testRunner.getStatus()}...")
while ( testRunner.getStatus() == Status.RUNNING ) {
log.info(testRunner.getStatus())
}
the output is below
Wed Apr 17 21:06:22 IST 2013:INFO:RUNNING
Wed Apr 17 21:06:22 IST 2013:INFO:RUNNING
Wed Apr 17 21:06:22 IST 2013:INFO:RUNNING
Wed Apr 17 21:06:22 IST 2013:INFO:RUNNING
Wed Apr 17 21:06:22 IST 2013:INFO:RUNNING
Wed Apr 17 21:06:22 IST 2013:INFO:RUNNING
.
.
continuing for infinite time...
Ideally it should return FINISHED since the above test step is executed ,
Advanced thanks for any help to this
It sounds logical, as long as you are in the loop, the test is 'running'. You can get the status with this:
import com.eviware.soapui.model.testsuite.TestStepResult.TestStepStatus
myTestStepResult = testRunner.runTestStepByName("GetCitiesByCountry - Request 1")
myStatus = myTestStepResult.getStatus()
if (myStatus == TestStepStatus.OK)
log.info "The step status is: " + myStatus.toString()
else
log.error "The step status is: " + myStatus.toString()
Also, as the call to runTestStepByName is synchronous, there is no 'running' status, only 'CANCELED', 'FAILED', 'OK' or 'UNKNOWN'.
See the doc here

Resources