I am evaluation ArangoDB and figured there are at least 3 ways to start the DB.
arangosh
arangod
arangodb
arangocli - this one would fall into a different category
This is all very confusing, can someone provide visibility about the different options and what is the best way to start the engine.
Thank you
These are not different ways to start Arango. These are different tools that can be used with Arango
arangosh is a shell tool that an be used to administer the DB as well as running ad-hoc queries. It does not start a server, it only starts the shell that connects to a server
arangod: This is the deamon that runs when you start Arangodb. In fact, to start the server you usually run (at least on Mac) /usr/local/Cellar/arangodb/3.4.0/sbin/arangod
arangodb: This is the db install package. For example, in the Mac you would run brew install arangodb to install arangodb on your machine
arangocli: This is simply a cli tool to run ad-hoc queries.
arangod starts a single ArangoDB server, which is sufficient to run a Single Instance deployment. To run more complex deployments (e.g. Cluster, Active Failover) multiple arangod processes are required.
For example a Cluster with 3 agents, 2 dbservers and 2 coordinators requires 8 arangod processes in total. More can be found in the Cluster - Manual Start documentation.
In order to simplify the deployment it is recommended to use the arangodb starter tool. The execution of arangodb initializes an arangod process for each dbserver, coordinator and agent. E.g. using arangodb --starter.local to start a local 3 node cluster starts 9 arangod processes internally: 3 dbservers, 3 coordinators and 3 agents. The running arangod processes can be checked with ps auxw | grep arangod.
An exemplary output can be seen below:
ps auxw | grep arangod
max 8099 1.9 0.1 114712 8736 pts/0 Sl+ 14:47 0:00 arangodb --starter.local
max 8118 3.9 2.5 635016 203736 pts/0 Sl+ 14:47 0:00 /usr/sbin/arangod -c /home/max/Documents/starter/test/local-slave-2/agent8551/arangod.conf --database.directory /home/max/Documents/starter/test/local-slave-2/agent8551/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /home/max/Documents/starter/test/local-slave-2/agent8551/apps --log.file /home/max/Documents/starter/test/local-slave-2/agent8551/arangod.log --log.force-direct false --javascript.copy-installation true --agency.activate true --agency.my-address tcp://localhost:8551 --agency.size 3 --agency.supervision true --foxx.queues false --server.statistics false --agency.endpoint tcp://localhost:8531 --agency.endpoint tcp://localhost:8541
max 8165 10.5 2.6 676548 215908 pts/0 Sl+ 14:47 0:01 /usr/sbin/arangod -c /home/max/Documents/starter/test/agent8531/arangod.conf --database.directory /home/max/Documents/starter/test/agent8531/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /home/max/Documents/starter/test/agent8531/apps --log.file /home/max/Documents/starter/test/agent8531/arangod.log --log.force-direct false --javascript.copy-installation true --agency.activate true --agency.my-address tcp://localhost:8531 --agency.size 3 --agency.supervision true --foxx.queues false --server.statistics false --agency.endpoint tcp://localhost:8541 --agency.endpoint tcp://localhost:8551
max 8168 5.4 2.6 705236 216828 pts/0 Sl+ 14:47 0:00 /usr/sbin/arangod -c /home/max/Documents/starter/test/local-slave-2/dbserver8550/arangod.conf --database.directory /home/max/Documents/starter/test/local-slave-2/dbserver8550/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /home/max/Documents/starter/test/local-slave-2/dbserver8550/apps --log.file /home/max/Documents/starter/test/local-slave-2/dbserver8550/arangod.log --log.force-direct false --javascript.copy-installation true --cluster.my-address tcp://localhost:8550 --cluster.my-role PRIMARY --foxx.queues false --server.statistics true --cluster.agency-endpoint tcp://localhost:8531 --cluster.agency-endpoint tcp://localhost:8541 --cluster.agency-endpoint tcp://localhost:8551
max 8171 4.8 2.5 641160 204968 pts/0 Sl+ 14:47 0:00 /usr/sbin/arangod -c /home/max/Documents/starter/test/local-slave-1/agent8541/arangod.conf --database.directory /home/max/Documents/starter/test/local-slave-1/agent8541/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /home/max/Documents/starter/test/local-slave-1/agent8541/apps --log.file /home/max/Documents/starter/test/local-slave-1/agent8541/arangod.log --log.force-direct false --javascript.copy-installation true --agency.activate true --agency.my-address tcp://localhost:8541 --agency.size 3 --agency.supervision true --foxx.queues false --server.statistics false --agency.endpoint tcp://localhost:8531 --agency.endpoint tcp://localhost:8551
max 8302 6.4 2.6 696532 216668 pts/0 Sl+ 14:47 0:00 /usr/sbin/arangod -c /home/max/Documents/starter/test/dbserver8530/arangod.conf --database.directory /home/max/Documents/starter/test/dbserver8530/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /home/max/Documents/starter/test/dbserver8530/apps --log.file /home/max/Documents/starter/test/dbserver8530/arangod.log --log.force-direct false --javascript.copy-installation true --cluster.my-address tcp://localhost:8530 --cluster.my-role PRIMARY --foxx.queues false --server.statistics true --cluster.agency-endpoint tcp://localhost:8531 --cluster.agency-endpoint tcp://localhost:8541 --cluster.agency-endpoint tcp://localhost:8551
max 8304 26.7 2.4 744684 199052 pts/0 Sl+ 14:47 0:02 /usr/sbin/arangod -c /home/max/Documents/starter/test/local-slave-2/coordinator8549/arangod.conf --database.directory /home/max/Documents/starter/test/local-slave-2/coordinator8549/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /home/max/Documents/starter/test/local-slave-2/coordinator8549/apps --log.file /home/max/Documents/starter/test/local-slave-2/coordinator8549/arangod.log --log.force-direct false --javascript.copy-installation true --cluster.my-address tcp://localhost:8549 --cluster.my-role COORDINATOR --foxx.queues true --server.statistics true --cluster.agency-endpoint tcp://localhost:8531 --cluster.agency-endpoint tcp://localhost:8541 --cluster.agency-endpoint tcp://localhost:8551
max 8306 5.8 2.6 711380 215200 pts/0 Sl+ 14:47 0:00 /usr/sbin/arangod -c /home/max/Documents/starter/test/local-slave-1/dbserver8540/arangod.conf --database.directory /home/max/Documents/starter/test/local-slave-1/dbserver8540/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /home/max/Documents/starter/test/local-slave-1/dbserver8540/apps --log.file /home/max/Documents/starter/test/local-slave-1/dbserver8540/arangod.log --log.force-direct false --javascript.copy-installation true --cluster.my-address tcp://localhost:8540 --cluster.my-role PRIMARY --foxx.queues false --server.statistics true --cluster.agency-endpoint tcp://localhost:8531 --cluster.agency-endpoint tcp://localhost:8541 --cluster.agency-endpoint tcp://localhost:8551
max 8318 27.5 2.5 760064 202456 pts/0 Sl+ 14:47 0:02 /usr/sbin/arangod -c /home/max/Documents/starter/test/coordinator8529/arangod.conf --database.directory /home/max/Documents/starter/test/coordinator8529/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /home/max/Documents/starter/test/coordinator8529/apps --log.file /home/max/Documents/starter/test/coordinator8529/arangod.log --log.force-direct false --javascript.copy-installation true --cluster.my-address tcp://localhost:8529 --cluster.my-role COORDINATOR --foxx.queues true --server.statistics true --cluster.agency-endpoint tcp://localhost:8531 --cluster.agency-endpoint tcp://localhost:8541 --cluster.agency-endpoint tcp://localhost:8551
max 8321 32.8 2.5 764160 202212 pts/0 Sl+ 14:47 0:02 /usr/sbin/arangod -c /home/max/Documents/starter/test/local-slave-1/coordinator8539/arangod.conf --database.directory /home/max/Documents/starter/test/local-slave-1/coordinator8539/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /home/max/Documents/starter/test/local-slave-1/coordinator8539/apps --log.file /home/max/Documents/starter/test/local-slave-1/coordinator8539/arangod.log --log.force-direct false --javascript.copy-installation true --cluster.my-address tcp://localhost:8539 --cluster.my-role COORDINATOR --foxx.queues true --server.statistics true --cluster.agency-endpoint tcp://localhost:8531 --cluster.agency-endpoint tcp://localhost:8541 --cluster.agency-endpoint tcp://localhost:8551
When specifying an option with the starter arangodb as --server.storage-engine rocksdb, this option is passed to all the starter's arangod processes.
A detailed overview for various deployment modes can be found in the documentation's Deployment chapter.
Related
I am trying to understand and compare the output I see from htop (sorted by mem%) and "ps aux --sort=-%mem | grep query.jar" and determine why 24.2G out of 32.3G is in use on an idle server.
The ps command shows a single parent (not child process I assume):
ps aux --sort=-%mem | grep query.jar
1000 67970 0.4 4.4 6721304 1452512 ? Sl 2020 163:55 java -Djava.security.egd=file:/dev/./urandom -Xmx700m -Xss256k -jar ./query.jar
Whereas htop shows PID 6790 as well as many other PIDs for query.jar below. I am trying to grasp what this means for memory usage. I also wonder if this has anything to do with open file handlers.
I ran this file handler command on the server: ls -la /proc/$$/fd
which produces this output (although I am not sure if this is showing me any issues):
total 0
lrwx------. 1 ziggy ziggy 64 Jan 2 09:14 0 -> /dev/pts/1
lrwx------. 1 ziggy ziggy 64 Jan 2 09:14 1 -> /dev/pts/1
lrwx------. 1 ziggy ziggy 64 Jan 2 09:14 2 -> /dev/pts/1
lrwx------. 1 ziggy ziggy 64 Jan 2 11:39 255 -> /dev/pts/1
lr-x------. 1 ziggy ziggy 64 Jan 2 09:14 3 -> /var/lib/sss/mc/passwd
Obviously the mem% output in htop (if totaled) exceeds 100% so I am guessing that despite there being different pids, the repetitive mem% values shown of 9.6 and 4.4 are not necessarily unique. Any clarification is appreciated here. I am trying to determine the best method to accurately report what is using 24.GB of memory on this server.
The complete output of the ps aux command is here which shows me all the different pids using memory. Again, I am confused by how this output differs form htop.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1000 40268 0.2 9.5 3432116 3143516 ? Sl 2020 73:33 /usr/local/bin/node --max-http-header-size=65000 index.js
1000 67970 0.4 4.4 6721304 1452516 ? Sl 2020 164:05 java -Djava.security.egd=file:/dev/./urandom -Xmx700m -Xss256k -jar ./query.jar
root 86212 2.6 3.0 15208548 989928 ? Ssl 2020 194:18 dgraph alpha --my=dgraph-public:9080 --lru_mb 2048 --zero dgraph-public:5080
1000 68027 0.2 2.9 6295452 956516 ? Sl 2020 71:43 java -Djava.security.egd=file:/dev/./urandom -Xmx512m -Xss256k -jar ./build.jar
1000 88233 0.3 2.9 6415084 956096 ? Sl 2020 129:25 java -Djava.security.egd=file:/dev/./urandom -Xmx500m -Xss256k -jar ./management.jar
1000 66554 0.4 2.4 6369108 803632 ? SLl 2020 159:23 ./TranslationService thrift sisense-zookeeper.sisense:2181 S1
polkitd 27852 1.2 2.3 2111292 768376 ? Ssl 2020 417:24 mongod --config /data/configdb/mongod.conf --bind_ip_all
root 52493 3.3 2.3 8361444 768188 ? Ssl 2020 1107:53 /bin/prometheus --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries --storage.tsdb.retention.size=7G
B --config.file=/etc/prometheus/config_out/prometheus.env.yaml --storage.tsdb.path=/prometheus --storage.tsdb.retention.time=30d --web.enable-lifecycle --storage.tsdb.no-lockfile --web.external-url=http://sisense-prom-oper
ator-prom-prometheus.monitoring:9090 --web.route-prefix=/
1000 54574 0.0 1.9 901996 628900 ? Sl 2020 13:47 /usr/local/bin/node dist/index.js
root 78245 0.9 1.9 11755696 622940 ? Ssl 2020 325:03 /fluent-bit/bin/fluent-bit -c /fluent-bit/etc/fluent-bit.conf
root 5838 4.4 1.4 781420 484736 ? Ssl 2020 1488:26 kube-apiserver --advertise-address=10.1.17.71 --allow-privileged=true --anonymous-auth=True --apiserver-count=1 --authorization-mode=Node,RBAC --bind-addre
ss=0.0.0.0 --client-ca-file=/etc/kubernetes/ssl/ca.crt --enable-admission-plugins=NodeRestriction --enable-aggregator-routing=False --enable-bootstrap-token-auth=true --endpoint-reconciler-type=lease --etcd-cafile=/etc/ssl
/etcd/ssl/ca.pem --etcd-certfile=/etc/ssl/etcd/ssl/node-dev-analytics-2.pem --etcd-keyfile=/etc/ssl/etcd/ssl/node-dev-analytics-2-key.pem --etcd-servers=https://10.1.17.71:2379 --insecure-port=0 --kubelet-client-certificat
e=/etc/kubernetes/ssl/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/ssl/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalDNS,InternalIP,Hostname,ExternalDNS,ExternalIP --profiling=
False --proxy-client-cert-file=/etc/kubernetes/ssl/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/ssl/front-proxy-client.key --request-timeout=1m0s --requestheader-allowed-names=front-proxy-client --request
header-client-ca-file=/etc/kubernetes/ssl/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --runtime-config
= --secure-port=6443 --service-account-key-file=/etc/kubernetes/ssl/sa.pub --service-cluster-ip-range=10.233.0.0/18 --service-node-port-range=30000-32767 --storage-backend=etcd3 --tls-cert-file=/etc/kubernetes/ssl/apiserve
r.crt --tls-private-key-file=/etc/kubernetes/ssl/apiserver.key
1000 91921 0.1 1.2 7474852 415516 ? Sl 2020 41:04 java -Xmx4G -server -Dfile.encoding=UTF-8 -Djvmp -DEC2EC -cp /opt/sisense/jvmConnectors/jvmcontainer_1_1_0.jar com.sisense.container.launcher.ContainerLaunc
herApp /opt/sisense/jvmConnectors/connectors/ec2ec/com.sisense.connectors.Ec2ec.jar sisense-zookeeper.sisense:2181 connectors.sisense
1000 21035 0.3 0.8 2291908 290568 ? Ssl 2020 111:23 /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /zookeeper-3.4.12/bin/../build/classes:/zookeeper-
3.4.12/bin/../build/lib/*.jar:/zookeeper-3.4.12/bin/../lib/slf4j-log4j12-1.7.25.jar:/zookeeper-3.4.12/bin/../lib/slf4j-api-1.7.25.jar:/zookeeper-3.4.12/bin/../lib/netty-3.10.6.Final.jar:/zookeeper-3.4.12/bin/../lib/log4j-1
.2.17.jar:/zookeeper-3.4.12/bin/../lib/jline-0.9.94.jar:/zookeeper-3.4.12/bin/../lib/audience-annotations-0.5.0.jar:/zookeeper-3.4.12/bin/../zookeeper-3.4.12.jar:/zookeeper-3.4.12/bin/../src/java/lib/*.jar:/conf: -XX:MaxRA
MFraction=2 -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XshowSettings:vm -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMa
in /conf/zoo.cfg
1000 91955 0.1 0.8 7323208 269844 ? Sl 2020 40:40 java -Xmx4G -server -Dfile.encoding=UTF-8 -Djvmp -DGenericJDBC -cp /opt/sisense/jvmConnectors/jvmcontainer_1_1_0.jar com.sisense.container.launcher.Containe
rLauncherApp /opt/sisense/jvmConnectors/connectors/genericjdbc/com.sisense.connectors.GenericJDBC.jar sisense-zookeeper.sisense:2181 connectors.sisense
1000 92076 0.1 0.8 8302704 262772 ? Sl 2020 52:11 java -Xmx4G -server -Dfile.encoding=UTF-8 -Djvmp -Dsql -cp /opt/sisense/jvmConnectors/jvmcontainer_1_1_0.jar com.sisense.container.launcher.ContainerLaunche
rApp /opt/sisense/jvmConnectors/connectors/mssql/com.sisense.connectors.MsSql.jar sisense-zookeeper.sisense:2181 connectors.sisense
1000 91800 0.1 0.7 9667560 259928 ? Sl 2020 39:38 java -Xms128M -jar connectorService.jar jvmcontainer_1_1_0.jar /opt/sisense/jvmConnectors/connectors sisense-zookeeper.sisense:2181 connectors.sisense
1000 91937 0.1 0.7 7326312 253708 ? Sl 2020 40:14 java -Xmx4G -server -Dfile.encoding=UTF-8 -Djvmp -DExcel -cp /opt/sisense/jvmConnectors/jvmcontainer_1_1_0.jar com.sisense.container.launcher.ContainerLaunc
herApp /opt/sisense/jvmConnectors/connectors/excel/com.sisense.connectors.ExcelConnector.jar sisense-zookeeper.sisense:2181 connectors.sisense
1000 92085 0.1 0.7 7323660 244160 ? Sl 2020 39:53 java -Xmx4G -server -Dfile.encoding=UTF-8 -Djvmp -DSalesforceJDBC -cp /opt/sisense/jvmConnectors/jvmcontainer_1_1_0.jar com.sisense.container.launcher.Conta
inerLauncherApp /opt/sisense/jvmConnectors/connectors/salesforce/com.sisense.connectors.Salesforce.jar sisense-zookeeper.sisense:2181 connectors.sisense
1000 16326 0.1 0.7 3327260 243804 ? Sl 2020 12:21 /opt/sisense/monetdb/bin/mserver5 --zk_system_name=S1 --zk_address=sisense-zookeeper.sisense:2181 --external_server=ec-devcube-qry-10669921-96e0-4.sisense -
-instance_id=qry-10669921-96e0-4 --dbname=aDevCube --farmstate=Querying --dbfarm=/tmp/aDevCube_2020.12.28.16.46.23.280/dbfarm --set mapi_port 50000 --set gdk_nr_threads 4
100 64158 20.4 0.7 1381624 232548 ? Sl 2020 6786:08 /usr/local/lib/erlang/erts-11.0.3/bin/beam.smp -W w -K true -A 128 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 512 -MMmcs 30 -P 1048576 -t 5000000
-stbt db -zdbbl 128000 -B i -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa -noshell -noinput -s rabbit boot -boot start_sasl -lager crash_log false -lager handlers []
root 1324 11.3 0.7 5105748 231100 ? Ssl 2020 3773:15 /usr/bin/dockerd --iptables=false --data-root=/var/lib/docker --log-opt max-size=50m --log-opt max-file=5 --dns 10.233.0.3 --dns 10.1.22.68 --dns 10.1.22.6
9 --dns-search default.svc.cluster.local --dns-search svc.cluster.local --dns-opt ndots:2 --dns-opt timeout:2 --dns-opt attempts:2
Adding more details:
$ free -m
Mem: 31993 23150 2602 1677 6240 6772
Swap: 0 0 0
$ top -b -n1 -o "%MEM"|head -n 20
top - 13:46:18 up 23 days, 3:26, 3 users, load average: 2.26, 1.95, 2.10
Tasks: 2201 total, 1 running, 2199 sleeping, 1 stopped, 0 zombie
%Cpu(s): 4.4 us, 10.3 sy, 0.0 ni, 85.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 32761536 total, 2639584 free, 23730688 used, 6391264 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 6910444 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
40268 1000 20 0 3439284 3.0g 8228 S 0.0 9.6 73:39.94 node
67970 1000 20 0 6721304 1.4g 7216 S 0.0 4.4 164:24.16 java
86212 root 20 0 14.5g 996184 13576 S 0.0 3.0 197:36.83 dgraph
68027 1000 20 0 6295452 956516 7256 S 0.0 2.9 71:52.15 java
88233 1000 20 0 6415084 956096 9556 S 0.0 2.9 129:40.80 java
66554 1000 20 0 6385500 803636 8184 S 0.0 2.5 159:42.44 TranslationServ
27852 polkitd 20 0 2111292 766860 11368 S 0.0 2.3 418:26.86 mongod
52493 root 20 0 8399864 724576 15980 S 0.0 2.2 1110:34 prometheus
54574 1000 20 0 905324 631708 7656 S 0.0 1.9 13:48.66 node
78245 root 20 0 11.2g 623028 1800 S 0.0 1.9 325:43.74 fluent-bit
5838 root 20 0 781420 477016 22944 S 7.7 1.5 1492:08 kube-apiserver
91921 1000 20 0 7474852 415516 3652 S 0.0 1.3 41:10.25 java
21035 1000 20 0 2291908 290484 3012 S 0.0 0.9 111:38.03 java
The primary difference between htop and ps aux is that htop shows each individual thread belonging to a process rather than the process only - this is similar to ps auxm. Using the htop interactive command H, you can hide threads to get to a list that more closely corresponds to ps aux.
In terms of memory usage, those additional entries representing individual threads do not affect the actual memory usage total because threads share the address space of the associated process.
RSS (resident set size) in general is problematic because it does not adequately represent shared pages (due to shared memory or copy-on-write) for your purpose - the sum can be higher than expected in those cases. You can use smem -t to get a better picture with the PSS (proportional set size) column. Based on the facts you provided, that is not your issue, though.
In your case, it might make sense to dig deeper via smem -tw to get a memory usage breakdown that includes (non-cache) kernel resources. /proc/meminfo provides further details.
need help on this script where I try to get output related to that command. For example, in the below code
"info related to process and the output should be ps -ef command output and should continue to the next command and print statement likewise"
But i get the lines saying
info related to process and all the commands are being displayed at once.
#!/usr/bin/env python3.7
import os
state = ['process' , 'http status' , 'date info' , 'system']
def comm(com):
for i in state:
for j in com:
print (f"info related to {i}")
os.system(j)
cmd = ['ps -ef | head -2' , 'systemctl status httpd' , 'date' , 'uptime']
comm(cmd)
OUTPUT:
info related to process
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 10:13 ? 00:00:19 /usr/lib/systemd/systemd -
-switched-root --system --deserialize 22
info related to process
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor
preset: disabled)
Active: active (running) since Wed 2019-03-27 18:27:50 IST; 1 day 2h ago
Docs: man:httpd(8)
man:apachectl(8)
Process: 8585 ExecReload=/usr/sbin/httpd $OPTIONS -k graceful (code=exited,
status=0/SUCCESS)
Main PID: 1367 (httpd)
Status: "Total requests: 0; Current requests/sec: 0; Current traffic: 0
B/sec"
Tasks: 6
CGroup: /system.slice/httpd.service
├─1367 /usr/sbin/httpd -DFOREGROUND
├─8597 /usr/sbin/httpd -DFOREGROUND
├─8598 /usr/sbin/httpd -DFOREGROUND
├─8599 /usr/sbin/httpd -DFOREGROUND
├─8600 /usr/sbin/httpd -DFOREGROUND
└─8601 /usr/sbin/httpd -DFOREGROUND
info related to process
Thu Mar 28 21:03:57 IST 2019
info related to process
21:03:57 up 10:50, 4 users, load average: 0.35, 0.09, 0.14
You have two loops, one being nested in the other. That means anything the inner loop does will be executed in every iteration of the outer loop. That's just how loops work, but not what (I presume) you want to do here.
You have commands to be executed by the os module and several state names that are associated with them. From a data-first point of view we could structure them in a dictionary:
commands = {
'process': 'ps -ef',
'http status': 'systemctl status httpd',
'date info': 'date',
'sytem': 'uptime',
}
Now when we iterate over this dictionary, in each iteration we will have both the state name and the command to be run as loop variables. The loops become a single for loop and we end up with:
def comm(commands):
for name, command in commands.items():
print (f"info related to {name}")
os.system(command)
I would like to run multiple jobs on a single node on my cluster. However, when I submit a job, it takes all available CPUs and so remaining jobs are queued. As an example, I made a script that request few resources and submit two jobs that are supposed to run at the same time.
#! /bin/bash
variable=$(seq 0 1 1)
for l in ${variable}
do
run_thread="./run_thread.sh"
cat << EOF > ${run_thread}
#! /bin/bash
#SBATCH -p normal
#SBATCH --nodes 1
#SBATCH --cpus-per-task 1
#SBATCH --ntasks 1
#SBATCH --threads-per-core 1
#SBATCH --mem=10G
sleep 120
EOF
sbatch ${run_thread}
done
However, one job is running and the other user is pending:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
57 normal run_thre user PD 0:00 1 (Resources)
56 normal run_thre user R 0:02 1 node00
The cluster only has one node with 4 sockets with 12 cores and 2 threads each. the output of command scontrol show jobid #job is the following:
JobId=56 JobName=run_thread.sh
UserId=user(1002) GroupId=user(1002) MCS_label=N/A
Priority=4294901755 Nice=0 Account=(null) QOS=(null)
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:51 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2018-03-24T15:34:46 EligibleTime=2018-03-24T15:34:46
StartTime=2018-03-24T15:34:46 EndTime=Unknown Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=normal AllocNode:Sid=node00:13047
ReqNodeList=(null) ExcNodeList=(null)
NodeList=node00
BatchHost=node00
NumNodes=1 NumCPUs=48 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:1
TRES=cpu=48,mem=10G,node=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=10G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
Gres=(null) Reservation=(null)
OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
Command=./run_thread.sh
WorkDir=/home/user
StdErr=/home/user/slurm-56.out
StdIn=/dev/null
StdOut=/home/user/slurm-56.out
Power=
And the output of scontrol show partition is:
PartitionName=normal
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=node00
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=YES:4
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=48 TotalNodes=1 SelectTypeParameters=NONE
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
There is something I don't get with the SLURM system. How can I use only 1 CPU per job and run 48 jobs on the node at the same time?
Slurm is probably configured with
SelectType=select/linear
which means that slurm allocates full nodes to jobs and does not allow node sharing among jobs.
You can check with
scontrol show config | grep SelectType
Set a value of select/cons_res to allow node sharing.
Does anybody know, where vcpu thread id is stored in linux? In fact according to my researches when we create a VM in KVM, some threads will shape vcpus; i need the ID and location of them.
I took a look at this position:
/proc/qemu-kvm process ID/tasks/*/
the qemu-kvm process ID field comes from this location:
/var/run/libvirt/qemu/VM_NAME.xml
BECAUSE i thought that maybe vcpu's ID could be found there, but unfortunately they are not vcpu's ID, they are just some child process.
any help would be appreciated.
thanks a lot.
If you exec qemu with parameters -qmp unix:./qmp-sock,server,nowait, for example:
# /opt/qemu/bin/qemu-system-x86_64 \
-smp cpus=2 \
-drive file=/opt/test.qcow2,format=qcow2 \
-cdrom CentOS-7-x86_64-DVD-1511.iso \
-qmp unix:./qmp-sock,server,nowait
You can exec qmp-shell to get cpu info:
# /opt/git/qemu/scripts/qmp/qmp-shell /opt/qmp-sock
Welcome to the QMP low-level shell!
Connected to QEMU 2.5.50
(QEMU) query-cpus
{"return": [{"halted": false, "pc": -2124176787, "current": true, "qom_path": "/machine/unattached/device[0]", "thread_id": 2344, "arch": "x86", "CPU": 0}, {"halted": true, "pc": -2130342250, "current": false, "qom_path": "/machine/unattached/device[3]", "thread_id": 2341, "arch": "x86", "CPU": 1}]}
Thread id here: 2344 and 2341
# ps -eLf|grep qemu-system
root 2341 2252 2341 9 4 08:52 pts/0 00:00:48 /opt/qemu/bin/qemu-system-x86_64 -smp cpus=2 -drive file=/opt/test.qcow2,format=qcow2 -cdrom CentOS-7-x86_64-DVD-1511.iso -qmp unix:./qmp-sock,server,nowait
root 2341 2252 2342 0 4 08:52 pts/0 00:00:00 /opt/qemu/bin/qemu-system-x86_64 -smp cpus=2 -drive file=/opt/test.qcow2,format=qcow2 -cdrom CentOS-7-x86_64-DVD-1511.iso -qmp unix:./qmp-sock,server,nowait
root 2341 2252 2344 85 4 08:52 pts/0 00:07:04 /opt/qemu/bin/qemu-system-x86_64 -smp cpus=2 -drive file=/opt/test.qcow2,format=qcow2 -cdrom CentOS-7-x86_64-DVD-1511.iso -qmp unix:./qmp-sock,server,nowait
root 2341 2252 2345 0 4 08:52 pts/0 00:00:00 /opt/qemu/bin/qemu-system-x86_64 -smp cpus=2 -drive file=/opt/test.qcow2,format=qcow2 -cdrom CentOS-7-x86_64-DVD-1511.iso -qmp unix:./qmp-sock,server,nowait
root 2378 2304 2378 0 1 09:01 pts/2 00:00:00 grep --color=auto qemu-system
For more information see http://wiki.qemu.org/QMP
I think vcpu thread ID is internal to Qemu and it is exposed to linux as a normal thread
struct CPUState {
...
struct QemuThread *thread;
...
int thread_id;
...
bool thread_kicked;
...
bool throttle_thread_scheduled;
...
};
You can use Qemu command info cpus to show information about cpus. It gives me this:
(qemu) info cpus
* CPU #0: pc=0x00000000b483c8c4 thread_id=6660
Alright, some of you might have noticed I've been working on this problem off and on for about 3 weeks. I cannot figure out for the life of me whats going on.. Below is the perl script that saves input from USB card reader which acts like a keyboard. The machine is an embedded system running off of a compact flash drive, using voyage linux.
use strict;
use Time::Local;
open(MATCH,'swipe_match.txt');
my #matches = <MATCH>;
close(MATCH);
my $error = qr/[+%;]E\?/;
while(1) {
my $text = <STDIN>;
my $text1 = <STDIN>;
my $text2 = <STDIN>;
if (($text && $text1 && $text2) or ($text && $text1) or $text) {
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday) = localtime(); $year += 1900;
$mon+=1;
my $timestamp = "$mon/$mday/$year $hour:$min:$sec";
chomp $text;
chomp $text1;
chomp $text2;
# my $matched = 0;
# foreach my $test (#matches) {
# chomp $test;
# $matched = 1 if ($text =~ /$test/i);
# }
# if ($matched) {
# system("aplay /SWIPE/good.wav >/dev/null 2>/dev/null");
# } else {
# system("aplay /SWIPE/bad.wav >/dev/null 2>/dev/null");
# }
# write out the swipe even if its bad...
open(LOG,'>>/DATA/SWIPES.TXT');
print LOG $text."\t".$text1."\t".$text2."\t".$timestamp."\n";
close(LOG);
if ($text =~ $error or $text1 =~ $error or $text2 =~ $error) {
system("aplay /SWIPE/bad.wav >/dev/null 2>/dev/null");
}
else {
system("aplay /SWIPE/good.wav >/dev/null 2>/dev/null");
}
}
}
exit;
I did not write this script, and the person who did write it, is long gone. Currently I have 2 machines. One of which is working and the other is the one Im trying to get to work. Im trying to figure out how this script gets input (on the machine that is working). I can open the log file /DATA/SWIPES.TXT and view the actual swipes. Currently there are no running processes on the machine that would affect the script, here are the processes:
PID TTY STAT TIME COMMAND
1 ? Ss 0:29 init [2]
2 ? S< 0:00 [kthreadd]
3 ? S< 0:04 [ksoftirqd/0]
4 ? S< 3:21 [events/0]
5 ? S< 0:00 [khelper]
44 ? S< 0:00 [kblockd/0]
46 ? S< 0:00 [kacpid]
47 ? S< 0:00 [kacpi_notify]
94 ? S< 0:00 [kseriod]
134 ? S 0:00 [pdflush]
135 ? S 0:06 [pdflush]
136 ? S< 0:00 [kswapd0]
137 ? S< 0:00 [aio/0]
138 ? S< 0:00 [nfsiod]
795 ? S< 0:00 [kpsmoused]
800 ? S< 0:00 [rpciod/0]
1627 ? S< 0:00 [ksuspend_usbd]
1631 ? S< 0:00 [khubd]
1646 ? S< 0:00 [ata/0]
1648 ? S< 0:00 [ata_aux]
1794 ? S<s 0:00 udevd --daemon
2913 ? Ss 0:00 pump -i eth0
2979 ? Ss 0:00 /usr/sbin/rpc.idmapd
3060 ? S 0:01 /usr/sbin/syslogd --no-forward
3083 ? Ss 0:00 /usr/sbin/sshd
3099 ? S 0:00 /usr/sbin/inetutils-inetd
3122 ? Ss 0:00 /usr/sbin/pptpd
3138 ? Ss 0:00 /usr/sbin/cron
3149 ? SLs 0:33 /usr/sbin/watchdog
3167 tty2 Ss+ 0:00 /sbin/mingetty tty2
3169 tty3 Ss+ 0:00 /sbin/rungetty tty3
3170 tty4 Ss+ 0:00 /sbin/rungetty tty4
3173 tty5 Ss+ 0:00 /sbin/getty 38400 tty5
3175 tty6 Ss+ 0:00 /sbin/getty 38400 tty6
15677 ? Ss 0:00 sshd: root#pts/0
15679 pts/0 Ss 0:00 -bash
15710 ? Z 0:00 [watchdog] <defunct>
15711 pts/0 R+ 0:00 ps x
So, from there, I don't know where to go. Can anyone give me any suggestions or hints as to how this script is actually receiving the input from the usb reader. Also, it some how receives the input while not being logged in. The machine is an embedded machine, I turn it on, and it accepts swipes and saves them, using the perl script.
Take a look at udev, among other things it can: "Launch a script when a device node is created or deleted (typically when a device is attached or unplugged)"
http://www.reactivated.net/writing_udev_rules.html
The key bits are here:
while(1) {
my $text = <STDIN>;
The USB card reader is set up to direct its input to STDIN, since it's acting like a keyboard. When it finishes reading a card it sends a carriage return. The "input" then gets read by Perl and stuck into $text, then it waits for the next swipe. Once three swipes are done (the three <STDIN> lines) then it processes the information and writes it to the file. Then, since you're in a while(1) loop, it just loops back to the top of the loop and waits for more input.
You can simulate this on a different computer by running the program, then when it's waiting for input you type in some text and finish it with the Enter key. Do that three times to simulate the three swipes, and the program should process it.
The script is reading from stdin, so you need to find where/who is calling this script and see what is being piped in on stdin.
Have you checked the system's cron jobs? You might find a hint by looking at the timestamp and ownership of the /DATA/SWIPES.TXT file.