When installing kabanero-foundation, does "Waiting for KnativeServing knative-serving to be ready." have a time-out? - kabanero

Installing on OpenShift4.2 3x3 Cluster using install script posted on https://github.com/kabanero-io/kabanero-foundation. The following block is echoed for last 1.5 hours
echo 'Waiting for KnativeServing knative-serving to be ready.'
Waiting for KnativeServing knative-serving to be ready.
++ oc get knativeserving knative-serving -n knative-serving '--output=jsonpath={.status.conditions[-1:].type}'
+ TYPE=Ready
++ oc get knativeserving knative-serving -n knative-serving '--output=jsonpath={.status.conditions[-1:].status}'
+ STATUS=False
+ sleep 2
+ '[' False == True ']'

There is not a timeout - the install script will continue to wait until the KnativeServing instance is ready. This can take a considerable amount of time on some systems (15-30 minutes). However 1.5 hours is excessive.
There is a known bug with Kabanero 0.3.0 (https://github.com/kabanero-io/kabanero-operator/issues/317) which is not compatible with recent versions of OpenShift Container Platform 4.2. Specifically, not compatible with the OpenShift Serverless operator v1.2.0. This has been addressed in Kabanero 0.3.1 (https://github.com/kabanero-io/kabanero-operator/releases/tag/0.3.1). That might explain why the install script continues to loop for you - the KnativeServing instance would never be ready.

Related

Live debugging a frozen nodejs program

I have a nodejs process running but after a couple hours it keeps just getting stuck in some infinite loop somewhere, no error message or timeout message right now. So I am trying to live debug the frozen program:
On the command line I run $ top to get the process id of my node program, then run these commands to go into the node debug mode:
//my node PID=2599
$ kill -SIGUSR1 2599
$ node inspect -p 2599
debug> pause
break in internal/timers.js:483
481
482 function processTimers(now) {
>483 debug('process timer lists %d', now);
484 nextExpiry = Infinity;
485
debug> bt
#0 processTimers internal/timers.js:483:4
debug>
The two debug commands I've used so far are bt and pause, and I can see it points to a line in internal/timers.js, but where can I fine the location of this file? I've searched for the filename and code in my project but can't find it anywhere. I am running this version:
$ npm -v
7.12.0
$ node -v
v14.16.1

RIAK Node does not Start after changing IP

I am in the process of setting up a Riak Cluster on Raspberry Pis.
Unfortunately I get the following error message after changing the IP address.
Versions I used:
Debian Jessie (Raspberry PI)
riak (Github Clone Mar2017)
riak-cs2.1.1
stanchion-2.1.1
Using this guide I tried to change the IP addresses in the various .conf files.
https://docs.riak.com/riak/kv/latest/using/cluster-operations/changing-cluster-info/index.html
Works on 127.0.0.1:
$ ~/riak/rel/riak/bin/riak-admin test
Successfully completed 1 read/write cycle to 'riak#127.0.0.1'
Error Message (after changing IP:192.168.178.61):
sudo ./riak console
config is OK
-config /home/pi/neu/riak/rel/riak/data/generated.configs/app.2020.01.02.23.37.52.config -args_file /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -vm_args /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args
Exec: /home/pi/neu/riak/rel/riak/bin/../erts-5.10.3/bin/erlexec -boot /home/pi/neu/riak/rel/riak/bin/../releases/2.2.3/riak -config /home/pi/neu/riak/rel/riak/data/generated.configs/app.2020.01.02.23.37.52.config -args_file /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -vm_args /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -pa /home/pi/neu/riak/rel/riak/bin/../lib/basho-patches -- console
Root: /home/pi/neu/riak/rel/riak/bin/..
Erlang R16B02_basho10 (erts-5.10.3) [source] [smp:4:4] [async-threads:64] [hipe] [kernel-poll:true] [frame-pointer]
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
{"Kernel pid terminated",application_controller,"{application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak#192.168.178.61',[]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_capability,renegotiate_capabilities,1,[{file,\"src/riak_core_capability.erl\"},{line,441}]},{riak_core_capability,handle_call,3,[{file,\"src/riak_core_capability.erl\"},{line,213}]},{gen_server,handle_msg,5,[{file,\"gen_server.erl\"},{line,585}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]},{gen_server,call,[riak_core_capability,{register,{riak_core,vnode_routing},{capability,[proxy,legacy],legacy,{riak_core,legacy_vnode_routing,[{true,legacy},{false,proxy}]}}},infinity]}}}}}}"}
Crash dump was written to: ./log/erl_crash.dump
Kernel pid terminated (application_controller) ({application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak#192.168.178.61',[
https://github.com/basho/riak/issues/999
martinsumner commented 3 days ago:
I might expect to see this if you hadn't done the step of either renaming (or deleting the contents of) the ring directory. Did you do this?
Also can you confirm if you're in the single-node or multi-node renaming scenario?
Ei3rb0mb3r commented 1 minute ago:
Many thanks for the quick feedback!
The error has been solved after I deleted the ring directory files.
../riak/rel/riak/data/ring/ rm -rf *

Chef issue with local rpm package installation on oracle linux (oel)

I'm struggling with installation of packages available in form of locally downloaded rpm file - just on Oracle Linux (OEL). Is there a bug? Has anyone observed this? It would be a huge bug, so I'm bit surprised.
Chef recipe is quite simple:
pkg_src_location = 'https://s3.amazonaws.com/solution-automation-folder/qualys'
pkg = 'qualys-cloud-agent.x86_64.rpm'
local_image = "#{Chef::Config['file_cache_path']}/#{pkg}"
remote_file 'qualys-cloud-agent-image' do
path local_image
source "#{pkg_src_location}/#{pkg}"
end
package 'qualys-cloud-agent' do
source local_image
end
It's available from https://github.com/r2oro/oel_pkg_test.git.
I have observed that on Oracle Linux (OEL) it results with following python script being triggered:
/usr/bin/python /opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-12.16.42/lib/chef/provider/package/yum/yum-dump.py --options --installed-provides --yum-lock-timeout 30
It runs for quite a while (downloading several hundreds of megabytes of data - as far I can see - yum repo metadata) and eventually fails (kitchen in debug mode dumps all this to stdout...). Anyway the result is:
* yum_package[qualys-cloud-agent] action install[2016-12-01T12:35:32+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-12.16.42/lib/chef/provider/package/yum/yum-dump.py exceeded timeout 900
================================================================================
Error executing action `install` on resource 'yum_package[qualys-cloud-agent]'
================================================================================
Mixlib::ShellOut::CommandTimeout
--------------------------------
Command timed out after 900s:
Command exceeded allowed execution time, process terminated
---- Begin output of /usr/bin/python /opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-12.16.42/lib/chef/provider/package/yum/yum-dump.py --options --installed-provides --yum-lock-timeout 30 ----
STDOUT:
STDERR:
---- End output of /usr/bin/python /opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-12.16.42/lib/chef/provider/package/yum/yum-dump.py --options --installed-provides --yum-lock-timeout 30 ----
Ran /usr/bin/python /opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-12.16.42/lib/chef/provider/package/yum/yum-dump.py --options --installed-provides --yum-lock-timeout 30 returned
Resource Declaration:
---------------------
# In /tmp/kitchen/cache/cookbooks/oel_pkg_test/recipes/default.rb
16: package 'qualys-cloud-agent' do
17: source local_image
18: end
Compiled Resource:
------------------
# Declared in /tmp/kitchen/cache/cookbooks/oel_pkg_test/recipes/default.rb:16:in `from_file'
yum_package("qualys-cloud-agent") do
package_name "qualys-cloud-agent"
action [:install]
retries 0
retry_delay 2
default_guard_interpreter :default
declared_type :package
cookbook_name "oel_pkg_test"
recipe_name "default"
source "/tmp/kitchen/cache/qualys-cloud-agent.x86_64.rpm"
flush_cache {:before=>false, :after=>false}
end
Did you note that yum flush_cache should not happen, but it still does? It's frustrating. This always fails so in my local kitchen (with vagrant/virtualbox) or even in AWS cloud kitchen... Real instances sometimes fail sometimes converge... But it's a lottery. Anyway why this cache update happens at all for single local rpm image!?
I did try to use rpm_package but this leads to problems with yum_package beings used in other recipes...
Any thoughts?
You probably do want to use rpm_package in this case, but as for why the cache is reloading, it might just be the first time it is getting hit and so has to do the initial reload, or it's after something else modified the package set.

FreeBSD pkg suddenly stopped bootstrapping

I've set up a packer template to generate vagrant base image of FreeBSD 10.3 and it was working well at least Mon Oct 3 00:34:41 2016 +0300.
Yesterday I was going to continue my work on this project and it turned out this is not working anymore. So here come details.
Packer does what it have to do, then runs my script to install FreeBSD by using bsdinstall(8) with the following script:
PARTITIONS="ada0 { 29G freebsd-ufs /, 5G freebsd-swap, 10G freebsd-ufs /var }"
DISTRIBUTIONS="base.txz kernel.txz"
#!/bin/sh
echo 'WITHOUT_X11="YES"' >> /etc/make.conf
echo 'OPTIONS_UNSET=X11' >> /etc/make.conf
echo 'nameserver 8.8.8.8' >> /etc/resolv.conf
cat >> /etc/rc.conf <<EOF
ifconfig_em0="DHCP"
sshd_enable="YES"
dumpdev="NO"
EOF
env ASSUME_ALWAYS_YES=1 pkg bootstrap # <<stops here
pkg update
pkg install -y sudo
[.....snip.....]
reboot
This stops at bootstrapping pkg with the message:
Bootstrapping pkg from pkg+http://pkg.FreeBSD.org/FreeBSD:10:amd64/quarterly, please wait...
Signature for pkg not available.
pkg: Error fetching http://pkg.FreeBSD.org/FreeBSD:10:amd64/quarterly/Latest/pkg.txz.sig: Connection reset by peer
A pre-built version of pkg could not be found for your system.
Consider changing PACKAGESITE or installing it from ports: 'ports-mgmt/pkg'.
If I stop the bsdinstall script and chroot /mnt /bin/sh I can fetch pkg.txz.sig from the above URL without any problems.
Any ideas what could be the reason of the "connection reset by peer"? Something was changed on the pkg.FreeBSD.org recently?
I couldn't find anything about the issue.
UPD1
Looking at the captured traffic -- the site really answers 200OK and then drops the connection for the pkg.txz.sig file.
But this 200OK packet contains the signature file and they are identical for both manual fetch (which succeeds) and pkg bootstrap (which fails)
Both sessions are identical, so this is likely not a networking problem.
UPD2
The truss was not helpful either.
So as a workaround I've just modified my bsdinstall script to fetch files manually:
[.....snip.....]
#env ASSUME_ALWAYS_YES=1 pkg bootstrap
fetch http://pkg.FreeBSD.org/FreeBSD:10:amd64/quarterly/Latest/pkg.txz
fetch http://pkg.FreeBSD.org/FreeBSD:10:amd64/quarterly/Latest/pkg.txz.sig
pkg add pkg.txz
pkg update
[.....snip.....]
PS: The only thing that I can suspect now is the virtualbox version update... anyway downgrading is not an option. (ISO checksum is hardcoded into the template, the template and scripts are in git repository, so accidential changes are impossible)
UPD3
I've set up a debugging environment, for the moment I only isolated the function where the error is raised.
It's the second buffer refill from the http connection (while the first one has already read 727 bytes - it should be EOF)...
Here is small gdb log with backtrace and breakpoints to get there.
Added tcpdump capture made on the system (wireshark compatible).
As I found out, partially the problem was with pkg -- they try to read 10240 bytes from the connection, expecting the EOF if file will be smaller, but somehow on my system EOF is not set when whole remote file was already read out.
# /release/10.3.0/usr.sbin/pkg/pkg.c
185 char buf[10240];
242 while ((r = fread(buf, 1, sizeof(buf), remote)) > 0) {
and the following loops twice -- first time reading the file, second time getting connection reset error instead of EOF
# /release/10.3.0/lib/libc/stdio/fread.c
94 resid = count * size; # == 10240 here
100 while (resid > (r = fp->_r)) {
101 (void)memcpy((void *)p, (void *)fp->_p, (size_t)r);
102 fp->_p += r;
103 /* fp->_r = 0 ... done in __srefill */
104 p += r;
105 resid -= r;
106 if (__srefill(fp)) {
107 /* no more input: return partial result */
108 return ((total - resid) / size);
109 }
110 }
While manual fetch succeeds because the size is adjusted for small chunks and they only ask 727 bytes to read:
# /release/10.3.0/usr.bin/fetch/fetch.c
720 if (us.size != -1 && us.size - count < B_size &&
721 us.size - count >= 0)
722 size = us.size - count;
723 else
724 size = B_size;
733 if ((readcnt = fread(buf, 1, size, f)) < size) {
...but why EOF is not set is still a question.
Posted this to freebsd-pkg mailing list.
UPD1
Downgraded Virtualbox from 5.028 to 5.026 and EOF is set, _sread() on libc/stdio/refill.c:135 returns 0 and it sets EOF on line 138.
So something was changed in Virtualbox networking too. Added pcap file for Virtualbox 5.026 to the gist. 5.028 really was the culprit of connection reset - here is captures comparison.
Virtualbox 5.1.8 has this bug too. Version 5.1.6 works ok.
Opened ticket #16141 in their bugtracker.

Openstack TripleO undercloud installation "could not find class ::ironic::drivers::deploy"

My host is:
cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
The host setup was done as described here: http://docs.openstack.org/developer/tripleo-docs/environments/environments.html#virtual-environment up to the "Continue with Undercloud ..." step
The result:
sudo virsh list --all
Id Name State
----------------------------------------------------
3 baremetalbrbm_0 running
4 instack running
- baremetalbrbm_1 shut off
The undercloud setup was done as described here: http://docs.openstack.org/developer/tripleo-docs/installation/installation.html
The installation was attempted on the instack VM. Did the SSL setup as well.
Running
openstack undercloud install
fails with
+ puppet apply --detailed-exitcodes /etc/puppet/manifests/puppet-stack-config.pp Notice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked. Warning: Scope(Class[Swift]): swift_hash_suffix has been deprecated and should be replaced with swift_hash_path_suffix, this will be removed Warning: Scope(Class[Nova::Keystone::Auth]): Note that service_name parameter default value will be changed to "Compute Service" (according future release. In case you use different value, please update your manifests accordingly. Warning: Scope(Class[Nova::Keystone::Auth]): Note that service_name_v3 parameter default value will be changed to "Compute Service v3" (acco in a future release. In case you use different value, please update your manifests accordingly. Warning: Scope(Class[Glance::Api]): The known_stores parameter is deprecated, use stores instead Warning: Scope(Class[Glance::Api]): default_store not provided, it will be automatically set to glance.store.filesystem.Store Warning: Scope(Class[Nova::Api]): In N cycle, enabled_apis will have to be an array of APIs to enable. Warning: Scope(Class[Neutron::Server]): identity_uri, auth_tenant, auth_user, auth_password, auth_region configuration options are deprecateted options Warning: Scope(Class[Neutron::Agents::Dhcp]): The dhcp_domain parameter is deprecated and will be removed in future releases Warning: Scope(Class[Heat]): Default value for rabbit_heartbeat_timeout_threshold parameter is different from OpenStack project defaults Warning: Scope(Class[Heat]): "admin_user", "admin_password", "admin_tenant_name" configuration options are deprecated in favor of auth_plugi Warning: Scope(Class[Nova::Network::Neutron]): neutron_auth_plugin parameter is deprecated and will be removed in a future release, use neut Error: Could not find class ::ironic::drivers::deploy for instack on node instack Error: Could not find class ::ironic::drivers::deploy for instack on node instack
+ rc=1
+ set -e
+ echo 'puppet apply exited with exit code 1' puppet apply exited with exit code 1
+ '[' 1 '!=' 2 -a 1 '!=' 0 ']'
+ exit 1 [2016-05-19 15:32:29,361] (os-refresh-config) [ERROR] during configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/cot status 1]
[2016-05-19 15:32:29,362] (os-refresh-config) [ERROR] Aborting... Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 987, in install
_run_orc(instack_env) File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 866, in _run_orc
_run_live_command(args, instack_env, 'os-refresh-config') File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 444, in _run_live_command
raise RuntimeError('%s failed. See log for details.' % name) RuntimeError: os-refresh-config failed. See log for details. Command 'instack-install-undercloud' returned non-zero exit status 1
Tried to install the ironic api as described here http://docs.openstack.org/developer/ironic/deploy/install-guide.html although to my understanding, this should not be necessary, since the undercloud was not installed on a baremetal machine.
Same result.
Some hours of Puppet readings later, I went into the /etc/puppet/modules/ironic/manifests/drivers folder and found, to no surprise, that the deploy class was not there. Perhaps it should not have been needed? I copied it from https://github.com/openstack/puppet-ironic/blob/master/manifests/drivers/deploy.pp and it seems to have got past the error originally reported. Fingers crossed.

Resources