I've set up a packer template to generate vagrant base image of FreeBSD 10.3 and it was working well at least Mon Oct 3 00:34:41 2016 +0300.
Yesterday I was going to continue my work on this project and it turned out this is not working anymore. So here come details.
Packer does what it have to do, then runs my script to install FreeBSD by using bsdinstall(8) with the following script:
PARTITIONS="ada0 { 29G freebsd-ufs /, 5G freebsd-swap, 10G freebsd-ufs /var }"
DISTRIBUTIONS="base.txz kernel.txz"
#!/bin/sh
echo 'WITHOUT_X11="YES"' >> /etc/make.conf
echo 'OPTIONS_UNSET=X11' >> /etc/make.conf
echo 'nameserver 8.8.8.8' >> /etc/resolv.conf
cat >> /etc/rc.conf <<EOF
ifconfig_em0="DHCP"
sshd_enable="YES"
dumpdev="NO"
EOF
env ASSUME_ALWAYS_YES=1 pkg bootstrap # <<stops here
pkg update
pkg install -y sudo
[.....snip.....]
reboot
This stops at bootstrapping pkg with the message:
Bootstrapping pkg from pkg+http://pkg.FreeBSD.org/FreeBSD:10:amd64/quarterly, please wait...
Signature for pkg not available.
pkg: Error fetching http://pkg.FreeBSD.org/FreeBSD:10:amd64/quarterly/Latest/pkg.txz.sig: Connection reset by peer
A pre-built version of pkg could not be found for your system.
Consider changing PACKAGESITE or installing it from ports: 'ports-mgmt/pkg'.
If I stop the bsdinstall script and chroot /mnt /bin/sh I can fetch pkg.txz.sig from the above URL without any problems.
Any ideas what could be the reason of the "connection reset by peer"? Something was changed on the pkg.FreeBSD.org recently?
I couldn't find anything about the issue.
UPD1
Looking at the captured traffic -- the site really answers 200OK and then drops the connection for the pkg.txz.sig file.
But this 200OK packet contains the signature file and they are identical for both manual fetch (which succeeds) and pkg bootstrap (which fails)
Both sessions are identical, so this is likely not a networking problem.
UPD2
The truss was not helpful either.
So as a workaround I've just modified my bsdinstall script to fetch files manually:
[.....snip.....]
#env ASSUME_ALWAYS_YES=1 pkg bootstrap
fetch http://pkg.FreeBSD.org/FreeBSD:10:amd64/quarterly/Latest/pkg.txz
fetch http://pkg.FreeBSD.org/FreeBSD:10:amd64/quarterly/Latest/pkg.txz.sig
pkg add pkg.txz
pkg update
[.....snip.....]
PS: The only thing that I can suspect now is the virtualbox version update... anyway downgrading is not an option. (ISO checksum is hardcoded into the template, the template and scripts are in git repository, so accidential changes are impossible)
UPD3
I've set up a debugging environment, for the moment I only isolated the function where the error is raised.
It's the second buffer refill from the http connection (while the first one has already read 727 bytes - it should be EOF)...
Here is small gdb log with backtrace and breakpoints to get there.
Added tcpdump capture made on the system (wireshark compatible).
As I found out, partially the problem was with pkg -- they try to read 10240 bytes from the connection, expecting the EOF if file will be smaller, but somehow on my system EOF is not set when whole remote file was already read out.
# /release/10.3.0/usr.sbin/pkg/pkg.c
185 char buf[10240];
242 while ((r = fread(buf, 1, sizeof(buf), remote)) > 0) {
and the following loops twice -- first time reading the file, second time getting connection reset error instead of EOF
# /release/10.3.0/lib/libc/stdio/fread.c
94 resid = count * size; # == 10240 here
100 while (resid > (r = fp->_r)) {
101 (void)memcpy((void *)p, (void *)fp->_p, (size_t)r);
102 fp->_p += r;
103 /* fp->_r = 0 ... done in __srefill */
104 p += r;
105 resid -= r;
106 if (__srefill(fp)) {
107 /* no more input: return partial result */
108 return ((total - resid) / size);
109 }
110 }
While manual fetch succeeds because the size is adjusted for small chunks and they only ask 727 bytes to read:
# /release/10.3.0/usr.bin/fetch/fetch.c
720 if (us.size != -1 && us.size - count < B_size &&
721 us.size - count >= 0)
722 size = us.size - count;
723 else
724 size = B_size;
733 if ((readcnt = fread(buf, 1, size, f)) < size) {
...but why EOF is not set is still a question.
Posted this to freebsd-pkg mailing list.
UPD1
Downgraded Virtualbox from 5.028 to 5.026 and EOF is set, _sread() on libc/stdio/refill.c:135 returns 0 and it sets EOF on line 138.
So something was changed in Virtualbox networking too. Added pcap file for Virtualbox 5.026 to the gist. 5.028 really was the culprit of connection reset - here is captures comparison.
Virtualbox 5.1.8 has this bug too. Version 5.1.6 works ok.
Opened ticket #16141 in their bugtracker.
Related
I have been using georep for the last two months and posted this on their GitHub but no answers so far.
Description of problem: after copying ~8TB without any issue, some nodes are flipping between Active and Faulty with the following error message in gsync log:
ssh> failed with UnicodeDecodeError: 'ascii' codec can't decode byte 0xf2 in position 60: ordinal not in range(128).
Default encoding in all machines is utf-8
Command to reproduce the issue:
gluster volume georeplication master_vol user#slave_machine::slave_vol start
The full output of the command that failed:
The command itself it's fine but you need to start it to fail, hence the command it's not the issue on it's own
Expected results:
No such failures, copy should go as planned
Mandatory info:
The output of the gluster volume info command:
Volume Name: volname
Type: Distributed-Replicate
Volume ID: d5a46398-9638-4b50-9db0-4cd7019fa526
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x 2 = 24
Transport-type: tcp
Bricks: 24 bricks (omited the names cause not relevant and too large)
Options Reconfigured:
features.ctime: off
cluster.min-free-disk: 15%
performance.readdir-ahead: on
server.event-threads: 8
cluster.consistent-metadata: on
performance.cache-refresh-timeout: 1
diagnostics.client-log-level: WARNING
diagnostics.brick-log-level: WARNING
performance.flush-behind: off
performance.cache-size: 5GB
performance.cache-max-file-size: 1GB
performance.io-thread-count: 32
performance.write-behind-window-size: 8MB
client.event-threads: 8
network.inode-lru-limit: 1000000
performance.md-cache-timeout: 1
performance.cache-invalidation: false
performance.stat-prefetch: on
features.cache-invalidation-timeout: 30
features.cache-invalidation: off
cluster.lookup-optimize: on
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
storage.owner-uid: 33
storage.owner-gid: 33
features.bitrot: on
features.scrub: Active
features.scrub-freq: weekly
cluster.rebal-throttle: lazy
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on
The output of the gluster volume status command:
Don't really think this is relevant as everything seems fine, if needed I'll post it
The output of the gluster volume heal command:
Same as before
**- Provide logs present on following locations of client and server nodes -
/var/log/glusterfs/
Not the relevant ones as is georep, posting the exact issue: (this log is from master volume node)
[2022-09-23 09:53:32.565196] I [master(worker /bricks/brick1/data):1439:process] _GMaster: Entry Time Taken [{MKD=0}, {MKN=0}, {LIN=0}, {SYM=0}, {REN=0}, {RMD=0}, {CRE=0}, {duration=0.0000}, {UNL=0}]
[2022-09-23 09:53:32.565651] I [master(worker /bricks/brick1/data):1449:process] _GMaster: Data/Metadata Time Taken [{SETA=0}, {SETX=0}, {meta_duration=0.0000}, {data_duration=1663926812.5656}, {DATA=0}, {XATT=0}]
[2022-09-23 09:53:32.566270] I [master(worker /bricks/brick1/data):1459:process] _GMaster: Batch Completed [{changelog_end=1663925895}, {entry_stime=None}, {changelog_start=1663925895}, {stime=(0, 0)}, {duration=673.9491}, {num_changelogs=1}, {mode=xsync}]
[2022-09-23 09:53:32.668133] I [master(worker /bricks/brick1/data):1703:crawl] _GMaster: processing xsync changelog [{path=/var/lib/misc/gluster/gsyncd/georepsession/bricks-brick1-data/xsync/XSYNC-CHANGELOG.1663926139}]
[2022-09-23 09:53:33.358545] E [syncdutils(worker /bricks/brick1/data):325:log_raise_exception] : connection to peer is broken
[2022-09-23 09:53:33.358802] E [syncdutils(worker /bricks/brick1/data):847:errlog] Popen: command returned error [{cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-GcBeU5/38c083bada86a45a28e6710377e456f6.sock geoaccount#slavenode6 /usr/libexec/glusterfs/gsyncd slave mastervol geoaccount#slavenode1::slavevol --master-node masternode21 --master-node-id 08c7423e-c2b6-4d40-adc8-d2ded4f66608 --master-brick /bricks/brick1/data --local-node slavenode6 --local-node-id bc1b3971-50a7-4b32-a863-aaaa02419de6 --slave-timeout 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbin --master-dist-count 12}, {error=1}]
[2022-09-23 09:53:33.358927] E [syncdutils(worker /bricks/brick1/data):851:logerr] Popen: ssh> failed with UnicodeDecodeError: 'ascii' codec can't decode byte 0xf2 in position 60: ordinal not in range(128).
[2022-09-23 09:53:33.672739] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}]
[2022-09-23 09:53:45.477905] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}]
**- Is there any crash ? Provide the backtrace and coredump
Provided log up
Additional info:
Master volume: 12x2 Distributed-replicated setup, been working for a couple years no, no big issues as of today. 160TB of Data
Slave volume: 2x(5+1) Distributed-disperse setup, created exclusively to be a slave georep node. Managed to copy 11TB of data from master node, but it's failing.
The operating system / glusterfs version:
On ALL nodes: Glusterfs version= 9.6
Master nodes OS: CentOS 7
Slave nodes OS: Debian11
Extra questions
Don't really know if it's the place to ask this, but while we're at it, any guidance as of how to improve sync performance? Tried changing the parameter sync_jobs up to 9 (from 3) but as we've seen (while it was working) it'd only copy from 3 nodes max, at a "low" speed (about 40% of our bandwidth). It could go as high as 1Gbps but the max we got was 370Mbps.
Also, is there any in-depth documentation for georep? The basics we found were too basic and we did miss more doc to read and dig up into.
I am trying to connect M210 v2 RTK to a desktop computer with Ubuntu 18.04, ROS Melodic and parallel installation of Opencv 3.3.1 and 4.5.3 using a USB-TTL RS232 to make UART connection and an USB-USB connecting drone and desktop to be able to run Advanced Sensing.
When I call ls -l /dev/ttyACM* && ls -l /dev/ttyUSB* it returns that it is indentified the USB and ACM connection.
crw-rw---- 1 root dialout 166, 0 out 4 13:18 /dev/ttyACM0
crw-rw---- 1 root dialout 188, 0 out 4 13:18 /dev/ttyUSB0
I also set the transfer rate of TTL-USB to 921600 using minicom, and gave persmission to device to read and write with sudo usermod -a -G dialout $USER && sudo chmod 666 /dev/ttyUSB0
Unfortunatelly when I launch roslaunch dji_osdk_ros dji_sdk_node.launch it appears some connection problem presented below and I am not being able to fix it. I have been trying to turn on/off drone and RC several times ass described here, but the problem still stand.
started roslaunch server http://V3D06:43613/
SUMMARY
========
PARAMETERS
* /dji_sdk/acm_name: /dev/ttyACM0
* /dji_sdk/align_time: False
* /dji_sdk/app_id: 1076017
* /dji_sdk/app_version: 1
* /dji_sdk/baud_rate: 921600
* /dji_sdk/dxc: False
* /dji_sdk/enc_key: 6bd1d26f8dd897e4b...
* /dji_sdk/serial_name: /dev/ttyUSB0
* /dji_sdk/use_broadcast: False
* /rosdistro: melodic
* /rosversion: 1.14.12
NODES
/
dji_sdk (dji_osdk_ros/dji_sdk_node)
auto-starting new master
process[master]: started with pid [2436]
ROS_MASTER_URI=http://localhost:11311
setting /run_id to bde7b4d2-252e-11ec-8a59-1831bfb3e154
process[rosout-1]: started with pid [2458]
started core service [/rosout]
process[dji_sdk-2]: started with pid [2464]
[ INFO] [1633364323.534426789]: Advanced Sensing is Enabled on M210.
Read App ID
User Configuration read successfully.
[1276751.089]STATUS/1 # getDroneVersion, L1702: ret = 0
[1276751.089]STATUS/1 # parseDroneVersionInfo, L1122: Device Serial No. = 1DADG3E00100U4
[1276751.089]STATUS/1 # parseDroneVersionInfo, L1124: Firmware = 3.4.3.44
[1276751.089]STATUS/1 # functionalSetUp, L279: Shake hand with drone successfully by getting drone version.
[1276751.089]STATUS/1 # legacyX5SEnableTask, L56: Legacy X5S Enable task created.
[1276752.089]STATUS/1 # sendHeartbeatToFCTask, L1576: OSDK send heart beat to fc task created.
[1276752.289]STATUS/1 # Control, L40: The control class is going to be deprecated.It will be better to use the FlightController class instead!
[1276752.290]STATUS/1 # FileMgrImpl, L253: register download file callback handler successfully.
[1276753.557]STATUS/1 # PSDKModule, L98: MOP only support M300, so mop client will not be initialized here.
[1276753.557]STATUS/1 # PSDKModule, L98: MOP only support M300, so mop client will not be initialized here.
[1276753.557]STATUS/1 # PSDKModule, L98: MOP only support M300, so mop client will not be initialized here.
[1276753.557]STATUS/1 # initDJIHms, L900: DJI HMS is not supported on this platform!
[1276753.567]STATUS/1 # getDroneVersion, L1702: ret = 0
[1276753.567]STATUS/1 # parseDroneVersionInfo, L1122: Device Serial No. = 1DADG3E00100U4
[1276753.567]STATUS/1 # parseDroneVersionInfo, L1124: Firmware = 3.4.3.44
[1276753.567]STATUS/1 # AdvancedSensing, L145: Advanced Sensing init for the M210 drone
[1276753.567]STATUS/1 # init, L49: Looking for USB device...
[1276753.572]STATUS/1 # init, L65: Found 8 USB devices, identifying DJI device...
[1276753.572]STATUS/1 # init, L83: Found a DJI device...
[1276753.572]STATUS/1 # init, L96: Attempting to open DJI USB device...
[1276753.572]ERRORLOG/1 # init, L101: Failed to open DJI USB device...
[1276753.572]ERRORLOG/1 # init, L102: Error code: -3
[1276753.572]ERRORLOG/1 # init, L105: Please make sure you provide a udev file for your system and reboot the computer
[1276753.573]STATUS/1 # LiveViewImpl, L89: Finding if liveview stream is available now.
[1276754.076]STATUS/1 # init, L254: Start advanced sensing initalization
[1276754.076]STATUS/1 # activate, L1329: version 0x304032C
[1276754.076]STATUS/1 # adv_pthread, L46: adv pthread created !!!!!!!!!!!!!!!!!!!!!!!
[1276754.076]STATUS/1 # adv_pthread, L48: adv pthread running !!!!!!!!!!!!!!!!!!!!!!!
[dji_sdk-2] process has died [pid 2464, exit code -11, cmd /home/vant3d/catkin_ws/devel/lib/dji_osdk_ros/dji_sdk_node __name:=dji_sdk __log:=/home/vant3d/.ros/log/bde7b4d2-252e-11ec-8a59-1831bfb3e154/dji_sdk-2.log].
log file: /home/vant3d/.ros/log/bde7b4d2-252e-11ec-8a59-1831bfb3e154/dji_sdk-2*.log
It appears it has some problem providing a udev file, but I don't know how to fix it. Does anyone have some idea to help on this problems?
Thank you!
That's my post. Firstly turn off advanced sensing to try whether a basic FTDI works.
The second which DJI OSDK version are you using? does the OSDK version match the version in OSDK-ROS? I saw you have M300 in. that is usually in OSDK 4+. For M210, I only use 3.8 and 3.9
If basic FTDI works, and you can get all the feedback. there is a higher chance that you have the wrong ACM config. DJI RNDIS thing is nasty and may not be config properly. You need to manually set static IP of 192.168.43.1 (or I remember something like this 42 or 43, you need to check on this static IP) and set it manually
I'm struggling with installation of packages available in form of locally downloaded rpm file - just on Oracle Linux (OEL). Is there a bug? Has anyone observed this? It would be a huge bug, so I'm bit surprised.
Chef recipe is quite simple:
pkg_src_location = 'https://s3.amazonaws.com/solution-automation-folder/qualys'
pkg = 'qualys-cloud-agent.x86_64.rpm'
local_image = "#{Chef::Config['file_cache_path']}/#{pkg}"
remote_file 'qualys-cloud-agent-image' do
path local_image
source "#{pkg_src_location}/#{pkg}"
end
package 'qualys-cloud-agent' do
source local_image
end
It's available from https://github.com/r2oro/oel_pkg_test.git.
I have observed that on Oracle Linux (OEL) it results with following python script being triggered:
/usr/bin/python /opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-12.16.42/lib/chef/provider/package/yum/yum-dump.py --options --installed-provides --yum-lock-timeout 30
It runs for quite a while (downloading several hundreds of megabytes of data - as far I can see - yum repo metadata) and eventually fails (kitchen in debug mode dumps all this to stdout...). Anyway the result is:
* yum_package[qualys-cloud-agent] action install[2016-12-01T12:35:32+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-12.16.42/lib/chef/provider/package/yum/yum-dump.py exceeded timeout 900
================================================================================
Error executing action `install` on resource 'yum_package[qualys-cloud-agent]'
================================================================================
Mixlib::ShellOut::CommandTimeout
--------------------------------
Command timed out after 900s:
Command exceeded allowed execution time, process terminated
---- Begin output of /usr/bin/python /opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-12.16.42/lib/chef/provider/package/yum/yum-dump.py --options --installed-provides --yum-lock-timeout 30 ----
STDOUT:
STDERR:
---- End output of /usr/bin/python /opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-12.16.42/lib/chef/provider/package/yum/yum-dump.py --options --installed-provides --yum-lock-timeout 30 ----
Ran /usr/bin/python /opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-12.16.42/lib/chef/provider/package/yum/yum-dump.py --options --installed-provides --yum-lock-timeout 30 returned
Resource Declaration:
---------------------
# In /tmp/kitchen/cache/cookbooks/oel_pkg_test/recipes/default.rb
16: package 'qualys-cloud-agent' do
17: source local_image
18: end
Compiled Resource:
------------------
# Declared in /tmp/kitchen/cache/cookbooks/oel_pkg_test/recipes/default.rb:16:in `from_file'
yum_package("qualys-cloud-agent") do
package_name "qualys-cloud-agent"
action [:install]
retries 0
retry_delay 2
default_guard_interpreter :default
declared_type :package
cookbook_name "oel_pkg_test"
recipe_name "default"
source "/tmp/kitchen/cache/qualys-cloud-agent.x86_64.rpm"
flush_cache {:before=>false, :after=>false}
end
Did you note that yum flush_cache should not happen, but it still does? It's frustrating. This always fails so in my local kitchen (with vagrant/virtualbox) or even in AWS cloud kitchen... Real instances sometimes fail sometimes converge... But it's a lottery. Anyway why this cache update happens at all for single local rpm image!?
I did try to use rpm_package but this leads to problems with yum_package beings used in other recipes...
Any thoughts?
You probably do want to use rpm_package in this case, but as for why the cache is reloading, it might just be the first time it is getting hit and so has to do the initial reload, or it's after something else modified the package set.
I am starting to use VACaMobil, a module for OMNET++ which allows, while evaluating
ITS solutions, to have a constant number of cars during a simulation period.
After making some changes to the code, I tried to run VACaMobil, with the configuration
flows 2. The simulation aborted and I got the following messages from sumo-launchd.py:
jcmh#juanca-freya:~$ python Proyectos-OMNeT++/inet/etc/sumo-launchd.py -vv -c sumo-gui
Logging to /tmp/sumo-launchd.log
Listening on port 9999
Connection from 127.0.0.1 on port 49847
Handling connection from 127.0.0.1 on port 49847
Got TraCI message of length 2
Got TraCI command of length 1
Got TraCI command 0x0
Got CMD_GETVERSION
Got TraCI message of length 248
Got TraCI command of length 247
Got TraCI command 0x75
Got CMD_FILE_SEND for "sumo-launchd.launch.xml"
Got CMD_FILE_SEND with data "<launch>
<copy file="downtown.mapa.xml"/>
<copy file="downtown.routes.xml"/>
<copy file="downtown.sumo.cfg" type="config"/>
<basedir path="../examples/VACaMobil/flows/Milan/"/>
<seed value="2"/>
</launch>
"
Creating temporary directory...
Temporary dir is /tmp/sumo-launchd-tmp-dsJ7kR
Base dir is ../examples/VACaMobil/flows/Milan/
Seed is 2
Finding free port number...
Claiming lock on port
...found port 56172
Releasing lock on port
Cleaning up
Result: "None"
Aborting on error: file "../examples/VACaMobil/flows/Milan/downtown.mapa.xml" does not exist
Closing connection from 127.0.0.1 on port 49847
A user posted a message in the blog of Sergio Tornell, one of the VACaMobil developers, asking to help,
because he had a problem similar to mine. The answer was: "It seems like you are using windows. You
have to modify all the routes, in windows you use "\" instead of "/"". He certainly was using Windows,
but I am in GNU/Linux.
What could be the problem? I don't think to be because backslashes, since I am in GNU/Linux.
I am using Elementary OS 3 (based in Ubuntu 14.04), OMNeT++ 4.6, VACaMobil for INET framework and SUMO 0.18.0.
Thanks in advance.
The way to solve this problem was simply:
To go to folder where was sumo-launchd.py:
jcmh#juanca-freya:~$ cd Proyectos-OMNeT++/inet/etc
Run sumo-launchd.py directly:
jcmh#juanca-freya:~/Proyectos-OMNeT++/inet/etc$ python sumo-launchd.py -vv -c sumo-gui
I would like to patch my ArchLinux for Raspberry Pi with grsecurity.
This is what I've done so far:
I've downloaded the linux-raspberry directory (with the PKGBUILD) available here
https://github.com/archlinuxarm/PKGBUILDs/tree/master/core/linux-raspberrypi
I used the linux-raspberry directory.
There, I wget the patch: http://grsecurity.net/stable/grsecurity-3.0-3.2.58-201405112002.patch
To continue, I've applied the patch in the PKGBUILD, in the prepare() function:
patch -p1 < "${srcdir}/grsecurity-3.0-3.2.58-201405112002.patch"
Then:
makepkg
Unfortunately, at the line of the patch, I got an:
==> ERROR: A failure occurred in prepare().
I've applied the patch manually and I got things like that:
Hunk #10 succeeded at 3232 (offset 440 lines).
Hunk #11 succeeded at 3242 (offset 440 lines).
Hunk #12 FAILED at 2816.
1 out of 12 hunks FAILED -- saving rejects to file virt/kvm/kvm_main.c.rej
This file contains :
--- virt/kvm/kvm_main.c
+++ virt/kvm/kvm_main.c
## -2816,9 +2832,6 ##
register_syscore_ops(&kvm_syscore_ops);
- kvm_preempt_ops.sched_in = kvm_sched_in;
- kvm_preempt_ops.sched_out = kvm_sched_out;
-
kvm_init_debug();
return 0;
That is probably because I used a wrong version of grsecurity for my kernel which is :
3.12.20-1-ARCH
If you have any idea if it might be this, or something else, please let me know