Glusterfs geo-replication issue - glusterfs

I have been using georep for the last two months and posted this on their GitHub but no answers so far.
Description of problem: after copying ~8TB without any issue, some nodes are flipping between Active and Faulty with the following error message in gsync log:
ssh> failed with UnicodeDecodeError: 'ascii' codec can't decode byte 0xf2 in position 60: ordinal not in range(128).
Default encoding in all machines is utf-8
Command to reproduce the issue:
gluster volume georeplication master_vol user#slave_machine::slave_vol start
The full output of the command that failed:
The command itself it's fine but you need to start it to fail, hence the command it's not the issue on it's own
Expected results:
No such failures, copy should go as planned
Mandatory info:
The output of the gluster volume info command:
Volume Name: volname
Type: Distributed-Replicate
Volume ID: d5a46398-9638-4b50-9db0-4cd7019fa526
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x 2 = 24
Transport-type: tcp
Bricks: 24 bricks (omited the names cause not relevant and too large)
Options Reconfigured:
features.ctime: off
cluster.min-free-disk: 15%
performance.readdir-ahead: on
server.event-threads: 8
cluster.consistent-metadata: on
performance.cache-refresh-timeout: 1
diagnostics.client-log-level: WARNING
diagnostics.brick-log-level: WARNING
performance.flush-behind: off
performance.cache-size: 5GB
performance.cache-max-file-size: 1GB
performance.io-thread-count: 32
performance.write-behind-window-size: 8MB
client.event-threads: 8
network.inode-lru-limit: 1000000
performance.md-cache-timeout: 1
performance.cache-invalidation: false
performance.stat-prefetch: on
features.cache-invalidation-timeout: 30
features.cache-invalidation: off
cluster.lookup-optimize: on
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
storage.owner-uid: 33
storage.owner-gid: 33
features.bitrot: on
features.scrub: Active
features.scrub-freq: weekly
cluster.rebal-throttle: lazy
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on
The output of the gluster volume status command:
Don't really think this is relevant as everything seems fine, if needed I'll post it
The output of the gluster volume heal command:
Same as before
**- Provide logs present on following locations of client and server nodes -
/var/log/glusterfs/
Not the relevant ones as is georep, posting the exact issue: (this log is from master volume node)
[2022-09-23 09:53:32.565196] I [master(worker /bricks/brick1/data):1439:process] _GMaster: Entry Time Taken [{MKD=0}, {MKN=0}, {LIN=0}, {SYM=0}, {REN=0}, {RMD=0}, {CRE=0}, {duration=0.0000}, {UNL=0}]
[2022-09-23 09:53:32.565651] I [master(worker /bricks/brick1/data):1449:process] _GMaster: Data/Metadata Time Taken [{SETA=0}, {SETX=0}, {meta_duration=0.0000}, {data_duration=1663926812.5656}, {DATA=0}, {XATT=0}]
[2022-09-23 09:53:32.566270] I [master(worker /bricks/brick1/data):1459:process] _GMaster: Batch Completed [{changelog_end=1663925895}, {entry_stime=None}, {changelog_start=1663925895}, {stime=(0, 0)}, {duration=673.9491}, {num_changelogs=1}, {mode=xsync}]
[2022-09-23 09:53:32.668133] I [master(worker /bricks/brick1/data):1703:crawl] _GMaster: processing xsync changelog [{path=/var/lib/misc/gluster/gsyncd/georepsession/bricks-brick1-data/xsync/XSYNC-CHANGELOG.1663926139}]
[2022-09-23 09:53:33.358545] E [syncdutils(worker /bricks/brick1/data):325:log_raise_exception] : connection to peer is broken
[2022-09-23 09:53:33.358802] E [syncdutils(worker /bricks/brick1/data):847:errlog] Popen: command returned error [{cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-GcBeU5/38c083bada86a45a28e6710377e456f6.sock geoaccount#slavenode6 /usr/libexec/glusterfs/gsyncd slave mastervol geoaccount#slavenode1::slavevol --master-node masternode21 --master-node-id 08c7423e-c2b6-4d40-adc8-d2ded4f66608 --master-brick /bricks/brick1/data --local-node slavenode6 --local-node-id bc1b3971-50a7-4b32-a863-aaaa02419de6 --slave-timeout 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbin --master-dist-count 12}, {error=1}]
[2022-09-23 09:53:33.358927] E [syncdutils(worker /bricks/brick1/data):851:logerr] Popen: ssh> failed with UnicodeDecodeError: 'ascii' codec can't decode byte 0xf2 in position 60: ordinal not in range(128).
[2022-09-23 09:53:33.672739] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}]
[2022-09-23 09:53:45.477905] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}]
**- Is there any crash ? Provide the backtrace and coredump
Provided log up
Additional info:
Master volume: 12x2 Distributed-replicated setup, been working for a couple years no, no big issues as of today. 160TB of Data
Slave volume: 2x(5+1) Distributed-disperse setup, created exclusively to be a slave georep node. Managed to copy 11TB of data from master node, but it's failing.
The operating system / glusterfs version:
On ALL nodes: Glusterfs version= 9.6
Master nodes OS: CentOS 7
Slave nodes OS: Debian11
Extra questions
Don't really know if it's the place to ask this, but while we're at it, any guidance as of how to improve sync performance? Tried changing the parameter sync_jobs up to 9 (from 3) but as we've seen (while it was working) it'd only copy from 3 nodes max, at a "low" speed (about 40% of our bandwidth). It could go as high as 1Gbps but the max we got was 370Mbps.
Also, is there any in-depth documentation for georep? The basics we found were too basic and we did miss more doc to read and dig up into.

Related

WineBottler returning error message every time i try an open .exe file on Mac

I'm trying to open a plugin for SNAP (to process Sentinel-3 imagery) on my Mac - the plugin downloads as an .exe file which means I need to open it using WineBottler. Every time I try and open the file however, I get this error message:
###BOTTLING### default.sh
/var/folders/rz/rr6ytzhx5gl60f1v1tbc67xm0000gn/T/AppTranslocation/6CDA1855-FA78-4A2A-A976-2C1A539F36ED/d/WineBottler.app/Contents/Frameworks/WBottler.framework/Resources/bottler.sh: line 39: /Applications/Wine.app/Contents/Resources/bin/wine: Bad CPU type in executable
###BOTTLING### Gathering debug Info...
Versions
OS...........................: darwin21
Wine.........................:
WineBottler..................: 1.8.6
Wineticks....................: 20220411-next - sha256sum: b6370f13c4dc410023f2a4e4e9a4385d2a0420031666c2f30befccc9b39c8f65
Environment
PWD..........................: '/Applications/Wine.app/Contents/Resources/bin'
PATH.........................: /Applications/Wine.app/Contents/Resources/bin:/usr/bin:/bin:/usr/sbin:/sbin
USER.........................: hannah
HOME.........................: /Users/hannah
COMPUTERNAME.................: hannahâs MacBook Air
BUNDLERESOURCEPATH...........: /var/folders/rz/rr6ytzhx5gl60f1v1tbc67xm0000gn/T/AppTranslocation/6CDA1855-FA78-4A2A-A976-2C1A539F36ED/d/WineBottler.app/Contents/Frameworks/WBottler.framework/Resources
WINEPREFIX...................: /Applications/Wine.app/Contents/Resources
WINEPATH.....................: /Applications/Wine.app/Contents/Resources/bin
LD_LIBRARY_PATH..............: /Applications/Wine.app/Contents/Resources/lib:/opt/X11/lib:/usr/X11/lib
DYLD_FALLBACK_LIBRARY_PATH...: /Applications/Wine.app/Contents/Resources/lib:/usr/lib:/opt/X11/lib:/usr/X11/lib
SILENT.......................:
http_proxy...................:
https_proxy..................:
ftp_proxy....................:
socks5_proxy.................:
Bottle
TEMPLATE.....................:
BOTTLE.......................: /Users/hannah/Desktop/Untitled.app
INSTALLER_URL................: /Users/hannah/Desktop/iCOR_Setup_3.0.0.exe
INSTALLER_IS_ZIPPED..........: 0
INSTALLER_NAME...............: iCOR_Setup_3.0.0.exe
INSTALLER_ARGUMENTS..........:
REMOVE_MONO..................:
REMOVE_GECKO.................:
REMOVE_USERS.................:
REMOVE_INSTALLERS............:
WINETRICKS_ITEMS.............: winxp
DLL_OVERRIDES................:
EXECUTABLE_PATH..............: winefile
EXECUTABLE_ARGUMENTS.........:
EXECUTABLE_VERSION...........: 1.0.0
BUNDLE_COPYRIGHT.............: © Your Company
BUNDLE_IDENTIFIER............: com.yourcompany.yourapp
BUNDLE_CATEGORYTYPE..........: public.app-category.business
SILENT.......................:
Hardware:
Hardware Overview:
Model Name: MacBook Air
Model Identifier: MacBookAir7,2
Processor Name: Dual-Core Intel Core i5
Processor Speed: 1.6 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 3 MB
Hyper-Threading Technology: Enabled
Memory: 4 GB
System Firmware Version: 476.0.0.0.0
OS Loader Version: 540.120.3~22
SMC Version (system): 2.27f2
Serial Number (system): C02QM1XWG941
Hardware UUID: EE27242F-C2B2-59E6-AAED-D598D1D61044
Provisioning UDID: EE27242F-C2B2-59E6-AAED-D598D1D61044
###BOTTLING### Create .app...
###BOTTLING### Enabling CoreAudio, Colors, Antialiasing and flat menus...
/var/folders/rz/rr6ytzhx5gl60f1v1tbc67xm0000gn/T/AppTranslocation/6CDA1855-FA78-4A2A-A976-2C1A539F36ED/d/WineBottler.app/Contents/Frameworks/WBottler.framework/Resources/bottler.sh: line 134: /Applications/Wine.app/Contents/Resources/bin/wine: Bad CPU type in executable
### LOG ### Command '/Applications/Wine.app/Contents/Resources/bin/wine regedit /tmp/reg.reg' returned status 126.
###ERROR### Command '/Applications/Wine.app/Contents/Resources/bin/wine regedit /tmp/reg.reg' returned status 126.
Task returned with status 1.
I've tried downloading the 'stable' version of WineBottler, download and redownload it to no avail - it always returns this message. I can't seem to find any way of getting around this or recently posted question (a lot are from 2010-15 and are outdated in their solutions)
Does anyone know what I can do to get around this and open it? It's driving me insane!!!
Thanks!

Unable to spin up 3 nodes via yb-master in yugabytedb

I am unable to start a 3 node universe with yb-master. I am following the docs here:
https://docs.yugabyte.com/latest/deploy/manual-deployment/start-masters/#verify-health
I created 3 master.conf files for 3 separate ips.
For 10.0.0.185:
--master_addresses=10.0.0.185:7100,10.0.0.141:7100,10.0.0.119:7100
--rpc_bind_addresses=10.0.0.185:7100
--fs_data_dirs=/home/mark/yuga/y1
For 10.0.0.141:
--master_addresses=10.0.0.141:7100,10.0.0.185:7100,10.0.0.119:7100
--rpc_bind_addresses=10.0.0.141:7100
--fs_data_dirs=/home/mark/yuga/y1
For 10.0.0.119:
--master_addresses=10.0.0.119:7100,10.0.0.141:7100,10.0.0.185:7100
--rpc_bind_addresses=10.0.0.119:7100
--fs_data_dirs=/home/mark/yuga/y1
I started each node up with the command ./bin/yb-master --flagfile master.conf >& ./y1/yb-master.out &
What seems to happen is that the first 2 nodes start up fine but as soon as I try to spin up the third node, the first node crashes and I end up with the error:
At first I thought may be this has to do with the servers I've got so I changed up the order I spin up the yb-masters, but it's always the first one I spin up first that dies.
Looking at the yb-master.INFO for each ip from yb1/yb-data/master/logs/yb-master.INFO with the command cat y1/yb-data/master/logs/yb-master.INFO | grep master I see:
The one that crashes:
This master's current role is: FOLLOWER
And the other two show:
I0110 00:02:56.565732 3292 client-internal.cc:2384] New master addresses: [10.0.0.141:7100,10.0.0.185:7100,10.0.0.119:7100, 10.0.0.141:7100, 10.0.0.185:7100, 10.0.0.119:7100]
E0110 00:02:58.069311 3162 async_initializer.cc:99] Failed to initialize client: Timed out (yb/rpc/rpc.cc:224): Could not locate the leader master: GetLeaderMasterRpc(addrs: [10.0.0.141:7100, 10.0.0.185:7100, 10.0.0.119:7100, 10.0.0.141:7100, 10.0.0.185:7100, 10.0.0.119:7100], num_attempts: 46) passed its deadline 1101.945s (passed: 1.504s): Network error (yb/util/net/socket.cc:551): recvmsg error: Connection refused (system error 111)
I0110 00:02:59.071501 3293 client-internal.cc:2355] Reinitialize master addresses from file: master.conf
I0110 00:02:59.071782 3293 client-internal.cc:2384] New master addresses: [10.0.0.141:7100,10.0.0.185:7100,10.0.0.119:7100, 10.0.0.141:7100, 10.0.0.185:7100, 10.0.0.119:7100]
and
I0110 00:02:57.610631 2128 master_service.cc:531] Patching role from leader to follower because of: Leader not ready to serve requests (yb/master/scoped_leader_shared_lock.cc:123): Leader not yet ready to serve requests: leader_ready_term_ = -1; cstate.current_term = 1 [suppressed 77 similar messages]
I0110 00:02:58.072002 2144 client-internal.cc:2355] Reinitialize master addresses from file: master.conf
I0110 00:02:58.072276 2144 client-internal.cc:2384] New master addresses: [10.0.0.119:7100,10.0.0.141:7100,10.0.0.185:7100, 10.0.0.119:7100, 10.0.0.141:7100, 10.0.0.185:7100]
I'm not sure why I'm seeing those errors, am I missing something while attempting to start up the 3 yb-masters?
I should also mention that I've ensured all 3 nodes have the correct system configurations, as mentioned here: https://docs.yugabyte.com/latest/deploy/manual-deployment/system-config/#setting-system-wide-ulimits

Neo4j refused to connect

Characteristics :
Linux
Neo4j version 3.2.1
Access on remote
Installation
I Had install neo4j and gave the folder chmod 777 .
Im running it remotely on my machine and I had already enabled non local access
Doing NEo4j start i get this message
Active database: graph.db
Directories in use:
home: /home/cloudera/Muna/apps/neo4j
config: /home/cloudera/Muna/apps/neo4j/conf
logs: /home/cloudera/Muna/apps/neo4j/logs
plugins: /home/cloudera/Muna/apps/neo4j/plugins
import: /home/cloudera/Muna/apps/neo4j/import
data: /home/cloudera/Muna/apps/neo4j/data
certificates: /home/cloudera/Muna/apps/neo4j/certificates
run: /home/cloudera/Muna/apps/neo4j/run
Starting Neo4j.
WARNING: Max 1024 open files allowed, minimum of 40000 recommended. See the Neo4j manual.
Started neo4j (pid 9469). It is available at http://0.0.0.0:7474/
There may be a short delay until the server is ready.
See /home/cloudera/Muna/apps/neo4j/logs/neo4j.log for current status.
and it is not connecting in the browser .
running neo4j console
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 409600000 bytes for AllocateHeap
# An error report file with more information is saved as:
# /home/cloudera/hs_err_pid18598.log
where could the problem be coming from ?
Firstly, you should set the maximum open files to 40000, which is the recommended value. Then you do not get the WARNING. Like this: http://neo4j.com/docs/1.6.2/configuration-linux-notes.html
Secondly,'failed to allocate memory' means that the Java virtual machine cannot allocate the memory you start it with.
It can be a misconfiguration, or you physically do not have enough memory.
Please read the memory sizing guidelines here:
https://neo4j.com/docs/operations-manual/current/performance/

Failure running Overtone and SuperCollider

I can't get overtone to work with supercollider server, I'm following the getting started guide at https://github.com/overtone/overtone/wiki/Getting-Started, I've got Jack audio server running through qjackctl, then I ran SuperCollider with scsynth -u 8888 which produced the following output:
Found 12 LADSPA plugins
JackDriver: client name is 'SuperCollider'
SC_AudioDriver: sample rate = 48000.000000, driver's block size = 1024
SuperCollider 3 server ready.
Zeroconf: registered service 'SuperCollider'
then in the clojure repl I connect to SC server:
(connect-external-server 8888)
then when I run (definst foo [] (saw 220))
I get the following error:
CompilerException java.util.concurrent.TimeoutException: deref! timeout
error. Dereference took longer than 5000 ms whilst blocking until the
following node has completed loading: #<synth-group[loading]: Inst foo
Container 41>, compiling:(form-init1483192646581877285.clj:131:7)
and scsynth outputs FAILURE IN SERVER /g_new Group 31 not found
also if I try (demo (sin-osc)) I get the error FAILURE IN SERVER /s_new Group 7 not found
although if I run using sclang:
s.boot;
{ SinOsc.ar(440, 0, 0.2) }.play;
it does produce a sound.
I'm running Manjaro Linux using the Linux 4.9.27 real time Manjaro kernel
and an HDA Intel PCH sound card.

Heketi can't provision a volume for Heketi database

I'm trying to make glusterfs cluster with Heketi for Kubernetes persistent volumes. I have 3 nodes in gluster cluster:
heketi-cli node list
Id:242e801e6eeb7ec10acda60a409b5d98 Cluster:fd539c5d13b6229498c6c67ac491163d
Id:439fb090888a745633f9db6ac4d243b8 Cluster:fd539c5d13b6229498c6c67ac491163d
Id:5e9b7e5f3ec33c77c42437e89ca857a3 Cluster:fd539c5d13b6229498c6c67ac491163d
But when I try to provision a volume for Heketi database by using command:
heketi-cli setup-openshift-heketi-storage
I get an error:
Error: No space
But I have enough free space on my volumes:
Devices:
Id:931b4f87e3675368a4f737ed6862e0cf Name:/dev/sdb State:online Size (GiB):29 Used (GiB):0 Free (GiB):29
Devices:
Id:3a2a30b22ade4efca7949e9cc082b685 Name:/dev/sdb State:online Size (GiB):29 Used (GiB):0 Free (GiB):29
Devices:
Id:5d1b5c7b258c52569bff1e1c720015c5 Name:/dev/sdb State:online Size (GiB):29 Used (GiB):0 Free (GiB):29
What can be the reason for this strange behavior?
I'm sorry, I have found the reason. It's the count of gluster node, it should be equal to count of gluster instances in kubernetes. In previous turn I had only 3 gluster nodes and 4 gluster instances in kubernetes.
There can be a number of problems that lead to this error message. The 2 most common ones are:
You do not have the minimum of 3 nodes in your gluster cluster
The heketi-cli setup-openshift-heketi-storage command needs to create a volume for heketi's database. That volume is now 2GB by default but it used to 32GB(!) (see heketi issue #639). So depending on your heketi-cli version it may be trying to create a 32GB volume on your 29GB bricks. Nasty.
I suggest you look at the logs of heketi:
$ kubectl get pod -l name=heketi
NAME READY STATUS RESTARTS AGE
heketi-703226055-7g3hb 1/1 Running 0 18h
$ kubectl logs heketi-703226055-7g3hb -f
Heketi v3.0.0-111-gc5f0f58
[heketi] INFO 2017/02/14 22:17:53 Loaded kubernetes executor
...

Resources