When I execute this command on MAC OS I got file created -
(01:53 PM)[v0zilbe#L-SB8T2SCG3Q-M]: ~$ cqlsh --cqlversion=3.2.1
Connected to MCSE_CLUSTER at 172.16.117.212:9042. [cqlsh 5.0.1 |
Cassandra 2.1.12.1046 | CQL spec 3.2.1 | Native protocol v3] Use HELP
for help. cqlsh:> copy a to 'a.csv'; Using 7 child processes
Starting copy of a with columns ['key', 'column1', 'value'].
Processed: 52033 rows; Rate: 59 rows/s; Avg. rate: 138 rows/s
52033 rows exported to 1 files in 6 minutes and 16.173 seconds. cqlsh>
However the same thing done on Linux I got that.
cqlsh:> copy a to 'a.csv'; list size out of the sanity limit (10000
items max) cqlsh:>
The Linux is CentOS release 6.9
Related
I have been using georep for the last two months and posted this on their GitHub but no answers so far.
Description of problem: after copying ~8TB without any issue, some nodes are flipping between Active and Faulty with the following error message in gsync log:
ssh> failed with UnicodeDecodeError: 'ascii' codec can't decode byte 0xf2 in position 60: ordinal not in range(128).
Default encoding in all machines is utf-8
Command to reproduce the issue:
gluster volume georeplication master_vol user#slave_machine::slave_vol start
The full output of the command that failed:
The command itself it's fine but you need to start it to fail, hence the command it's not the issue on it's own
Expected results:
No such failures, copy should go as planned
Mandatory info:
The output of the gluster volume info command:
Volume Name: volname
Type: Distributed-Replicate
Volume ID: d5a46398-9638-4b50-9db0-4cd7019fa526
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x 2 = 24
Transport-type: tcp
Bricks: 24 bricks (omited the names cause not relevant and too large)
Options Reconfigured:
features.ctime: off
cluster.min-free-disk: 15%
performance.readdir-ahead: on
server.event-threads: 8
cluster.consistent-metadata: on
performance.cache-refresh-timeout: 1
diagnostics.client-log-level: WARNING
diagnostics.brick-log-level: WARNING
performance.flush-behind: off
performance.cache-size: 5GB
performance.cache-max-file-size: 1GB
performance.io-thread-count: 32
performance.write-behind-window-size: 8MB
client.event-threads: 8
network.inode-lru-limit: 1000000
performance.md-cache-timeout: 1
performance.cache-invalidation: false
performance.stat-prefetch: on
features.cache-invalidation-timeout: 30
features.cache-invalidation: off
cluster.lookup-optimize: on
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
storage.owner-uid: 33
storage.owner-gid: 33
features.bitrot: on
features.scrub: Active
features.scrub-freq: weekly
cluster.rebal-throttle: lazy
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on
The output of the gluster volume status command:
Don't really think this is relevant as everything seems fine, if needed I'll post it
The output of the gluster volume heal command:
Same as before
**- Provide logs present on following locations of client and server nodes -
/var/log/glusterfs/
Not the relevant ones as is georep, posting the exact issue: (this log is from master volume node)
[2022-09-23 09:53:32.565196] I [master(worker /bricks/brick1/data):1439:process] _GMaster: Entry Time Taken [{MKD=0}, {MKN=0}, {LIN=0}, {SYM=0}, {REN=0}, {RMD=0}, {CRE=0}, {duration=0.0000}, {UNL=0}]
[2022-09-23 09:53:32.565651] I [master(worker /bricks/brick1/data):1449:process] _GMaster: Data/Metadata Time Taken [{SETA=0}, {SETX=0}, {meta_duration=0.0000}, {data_duration=1663926812.5656}, {DATA=0}, {XATT=0}]
[2022-09-23 09:53:32.566270] I [master(worker /bricks/brick1/data):1459:process] _GMaster: Batch Completed [{changelog_end=1663925895}, {entry_stime=None}, {changelog_start=1663925895}, {stime=(0, 0)}, {duration=673.9491}, {num_changelogs=1}, {mode=xsync}]
[2022-09-23 09:53:32.668133] I [master(worker /bricks/brick1/data):1703:crawl] _GMaster: processing xsync changelog [{path=/var/lib/misc/gluster/gsyncd/georepsession/bricks-brick1-data/xsync/XSYNC-CHANGELOG.1663926139}]
[2022-09-23 09:53:33.358545] E [syncdutils(worker /bricks/brick1/data):325:log_raise_exception] : connection to peer is broken
[2022-09-23 09:53:33.358802] E [syncdutils(worker /bricks/brick1/data):847:errlog] Popen: command returned error [{cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-GcBeU5/38c083bada86a45a28e6710377e456f6.sock geoaccount#slavenode6 /usr/libexec/glusterfs/gsyncd slave mastervol geoaccount#slavenode1::slavevol --master-node masternode21 --master-node-id 08c7423e-c2b6-4d40-adc8-d2ded4f66608 --master-brick /bricks/brick1/data --local-node slavenode6 --local-node-id bc1b3971-50a7-4b32-a863-aaaa02419de6 --slave-timeout 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbin --master-dist-count 12}, {error=1}]
[2022-09-23 09:53:33.358927] E [syncdutils(worker /bricks/brick1/data):851:logerr] Popen: ssh> failed with UnicodeDecodeError: 'ascii' codec can't decode byte 0xf2 in position 60: ordinal not in range(128).
[2022-09-23 09:53:33.672739] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Faulty}]
[2022-09-23 09:53:45.477905] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change [{status=Initializing...}]
**- Is there any crash ? Provide the backtrace and coredump
Provided log up
Additional info:
Master volume: 12x2 Distributed-replicated setup, been working for a couple years no, no big issues as of today. 160TB of Data
Slave volume: 2x(5+1) Distributed-disperse setup, created exclusively to be a slave georep node. Managed to copy 11TB of data from master node, but it's failing.
The operating system / glusterfs version:
On ALL nodes: Glusterfs version= 9.6
Master nodes OS: CentOS 7
Slave nodes OS: Debian11
Extra questions
Don't really know if it's the place to ask this, but while we're at it, any guidance as of how to improve sync performance? Tried changing the parameter sync_jobs up to 9 (from 3) but as we've seen (while it was working) it'd only copy from 3 nodes max, at a "low" speed (about 40% of our bandwidth). It could go as high as 1Gbps but the max we got was 370Mbps.
Also, is there any in-depth documentation for georep? The basics we found were too basic and we did miss more doc to read and dig up into.
Cassandra version: 3.9, CQLSH version: 5.0.1
Can I query Cassandra configuration (cassandra.yaml) using cqlsh?
No, it's not possible in your version. It's possible only starting with Cassandra 4.0 that has so-called virtual tables, and there is a special table for configurations: system_views.settings:
cqlsh:test> select * from system_views.settings ;
name | value
-------------------------------------------------+-------
transparent_data_encryption_options_enabled | false
transparent_data_encryption_options_iv_length | 16
trickle_fsync | false
trickle_fsync_interval_in_kb | 10240
truncate_request_timeout_in_ms | 60000
....
You can find more information on the virtual tables in the following blog post from TLP.
In the meantime, you can access configuration parameters via JMX.
We have 10 kafka machines with kafka version - 1.X
this kafka cluster version is part of HDP version - 2.6.5
We noticed that under /var/log/kafka/server.log the following message
ERROR Error while accepting connection {kafka.network.Accetpr}
java.io.IOException: Too many open files
We saw also additionally
Broker 21 stopped fetcher for partition ...................... because they are in the failed log dir /kafka/kafka-logs {kafka.server.ReplicaManager}
and
WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. this implies messages have arrived out of order. New: {epoch:0, offset:2227488}, Currnet: {epoch 2, offset:261} for Partition: cars-list-75 {kafka.server.epochLeaderEpocHFileCache}
so regarding to the issue -
ERROR Error while accepting connection {kafka.network.Accetpr}
java.io.IOException: Too many open files
how to increase the MAX open files , in order to avoid this issue
update:
in ambari we saw the following parameter from kafka --> config
is this is the parameter that we should to increase?
It can be done like this:
echo "* hard nofile 100000
* soft nofile 100000" | sudo tee --append /etc/security/limits.conf
Then you should reboot.
I am having issues create a build with cassandra as a service with a version that is higher than 2.X (let x be higher than 1).
I have verified that
services:
- cassandra
produce a cassandra 2.0.9 as i put in my .travis.yml
$ cqlsh --execute="show version" 127.0.0.1
[cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
However my project requires 2.2.4 as a minimum.
when i tried doing as per travis suggested:
before_install:
- sudo rm -rf /var/lib/cassandra/*
- wget https://archive.apache.org/dist/cassandra/2.2.4/apache-cassandra-2.2.4-bin.tar.gz && tar -xvzf apache-cassandra-2.2.4-bin.tar.gz && sudo sh apache-cassandra-2.2.4/bin/cassandra
- sleep 30
It fails to boot the cassandra as it waits at the following line:
Connection error: Could not connect to 127.0.0.1:9160
While dumping the raw_log it gets stuck here:
INFO 16:01:31 Loading org.apache.cassandra.config.CFMetaData#2716f853[cfId=5f2fbdad-91f1-3946-bd25-d5da3a5c35ec,ksName=system_auth,cfName=resource_role_permissons_index,cfType=Standard,comparator=org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type),comment=index of db roles with permissions granted on a resource,readRepairChance=0.0,dcLocalReadRepairChance=0.0,gcGraceSeconds=7776000,defaultValidator=org.apache.cassandra.db.marshal.BytesType,keyValidator=org.apache.cassandra.db.marshal.UTF8Type,minCompactionThreshold=4,maxCompactionThreshold=32,columnMetadata=[ColumnDefinition{name=role, type=org.apache.cassandra.db.marshal.UTF8Type, kind=CLUSTERING_COLUMN, componentIndex=0, indexName=null, indexType=null}, ColumnDefinition{name=resource, type=org.apache.cassandra.db.marshal.UTF8Type, kind=PARTITION_KEY, componentIndex=null, indexName=null, indexType=null}],compactionStrategyClass=class org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,compactionStrategyOptions={},compressionParameters={sstable_compression=org.apache.cassandra.io.compress.LZ4Compressor},bloomFilterFpChance=0.01,memtableFlushPeriod=3600000,caching={"keys":"ALL", "rows_per_partition":"NONE"},defaultTimeToLive=0,minIndexInterval=128,maxIndexInterval=2048,speculativeRetry=99.0PERCENTILE,droppedColumns={},triggers=[],isDense=false]
INFO 16:01:31 Initializing system_auth.resource_role_permissons_index
Does anyone have any idea how to get travis to successful build a cassandra 2.X higher than its default?
NOTES:
My project is PHP base.
I have tried this in Container mode and as
sudo (as per travis instructions also
I have a problem with a single node Cassandra installation.
I can start it without any errors in the log.
I can create a keyspace, create tables, insert and delete data.
However truncate is not working
cqlsh> CREATE KEYSPACE mykeyspace WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor': 1};
cqlsh> use mykeyspace;
cqlsh:mykeyspace> create table test1 (num int, primary key (num));
cqlsh:mykeyspace> insert into test1 (num) values (12);
cqlsh:mykeyspace> select * from test1;
num
-----
12
(1 rows)
cqlsh:mykeyspace> truncate test1;
Unable to complete request: one or more nodes were unavailable.
Also if I try to run nodetool describecluster it doesn't return complete response
[XXXX#XXXX dsc-cassandra-2.0.6]$ ./bin/nodetool describecluster
Cluster Information:
Name: Test Cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
UNREACHABLE: [127.0.0.1]
I'm using
Cassandra DSC 2.0.6.
Red Hat 5.8.
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
I get responses for ping 127.0.0.1 and ping localhost
I checked all the ports that I am aware of cassandra may need (7000, 9160, 7199, 9042) using telnet - for example
telnet 127.0.0.1 7199
telnet localhost 7199
I can connect to these ports.
I'm using the default cassandra.yaml. These are the lines where either IP or hostname shows up
listen_address: localhost
rpc_address: localhost
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "127.0.0.1"
I also looked into the source code. I believe the problem can be close to the method org.apache.cassandra.service.StorageProxyMBean.describeSchemaVersions(). Most likely I get no response to the SCHEMA_CHECK message.
I tried to enable TRACE log in log4j for nodetool (conf/log4j-tools.properties) to get more information about the issue, but somehow log4j didn't start logging (it did create the file that I set in the appender, but the file was empty.)
There must be something specific to this environment because I can't repeat this problem in any other environments. So I can't figure out what's causing it.
The problem was that Cassandra couldn't load snappy.
org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:239)
at org.xerial.snappy.Snappy.<clinit>(Snappy.java:48)
at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:66)
at org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:359)
at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:150)
I turned off compression in cassanda.yaml
internode_compression: none
Now both nodetool describecluster and I truncate work.
I also found a similar post here Cassandra Startup Error 1.2.6 on Linux x86_64
Since I can't install another glibc on this machine for the sake of testing I downloaded snappy-java-1.0.4.1.jar and replaced libsnappyjava.so in my snappy-java-1.0.5.jar
With this jar I was able to run cassandra with
internode_compression: all
(I have glibc 2.5 installed)