How to print change data logs at stdout in yugabyte? - yugabytedb

I am using yugabyte db 1.3.0.0 and following https://docs.yugabyte.com/latest/deploy/cdc/use-cdc/ to learn yugabyte db cdc.
Procedure Followed:
a) generate users dynamically and insert into table, yugastore.users continuously through script
b) downloaded yb-cdc-connector.jar using command, wget -O yb-cdc-connector.jar https://github.com/yugabyte/yb-kafka-connector/blob/master/yb-cdc/yb-cdc-connector.jar?raw=true
c)copied yb-cdc-connector.jar to jre/lib/ext using command cp -a /root/yb-cdc-connector.jar /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.232.b09-0.el7_7.x86_64/jre/lib/ext/
d) java -jar /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.232.b09-0.el7_7.x86_64/jre/lib/ext/yb_cdc_connector.jar --table_name yugastore.users --master_addrs 127.0.0.1 --stream_id 1 --log_only
Error Logs:
[root#srvr0 ~]# java -jar /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.232.b09-0.el7_7.x86_64/jre/lib/ext/yb_cdc_connector.jar --table_name yugastore.users --master_addrs 127.0.0.1 --stream_id 1 --log_only
Error: Unable to access jarfile /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.232.b09-0.el7_7.x86_64/jre/lib/ext/yb_cdc_connector.jar
Records Insertion:
yugastore=> select count(*) from yugastore.users;
count
-------
59414
(1 row)
yugastore=> select count(*) from yugastore.users;
count
-------
60066
(1 row)
yugastore=> select count(*) from yugastore.users;
count
-------
79341
(1 row)
yugastore=>
java:
[root#srvr0 ~]# java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
Availability of yb-cdc-connector.jar:
[root#srvr0 ~]# ll /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.232.b09-0.el7_7.x86_64/jre/lib/ext/
total 54756
-rw-r--r--. 1 root root 4003855 Oct 22 12:18 cldrdata.jar
-rw-r--r--. 1 root root 9445 Oct 22 12:18 dnsns.jar
-rw-r--r--. 1 root root 48733 Oct 22 12:18 jaccess.jar
-rw-r--r--. 1 root root 1204895 Oct 22 12:18 localedata.jar
-rw-r--r--. 1 root root 617 Oct 22 12:18 meta-index
-rw-r--r--. 1 root root 2033680 Oct 22 12:18 nashorn.jar
-rw-r--r--. 1 root root 52079 Oct 22 12:18 sunec.jar
-rw-r--r--. 1 root root 304504 Oct 22 12:18 sunjce_provider.jar
-rw-r--r--. 1 root root 279788 Oct 22 12:18 sunpkcs11.jar
-rw-r--r--. 1 root root 48026360 Jan 16 08:31 yb-cdc-connector.jar
-rw-r--r--. 1 root root 78006 Oct 22 12:18 zipfs.jar
Please help me in printing change logs at stdout!
Update1:
small correction required in the doc: edit yb_cdc_connector.jar to yb-cdc-connector.jar in the document command, as wget downloads as yb_cdc_connector.jar.
Though there are changes happening (inserts and deletes) to the table, it showing polling but no cdc is printed.
Logs:
[root#srvr0 ~]# java -jar ./yb_cdc_connector.jar --table_name yugastore.users --master_addrs 127.0.0.1 --stream_id 1 --log_only
[2020-01-22 01:37:24,221] INFO Starting CDC Kafka Connector... (org.yb.cdc.Main:28)
2020-01-22 01:37:24,393 [INFO|org.yb.cdc.KafkaConnector|KafkaConnector] Creating new YB client...
[2020-01-22 01:37:28,344] INFO Discovered tablet YB Master for table YB Master with partition ["", "") (org.yb.client.AsyncYBClient:1593)
2020-01-22 01:37:28,839 [INFO|org.yb.cdc.KafkaConnector|KafkaConnector] Polling for new tablet c6f3d759202341ecad87e2617579371c
2020-01-22 01:37:28,842 [INFO|org.yb.cdc.KafkaConnector|KafkaConnector] Polling for new tablet e94b1748d1a742289500a30a38ff9eda
Insertion:
time: 01:37:49.980 cumulative records: 10
time: 01:37:52.169 cumulative records: 20
time: 01:37:52.410 cumulative records: 30
time: 01:37:52.425 cumulative records: 40
...
time: 01:39:28.139 cumulative records: 970
time: 01:39:28.171 cumulative records: 980
time: 01:39:28.208 cumulative records: 990
time: 01:39:28.246 cumulative records: 1000
Deletion :
yugastore=# delete from yugastore.users;
DELETE 21314
yugastore=# select count(*) from yugastore.users;
count
-------
0
(1 row)

The error shows that the jar file cannot be accessed in the system directory you moved. Most likely a permission issue. Note that you don't even need to move this jar into a system directory. So keeping the jar is some normal directory is the simplest option.
Also, you don't need the stream_id as well since you want the default behavior most times. So try the following command in a user directory where you have the jar file.
java -jar ./yb_cdc_connector.jar \
--table_name yugastore.users \
--master_addrs 127.0.0.1:7100 \
--log_only

Related

How to view logs of container before restart

I have a container which was restart 14 hours ago. The container is running since 7 weeks. I want to inspect the container logs during a certain interval. When i run below command, I see there is no output
docker container logs pg-connect --until 168h --since 288h
When i run below commands i only see logs since the container was restarted.
docker logs pg-connect
Any idea how to retrieve older logs for the container?
More info if helps
> docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9f08fb6fb0fb kosta709/alpine-plus:0.0.2 "/connectors-restart…" 7 weeks ago Up 14 hours connectors-monitor
7e919a253a29 debezium/connect:1.2.3.Final "/docker-entrypoint.…" 7 weeks ago Up 14 hours pg-connect
>
>
> docker logs 7e919a253a29 -n 2
2022-08-26 06:37:10,878 INFO || WorkerSourceTask{id=relations-0} Committing offsets [org.apache.kafka.connect.runtime.WorkerSourceTask]
2022-08-26 06:37:10,878 INFO || WorkerSourceTask{id=relations-0} flushing 0 outstanding messages for offset commit [org.apache.kafka.connect.runtime.WorkerSourceTask]
> docker logs 7e919a253a29 |head
org.apache.kafka.common.KafkaException: Producer is closed forcefully.
at org.apache.kafka.clients.producer.internals.RecordAccumulator.abortBatches(RecordAccumulator.java:766)
at org.apache.kafka.clients.producer.internals.RecordAccumulator.abortIncompleteBatches(RecordAccumulator.java:753)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:279)
at java.base/java.lang.Thread.run(Thread.java:834)
2022-08-24 16:13:06,567 ERROR || WorkerSourceTask{id=session-0} failed to send record to barclays.public.session: [org.apache.kafka.connect.runtime.WorkerSourceTask]
org.apache.kafka.common.KafkaException: Producer is closed forcefully.
at org.apache.kafka.clients.producer.internals.RecordAccumulator.abortBatches(RecordAccumulator.java:766)
at org.apache.kafka.clients.producer.internals.RecordAccumulator.abortIncompleteBatches(RecordAccumulator.java:753)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:279)
>
> ls -lart /var/lib/docker/containers/7e919a253a296494b74361e258e49d8c3ff38f345455316a15e1cb28cf556fa1/
total 90720
drwx------ 2 root root 6 Jul 1 10:39 checkpoints
drwx--x--- 2 root root 6 Jul 1 10:39 mounts
drwx--x--- 4 root root 150 Jul 1 10:40 ..
-rw-r----- 1 root root 10000230 Aug 24 16:13 7e919a253a296494b74361e258e49d8c3ff38f345455316a15e1cb28cf556fa1-json.log.9
-rw-r----- 1 root root 10000163 Aug 24 16:13 7e919a253a296494b74361e258e49d8c3ff38f345455316a15e1cb28cf556fa1-json.log.8
-rw-r----- 1 root root 10000054 Aug 24 16:16 7e919a253a296494b74361e258e49d8c3ff38f345455316a15e1cb28cf556fa1-json.log.7
-rw-r----- 1 root root 10000147 Aug 24 16:42 7e919a253a296494b74361e258e49d8c3ff38f345455316a15e1cb28cf556fa1-json.log.6
-rw-r----- 1 root root 10000123 Aug 24 16:42 7e919a253a296494b74361e258e49d8c3ff38f345455316a15e1cb28cf556fa1-json.log.5
-rw-r----- 1 root root 10000019 Aug 24 16:42 7e919a253a296494b74361e258e49d8c3ff38f345455316a15e1cb28cf556fa1-json.log.4
-rw-r----- 1 root root 10000159 Aug 24 16:42 7e919a253a296494b74361e258e49d8c3ff38f345455316a15e1cb28cf556fa1-json.log.3
-rw-r----- 1 root root 10000045 Aug 24 16:42 7e919a253a296494b74361e258e49d8c3ff38f345455316a15e1cb28cf556fa1-json.log.2
-rw-r--r-- 1 root root 199 Aug 25 16:30 hosts
-rw-r--r-- 1 root root 68 Aug 25 16:30 resolv.conf
-rw-r--r-- 1 root root 25 Aug 25 16:30 hostname
-rw------- 1 root root 7205 Aug 25 16:30 config.v2.json
-rw-r--r-- 1 root root 1559 Aug 25 16:30 hostconfig.json
-rw-r----- 1 root root 10000085 Aug 25 16:31 7e919a253a296494b74361e258e49d8c3ff38f345455316a15e1cb28cf556fa1-json.log.1
drwx--x--- 4 root root 4096 Aug 25 16:31 .
-rw-r----- 1 root root 2843232 Aug 26 06:38 7e919a253a296494b74361e258e49d8c3ff38f345455316a15e1cb28cf556fa1-json.log
As stated by [the official guide][1]:
The docker logs command batch-retrieves logs present at the time of execution.```
To solve this issue you should instrument the container software to log its output to a persistent (rotated if you want) log file.
[1]: https://docs.docker.com/engine/reference/commandline/logs/

Postgresql Failed to start on linux

My PostgreSQL stopped working suddenly and I am unable to start it. (Last changes in my Linux were, I tried to install dockers).
When I run the command:
sudo service postgresql restart
OR
sudo service postgresql start
I get the error:
* Starting PostgreSQL 11 database server
* Failed to issue method call: Unit postgresql#11-main.service failed to load: No such file or directory. See system logs and 'systemctl status postgresql#11-main.service' for details.
[fail]
and
~$ sudo service postgresql status
11/main (port 5434): down
There result of systemctl status:
~ $ sudo systemctl status postgresql#11-main.service
postgresql#11-main.service
Loaded: error (Reason: No such file or directory)
Active: inactive (dead)
I have tried all the available options on SO and on other forums but neither worked for me.
What can be the issue?
EDIT:
cat /etc/os-release
Output:
NAME="Ubuntu"
VERSION="14.04.6 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.6 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
and
ls -al /var/lib/pgsql
output:
ls: cannot access /var/lib/pgsql: No such file or directory
Then I tried
ls -al /var/lib/postgresql/
output:
total 40
drwxr-xr-x 7 postgres postgres 4096 Mar 24 13:34 .
drwxr-xr-x 90 root root 4096 Apr 6 18:36 ..
drwxr-xr-x 4 postgres postgres 4096 Apr 6 18:20 11
drwxr-xr-x 3 postgres postgres 4096 Feb 11 2016 9.3
drwxr-xr-x 3 postgres postgres 4096 Dec 3 2018 9.4
drwx------ 2 postgres postgres 4096 Jan 31 11:20 .aptitude
-rw------- 1 postgres postgres 5898 Apr 6 18:33 .bash_history
drwx------ 3 postgres postgres 4096 Feb 11 2016 .cache
-rw------- 1 postgres postgres 2180 Mar 24 13:34 .psql_history
and
yum info postgresql11-server
Error: No matching Packages to list
EDIT 2
sudo systemctl list-unit-files | grep postg et
output:
grep: et: No such file or directory
and
ls -l /var/lib/postgresql/11
output:
total 8
drwxr-xr-x 2 root root 4096 Apr 6 18:20 data
drwx------ 19 postgres postgres 4096 Apr 2 18:53 main

Does Spark support multiple users?

I have a 3-node spark 2.3.1 cluster running at the moment, and I'm also running a zeppelin server using a normal user, like ulab.
From zeppelin, I ran the commands:
%spark
val file = sc.textFile("file:///mnt/glusterfs/test/testfile")
file.saveAsTextFile("/mnt/glusterfs/test/testfile2")
It report a lot of error messages, something like:
WARN [2018-09-14 05:44:50,540] ({pool-2-thread-8} NotebookServer.java[afterStatusChange]:2302) - Job 20180907-130718_39068508 is finished, status: ERROR, exception: null, result: %text file: org.apache.spark.rdd.RDD[String] = file:///mnt/glusterfs/test/testfile MapPartitionsRDD[49] at textFile at <console>:51
org.apache.spark.SparkException: Job aborted.
...
... 64 elided
Caused by: java.io.IOException: Failed to rename DeprecatedRawLocalFileStatus{path=file:/mnt/glusterfs/test/testfile2/_temporary/0/task_20180914054253_0050_m_000018/part-00018; isDirectory=false; length=33554979; replication=1; blocksize=33554432; modification_time=1536903780000; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false} to file:/mnt/glusterfs/test/testfile2/part-00018
And I found that some temporary files owned by user root, while some owned by ulab, like the following:
bash-4.4# ls -l testfile2
total 32773
drwxr-xr-x 3 ulab ulab 4096 Sep 14 05:42 _temporary
-rw-r--r-- 1 ulab ulab 33554979 Sep 14 05:44 part-00018
bash-4.4# ls -l testfile2/_temporary/
total 4
drwxr-xr-x 210 ulab ulab 4096 Sep 14 05:44 0
bash-4.4# ls -l testfile2/_temporary/0
total 832
drwxr-xr-x 2 root root 4096 Sep 14 05:42 task_20180914054253_0050_m_000000
drwxr-xr-x 2 root root 4096 Sep 14 05:42 task_20180914054253_0050_m_000001
drwxr-xr-x 2 root root 4096 Sep 14 05:42 task_20180914054253_0050_m_000002
drwxr-xr-x 2 root root 4096 Sep 14 05:42 task_20180914054253_0050_m_000003
....
Is there any setup to let all these temporary files created by ulab? so we can use multiple users in spark driver to isolate the priviledges.
You can enable 'User Impersonate' option for spark interpreter which will start the spark job as logged-in user.
Refer this link for more info

Installation Of Cassandra

I have download cassandra via terminal but problem is where are the other folders like data, conf, lib, doc etc.
i can see only some files as shown in figure i.e Click here
where is the other folders ?
By "download cassandra via terminal" and your screenshot, I'll assume that you installed Cassandra via apt-get.
From the Apache Cassandra project Wiki, section on Installation from Debian packages:
The default location of configuration files is /etc/cassandra.
The default location of log and data directories is /var/log/cassandra/ and /var/lib/cassandra.
As for the lib directory, check how your $CASSANDRA_HOME is being set:
$ grep CASSANDRA_HOME /etc/init.d/cassandra
CASSANDRA_HOME=/usr/share/cassandra
$ ls -al /usr/share/cassandra/
total 8312
drwxr-xr-x 3 root root 4096 Dec 13 07:57 .
drwxr-xr-x 372 root root 12288 Nov 28 08:51 ..
-rw-r--r-- 1 root root 5962385 Jun 1 2016 apache-cassandra-3.6.jar
lrwxrwxrwx 1 root root 24 Jun 1 2016 apache-cassandra.jar -> apache-cassandra-3.6.jar
-rw-r--r-- 1 root root 1902216 Jun 1 2016 apache-cassandra-thrift-3.6.jar
-rw-r--r-- 1 root root 875 May 31 2016 cassandra.in.sh
drwxr-xr-x 3 root root 12288 Dec 13 07:57 lib
-rw-r----- 1 root root 82123 Oct 20 2015 metrics-core-2.2.0.jar
-rw-r----- 1 root root 9639 Oct 20 2015 metrics-graphite-2.2.0.jar
-rw-r--r-- 1 root root 509144 Jun 1 2016 stress.jar
Note that Cassandra's lib directory is shown in the middle of the directory listing above.

Linux: Finding Newly Added Files

I am trying to obtain a backup of 'newly' added files to a Fedora system. Files can be copied through a Windows Samba share and appear to retain the original created timestamp. However, because it retains this timestamp I am having issues identifying which files were newly added to the system.
Currently, the only way I can think of doing this is to have a master list snapshot of all the files on the system at a specific time. Then when I perform the backup I compare the previous snapshot with a current snapshot. It would detect files that were removed from the system but it seems excessive and I was thinking there must be an easier way to backup newly added files.
Terry
Try using find. Something like this:
find . -ctime -10
That will give you a list of files and directories, starting from within your current directory, that has had its state changed within the last 10 days.
Example:
My Downloads directory looks like this:
kobus#akira:~/Downloads$ ll
total 2025284
drwxr-xr-x 4 kobus kobus 4096 Nov 4 11:25 ./
drwxr-xr-x 41 kobus kobus 4096 Oct 30 09:26 ../
-rw-rw-r-- 1 kobus kobus 8042383 Oct 28 14:08 apache-maven-3.3.3- bin.tar.gz
drwxrwxr-x 2 kobus kobus 4096 Oct 14 09:55 ELKImages/
-rw-rw-r-- 1 kobus kobus 1469054976 Nov 4 11:25 Fedora-Live-Workstation-x86_64-23-10.iso
-rw------- 1 kobus kobus 351004 Sep 21 14:07 GrokConstructor-master.zip
drwxrwxr-x 11 kobus kobus 4096 Jul 11 2014 jboss-eap-6.3/
-rw-rw-r-- 1 kobus kobus 183399393 Oct 19 16:26 jboss-eap-6.3.0-installer.jar
-rw-rw-r-- 1 kobus kobus 158177216 Oct 19 16:26 jboss-eap-6.3.0.zip
-rw-rw-r-- 1 kobus kobus 71680110 Oct 13 13:51 jre-8u60-linux-x64.tar.gz
-rw-r--r-- 1 kobus kobus 4680 Oct 12 12:34 nginx-release-centos-7-0.el7.ngx.noarch.rpm
-rw-r--r-- 1 kobus kobus 3479765 Oct 12 14:22 ngx_openresty-1.9.3.1.tar.gz
-rw------- 1 kobus kobus 16874455 Sep 15 16:49 Oracle_VM_VirtualBox_Extension_Pack-5.0.4-102546.vbox-extpack
-rw-r--r-- 1 kobus kobus 7505310 Oct 6 10:29 sublime_text_3_build_3083_x64.tar.bz2
-rw------- 1 kobus kobus 41467245 Sep 7 10:37 tagspaces-1.12.0-linux64.tar.gz
-rw-rw-r-- 1 kobus kobus 42658300 Nov 4 10:14 tagspaces-2.0.1-linux64.tar.gz
-rw------- 1 kobus kobus 70046668 Sep 15 16:49 VirtualBox-5.0-5.0.4_102546_el7-1.x86_64.rpm
Here's what the find returns:
kobus#akira:~/Downloads$ find . -ctime -10
.
./tagspaces-2.0.1-linux64.tar.gz
./apache-maven-3.3.3-bin.tar.gz
./Fedora-Live-Workstation-x86_64-23-10.iso
kobus#akira:~/Downloads$
Most unices do not have a concept of file creation time. You can't make ls print it because the information is not recorded. If you need creation time, use a version control system: define creation time as the check-in time.
If your unix variant has a creation time, look at its documentation. For example, on Mac OS X (the only example I know of¹), use ls -tU. Windows also stores a creation time, but it's not always exposed to ports of unix utilities, for example Cygwin ls doesn't have an option to show it. The stat utility can show the creation time, called “birth time” in GNU utilities, so under Cygwin you can show files sorted by birth time with stat -c '%W %n' * | sort -k1n.
Note that the ctime (ls -lc) is not the file creation time, it's the inode change time. The inode change time is updated whenever anything about the file changes (contents or metadata) except that the ctime isn't updated when the file is merely read (even if the atime is updated). In particular, the ctime is always more recent than the mtime (file content modification time) unless the mtime has been explicitly set to a date in the future.
"Newly added files, Fedora" : The below examples will show a list with date and time.
Example, all installed packages : $ rpm -qa --last
Example, the latest 100 packages : $ rpm -qa --last | head -100
Example, create a text file : $ rpm -qa --last | head -100 >> last-100-packages.txt

Resources