Reading recently executed sql statements from WAL in YugabyteDB - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
How can I read the most recently executed sql statement? I think it's useful to analyze executed process. May I read it from wal?

WAL logs changes to the memtable, not statements. And not all statements make changes, and thus WAL cannot be used, even if it would allow reconstructing changes somehow.
You can use ysql-log-statement https://docs.yugabyte.com/preview/reference/configuration/yb-tserver/#ysql-log-statement to log some or all queries.
Or using audit logging https://docs.yugabyte.com/preview/secure/audit-logging/.
There are additional features such as viewing pg_stat_activity to see live running queries, as well as using pg_stat_statements to view slow-running queries that might help out. Neither gives you the exact last query but can help you narrow it down.

Related

Shopware 6 partitioning

Has anyone had any experience with database partitioning? We already have a lot of data and queries on it are already starting to slow down. Maybe someone has some examples? These are tables related to orders.
Shopware, since version 6.4.12.0, allows the use of database clusters, see the relevant documentation. You will have to set up a number read-only nodes first. The load of reading data will then be distributed among the read-only nodes while write operations are restricted to the primary node.
Note that in a cluster setup you should also use a lock storage that compliments the setup.
Besides using a DB cluster you can also try to reduce the load of the db server.
The first thing you should enable the HTTP-Cache, still better to additionaly also set up a reverse cache like varnish. This will greatly decrease the number of requests that hit your webserver and thus your DB server as well.
Besides all those measures explained here should improve the overall performance of your shop as well as decreasing load on the DB.
Additionally you could use Elasticsearch, so that costly search requests won't hit the Database. And use a "real" MessageQueue, so that the messages are not stored in the Database. And use Redis instead of the database for the storage of performance critical information as is documented in the articles in this category of the official docs.
The impact of all those measures probably depends on your concrete project setup, so maybe you see in the DB locks something that hints to one of the points i mentioned previously, so that would be an indicator to start in that direction. E.g. if you see a lot of search related queries Elasticsearch would be a great start, but if you see a lot of DB load coming from writing/reading/deleting messages, then the MessageQueue might be a better starting point.
All in all when you use a DB cluster with a primary and multiple replicas and use the additional services i mentioned here your shop should be able to scale quite well without the need for partitioning the actual DB.

YugabyteDB YCQL transaction capabilities

[Question posted by a user on YugabyteDB Community Slack]
I'm trying to figure out how powerful YCQL transactions are. Reading https://docs.yugabyte.com/latest/explore/transactions/distributed-transactions-ysql/#execute-a-transaction I can see that I can update multiple tables in the transaction, so I can keep them consistent. But I struggle to figure out how can I read them to see a snapshot. Also is there a way to run read-modify-write type of transactions with YCQL?
There are no client-controlled transactions in YCQL like in YSQL. YCQL transactions work like YSQL with autocommit on.
The reason is that YCQL was developed to match Cassandra’s API, but also be consistent. For more features like client controlled transactions, triggers, procedures you should use the YSQL layer.

Making ysql_dump in YugabyteDB compatible with PostgreSQL

[Question posted by a user on YugabyteDB Community Slack]
After this commit: [7813] [YSQL] YSQL dump should always include HASH/ASC/DESC modifier for indexes/primary-key.
This makes the ysql_dump unrestorable in PostgreSQL.
Is there a workaround? I really need to restore a YugabyteDB dump to PostgreSQL instance.
A simple workaround is to make 2 dumps. The first one is just for the schema in text format.
Try importing it in PostgreSQL, see the errors and you can easily fix the DDL queries manually by removing HASH or special keywords that YugabyteDB uses.
Then take another dump, but this time just for the data which should be able to be imported in PostgreSQL.

Are YSQL and YCQL compatible with each other?

As i know yugabyte store data in something called DocDB and it provide YSQL API and YCQL API over that.
So if i'm not wrong, Are YSQL and YCQL compatible with each other?
For example: Can i create a table with YSQL and query on it with YCQL?
And if NOT, Could be any performance-diffrence between YSQL and YCQL? I mean, should we care about choosing YSQL or YCQL in a project?
Thanks for the question - what Jose mentioned is correct (as of now):
The YugabyteDB APIs are isolated and independent from one another today. This means that the data inserted or managed by one API cannot be queried by the other API. Additionally, there is no common way to access the data across the APIs (external frameworks such as Presto can help for simple cases).
Over time, we should be able to allow the YCQL tables to be "accessed" from YSQL (which is fully PostgreSQL compatible) as a foreign table using foreign data wrappers (FDW). Note that the work for this has not yet started, but it is fully possible to do in the current architecture.
Performance Differences
Yes, there are differences in the performance as Jose pointed out, but this is the current state of things as well. YCQL is higher performing at the moment that YSQL due to various reasons, and we are rapidly closing the gap. This is something we intend to write about a lot more over time.
In the short term, I would recommend the following:
If you need relational features (say foreign keys) or query flexibility (say joins) in your app, YSQL is the way to go. Again, we expect to get it close to YCQL over the next 3-6 months for most use-cases.
If you do not need the above but are talking about a lot of data (say over 10TB of data), care about very low latencies (sub-millisecond) and have a need for features such as automatic data expiry using the TTL feature - you should use YCQL.
If you have a use case in mind, we are happy to help you decide with a more detailed analysis. Please also consider joining the community slack channel for realtime support.
The YugabyteDB APIs are isolated and independent from one another today. This means that the data inserted or managed by one API cannot be queried by the other API. Additionally, there is no common way to access the data across the APIs (external frameworks such as Presto can help for simple cases).
The net impact is that application developers have to select an API first before undertaking detailed database schema/query design and implementation.
See the docs
I'm not a Yugabyte expert, but according to docs: Yes there are performance differences between YSQL and YCQL. See slide 18 at enter link description here

CouchDB replication ignoring sporadic documents

I've got a CouchDB setup (CouchDB 2.1.1) for my app, which relies heavily on replication integrity. We are using the "one db per user" approach, with an additional layer of "role" db:s that groups users like the image below.
Recently, while increasing the number of beta testers, we discovered that some documents had not been replicated as they should. We are unable to see any pattern in document size, creation/update time, user or other. The errors seem to happen sporadically, with 2-3 successfully replicated docs followed by 4-6 non-replicated docs.
The server responds with {"error":"not_found","reason":"missing"} on those docs.
Most (but not all) of the user documents has been replicated to the corresponding Role DB, but very few made it all the way to the Master DB. This never happened when testing with < 100 documents (now we're at 1000-1200 docs in the db).
I discovered a problem with the "max open files" setting mentioned in the Performance chapter in the docs and fixed it, but the non-replicated documents are still not replicating. If I open a document and save it, it will replicate.
This is my current theory:
The replication process tried to copy new documents when the user went online
The write process failed due to Linux's "max_open_files" peaked
The master DB still thinks the replication was successful
At a later replication, the master DB ignores those old documents and only tries to replicate new ones
Could this be correct? And can I somehow make the CouchDB server "double check" all documents and the integrity of previous replications?
Thank you for your time and any helpful comments!
I have experienced something similar in the past - when attempting to replicate documents without sufficient permissions the replication fails as it should do. But when the permissions issue is fixed the documents you attempted to replicate cannot then be replicated, although edit/save on the documents fixes the issue. I wonder if this is due to checkpoints? The CouchDb manual says about the "use_checkpoints" flag:
Disabling checkpoints is not recommended as CouchDB will scan the
Source database’s changes feed from the beginning.
Though scanning from the beginning sounds like it might fix the problem, so perhaps disabling checkpoints could help. I never got back to that issue at the time so I am afraid this is not a proper answer, just a suggestion.

Resources