Making ysql_dump in YugabyteDB compatible with PostgreSQL

Making ysql_dump in YugabyteDB compatible with PostgreSQL - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
After this commit: [7813] [YSQL] YSQL dump should always include HASH/ASC/DESC modifier for indexes/primary-key.
This makes the ysql_dump unrestorable in PostgreSQL.
Is there a workaround? I really need to restore a YugabyteDB dump to PostgreSQL instance.

A simple workaround is to make 2 dumps. The first one is just for the schema in text format.
Try importing it in PostgreSQL, see the errors and you can easily fix the DDL queries manually by removing HASH or special keywords that YugabyteDB uses.
Then take another dump, but this time just for the data which should be able to be imported in PostgreSQL.

Related

Reading recently executed sql statements from WAL in YugabyteDB

[Question posted by a user on YugabyteDB Community Slack]
How can I read the most recently executed sql statement? I think it's useful to analyze executed process. May I read it from wal?

WAL logs changes to the memtable, not statements. And not all statements make changes, and thus WAL cannot be used, even if it would allow reconstructing changes somehow.
You can use ysql-log-statement https://docs.yugabyte.com/preview/reference/configuration/yb-tserver/#ysql-log-statement to log some or all queries.
Or using audit logging https://docs.yugabyte.com/preview/secure/audit-logging/.
There are additional features such as viewing pg_stat_activity to see live running queries, as well as using pg_stat_statements to view slow-running queries that might help out. Neither gives you the exact last query but can help you narrow it down.

YugabyteDB working under unreliable internet connection?

[Question posted by a user on YugabyteDB Community Slack]
I am interested in Yugabyte’s ability to be geo-distributed. However, I am wondering if the protocol between the different nodes can tolerate DIL conditions (see: Disconnected, Intermittent and Limited (DIL) [DIDO Wiki] ) where the network is not reliable. Are there timeouts? Can they change? Are there protocols/defaults for collisions?

where the network is not reliable
YugabyteDB and its geo-distribution and replication are created to be able to unexpected issues.
However, it is not a good idea to explicitly create a situation where the connection is not stable, that is simply not what it is created for.
Are there timeouts? Can they change?
See yb-master configuration reference | YugabyteDB Docs.
Are there protocols/defaults for collisions?
The default protocol for collisions is last write wins: xCluster replication | YugabyteDB Docs

YugabyteDB YCQL transaction capabilities

[Question posted by a user on YugabyteDB Community Slack]
I'm trying to figure out how powerful YCQL transactions are. Reading https://docs.yugabyte.com/latest/explore/transactions/distributed-transactions-ysql/#execute-a-transaction I can see that I can update multiple tables in the transaction, so I can keep them consistent. But I struggle to figure out how can I read them to see a snapshot. Also is there a way to run read-modify-write type of transactions with YCQL?

There are no client-controlled transactions in YCQL like in YSQL. YCQL transactions work like YSQL with autocommit on.
The reason is that YCQL was developed to match Cassandra’s API, but also be consistent. For more features like client controlled transactions, triggers, procedures you should use the YSQL layer.

Are YSQL and YCQL compatible with each other?

As i know yugabyte store data in something called DocDB and it provide YSQL API and YCQL API over that.
So if i'm not wrong, Are YSQL and YCQL compatible with each other?
For example: Can i create a table with YSQL and query on it with YCQL?
And if NOT, Could be any performance-diffrence between YSQL and YCQL? I mean, should we care about choosing YSQL or YCQL in a project?

Thanks for the question - what Jose mentioned is correct (as of now):
The YugabyteDB APIs are isolated and independent from one another today. This means that the data inserted or managed by one API cannot be queried by the other API. Additionally, there is no common way to access the data across the APIs (external frameworks such as Presto can help for simple cases).
Over time, we should be able to allow the YCQL tables to be "accessed" from YSQL (which is fully PostgreSQL compatible) as a foreign table using foreign data wrappers (FDW). Note that the work for this has not yet started, but it is fully possible to do in the current architecture.
Performance Differences
Yes, there are differences in the performance as Jose pointed out, but this is the current state of things as well. YCQL is higher performing at the moment that YSQL due to various reasons, and we are rapidly closing the gap. This is something we intend to write about a lot more over time.
In the short term, I would recommend the following:
If you need relational features (say foreign keys) or query flexibility (say joins) in your app, YSQL is the way to go. Again, we expect to get it close to YCQL over the next 3-6 months for most use-cases.
If you do not need the above but are talking about a lot of data (say over 10TB of data), care about very low latencies (sub-millisecond) and have a need for features such as automatic data expiry using the TTL feature - you should use YCQL.
If you have a use case in mind, we are happy to help you decide with a more detailed analysis. Please also consider joining the community slack channel for realtime support.

The YugabyteDB APIs are isolated and independent from one another today. This means that the data inserted or managed by one API cannot be queried by the other API. Additionally, there is no common way to access the data across the APIs (external frameworks such as Presto can help for simple cases).
The net impact is that application developers have to select an API first before undertaking detailed database schema/query design and implementation.
See the docs
I'm not a Yugabyte expert, but according to docs: Yes there are performance differences between YSQL and YCQL. See slide 18 at enter link description here

Issue with PrestoDB & MongoDB

I having some strange issues querying mongodb from presto CLI. I have my mongodb.properties set and connecting to 3 different databases as shown below.
connector.name=mongodb
mongodb.seeds=172.23.0.7:27017
mongodb.schema-collection=stage,configuration,hub
mongodb.credentials=<username>:<password>#stage,<username>:<password>#hub,<username>:<password>#configuration
None of the queries including show columns from <collection> or select count(*) from <collection> is not working on stage or hub and for collections in configuration too.
Question is, does Presto support these kind of queries on MongoDB. If yes, what could be the problem with my configuration or queries. Our intention is to compare data from Oracle to MongoDB.
Appreciate your help.

This is an old post, but I hope this is still useful for future users. You shouldn't be setting the mongodb.schema-collection as such. This property is meant to point to the mongo collection which describes the schema of other collections, typically defaulting to _schema when it exists. This is covered in the docs of most presto distributions, including prestodb.
This does not allow you to control which collections Presto will have access to, this must be done elsewhere (e.g. when setting up presto's user in the MongoDB cluster). Once correctly set up, Presto will be able to perform queries such as the ones in your example in all the collections it has access to.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string