How to gather statistics for all tables in a YugabyteDB schema - yugabytedb

How to gather statistics for all tables in a YugabyteDB schema without having to write individual ANALYZE commands?

This generates the ANALYZE statement for all tables in the public schema, and runs it with psql gexec:
\set ECHO all
select format('analyse %I.%I /* current reltuples: %s */;', nspname, relname, reltuples) ddl
from pg_class natural join (select oid relnamespace, nspname from pg_namespace) s
where relkind = 'r' and nspname='public'
order by reltuples
\gexec
You can change it to another schema name.
On YugabyteDB database there's no autovacuum, and ANALYZE is still a beta feature (version 2.15). The following warning is expected:
WARNING: 'analyze' is a beta feature!
LINE 1: analyze ehr.access,ehr.attestation,ehr.attestation_ref,ehr.a...
^
HINT: Set 'ysql_beta_features' yb-tserver gflag to true to suppress the warning for all beta features.
On YugabyteDB if you encounter ERROR: Timed out: Perform RPC (request call id ...) to ... timed out after ... you may want to increase yb_client_admin_operation_timeout_sec

Related

Getting SyntaxException programmatically creating a table with the Cassandra Python driver

Error:
cassandra.protocol.SyntaxException: \
<Error from server: code=2000 [Syntax error in CQL query] \
message="line 1:36 no viable alternative at input '(' \
(CREATE TABLE master_table(dict_keys[(]...)">
Code:
cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider)
session=cluster.connect('firstkey')
ColName={"qty_dot_url": "int",
"qty_hyphen_url": "int",
"qty_underline_url": "int",
"qty_slash_url": "int"}
columns = ColName.keys()
values = ColName.values()
session.execute('CREATE TABLE master_table({ColName} {dataType}),PRIMARY KEY(qty_dot_url)'.format(ColName=columns, dataType=values))
How to resolve above mentioned error?
So I replaced the session.execute with a print, and it produced this:
CREATE TABLE master_table(dict_keys(['qty_dot_url', 'qty_hyphen_url', 'qty_underline_url', 'qty_slash_url']) dict_values(['int', 'int', 'int', 'int'])),PRIMARY KEY(qty_dot_url)
That is not valid CQL. It needs to look like this:
CREATE TABLE master_table(qty_dot_url int, qty_hyphen_url int,
qty_underline_url int, qty_slash_url int, PRIMARY KEY(qty_dot_url))
I was able to create that by making these adjustments to your code:
createTableCQL = "CREATE TABLE master_table("
for key, value in ColName.items():
createTableCQL += key + " " + value + ", "
createTableCQL += "PRIMARY KEY(qty_dot_url))"
You could then follow that with a session.execute(createTableCQL).
Notes:
The PRIMARY KEY definition must be inside the paren list.
Creating schema from inside application code is often problematic, and can create a schema disagreement in the cluster. It's almost always better to create tables outside of code.
The syntax exception is a result of your Python code generating an invalid CQL which Aaron pointed out in his response.
To add to his answer, you need to add additional steps whenever you are programatically making schema changes. In particular, you need to make sure that you check for schema agreement (i.e. the schema change has been propagated to all nodes) before moving on to the next bit in your code.
You will need to modify your code to save the result from the schema change, for example:
resultset = session.execute(SimpleStatement("CREATE TABLE ..."))
then call this in your code:
resultset.response_future.is_schema_agreed
You'll need to loop through this check until True is returned. Depending on how long you want to wait (default max_schema_agreement_wait is 10 seconds), you'll need to implement some logic to do [something] when schema agreement is not achieved (because a node is down for example) -- this requires manual intervention from an operator to investigate the cluster.
As Aaron already said, performing schema changes programatically is very problematic and we discourage doing this unless you fully understand the pitfalls and know how to handle failures. Cheers!

AnalysisException when dropping a hive database using spark

Environment
spark 3.0.0
hive metastore (standalone) 3.0.0
mysql 8 as the metastore db
Problem
Every time I try to drop a database in the metastore via spark, I get AnalysisException and I don't know what is causing it or whether the drop operation is succeeding in it's entirety
Example
spark.sql(f"CREATE DATABASE IF NOT EXISTS myDb LOCATION 'shared-metastore-location/myDb.db'")
################
# DB creation succeeds and I can view the db in the metastore, add tables etc
################
spark.sql(f"DROP DATABASE myDb CASCADE")
################
AnalysisException: 'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Unable to clean up java.sql.SQLException: The table does not comply with the requirements by an external plugin.\n\tat com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)\n\tat com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)\n\tat com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)\n\tat com.mysql.cj.jdbc.StatementImpl.executeUpdateInternal(StatementImpl.java:1335)\n\tat com.mysql.cj.jdbc.StatementImpl.executeLargeUpdate(StatementImpl.java:2108)\n\tat com.mysql.cj.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1245)\n\tat com.zaxxer.hikari.pool.ProxyStatement.executeUpdate(ProxyStatement.java:117)\n\tat com.zaxxer.hikari.pool.HikariProxyStatement.executeUpdate(HikariProxyStatement.java)\n\tat org.apache.hadoop.hive.metastore.txn.TxnHandler.cleanupRecords(TxnHandler.java:2741)\n\tat org.apache.hadoop.hive.metastore.AcidEventListener.onDropDatabase(AcidEventListener.java:52)\n\tat org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$21.notify(MetaStoreListenerNotifier.java:85)\n\tat org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:264)\n\tat org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:326)\n\tat org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:364)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_database_core(HiveMetaStore.java:1537)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_database(HiveMetaStore.java:1575)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)\n\tat org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)\n\tat com.sun.proxy.$Proxy32.drop_database(Unknown Source)\n\tat org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_database.getResult(ThriftHiveMetastore.java:14352)\n\tat org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_database.getResult(ThriftHiveMetastore.java:14336)\n\tat org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)\n\tat org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)\n\tat org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)\n\tat org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)\n\tat org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n);'
Despite the Exception, the database does disappear after I run this code. And if I try to run the drop command a second time, I get a different exception saying that the database doesn't exist. But, I have no idea whether the operation has succeeded in its entirety or whether it's leaving a mess behind. I'm not familiar enough with Hive to know what should be deleted in the metastore to completely delete a db
I seem to get the same result whether I have tables in the db or not. I've also tried dropping the table without cascade. Same result

inserting datetime into database

Can anyone explain how to insert datetime into SQLite database using python?
which datatype should be datetime, I tried many different ways
my doubt is based on the problem set 7 in cs50, we use flask as a web framework
history is the name of the database
transaction is name of field
db.execute("INSERT INTO history (transaction) VALUES(:d)",
d=datetime.datetime.today())
error message, I receive this message when I run the application
builtins.RuntimeError
RuntimeError: near "transaction": syntax error [SQL: "INSERT INTO history (transaction) VALUES('2019-05-19 17:14:25')"] (Background on this error at: http://sqlalche.me/e/e3q8)
transaction is a reserved word in SQLite.
sqlite> INSERT INTO history (transaction) VALUES('2019-05-19 17:14:25');
Error: near "transaction": syntax error
In order to disambiguate reserved words they have to be specially quoted. In this case as an identifier with double quotes.
sqlite> INSERT INTO history ("transaction") VALUES('2019-05-19 17:14:25');
Rather than constantly having to remember to quote transaction, I'd recommend renaming the column; created_at is very common for a row timestamp. I'd also recommend using an ORM rather than writing SQL by hand; it will handle all the quoting for you, and has a great many other benefits.

Azure SQL Data Warehouse: No catalog entry found for partition ID <id> in database <id>. The metadata is inconsistent. Run DBCC CHECKDB

I am working on moving stored procedures from an on-prem SQL Server database to an Azure SQL Data Warehouse (ASDW). Throughout the process I have had to work around a few missing features - time consuming but not impossible. One thing I have had to do is replace CTE's followed by MERGE statements with temp tables followed by UPDATE/INSERT/DELETE statements (since CTE's cannot be followed by these statements). At the beginning of each SP I check for the temp tables and delete them if they exist.
Today, I created another stored procedure in the ASDW without any temp tables (no updates/inserts/deletes so I left the CTE's in there), it "compiled", and I was able to run it without issue (returned an empty result set, as there is no data yet). I created another SP after this, and when I went to execute it, I got the following error:
...No catalog entry found for partition ID (id) in database 26. The metadata is inconsistent. Run DBCC CHECKDB to check for a metadata corruption...
I then went back to the first SP that I mentioned, and it gave me the same error, even though it had previously run without flaw.
I tried running DBCC CHECKDB as instructed but alas, it is not supported/doesn't work.
I dug around a lot, and what I ended up doing was scaling my database from 100DWU's to 500DWU's. I am at 0.16% of my database storage size limit, and there is barely any data anywhere (total DB size is <300MB).
Is there an explanation for this? If not, I can't in good conscience use this platform in a production environment.
Full error:
Msg 110802, Level 16, State 1, Line 1
110802;An internal DMS error occurred that caused this operation to fail.
Details: Exception: Microsoft.SqlServer.DataWarehouse.DataMovement.Workers.DmsSqlNativeException,
Message: SqlNativeBufferReader.Run, error in OdbcExecuteQuery: SqlState:
42000, NativeError: 608, 'Error calling: SQLExecDirect(this->GetHstmt(), (SQLWCHAR *)statementText, SQL_NTS), SQL return code: -1 | SQL Error Info:
SrvrMsgState: 1, SrvrSeverity: 16, Error <1>: ErrorMsg: [Microsoft][ODBC Driver 11 for SQL Server][SQL Server]No catalog entry found for partition ID
72057594047758336 in database 36. The metadata is inconsistent. Run DBCC
CHECKDB to check for a metadata corruption. | Error calling: pReadConn-
>ExecuteQuery(statementText, bufferFormat) | state: FFFF, number: 134148,
active connections: 100', Connection String: Driver={pdwodbc};APP=TypeC01-
DmsNativeReader:DB196\mpdwsvc (2504)- ODBC;Trusted_Connection=yes;AutoTranslate=no;Server=\\.\pipe\DB.196-
bb5f9dd884cf\sql\query
I'm sorry to hear about your experience with Azure SQL Data Warehouse. I believe this is a defect related to BIT data type handling for NOT NULL columns. Can you confirm that you have a BIT NOT NULL column (e.g., CREATE TABLE t1 (IsTrue BIT NOT NULL);)?
If so, a fix has been coded and is in testing for release. To mitigate this now, you can either switch to a TINY INT or remove the NOT NULL setting for the column.

Rebuild index failed on Hive on Azure HDInsight with Tez

I try to create indexes on Hive on Azure HDInsight with Tez enabled.
I can successfully create indexes but I can't rebuild them : the job failed with this output :
Map 1: -/- Reducer 2: 0/1
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1421234198072_0091_1_01, diagnostics=[Vertex Input: measures initializer failed.]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1421234198072_0091_1_00, diagnostics=[Vertex > received Kill in INITED state.]
DAG failed due to vertex failure. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
I have created my table and indexes with the following job :
DROP TABLE IF EXISTS Measures;
CREATE TABLE Measures(
topology string,
val double,
date timestamp,
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE LOCATION 'wasb://<mycontainer>#<mystorage>.blob.core.windows.net/';
CREATE INDEX measures_index_topology ON TABLE Measures (topology) AS 'COMPACT' WITH DEFERRED REBUILD;
CREATE INDEX measures_index_date ON TABLE Measures (date) AS 'COMPACT' WITH DEFERRED REBUILD;
ALTER INDEX measures_index_topology ON Measures REBUILD;
ALTER INDEX measures_index_date ON Measures REBUILD;
Where am I wrong ? And why my rebuilding index fail ?
Best regards
It looks like Tez might have a problem with generating an index on an empty table. I was able to get the same error as you (without using the JSON SerDe), and if you look at the application logs for the DAG that fails, you might see something like:
java.lang.NullPointerException
at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:254)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:299)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getSplits(TezGroupedSplitsInputFormat.java:68)
at org.apache.tez.mapreduce.hadoop.MRHelpers.generateOldSplits(MRHelpers.java:263)
at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:139)
at org.apache.tez.dag.app.dag.RootInputInitializerRunner$InputInitializerCallable$1.run(RootInputInitializerRunner.java:154)
at org.apache.tez.dag.app.dag.RootInputInitializerRunner$InputInitializerCallable$1.run(RootInputInitializerRunner.java:146)
...
If you populate the table with a single dummy record, it seems to work fine. I used:
INSERT INTO TABLE Measures SELECT market,0,0 FROM hivesampletable limit 1;
After that, the index rebuild was able to run without error.

Resources