Key with version number error in YugabyteDB cluster? - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
I'm just testing yugabyte under some negative tests and I'm facing a kind of issue. I'm running a 3 node cluster (master and tserver running on same node) When I stop one node and start it up again, T-server is not booting up under this log
F20220506 06:50:49 ../../src/yb/tserver/tablet_server_main.cc:220] Invalid argument (yb/util/universe_key_manager.cc:73): Could not init Tablet Manager: Failed to open tablet metadata for tablet: eb1e5457022f42c084148ca8fa4ba5c6: Failed to load tablet metadata for tablet id eb1e5457022f42c084148ca8fa4ba5c6: Co
uld not load Raft group metadata from /data/yugabyte/data/yb-data/tserver/tablet-meta/eb1e5457022f42c084148ca8fa4ba5c6: Key with version number c7b91fad-dd60-404f-8846-cab568e52468 does not exist.
# 0x7fcdace5ee4c yb::LogFatalHandlerSink::send()
# 0x7fcdaa5e28ee google::LogMessage::SendToLog()
# 0x7fcdaa5dfa7a google::LogMessage::Flush()
# 0x7fcdaa5e3169 google::LogMessageFatal::~LogMessageFatal()
# 0x4124ea yb::tserver::(anonymous namespace)::TabletServerMain()
# 0x7fcda6811825 __libc_start_main
# 0x410f99 _start
# (nil) (unknown)
The only way to start it up is remove the old data.
My steps were:
1.- Cluster up with 3 server
2.- Create a taable with 3 partition on different tablet id confirmed via UI)
3.- Insert 3 different row to diff partition
4.- Select * working fine
5.- Shut down one Table server
6.- Select * working fine
7.- Starting up the table server (error)
We are running using config file:
/usr/local/yugabyte/src/yugabyte-2.11.0.1/bin/./yb-tserver --flagfile /data/yugabyte/etc/tserver.conf
and config:
--tserver_master_addrs=ip1:7100,ip2:7100,ip3:7100
--rpc_bind_addresses=fqdn
--server_broadcast_addresses=ip1
--enable_ysql
--pgsql_proxy_bind_address=ip1:5433
--cql_proxy_bind_address=ip1:9042
--fs_data_dirs=/data/yugabyte/data
--placement_cloud=cloud
--placement_region=reg
--placement_zone=zone
--use_client_to_server_encryption=true
--certs_for_client_dir=/data/yugabyte/ssl
--certs_dir=/data/yugabyte/ssl
--use_node_to_node_encryption=true
--ysql_enable_auth=true
--log_dir=/data/yugabyte/logs
--ssl_protocols=tls12,tls13
--ysql_pg_conf=pgaudit.log='DDL',pgaudit.log_level=notice,pgaudit.log_client=ON,log_min_messages=notice,log_line_prefix='\%m \%r \%u \%d [\%p]'
Looks like the key is not in memory:
/usr/local/yugabyte/src/yugabyte-2.11.0.1/bin/yb-admin -master_addresses $master get_universe_config
{"version":2,"replicationInfo":{"liveReplicas":{"numReplicas":3,"placementBlocks":[{"cloudInfo":{"placementCloud":"cloud","placementRegion":"region","placementZone":"zone"},"minNumReplicas":1}]}},"clusterUuid":"dccea8cb-9790-48ba-8a05-6218a8e875a4","encryptionInfo":{"encryptionEnabled":true,"universeKeyRegistryEncoded":"sZTzNciYu6b1KxZonpJx6v7CDDvexiv1jh/HIEAOkpV4YRrIZbIK9jtajdEMmVEUy706+dmz8bmnZvy6/n33u+qS7fzRSOTPOlpxYI6+k1lSM6bu2DRTTffhZtaiKN15gy8a3ifaZV7xJ9QJ3z9SvFYzb96+KDWw","keyPath":"/data/yugabyte/rest/universe_key","latestVersionId":"c7b91fad-dd60-404f-8846-cab568e52468","keyInMemory":false}}
KeyInMemory: False
We are using encryption at rest but the the file with the key should be there.
Am I doing something wrong?
usr/local/yugabyte/src/yugabyte-2.11.0.1/bin/yb-admin -master_addresses fqdn:7100 all_masters_have_universe_key_in_memory 7e13c99e-5278-4abd-ab78-79f70d6c2679
Error running all_masters_have_universe_key_in_memory: Operation failed. Try again. (yb/tools/yb-admin_client_ent.cc:1027): Unable to check whether master has universe key in memory.: Node fqdn:7100 does not have universe key in memory

The above error seems to indicate the tserver is looking for a key - “c7b91fad-dd60-404f-8846-cab568e52468” in order to open the file, but can’t find it.
Realizing that there’s an actual key file on the masters. That’s been deprecated for using more secure in-memory keys. I just did a little digging through the code, and sure enough, we don’t actually support sending on-disk keys to tablet servers on restart.
The command you have to run is this one:
yb-admin -master_addresses <master-addresses> add_universe_keys_to_all_masters <key_id> <key_path>
and then right after that it should work:
/usr/local/yugabyte/src/yugabyte-2.11.0.1/bin/yb-admin -master_addresses $master all_masters_have_universe_key_in_memory 1
Node fqdn1:7100 has universe key in memory: 1
Node fqdn2:7100 has universe key in memory: 1
Node fqdn3:7100 has universe key in memory: 1

Related

Error Code PINOT_UNABLE_TO_FIND_BROKER :No valid brokers found

I am trying to query pinot table data using presto, below are my configuration details.
started Pinot is one of the sit server.i.e. 10.184.160.52
Controller: 10.184.160.52:9000
server: 10.184.160.52:7000
broker: 10.184.160.52:8000
I have Presto on different server Ports are open b/w these 2 servers. i.e.10.184.160.53
Created One pinot.properties file inside presto/etc/catalog/pinot.properties.
connector.name=pinot
pinot.controller-urls=Controller_Host:9000
bin/launcher run ---> Loaded Pinot catalog.
Started Prestro with Pinot Segment.
./presto --server 10.184.160.53:8080 --catalog pinot
show catalogs;(able to see my Catalog)
pinot
show schemas; (able to see sachema also)
presto> show schemas;
Schema
--------------------
default
presto> use default;
USE
presto:default> show tables;----(able to see pinot tables:)
Table
------------------------------
test
test2
test3
(3 rows)
Query 20210519_124218_00061_vcz4u, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:00 [3 rows, 98B] [10 rows/s, 340B/s]
but when I am doing select * from test ; its showing broker not found
presto:default> select * from test;
Query 20210519_124230_00062_vcz4u failed: No valid brokers found for test
Complete Presto Logs:
Error Code PINOT_UNABLE_TO_FIND_BROKER (84213767)
Stack Trace
io.prestosql.pinot.PinotException: No valid brokers found for test
at io.prestosql.pinot.client.PinotClient.getBrokerHost(PinotClient.java:285)
at io.prestosql.pinot.client.PinotClient.sendHttpGetToBrokerJson(PinotClient.java:185)
at io.prestosql.pinot.client.PinotClient.getRoutingTableForTable(PinotClient.java:302)
at io.prestosql.pinot.PinotSplitManager.generateSplitsForSegmentBasedScan(PinotSplitManager.java:72)
at io.prestosql.pinot.PinotSplitManager.getSplits(PinotSplitManager.java:167)
at io.prestosql.split.SplitManager.getSplits(SplitManager.java:87)
at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitScanAndFilter(DistributedExecutionPlanner.java:203)
at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitTableScan(DistributedExecutionPlanner.java:185)
at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitTableScan(DistributedExecutionPlanner.java:156)
at io.prestosql.sql.planner.plan.TableScanNode.accept(TableScanNode.java:143)
at io.prestosql.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:124)
at io.prestosql.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:131)
at io.prestosql.sql.planner.DistributedExecutionPlanner.plan(DistributedExecutionPlanner.java:101)
at io.prestosql.execution.SqlQueryExecution.planDistribution(SqlQueryExecution.java:470)
at io.prestosql.execution.SqlQueryExecution.start(SqlQueryExecution.java:386)
at io.prestosql.execution.SqlQueryManager.createQuery(SqlQueryManager.java:237)
at io.prestosql.dispatcher.LocalDispatchQuery.lambda$startExecution$7(LocalDispatchQuery.java:143)
at io.prestosql.$gen.Presto_350____20210519_105836_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
I am not able to understand what is happening here why this issue is showing for select statement.Looks like pinot broker is not accepting queries.someOne Kindly Suggest, What is the issue here.
Update: This is because the connector does not support mixed case table names. Mixed case column names are supported. There is a pull request to add support for mixed case table names: https://github.com/trinodb/trino/pull/7630

Assignment files do not get deleted via course reset after cron execution in Moodle

When I try to reset the assignments of a course, front-end wise all data get deleted. I tested this with a single file upload by myself in a test assignment. But when checking disk usage with
du moodledata/filedir
the same usage remains. I ensured execution of the cron task which printed
...
Cron script completed correctly
Cron completed at 17:40:03. Memory used 32.8MB.
Execution took 0.810698 seconds
The files also are not in moodledata/trashdir probably reason why the cron task does not clean it.
Removing file with
moosh file-hash-delete <hash>
seemed to work. I identified the hash with pre/after executing disk usage and checking hash in the folder that used up the size of the file I uploaded.
The hash was not in the mdl_files table in MySQL, but the draft of it was. This one I found out via
moosh file-check
and I also checked it with phpMyAdmin, which outputted the file(draft) alongside other files.
Logs for resetting the course show the following:
Core System, course reset finished, The reset of the course with id '4' has ended.
Core System, deadline updated, The user with id '2' updated the event 'test ist zur Bewertung fällig.' with id '4'.
Core System, deadline updated, The user with id '2' updated the event 'test ist fällig.' with id '3'.
Core System, course reset begin, The user with id '2' started the reset of the course with id '4'.
(note that I translated some of the messages, because my setup is in German).
Unfortunately I'm having to run this Moodle instance on a hoster with extremely low disk storage (hence backup/deletion requirement).
Some background infos:
Moodle - version 3.8.2+ stable, dbtype set to mariadb
MariaDB - version 10.3.19
Machine: CentOS Linux 7
UPDATE: It seems that after some days (I checked today, ~4 days later) the files have been deleted. I don't know why this happened after so many days even though I manually triggered the cron job (seems that it doesn't delete the files). It would be nice to check where the timer is set and which script finally deletes the files.
On the course reset page, if you scroll down, there is a drop down for Assignments
Did you check the box for Delete all submissions ?
In the code, $data->reset_assign_submissions will delete the files:
public function reset_userdata($data) {
global $CFG, $DB;
$componentstr = get_string('modulenameplural', 'assign');
$status = array();
$fs = get_file_storage();
if (!empty($data->reset_assign_submissions)) {

Cassandra Clustering for 2 node

I want to create a two node cluster in Cassandra. I have done following changes in my yaml file -
Example:
Node 1
cluster_name: 'MyCassandraCluster'
  num_tokens: 256
  seed_provider: class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
        ‐ seeds:  "10.168.66.41,10.176.170.59" 
listen_address:10.168.66.41
rpc_address:10.168.66.41  
endpoint_snitch: GossipingPropertyFileSnitch
auto_bootstrap : false
Node 2
cluster_name: 'MyCassandraCluster'
num_tokens: 256
seed_provider:     class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
    ‐ seeds:  "10.168.66.41"
listen_address:10.176.170.59
rpc_address:10.176.170.59
endpoint_snitch: GossipingPropertyFileSnitch
auto_bootstrap : false
But still I am not able to create two node cluster. Why am I facing this issue?
Well, it's hard to know without seeing an actual error message from your system.log, but I'll take a guess. It looks like you might have a chicken-before-the-egg problem, based on your seed nodes.
10.176.170.59 won't be able to start without 10.168.66.41 already running. And while .41 has itself specified as a seed node, it also has .59 specified, which might throw things off.
My recommendation is to change your seed list to be the same on all (both) nodes. Just set it to this on both:
seeds: "10.168.66.41"
Then, start .41, which should come up. Then start .59.
If that doesn't do it, look for exceptions in your system.log.
Auto bootstrap should be set to true when a new node is added in a cluster.
So set auto bootstrap to true and set your seed node as one node E.g. in your case it 10.168.66.41 (or) 10.176.70.59.
Start your seed node first
Telnet your seed node and storage port (default 7000) from your secondary node, if not able to telnet then check your firewall settings.
Start your secondary node now

Azure SQL Data Warehouse: No catalog entry found for partition ID <id> in database <id>. The metadata is inconsistent. Run DBCC CHECKDB

I am working on moving stored procedures from an on-prem SQL Server database to an Azure SQL Data Warehouse (ASDW). Throughout the process I have had to work around a few missing features - time consuming but not impossible. One thing I have had to do is replace CTE's followed by MERGE statements with temp tables followed by UPDATE/INSERT/DELETE statements (since CTE's cannot be followed by these statements). At the beginning of each SP I check for the temp tables and delete them if they exist.
Today, I created another stored procedure in the ASDW without any temp tables (no updates/inserts/deletes so I left the CTE's in there), it "compiled", and I was able to run it without issue (returned an empty result set, as there is no data yet). I created another SP after this, and when I went to execute it, I got the following error:
...No catalog entry found for partition ID (id) in database 26. The metadata is inconsistent. Run DBCC CHECKDB to check for a metadata corruption...
I then went back to the first SP that I mentioned, and it gave me the same error, even though it had previously run without flaw.
I tried running DBCC CHECKDB as instructed but alas, it is not supported/doesn't work.
I dug around a lot, and what I ended up doing was scaling my database from 100DWU's to 500DWU's. I am at 0.16% of my database storage size limit, and there is barely any data anywhere (total DB size is <300MB).
Is there an explanation for this? If not, I can't in good conscience use this platform in a production environment.
Full error:
Msg 110802, Level 16, State 1, Line 1
110802;An internal DMS error occurred that caused this operation to fail.
Details: Exception: Microsoft.SqlServer.DataWarehouse.DataMovement.Workers.DmsSqlNativeException,
Message: SqlNativeBufferReader.Run, error in OdbcExecuteQuery: SqlState:
42000, NativeError: 608, 'Error calling: SQLExecDirect(this->GetHstmt(), (SQLWCHAR *)statementText, SQL_NTS), SQL return code: -1 | SQL Error Info:
SrvrMsgState: 1, SrvrSeverity: 16, Error <1>: ErrorMsg: [Microsoft][ODBC Driver 11 for SQL Server][SQL Server]No catalog entry found for partition ID
72057594047758336 in database 36. The metadata is inconsistent. Run DBCC
CHECKDB to check for a metadata corruption. | Error calling: pReadConn-
>ExecuteQuery(statementText, bufferFormat) | state: FFFF, number: 134148,
active connections: 100', Connection String: Driver={pdwodbc};APP=TypeC01-
DmsNativeReader:DB196\mpdwsvc (2504)- ODBC;Trusted_Connection=yes;AutoTranslate=no;Server=\\.\pipe\DB.196-
bb5f9dd884cf\sql\query
I'm sorry to hear about your experience with Azure SQL Data Warehouse. I believe this is a defect related to BIT data type handling for NOT NULL columns. Can you confirm that you have a BIT NOT NULL column (e.g., CREATE TABLE t1 (IsTrue BIT NOT NULL);)?
If so, a fix has been coded and is in testing for release. To mitigate this now, you can either switch to a TINY INT or remove the NOT NULL setting for the column.

How to delete graph in Titan with Cassandra storage backend?

I use Titan 0.4.0 All, running Rexster in shared VM mode on Ubuntu 12.04.
How could I properly delete a graph in Titan which is using the Cassandra storage backend?
I have tried the TitanCleanup.clear(graph), but it does not delete everything. The indices are still there. My real issue is that I have an index which I don't want (it crashes every query), however as I understand Titan's documentation it is impossible to remove an index once it is created.
You can clear all the edges/vertices with:
g.V.remove()
but as you have found that won't clear the types/indices previously created. The most cleanly option would be to just delete the Cassandra data directory.
If you are executing the delete via a unit test you might try to do this as part of your test setup:
this.config = new BaseConfiguration(){{
addProperty("storage.backend", "berkeleyje")
addProperty("storage.directory", "/tmp/titan-schema-test")
}}
GraphDatabaseConfiguration graphconfig = new GraphDatabaseConfiguration(config)
graphconfig.getBackend().clearStorage()
g = (StandardTitanGraph) TitanFactory.open(config)
Be sure to call g.shutdown() in your test teardown method.
Just to update this answer.
With Titan 1.0.0 this can be done programmatically in Java with:
TitanGraph graph = TitanFactory.open(config);
graph.close();
TitanCleanup.clear(graph);
For the continuation of Titan called JanusGraph, the command is JanusGraphFactory.clear(graph) but is soon to be JanusGraphCleanup.clear(graph).
As was mentioned in one of the comments to the earlier answer DROPping a keyspace titan using cqlsh should do it:
cqlsh> DROP KEYSPACE titan;
The name of the keyspace Titan uses is set up using storage.cassandra.keyspace configuration option. You can change it to whatever name you want and is acceptable by Cassandra.
storage.cassandra.keyspace=hello_titan
When Cassandra is getting up, it prints out the keyspace's name as follows:
INFO 19:50:32 Create new Keyspace: KSMetaData{name=hello_titan,
strategyClass=SimpleStrategy, strategyOptions={replication_factor=1},
cfMetaData={}, durableWrites=true,
userTypes=org.apache.cassandra.config.UTMetaData#767d6a9f}
In 0.9.0-M1, the name appears in Titan's log in DEBUG (set log4j.rootLogger=DEBUG, stdout in conf/log4j-server.properties):
[DEBUG] AstyanaxStoreManager - Found keyspace titan
or the following when it doesn't:
[DEBUG] AstyanaxStoreManager - Creating keyspace titan...
[DEBUG] AstyanaxStoreManager - Created keyspace titan

Resources