Percona Xtradb cluster crashing

Percona Xtradb cluster crashing - percona

We have a Percona Xtradb cluster with 5 nodes and an arbitrator. One of our Php developers ran a bad query on the cluster, crashing all the nodes. After the crash, we could not collect any error log to tell us what really went wrong as the entire cluster crashed without performing any logging.
I have always thought that when a single query is executed on the cluster, it is processed by only one of the nodes in the cluster. So if the query is bad (to the point of killing a db server), it should only crash the one node thats processing it, leaving the cluster running with the remaining 4 nodes.
This behavior has puzzled us and we would like to understand what is really going on especially that this is the second time this is happening. Why would a query running on the cluster while processed by one of the nodes would cause other nodes in the cluster to crash in case of some issue while being processed?
Below is our my.cnf config:
#
# Default values.
[mysqld_safe]
flush_caches
numa_interleave
#
#
[mysqld]
back_log = 65535
binlog_format = ROW
character_set_server = utf8
collation_server = utf8_general_ci
datadir = /var/lib/mysql
default_storage_engine = InnoDB
expand_fast_index_creation = 1
expire_logs_days = 7
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_instances = 16
innodb_buffer_pool_populate = 1
innodb_buffer_pool_size = 32G # XXX 64GB RAM, 80%
innodb_data_file_path = ibdata1:64M;ibdata2:64M:autoextend
innodb_file_format = Barracuda
innodb_file_per_table
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_io_capacity = 1600
innodb_large_prefix
innodb_locks_unsafe_for_binlog = 1
innodb_log_file_size = 64M
innodb_print_all_deadlocks = 1
innodb_read_io_threads = 64
innodb_stats_on_metadata = FALSE
innodb_support_xa = FALSE
innodb_write_io_threads = 64
log-bin = mysqld-bin
log-queries-not-using-indexes
log-slave-updates
long_query_time = 1
max_allowed_packet = 64M
max_connect_errors = 4294967295
max_connections = 4096
min_examined_row_limit = 1000
port = 3306
relay-log-recovery = TRUE
skip-name-resolve
slow_query_log = 1
slow_query_log_timestamp_always = 1
table_open_cache = 4096
thread_cache = 1024
tmpdir = /db/tmp
transaction_isolation = REPEATABLE-READ
updatable_views_with_limit = 0
user = mysql
wait_timeout = 60
#
# Galera Variable config
wsrep_cluster_address = gcomm://ip_1, ip_2, ip_3,ip_4,ip_4,ip_5
wsrep_cluster_name = cluster_db
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_provider_options = "gcache.size=4G"
wsrep_slave_threads = 32
wsrep_sst_auth = "user:password"
wsrep_sst_donor = "db1"
#wsrep_sst_method = xtrabackup_throttle
wsrep_sst_method = xtrabackup-v2
#
# XXX You *MUST* change!
server-id = 1

Can you post the query? SELECT queries only execute on a single node but all write queries will execute everywhere. What's in your error log?

Related

Spark stage taking too long - 2 executors doing "all" the work

I've been trying to figure this out for the past day, but have not been successful.
Problem I am facing
I'm reading a parquet file that is about 2GB big. The initial read is 14 partitions, then eventually gets split into 200 partitions. I perform seemingly simple SQL query that runs for 25+ mins runtime, about 22 mins is spent on a single stage. Looking in the Spark UI, I see that all computation is eventually pushed to about 2 to 4 executors, with lots of shuffling. I don't know what is going on. Please I would appreciate any help.
Setup
Spark environment - Databricks
Cluster mode - Standard
Databricks Runtime Version - 6.4 ML (includes Apache Spark 2.4.5, Scala 2.11)
Cloud - Azure
Worker Type - 56 GB, 16 cores per machine. Minimum 2 machines
Driver Type - 112 GB, 16 cores
Notebook
Cell 1: Helper functions
load_data = function(path, type) {
input_df = read.df(path, type)
input_df = withColumn(input_df, "dummy_col", 1L)
createOrReplaceTempView(input_df, "__current_exp_data")
## Helper function to run query, then save as table
transformation_helper = function(sql_query, destination_table) {
createOrReplaceTempView(sql(sql_query), destination_table)
}
## Transformation 0: Calculate max date, used for calculations later on
transformation_helper(
"SELECT 1L AS dummy_col, MAX(Date) max_date FROM __current_exp_data",
destination_table = "__max_date"
)
## Transformation 1: Make initial column calculations
transformation_helper(
"
SELECT
cId AS cId
, date_format(Date, 'yyyy-MM-dd') AS Date
, date_format(DateEntered, 'yyyy-MM-dd') AS DateEntered
, eId
, (CASE WHEN isnan(tSec) OR isnull(tSec) THEN 0 ELSE tSec END) AS tSec
, (CASE WHEN isnan(eSec) OR isnull(eSec) THEN 0 ELSE eSec END) AS eSec
, approx_count_distinct(eId) OVER (PARTITION BY cId) AS dc_eId
, COUNT(*) OVER (PARTITION BY cId, Date) AS num_rec
, datediff(Date, DateEntered) AS analysis_day
, datediff(max_date, DateEntered) AS total_avail_days
FROM __current_exp_data
CROSS JOIN __max_date ON __main_data.dummy_col = __max_date.dummy_col
",
destination_table = "current_exp_data_raw"
)
## Transformation 2: Drop row if Date is not valid
transformation_helper(
"
SELECT
cId
, Date
, DateEntered
, eId
, tSec
, eSec
, analysis_day
, total_avail_days
, CASE WHEN analysis_day == 0 THEN 0 ELSE floor((analysis_day - 1) / 7) END AS week
, CASE WHEN total_avail_days < 7 THEN NULL ELSE floor(total_avail_days / 7) - 1 END AS avail_week
FROM current_exp_data_raw
WHERE
isnotnull(Date) AND
NOT isnan(Date) AND
Date >= DateEntered AND
dc_eId == 1 AND
num_rec == 1
",
destination_table = "main_data"
)
cacheTable("main_data_raw")
cacheTable("main_data")
}
spark_sql_as_data_table = function(query) {
data.table(collect(sql(query)))
}
get_distinct_weeks = function() {
spark_sql_as_data_table("SELECT week FROM current_exp_data GROUP BY week")
}
Cell 2: Call helper function that triggers the long running task
library(data.table)
library(SparkR)
spark = sparkR.session(sparkConfig = list())
load_data_pq("/mnt/public-dir/file_0000000.parquet")
set.seed(1234)
get_distinct_weeks()
Long running stage DAG
Stats about long running stage
Logs
I trimmed it down, and show only entries that appeared multiple times below
BlockManager: Found block rdd_22_113 locally
CoarseGrainedExecutorBackend: Got assigned task 812
ExternalAppendOnlyUnsafeRowArray: Reached spill threshold of 4096 rows, switching to org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter
InMemoryTableScanExec: Predicate (dc_eId#61L = 1) generates partition filter: ((dc_eId.lowerBound#622L <= 1) && (1 <= dc_eId.upperBound#621L))
InMemoryTableScanExec: Predicate (num_rec#62L = 1) generates partition filter: ((num_rec.lowerBound#627L <= 1) && (1 <= num_rec.upperBound#626L))
InMemoryTableScanExec: Predicate isnotnull(Date#57) generates partition filter: ((Date.count#599 - Date.nullCount#598) > 0)
InMemoryTableScanExec: Predicate isnotnull(DateEntered#58) generates partition filter: ((DateEntered.count#604 - DateEntered.nullCount#603) > 0)
MemoryStore: Block rdd_17_104 stored as values in memory (estimated size <VERY SMALL NUMBER < 10> MB, free 10.0 GB)
ShuffleBlockFetcherIterator: Getting 200 non-empty blocks including 176 local blocks and 24 remote blocks
ShuffleBlockFetcherIterator: Started 4 remote fetches in 1 ms
UnsafeExternalSorter: Thread 254 spilling sort data of <Between 1 and 3 GB> to disk (3 times so far)

Amazon DynamoDB [DDB] parallel scan - boto 3 - python

I am trying to do parallel scan but it looks like it processing sequentially and not fetching full records
My ddb has 400k records and i am trying to pull all records using DDB parallel scans.I want to use 40 threads
dynamodb = boto3.resource('dynamodb',
region_name='us-east-1',
endpoint_url="https://dynamodb.us-east-1.amazonaws.com",
aws_access_key_id="123",
aws_secret_access_key="456")
table = dynamodb.Table('solved_history')
last_evaluated_key = None
a = 0
while True:
if last_evaluated_key:
response = table.scan(ExclusiveStartKey=last_evaluated_key)
allResults = response['Items']
for each in allResults:
storeRespectively(each["decodedText"], each["base64StringOfImage"])
a += 1
else:
response = table.scan(TotalSegments=40, Segment=39)
allResults = response['Items']
for each in allResults:
storeRespectively(each["decodedText"], each["base64StringOfImage"])
a += 1
last_evaluated_key = response.get('LastEvaluatedKey')
if not last_evaluated_key:
break
print(a)

Spark doesn't read the file properly

I run Flume to ingest Twitter data into HDFS (in JSON format) and run Spark to read that file.
But somehow, it doesn't return the correct result: it seems the content of the file is not updated.
Here's my Flume configuration:
TwitterAgent01.sources = Twitter
TwitterAgent01.channels = MemoryChannel01
TwitterAgent01.sinks = HDFS
TwitterAgent01.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent01.sources.Twitter.channels = MemoryChannel01
TwitterAgent01.sources.Twitter.consumerKey = xxx
TwitterAgent01.sources.Twitter.consumerSecret = xxx
TwitterAgent01.sources.Twitter.accessToken = xxx
TwitterAgent01.sources.Twitter.accessTokenSecret = xxx
TwitterAgent01.sources.Twitter.keywords = some_keywords
TwitterAgent01.sinks.HDFS.channel = MemoryChannel01
TwitterAgent01.sinks.HDFS.type = hdfs
TwitterAgent01.sinks.HDFS.hdfs.path = hdfs://hadoop01:8020/warehouse/raw/twitter/provider/m=%Y%m/
TwitterAgent01.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent01.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent01.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent01.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent01.sinks.HDFS.hdfs.rollCount = 0
TwitterAgent01.sinks.HDFS.hdfs.rollInterval = 86400
TwitterAgent01.channels.MemoryChannel01.type = memory
TwitterAgent01.channels.MemoryChannel01.capacity = 10000
TwitterAgent01.channels.MemoryChannel01.transactionCapacity = 10000
After that I check the output with hdfs dfs -cat and it returns more than 1000 rows, meaning that the data was successfully inserted.
But in Spark that's not the case
spark.read.json("/warehouse/raw/twitter/provider").filter("m=201802").show()
only has 6 rows.
Did I miss something here?

I'm not entirely sure of why you specified the latter part of the path as the condition expression of filter.
I believe to correctly read your file you can just write:
spark.read.json("/warehouse/raw/twitter/provider/m=201802").show()

LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems

I'm developing code in SQL spark reading tables in Hive ( HDFS ) .
The problem is that when I load my code in the shell of spark, recursively me the following message :
"WARN LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems."
I run the code that is:
val query_fare_details = sql("""
SELECT *
FROM fare_details
WHERE fardet_cd_carrier = 'LA'
AND fardet_cd_origin_city = 'SCL'
AND fardet_cd_dest_city = 'MIA'
AND fardet_cd_fare_basis = 'NNE0F0O1'
""")
query_fare_details.registerTempTable("query_fare_details")
val matchFAR1 = sql("""
SELECT *
FROM query_fare_details f
JOIN fare_rules r ON f.fardet_cd_carrier = r.farrul_cd_carrier
AND f.fardet_num_rule_tariff = r.farrul_num_rule_tariff
AND f.fardet_cd_fare_rule_bigint = r.farrul_cd_fare_rule_bigint
AND f.fardet_cd_fare_basis = r.farrul_cd_fare_basis
LIMIT 10""")
matchFAR1.show(5)
Any idea what goes wrong?

you can safely ignore this warning. This is not an error
Refer [https://issues.apache.org/jira/browse/SPARK-3057][1]

How Do I Connect Oracle To PostgreSQL over ODBC using the Oracle ODBC Gateway?

I would like to select from a PostgreSQL instance on a server running openSUSE from an Oracle instance on a server running Oracle's custom Linux flavor. I would like to do this using the Oracle ODBC gateway. I have done this successfully in the past and continue to do it using the same Oracle box and other SUSE/Postgres boxes.
my ODBC manager on the SUSE (Postgres) side is: unixODBC
My odbc.ini is:
[postgresql]
Description = Test to Postgres
Driver = /usr/lib64/psqlodbcw.so
Trace = Yes
TraceFile = sql.log
Database = host
Servername = localhost
UserName = *****
Password = *****
Port = 5432
Protocol =
ReadOnly = Yes
RowVersioning = No
ShowSystemTables = No
ShowOidColumn = No
FakeOidIndex = No
ConnSettings =
My odbcinst.ini is
[postgresql]
Description = Postgresql driver for Linux
Driver = /usr/lib64/psqlodbcw.so
UsageCount = 1
my tnsnames.ora (with other ODBC entries omitted) is:
# tnsnames.ora Network Configuration File: /opt/oracle/product/11gR1/db/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.
LISTENER_ORCL =
(ADDRESS = (PROTOCOL = TCP)(HOST = dbs1)(PORT = 1522))
ORCL =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = dbs1)(PORT = 1522))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = orcl)
)
)
ODBC_SERVER123=
(DESCRIPTION=
(ADDRESS=
(PROTOCOL=TCP)
(HOST=dbs1)
(PORT=1522)
)
(CONNECT_DATA=
(SID=server123)
)
(HS=OK)
)
my listener.ora (with other SID_DESC entries omitted) is:
# listener.ora Network Configuration File: /opt/oracle/product/11gR1/db/network/admin/listener.ora
# Generated by Oracle configuration tools.
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST=
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = dbs1)(PORT = 1522))
)
)
)
ADR_BASE_LISTENER = /opt/oracle
SID_LIST_LISTENER=
(SID_LIST=
(SID_DESC=
(SID_NAME=server123)
(ORACLE_HOME=/opt/oracle/product/11gR1/db)
(PROGRAM=dg4odbc)
(ENVS=LD_LIBRARY_PATH=/usr/lib64:/opt/oracle/product/11gR1/db/lib)
)
)
TRACE_LEVEL_LISTENER = 0
LOGGING_LISTENER = off
Also, here is the inithost123.ora file located in $ORACLE_HOME/hs/admin:
#
# HS init parameters
#
HS_FDS_CONNECT_INFO = server123
HS_FDS_TRACE_LEVEL = 0
#HS_FDS_TRACE_LEVEL=DEBUG
HS_FDS_SHAREABLE_NAME = /usr/lib64/libodbc.so
HS_FDS_SUPPORT_STATISTICS = FALSE
HS_LANGUAGE=AMERICAN_AMERICA.WE8ISO8859P1
#HS_LANGUAGE=AMERICAN_AMERICA.US7ASCII
HS_FDS_TIMESTAMP_MAPPING = "TIMESTAMP(6)"
HS_FDS_FETCH_ROWS=1
HS_FDS_SQLLEN_INTERPRETATION=32
#
# ODBC specific environment variables
#
set ODBCINI=/etc/unixODBC/odbc.ini
#
# Environment variables required for the non-Oracle system
#
#set <envvar>=<value>
And just for good measure, my sqlnet.ora is:
# sqlnet.ora Network Configuration File: /opt/oracle/product/11gR1/db/network/admin/sqlnet.ora
# Generated by Oracle configuration tools.
NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)
ADR_BASE = /opt/oracle
I added a gateway link oracle-side using the following:
CREATE DATABASE LINK ODBC_SERVER123 CONNECT TO "*****" IDENTIFIED BY "*****" USING 'ODBC_SERVER123';
When I try to perform my select I receive the following error:
[SQL] select * from "legit_view"#ODBC_SERVER123
[Err] ORA-28500: connection from ORACLE to a non-Oracle system returned this message:
[unixODBC][Driver Manager]Data source name not found, and no default driver specified {IM002}
ORA-02063: preceding 2 lines from ODBC_SERVER123
I can successfully test ODBC locally on the SUSE/Postgres box with isql:
>isql -v postgresql ***** *****
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL>
What step have I forgotten?

Your odbc.ini file must be on the remote box! That is, the one originating the query. In the example above that would be the Oracle box. Here is an entry that must go on the Oracle (remote) box:
[server123]
Driver = /usr/lib64/psqlodbcw.so
Description = ODBC to PostgreSQL on Server 123
Trace = No
Tracefile = /var/log/sql_host_123.log
Servername = ip.address.for.123
Username = *****
Password = *****
Port = 5432
Protocol = 6.4
Database = host
QuotedId = Yes
ReadOnly = No
RowVersioning = No
ShowSystemTables = No
ShowOidColumn = No
FakeOidIndex = No

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Percona Xtradb cluster crashing - percona

Can you post the query? SELECT queries only execute on a single node but all write queries will execute everywhere. What's in your error log?

Related

Spark stage taking too long - 2 executors doing "all" the work

Amazon DynamoDB [DDB] parallel scan - boto 3 - python

Spark doesn't read the file properly

LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems

How Do I Connect Oracle To PostgreSQL over ODBC using the Oracle ODBC Gateway?

Categories

Resources