GCP Dataproc - Slow read speed from GCS - apache-spark

I have a GCP dataproc cluster where I'm running a job. The input of the job is a folder where there are 200 part files. Each part file is approx 1.2 GB big.
My job is just map operations
val df = spark.read.parquet("gs://bucket/data/src/....")
df.withColumn("a", lit("b")).write.save("gs://bucket/data/dest/...")
The property parquet.block.size is set to 128 MB which means that each part file will be read 10 times during the job.
I enabled the bucket access logging and looked at the stats and I was surprised to see that each part file is getting access whopping 85 times. I can see that there are only 10 requests which send the actual data other requests are either sending 0 bytes in return or some very small amount.
I do understand that reading a big parquet file in splits is standard Spark behavior. Also there must be some metadata exchange requests as well but 8X calls is something very strange. Also if I take a look at amount of data transferred and time taken it looks like that data is getting transferred at 100 MB/mins speed which is very very slow for google's internal data transfer (from GCS to dataproc). I am attaching a CSV with bytes, time taken, url for one part file.
Has anybody experienced such behavior with dataproc? Is there an explanation for so many requests to the file and such slow transfer rates.
As a side note both bucket and dataproc cluster are in same region. There are 50 workers with n1-standard-16 machines.
Since I could not attach the file I'm pasting the formatted contents here.
| sc_bytes | time_taken_micros | cs_uri |
|-----------|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
| 0 | 21000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 22000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 709922 | 164000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 709922 | 86000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 709922 | 173000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 8 | 47000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 8 | 51000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 12000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 8 | 103000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 709922 | 98000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 8 | 42000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 709922 | 88000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 8 | 42000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 20000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 20000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 8 | 40000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 20000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 15000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 19000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 143092175 | 63484000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 16000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 19000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 32000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 16000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 19000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 14000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 137585202 | 66010000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 136726977 | 66732000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 176684024 | 101921000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 32000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 709922 | 113000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 16000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 23000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 16000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 19000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 134187229 | 64401000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 135450987 | 73632000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 24000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 21000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 15000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 15000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 27000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 709922 | 106000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 137020002 | 66333000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 17000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 24000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 8 | 41000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 25000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 16000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 8 | 39000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 20000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 709922 | 135000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 16000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 0 | 19000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 709922 | 126000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 8 | 41000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 0 | 18000 | /storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet |
| 135686216 | 71676000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |
| 179573683 | 90877000 | /download/storage/v1/b/spark-ml-tkrt/o/Dfs%2F/massiveDf%2Fpart-00184-35441be5-85ca-4b21-85bd-cb99f9aa3093-c000.snappy.parquet?alt=media |

Relatively large number of GCS metadata requests (URLs without ?alt=media parameter in your table) is expected in this case. Job driver performs metadata requests to list files and get their sizes to generate splits, after that for each split workers perform multiple metadata requests to check if files exist, get their size, etc. I think that this seeming inefficiency stems from the fact that Spark uses HDFS interface to access GCS and because HDFS requests have much lower latency than GCS, I don't think that whole Hadoop/Spark stack was heavily optimized to reduce number of HDFS requests.
To address this issue, on Spark level, you may want to enable metadata caching with spark.sql.parquet.cacheMetadata=true property.
On GCS connector level, to reduce number of GCS metadata requests you can enable metadata cache with fs.gs.performance.cache.enable=true property (with spark.hadoop. prefix for Spark), but it can introduce some metadata staleness.
Also, to take advantage of latest improvements in GCS connector (including reduced number of GCS metadata requests and support for random reads) you may want to update it in your cluster to latest version or use Dataproc 1.3 that has it pre-installed.
Regarding read speed, you may want to allocate more worker tasks per each VM which will increase read speed by increasing number of simultaneous reads.
Also, you may want to check if read speed is limited by write speed for your workload, by removing write to the GCS at the end entirely or replacing it with with write to HDFS or some computation instead.

Related

Illegal state: Transaction for catalog table write operation 'pg_database' not found in YugabyteDB YSQL

[Question posted by a user on YugabyteDB Community Slack]
Before dropping a database, first I want to prevent new connections and then drop the existing connections. However, I'm stuck at the first step:
ysqlsh (11.2-YB-2.1.1.0-b0)
yugabyte=# SELECT * FROM pg_database;
datname | datdba | encoding | datcollate | datctype | datistemplate | datallowconn | datconnlimit | datlastsysoid | datfrozenxid | datminmxid | dattablespace | datacl
----------------------------+--------+----------+------------+-------------+---------------+--------------+--------------+---------------+--------------+------------+---------------+-------------------------------------
template1 | 10 | 6 | C | en_US.UTF-8 | t | t | -1 | 0 | 0 | 1 | 1663 | {=c/postgres,postgres=CTc/postgres}
template0 | 10 | 6 | C | en_US.UTF-8 | t | f | -1 | 0 | 0 | 1 | 1663 | {=c/postgres,postgres=CTc/postgres}
postgres | 10 | 6 | C | en_US.UTF-8 | f | t | -1 | 0 | 0 | 1 | 1663 |
yugabyte | 10 | 6 | C | en_US.UTF-8 | f | t | -1 | 0 | 0 | 1 | 1663 |
system_platform | 10 | 6 | C | en_US.UTF-8 | f | t | -1 | 0 | 0 | 1 | 1663 |
test_1650530283_52506 | 12462 | 6 | C | en_US.UTF-8 | f | t | -1 | 0 | 0 | 1 | 1663 |
(6 rows)
yugabyte=# UPDATE pg_database SET datallowconn=false WHERE datname = 'test_1650530283_52506';
ERROR: Illegal state: Transaction for catalog table write operation 'pg_database' not found
The described behavior is not a bug. Everything works as expected. It is not recommended to change system tables manually. If it absolutely necessary for some reason user should set the yb_non_ddl_txn_for_sys_tables_allowed GUC variable to true. But user will do this on its own risk.
BTW to disallow connection to DB it is better to use this query instead of changing system table manually:
ALTER DATABASE db WITH ALLOW_CONNECTIONS false;

nyc returns empty results

I use the nyc library to check test coverage.
When I run the tests and they pass successfully, I get an empty table.
----------|----------|----------|----------|----------|-------------------|
File | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s |
----------|----------|----------|----------|----------|-------------------|
All files | 0 | 0 | 0 | 0 | |
----------|----------|----------|----------|----------|-------------------|
But when one of the tests fails, the table is filled.
File | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s |
---------------------------|----------|----------|----------|----------|-------------------|
All files | 31.64 | 9.25 | 11.76 | 31.86 | |
src | 96.55 | 100 | 66.67 | 96.55 | |
index.js | 96.43 | 100 | 66.67 | 96.43 | 44 |
src/config | 93.75 | 80 | 100 | 93.75 | |
constants.js | 100 | 100 | 100 | 100 | |
multer.config.js | 88.89 | 80 | 100 | 88.89 | 13 |
s3.config.js | 100 | 100 | 100 | 100 | |
src/controllers | 30.5 | 6.49 | 10.68 | 30.77 | |
activateProducts.js | 30.34 | 5.56 | 22.22 | 29.89 |... 26,130,131,138 |
admins.js | 18.92 | 0 | 0 | 18.92 |... 59,265,271,277 |
bookmarks.js | 61.29 | 33.33 | 66.67 | 60 |... 42,43,48,49,56 |
categories.js | 15.38 | 0 | 0 | 16 |... 98,205,213,219 |
comments.js | 20.83 | 0 | 0 | 21.74 |... 31,132,139,147 |
configurations.js | 16.26 | 0 | 0 | 17.39 |... 94,199,200,207 |
filters.js | 19.15 | 0 | 0 | 20 |... 77,81,82,86,87 |
followers.js | 22.67 | 0 | 0 | 23.61 |... 21,127,128,135 |
likes_merchants.js | 31.71 | 0 | 0 | 32.5 |... 58,59,66,67,74 |
likes_products.js | 31.71 | 0 | 0 | 32.5 |... 58,59,66,67,74 |
locations.js | 15.69 | 0 | 0 | 16.33 |... 26,233,241,247 |
locationsType.js | 18.6 | 0 | 0 | 19.51 |... 60,167,175,181 |
merchants.js | 34.62 | 0 | 0 | 36 |... 26,27,34,35,42 |
merchantsSignup.js | 26.56 | 0 | 0 | 26.56 |... 16,218,219,221 |
merchantsVerifyPhone.js | 39.02 | 0 | 0 | 39.02 |... 75,79,84,85,92 |
products.js | 47.24 | 20.62 | 52.17 | 46.39 |... 16,422,428,434 |
productsSold.js | 51.35 | 20 | 50 | 50 |... 49,54,55,62,70 |
saveTokenOfUsers.js | 40.74 | 0 | 0 | 42.31 |... 35,36,41,42,49 |
updateLocationOUsers.js | 28.07 | 0 | 0 | 28.57 |... 3,94,98,99,106 |
userBookmarks.js | 73.17 | 50 | 50 | 71.79 |... 44,53,54,61,69 |
userProducts.js | 81.4 | 50 | 50 | 80.49 |... 42,58,59,66,74 |
users.js | 15.91 | 0 | 0 | 16.41 |
---------------------------|----------|----------|----------|----------|-------------------|
With what it can be connected?
My main test file, where I import the rest of the tests.
import './products/allProductsTest';
Product test folder.
products:
- addProductsTest.js
- allProductsTest.js
- deleteProductsTest.js
- getProductTest.js
- listProductsTest.js
- updateProductsTest.js

Memsql is taking lot of time for a query to execute

I am having one problem with my memsql cluster when I run the query to fetch the 51M records it returns result in 5 minutes
but it used to take more than 15 min when data insertion is parallel to read.
I measured disk io and it is ok and the disk is hdd disk.
There are no other connections to the memsql and cpu is also 15% utilized with 64 core machine
Below are my varaiables
Variable_name | Value |
+----------------------------------------------+------------------------------------------------------------------------------+
| aggregator_failure_detection | ON |
| auto_replicate | OFF |
| autocommit | ON |
| basedir | /data/master-3306 |
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_sets_dir | /data/master-3306/share/charsets/ |
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
| columnar_segment_rows | 102400 |
| columnstore_window_size | 2147483648 |
| compile_only | OFF |
| connect_timeout | 10 |
| core_file | ON |
| core_file_mode | PARTIAL |
| critical_diagnostics | ON |
| datadir | /data/master-3306/data |
| default_partitions_per_leaf | 16 |
| enable_experimental_metrics | OFF |
| error_count | 0 |
| explain_expression_limit | 500 |
| external_user | |
| flush_before_replicate | OFF |
| general_log | OFF |
| geo_sphere_radius | 6367444.657120 |
| hostname | **** |
| identity | 0 |
| kerberos_server_keytab | |
| lc_messages | en_US |
| lc_messages_dir | /data/master-3306/share |
| leaf_failure_detection | ON |
| load_data_max_buffer_size | 1073741823 |
| load_data_read_size | 8192 |
| load_data_write_size | 8192 |
| lock_wait_timeout | 60 |
| master_aggregator | self |
| max_allowed_packet | 104857600 |
| max_connection_threads | 192 |
| max_connections | 100000 |
| max_pooled_connections | 4096 |
| max_prefetch_threads | 1 |
| max_prepared_stmt_count | 16382 |
| max_user_connections | 0 |
| maximum_memory | 506602 |
| maximum_table_memory | 455941 |
| memsql_id | ** |
| memsql_version | 5.7.2 |
| memsql_version_date | Thu Jan 26 12:34:22 2017 -0800 |
| memsql_version_hash | 03e5e3581e96d65caa30756f191323437a3840f0 |
| minimal_disk_space | 100 |
| multi_insert_tuple_count | 20000 |
| net_buffer_length | 102400 |
| net_read_timeout | 3600 |
| net_write_timeout | 3600 |
| pid_file | /data/master-3306/data/memsqld.pid |
| pipelines_batches_metadata_to_keep | 1000 |
| pipelines_extractor_debug_logging | OFF |
| pipelines_kafka_version | 0.8.2.2 |
| pipelines_max_errors_per_partition | 1000 |
| pipelines_max_offsets_per_batch_partition | 1000000 |
| pipelines_max_retries_per_batch_partition | 4 |
| pipelines_stderr_bufsize | 65535 |
| pipelines_stop_on_error | ON |
| plan_expiration_minutes | 720 |
| port | 3306 |
| protocol_version | 10 |
| proxy_user | |
| query_parallelism | 0 |
| redundancy_level | 1 |
| reported_hostname | |
| secure_file_priv | |
| show_query_parameters | ON |
| skip_name_resolve | AUTO |
| snapshot_trigger_size | 268435456 |
| snapshots_to_keep | 2 |
| socket | /data/master-3306/data/memsql.sock |
| sql_quote_show_create | ON |
| ssl_ca | |
| ssl_capath | |
| ssl_cert | |
| ssl_cipher | |
| ssl_key | |
| sync_slave_timeout | 20000 |
| system_time_zone | UTC |
| thread_cache_size | 0 |
| thread_handling | one-thread-per-connection |
| thread_stack | 1048576 |
| time_zone | SYSTEM |
| timestamp | 1504799067.127069 |
| tls_version | TLSv1,TLSv1.1,TLSv1.2 |
| tmpdir | . |
| transaction_buffer | 67108864 |
| tx_isolation | READ-COMMITTED |
| use_join_bucket_bit_vector | ON |
| use_vectorized_join | ON |
| version | 5.5.8 |
| version_comment | MemSQL source distribution (compatible; MySQL Enterprise & MySQL Commercial) |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
| warn_level | WARNINGS |
| warning_count | 0 |
| workload_management | ON |
| workload_management_expected_aggregators | 1 |
| workload_management_max_connections_per_leaf | 1024 |
| workload_management_max_queue_depth | 100 |
| workload_management_max_threads_per_leaf | 8192 |
| workload_management_queue_time_warning_ratio | 0.500000 |
| workload_management_queue_timeout | 3600
Some thoughts:
Workload profiling (https://docs.memsql.com/concepts/v5.8/workload-profiling-overview/) can help you understand what resources are limiting the speed of this query - if it's not cpu and not disk io, maybe it's network io or something
Query profiling (https://docs.memsql.com/sql-reference/v5.8/profile/) will help indicate which operators within the query are expensive
You mentioned your machine has 64 cores - is your cluster properly numa-optimized? Run memsql-ops memsql-optimize to check.
Jack, I did the workload profiling and find out that lock_time_ms is quite high when NETWORK_LOGICAL_SEND_B is high
LAST_FINISHED_TIMESTAMP, LOCK_TIME_MS, LOCK_ROW_TIME_MS, ACTIVITY_NAME, NETWORK_LOGICAL_RECV_B, NETWORK_LOGICAL_SEND_B, activity_name, database_name, partition_id, left(q.query_text, 50)
'2017-09-11 10:41:31', '39988', '0', 'InsertSelect_AggregatedHourly_temp_7aug_KING__et_al_c44dc7ab56d56280', '0', '538154753', 'InsertSelect_AggregatedHourly_temp_7aug_KING__et_al_c44dc7ab56d56280', 'datawarehouse', '7', 'SELECT \n combined.day AS day,\n `lineitem'

Blending Model: Oil Production

Oil Blending
An oil company produces three brands of oil: Regular, Multigrade, and
Supreme. Each brand of oil is composed of one or more of four crude stocks, each having a different lubrication index. The relevant data concerning the crude stocks are as follows.
+-------------+-------------------+------------------+--------------------------+
| Crude Stock | Lubrication Index | Cost (€/barrell) | Supply per day (barrels) |
+-------------+-------------------+------------------+--------------------------+
| 1 | 20 | 7,10 | 1000 |
+-------------+-------------------+------------------+--------------------------+
| 2 | 40 | 8,50 | 1100 |
+-------------+-------------------+------------------+--------------------------+
| 3 | 30 | 7,70 | 1200 |
+-------------+-------------------+------------------+--------------------------+
| 4 | 55 | 9,00 | 1100 |
+-------------+-------------------+------------------+--------------------------+
Each brand of oil must meet a minimum standard for a lubrication index, and each brand
thus sells at a different price. The relevant data concerning the three brands of oil are as
follows.
+------------+---------------------------+---------------+--------------+
| Brand | Minimum Lubrication index | Selling price | Daily demand |
+------------+---------------------------+---------------+--------------+
| Regular | 25 | 8,50 | 2000 |
+------------+---------------------------+---------------+--------------+
| Multigrade | 35 | 9,00 | 1500 |
+------------+---------------------------+---------------+--------------+
| Supreme | 50 | 10,00 | 750 |
+------------+---------------------------+---------------+--------------+
Determine an optimal output plan for a single day, assuming that production can be either
sold or else stored at negligible cost.
The daily demand figures are subject to alternative interpretations. Investigate the
following:
(a) The daily demands represent potential sales. In other words, the model should contain demand ceilings (upper limits). What is the optimal profit?
(b) The daily demands are strict obligations. In other words, the model should contain demand constraints that are met precisely. What is the optimal profit?
(c) The daily demands represent minimum sales commitments, but all output can be sold. In other words, the model should permit production to exceed the daily commitments. What is the optimal profit?
QUESTION
I've been able to construct the following model in Excel and solve it via OpenSolver, but I'm only able to integrate the mix for the Regular Oil.
I'm trying to work my way through the book Optimization Modeling with Spreadsheets by Kenneth R. Baker but I'm stuck with this exercise. While I could transfer the logic from another blending problem I'm not sure how to construct the model for multiple blendings at once.
I modeled the problem as a minimization problem on the cost of the different crude stocks. Using the Lubrication Index data I built the constraint for the R-Lub Index as a linear constraint. So far the answer seems to be right for the Regular Oil. However using this approach I've no idea how to include even the second Multigrade Oil.
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Decision Variables | | | | | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| | C1 | C2 | C3 | C4 | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Inputs | 1000 | 0 | 1000 | 0 | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| | | | | | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Objective Function | | | | | | Total | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Cost | 7,10 € | 8,50 € | 7,70 € | 9,00 € | | 14.800,00 € | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| | | | | | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Constraints | | | | | | LHS | | RHS |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| C1 supply | 1 | | | | | 1000 | <= | 1000 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| C2 supply | | 1 | | | | 0 | <= | 1100 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| C3 supply | | | 1 | | | 1000 | <= | 1200 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| C4 supply | | | | 1 | | 0 | <= | 1100 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| R- Lub Index | -5 | 15 | 5 | 30 | | 0 | >= | 0 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| R- Output | 1 | 1 | 1 | 1 | | 2000 | = | 2000 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| | | | | | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Blending Data | | | | | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| R- Lub | 20 | 40 | 30 | 55 | | 25 | >= | 25 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
Here is the model with Excel formulars:
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Decision Variables | | | | | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| | C1 | C2 | C3 | C4 | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Inputs | 1000 | 0 | 1000 | 0 | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| | | | | | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Objective Function | | | | | | Total | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Cost | 7,1 | 8,5 | 7,7 | 9 | | =SUMMENPRODUKT(B5:E5;B8:E8) | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| | | | | | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Constraints | | | | | | LHS | | RHS |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| C1 supply | 1 | | | | | =SUMMENPRODUKT($B$5:$E$5;B11:E11) | <= | 1000 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| C2 supply | | 1 | | | | =SUMMENPRODUKT($B$5:$E$5;B12:E12) | <= | 1100 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| C3 supply | | | 1 | | | =SUMMENPRODUKT($B$5:$E$5;B13:E13) | <= | 1200 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| C4 supply | | | | 1 | | =SUMMENPRODUKT($B$5:$E$5;B14:E14) | <= | 1100 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| R- Lub Index | -5 | 15 | 5 | 30 | | =SUMMENPRODUKT($B$5:$E$5;B15:E15) | >= | 0 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| R- Output | 1 | 1 | 1 | 1 | | =SUMMENPRODUKT($B$5:$E$5;B16:E16) | = | 2000 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| | | | | | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Blending Data | | | | | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| R- Lub | 20 | 40 | 30 | 55 | | =SUMMENPRODUKT($B$5:$E$5;B19:E19)/SUMME($B$5:$E$5) | >= | 25 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
A nudge in the right direction would be a tremendous help.
I think you want your objective to be Profit, which I would define as the sum of sales value - sum of cost.
To include all blends, develop calculations for Volume produced, Lube Index, Cost, and Value for each blend. Apply constraints for volume of stock used, volume produced, and lube index, and optimize for Profit.
I put together the model as follows ...
Columns A through D is the information you provided.
The 10's in G2:J5 are seed values for the stock volumes used in each blend. Solver will manipulate these.
Column K contains the total product volume produced. These will be constrained in different ways, as per your investigation (a), (b), and (c). It is =SUM(G3:J3) filled down.
Column L is the Lube Index for the product. As you noted, it is a linear blend - this is typically not true for blending problems. These values will be constrained in Solver. It is {=SUMPRODUCT(G3:J3,TRANSPOSE($B$2:$B$5))/$K3} filled down. Note that it is a Control-Shift-Enter (CSE) formula, required because of the TRANSPOSE.
Column M is the cost of the stock used to create the product. This is used in the Profit calculation. It is {=SUMPRODUCT(G3:J3,TRANSPOSE($C$2:$C$5))}, filled down. This is also a CSE formula.
Column N is the value of the product produced. This is used in the Profit calculation. It is =K3*C8 filled down.
Row 7 is the total stock volume used to generate all blends. These values will be constrained in Solver. It is =SUM(G3:G5), filled to the right.
The profit calculation is =SUM(N3:N5)-SUM(M3:M5).
Below is a snap of the Solver dialog box ...
It does the following ...
The objective is to maximize profit.
It will do this by manipulating the amount of stock that goes into each blend.
The first four constraints ($G$7 through $J$7) ensure the amount of stock available is not violated.
The next three constraints ($K$3 through $K$5) are for case (a) - make no more than product than there is demand.
The last three constraints ($L$3 through $L$5) make sure the lube index meets the minimum specification.
Not shown - I selected options for GRG Nonlinear and selected "Use Multistart" and deselected "Require Bounds on Variables".
Below is the result for case (a) ...
For case (b), change the constraints on Column K to be "=" instead of "<=". Below is the result ...
For case (c), change the constraints on Column K to be ">=". Below is the result ...
I think I came up with a solution, but I'm unsure if this is correct.
| Decision Variables | | | | | | | | | | | | | | | | |
|--------------------|---------|--------|--------|--------|-------------|--------|--------|--------|--------|--------|--------|--------|---|--------------------------------|----|------|
| | C1R | C1M | C1S | C2R | C2M | C2S | C3R | C3M | C3S | C4R | C4M | C4S | | | | |
| Inputs | 1000 | 0 | 0 | 800 | 0 | 300 | 0 | 1200 | 0 | 200 | 300 | 600 | | | | |
| | | | | | | | | | | | | | | | | |
| Objective Function | | | | | | | | | | | | | | Total Profit (Selling - Cost) | | |
| Cost | 7,10 € | 7,10 € | 7,10 € | 8,50 € | 8,50 € | 8,50 € | 7,70 € | 7,70 € | 7,70 € | 9,00 € | 9,00 € | 9,00 € | | 3.910,00 € | | |
| | | | | | | | | | | | | | | | | |
| Constraints | | | | | | | | | | | | | | LHS | | RHS |
| Regular | -5 | | | 15 | | | 5 | | | 30 | | | | 13000 | >= | 0 |
| Multi | | -15 | | | 5 | | | -5 | | | 20 | | | 0 | >= | 0 |
| Supreme | | | -30 | | | -10 | | | -20 | | | 5 | | 0 | >= | 0 |
| C1 Supply | 1 | 1 | 1 | | | | | | | | | | | 1000 | <= | 1000 |
| C2 Supply | | | | 1 | 1 | 1 | | | | | | | | 1100 | <= | 1100 |
| C3 Supply | | | | | | | 1 | 1 | 1 | | | | | 1200 | <= | 1200 |
| C4 Supply | | | | | | | | | | 1 | 1 | 1 | | 1100 | <= | 1100 |
| Regular Demand | 1 | | | 1 | | | 1 | | | 1 | | | | 2000 | >= | 2000 |
| Multi Demand | | 1 | | | 1 | | | 1 | | | 1 | | | 1500 | >= | 1500 |
| Supreme Demand | | | 1 | | | 1 | | | 1 | | | 1 | | 900 | >= | 750 |
| | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | |
| Selling | | | | | | | | | | | | | | | | |
| Regular | 8,50 € | x | 2000 | = | 17.000,00 € | | | | | | | | | | | |
| Multi | 9,00 € | x | 1500 | = | 13.500,00 € | | | | | | | | | | | |
| Supreme | 10,00 € | x | 900 | = | 9.000,00 € | | | | | | | | | | | |
| | | | | | 39.500,00 € | | | | | | | | | | | |

Severe mysqldump performance degradation using Centos Linux, 8GB PAE and MySQL 5.0.77

We use MySQL 5.0.77 on CentOS 5.5 on VMWare:
Linux dev.ic.soschildrensvillages.org.uk 2.6.18-194.11.4.el5PAE #1 SMP Tue Sep 21 05:48:23 EDT 2010 i686 i686 i386 GNU/Linux
We have recently upgraded from 4GB RAM to 8GB. When we did this the time of our mysqldump overnight backup jumped from under 10 minutes to over 2 hours. It also caused unresponsiveness on our plone based web site due to database load. The dump is using the optimized mysqldump format and is spooled directly through a socket to another server.
Any ideas on what we could do to fix gratefully appreciated. Would a MySQL upgrade help? Anything we can do to MySQL config? Anything we can do to Linux config? Or do we have to add another server or go to 64-bit?
We ran a previous (non-virtual) server on 6GB PAE and didn't notice a similar issue. This was on same MySQL version, but Centos 4.4.
Server config file:
[mysqld]
port=3307
socket=/tmp/mysql_live.sock
wait_timeout=31536000
interactive_timeout=31536000
datadir=/var/mysql/live/data
user=mysql
max_connections = 200
max_allowed_packet = 64M
table_cache = 2048
binlog_cache_size = 128K
max_heap_table_size = 32M
sort_buffer_size = 2M
join_buffer_size = 2M
lower_case_table_names = 1
innodb_data_file_path = ibdata1:10M:autoextend
innodb_buffer_pool_size=1G
innodb_log_file_size=300M
innodb_log_buffer_size=8M
innodb_flush_log_at_trx_commit=1
innodb_file_per_table
[mysqldump]
# Do not buffer the whole result set in memory before writing it to
# file. Required for dumping very large tables
quick
max_allowed_packet = 64M
[mysqld_safe]
# Increase the amount of open files allowed per process. Warning: Make
# sure you have set the global system limit high enough! The high value
# is required for a large number of opened tables
open-files-limit = 8192
Server variables:
mysql> show variables;
+---------------------------------+------------------------------------------------------------------+
| Variable_name | Value |
+---------------------------------+------------------------------------------------------------------+
| auto_increment_increment | 1 |
| auto_increment_offset | 1 |
| automatic_sp_privileges | ON |
| back_log | 50 |
| basedir | /usr/local/mysql-5.0.77-linux-i686-glibc23/ |
| binlog_cache_size | 131072 |
| bulk_insert_buffer_size | 8388608 |
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.0.77-linux-i686-glibc23/share/mysql/charsets/ |
| collation_connection | latin1_swedish_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
| completion_type | 0 |
| concurrent_insert | 1 |
| connect_timeout | 10 |
| datadir | /var/mysql/live/data/ |
| date_format | %Y-%m-%d |
| datetime_format | %Y-%m-%d %H:%i:%s |
| default_week_format | 0 |
| delay_key_write | ON |
| delayed_insert_limit | 100 |
| delayed_insert_timeout | 300 |
| delayed_queue_size | 1000 |
| div_precision_increment | 4 |
| keep_files_on_create | OFF |
| engine_condition_pushdown | OFF |
| expire_logs_days | 0 |
| flush | OFF |
| flush_time | 0 |
| ft_boolean_syntax | + -><()~*:""&| |
| ft_max_word_len | 84 |
| ft_min_word_len | 4 |
| ft_query_expansion_limit | 20 |
| ft_stopword_file | (built-in) |
| group_concat_max_len | 1024 |
| have_archive | YES |
| have_bdb | NO |
| have_blackhole_engine | YES |
| have_compress | YES |
| have_crypt | YES |
| have_csv | YES |
| have_dynamic_loading | YES |
| have_example_engine | NO |
| have_federated_engine | YES |
| have_geometry | YES |
| have_innodb | YES |
| have_isam | NO |
| have_merge_engine | YES |
| have_ndbcluster | DISABLED |
| have_openssl | DISABLED |
| have_ssl | DISABLED |
| have_query_cache | YES |
| have_raid | NO |
| have_rtree_keys | YES |
| have_symlink | YES |
| hostname | app.ic.soschildrensvillages.org.uk |
| init_connect | |
| init_file | |
| init_slave | |
| innodb_additional_mem_pool_size | 1048576 |
| innodb_autoextend_increment | 8 |
| innodb_buffer_pool_awe_mem_mb | 0 |
| innodb_buffer_pool_size | 1073741824 |
| innodb_checksums | ON |
| innodb_commit_concurrency | 0 |
| innodb_concurrency_tickets | 500 |
| innodb_data_file_path | ibdata1:10M:autoextend |
| innodb_data_home_dir | |
| innodb_adaptive_hash_index | ON |
| innodb_doublewrite | ON |
| innodb_fast_shutdown | 1 |
| innodb_file_io_threads | 4 |
| innodb_file_per_table | ON |
| innodb_flush_log_at_trx_commit | 1 |
| innodb_flush_method | |
| innodb_force_recovery | 0 |
| innodb_lock_wait_timeout | 50 |
| innodb_locks_unsafe_for_binlog | OFF |
| innodb_log_arch_dir | |
| innodb_log_archive | OFF |
| innodb_log_buffer_size | 8388608 |
| innodb_log_file_size | 314572800 |
| innodb_log_files_in_group | 2 |
| innodb_log_group_home_dir | ./ |
| innodb_max_dirty_pages_pct | 90 |
| innodb_max_purge_lag | 0 |
| innodb_mirrored_log_groups | 1 |
| innodb_open_files | 300 |
| innodb_rollback_on_timeout | OFF |
| innodb_support_xa | ON |
| innodb_sync_spin_loops | 20 |
| innodb_table_locks | ON |
| innodb_thread_concurrency | 8 |
| innodb_thread_sleep_delay | 10000 |
| interactive_timeout | 31536000 |
| join_buffer_size | 2097152 |
| key_buffer_size | 8384512 |
| key_cache_age_threshold | 300 |
| key_cache_block_size | 1024 |
| key_cache_division_limit | 100 |
| language | /usr/local/mysql-5.0.77-linux-i686-glibc23/share/mysql/english/ |
| large_files_support | ON |
| large_page_size | 0 |
| large_pages | OFF |
| lc_time_names | en_US |
| license | GPL |
| local_infile | ON |
| locked_in_memory | OFF |
| log | OFF |
| log_bin | OFF |
| log_bin_trust_function_creators | OFF |
| log_error | |
| log_queries_not_using_indexes | OFF |
| log_slave_updates | OFF |
| log_slow_queries | OFF |
| log_warnings | 1 |
| long_query_time | 10 |
| low_priority_updates | OFF |
| lower_case_file_system | OFF |
| lower_case_table_names | 1 |
| max_allowed_packet | 67108864 |
| max_binlog_cache_size | 4294963200 |
| max_binlog_size | 1073741824 |
| max_connect_errors | 10 |
| max_connections | 200 |
| max_delayed_threads | 20 |
| max_error_count | 64 |
| max_heap_table_size | 33554432 |
| max_insert_delayed_threads | 20 |
| max_join_size | 18446744073709551615 |
| max_length_for_sort_data | 1024 |
| max_prepared_stmt_count | 16382 |
| max_relay_log_size | 0 |
| max_seeks_for_key | 4294967295 |
| max_sort_length | 1024 |
| max_sp_recursion_depth | 0 |
| max_tmp_tables | 32 |
| max_user_connections | 0 |
| max_write_lock_count | 4294967295 |
| multi_range_count | 256 |
| myisam_data_pointer_size | 6 |
| myisam_max_sort_file_size | 2146435072 |
| myisam_recover_options | OFF |
| myisam_repair_threads | 1 |
| myisam_sort_buffer_size | 8388608 |
| myisam_stats_method | nulls_unequal |
| ndb_autoincrement_prefetch_sz | 1 |
| ndb_force_send | ON |
| ndb_use_exact_count | ON |
| ndb_use_transactions | ON |
| ndb_cache_check_time | 0 |
| ndb_connectstring | |
| net_buffer_length | 16384 |
| net_read_timeout | 30 |
| net_retry_count | 10 |
| net_write_timeout | 60 |
| new | OFF |
| old_passwords | OFF |
| open_files_limit | 8192 |
| optimizer_prune_level | 1 |
| optimizer_search_depth | 62 |
| pid_file | /var/mysql/live/mysqld.pid |
| plugin_dir | |
| port | 3307 |
| preload_buffer_size | 32768 |
| profiling | OFF |
| profiling_history_size | 15 |
| protocol_version | 10 |
| query_alloc_block_size | 8192 |
| query_cache_limit | 1048576 |
| query_cache_min_res_unit | 4096 |
| query_cache_size | 0 |
| query_cache_type | ON |
| query_cache_wlock_invalidate | OFF |
| query_prealloc_size | 8192 |
| range_alloc_block_size | 4096 |
| read_buffer_size | 131072 |
| read_only | OFF |
| read_rnd_buffer_size | 262144 |
| relay_log | |
| relay_log_index | |
| relay_log_info_file | relay-log.info |
| relay_log_purge | ON |
| relay_log_space_limit | 0 |
| rpl_recovery_rank | 0 |
| secure_auth | OFF |
| secure_file_priv | |
| server_id | 0 |
| skip_external_locking | ON |
| skip_networking | OFF |
| skip_show_database | OFF |
| slave_compressed_protocol | OFF |
| slave_load_tmpdir | /tmp/ |
| slave_net_timeout | 3600 |
| slave_skip_errors | OFF |
| slave_transaction_retries | 10 |
| slow_launch_time | 2 |
| socket | /tmp/mysql_live.sock |
| sort_buffer_size | 2097152 |
| sql_big_selects | ON |
| sql_mode | |
| sql_notes | ON |
| sql_warnings | OFF |
| ssl_ca | |
| ssl_capath | |
| ssl_cert | |
| ssl_cipher | |
| ssl_key | |
| storage_engine | MyISAM |
| sync_binlog | 0 |
| sync_frm | ON |
| system_time_zone | GMT |
| table_cache | 2048 |
| table_lock_wait_timeout | 50 |
| table_type | MyISAM |
| thread_cache_size | 0 |
| thread_stack | 196608 |
| time_format | %H:%i:%s |
| time_zone | SYSTEM |
| timed_mutexes | OFF |
| tmp_table_size | 33554432 |
| tmpdir | /tmp/ |
| transaction_alloc_block_size | 8192 |
| transaction_prealloc_size | 4096 |
| tx_isolation | REPEATABLE-READ |
| updatable_views_with_limit | YES |
| version | 5.0.77 |
| version_comment | MySQL Community Server (GPL) |
| version_compile_machine | i686 |
| version_compile_os | pc-linux-gnu |
| wait_timeout | 31536000 |
+---------------------------------+------------------------------------------------------------------+
237 rows in set (0.00 sec)
If you're sending data over sockets, you might as well use MySQL Replication with the binlog. Use binlog_format=ROWS so that your slave(s) doesn't waste time with triggers and so on.
Failing that, you could try a "hotcopy" script such as http://dev.mysql.com/doc/refman/5.0/en/mysqlhotcopy.html
Above all, I think you are running outdated software. Upgrade to CentOS 5.5 64bit, and Percona Server (100% MySQL-compatible database on steroids) http://www.percona.com/software/percona-server/

Resources