Illegal state: Transaction for catalog table write operation 'pg_database' not found in YugabyteDB YSQL - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
Before dropping a database, first I want to prevent new connections and then drop the existing connections. However, I'm stuck at the first step:
ysqlsh (11.2-YB-2.1.1.0-b0)
yugabyte=# SELECT * FROM pg_database;
datname | datdba | encoding | datcollate | datctype | datistemplate | datallowconn | datconnlimit | datlastsysoid | datfrozenxid | datminmxid | dattablespace | datacl
----------------------------+--------+----------+------------+-------------+---------------+--------------+--------------+---------------+--------------+------------+---------------+-------------------------------------
template1 | 10 | 6 | C | en_US.UTF-8 | t | t | -1 | 0 | 0 | 1 | 1663 | {=c/postgres,postgres=CTc/postgres}
template0 | 10 | 6 | C | en_US.UTF-8 | t | f | -1 | 0 | 0 | 1 | 1663 | {=c/postgres,postgres=CTc/postgres}
postgres | 10 | 6 | C | en_US.UTF-8 | f | t | -1 | 0 | 0 | 1 | 1663 |
yugabyte | 10 | 6 | C | en_US.UTF-8 | f | t | -1 | 0 | 0 | 1 | 1663 |
system_platform | 10 | 6 | C | en_US.UTF-8 | f | t | -1 | 0 | 0 | 1 | 1663 |
test_1650530283_52506 | 12462 | 6 | C | en_US.UTF-8 | f | t | -1 | 0 | 0 | 1 | 1663 |
(6 rows)
yugabyte=# UPDATE pg_database SET datallowconn=false WHERE datname = 'test_1650530283_52506';
ERROR: Illegal state: Transaction for catalog table write operation 'pg_database' not found

The described behavior is not a bug. Everything works as expected. It is not recommended to change system tables manually. If it absolutely necessary for some reason user should set the yb_non_ddl_txn_for_sys_tables_allowed GUC variable to true. But user will do this on its own risk.
BTW to disallow connection to DB it is better to use this query instead of changing system table manually:
ALTER DATABASE db WITH ALLOW_CONNECTIONS false;

Related

How to create a calculated column in access 2013 to detect duplicates

I'm recreating a tool I made in Excel as it's getting bigger and performance is getting out of hand.
The issue is that I only have MS Access 2013 on my work laptop and I'm fairly new to the Expression Builder in Access 2013, which has a very limited function base to be honest.
My data has duplicates on the [Location] column, meaning that, I have multiple SKUs on that warehouse location. However, some of my calculations need to be done only once per [Location]. My solution to that, in Excel, was to make a formula (see below) putting 1 only on the first appearance of that location, putting 0 on next appearances. Doing that works like a charm because summing over that [Duplicate] column while imposing multiple criteria returns the number of occurrences of the multiple criteria counting locations only once.
Now, MS Access 2013 Expression Builder has no SUM nor COUNT functions to create a calculated column emulating my [Duplicate] column from Excel. Preferably, I would just input the raw data and let Access populate the calculated fields vs also inputting the calculated fields as well, since that would defeat my original purpose of reducing the computational cost of creating my dashboard.
The question is, how would you create a calculated column, in MS Access 2013 Expression Builder to recreate the below Excel function:
= IF($D$2:$D3=$D4,0,1)
In the sake of reducing the file size (over 100K rows) I even replace the 0 by a blank character "".
Thanks in advance for your help
Y
First and foremost, understand MS Access' Expression Builder is a convenience tool to build an SQL expression. Everything in Query Design ultimately is to build an SQL query. For this reason, you have to use a set-based mentality to see data in whole sets of related tables and not cell-by-cell mindset.
Specifically, to achieve:
putting 1 only on the first appearance of that location, putting 0 on next appearances
Consider a whole set-based approach by joining on a separate, aggregate query to identify the first value of your needed grouping, then calculate needed IIF expression. Below assumes you have an autonumber or primary key field in table (a standard in relational databases):
Aggregate Query (save as a separate query, adjust columns as needed)
SELECT ColumnD, MIN(AutoNumberID) As MinID
FROM myTable
GROUP BY ColumnD
Final Query (join to original table and build final IIF expression)
SELECT m.*, IIF(agg.MinID = AutoNumberID, 1, 0) As Dup_Indicator
FROM myTable m
INNER JOIN myAggregateQuery agg
ON m.[ColumnD] = agg.ColumnD
To demonstrate with random data:
Original
| ID | GROUP | INT | NUM | CHAR | BOOL | DATE |
|----|--------|-----|--------------|------|-------|------------|
| 1 | r | 9 | 1.424490258 | B6z | TRUE | 7/4/1994 |
| 2 | stata | 10 | 2.591235683 | h7J | FALSE | 10/5/1971 |
| 3 | spss | 6 | 0.560461966 | Hrn | TRUE | 11/27/1990 |
| 4 | stata | 10 | -1.499272175 | eXL | FALSE | 4/17/2010 |
| 5 | stata | 15 | 1.470269177 | Vas | TRUE | 6/13/2010 |
| 6 | r | 14 | -0.072238898 | puP | TRUE | 4/1/1994 |
| 7 | julia | 2 | -1.370405263 | S2l | FALSE | 12/11/1999 |
| 8 | spss | 6 | -0.153684675 | mAw | FALSE | 7/28/1977 |
| 9 | spss | 10 | -0.861482674 | cxC | FALSE | 7/17/1994 |
| 10 | spss | 2 | -0.817222582 | GRn | FALSE | 10/19/2012 |
| 11 | stata | 2 | 0.949287754 | xgc | TRUE | 1/18/2003 |
| 12 | stata | 5 | -1.580841322 | Y1D | TRUE | 6/3/2011 |
| 13 | r | 14 | -1.671303816 | JCP | FALSE | 5/15/1981 |
| 14 | r | 7 | 0.904181025 | Rct | TRUE | 7/24/1977 |
| 15 | stata | 10 | -1.198211174 | qJY | FALSE | 5/6/1982 |
| 16 | julia | 10 | -0.265808162 | 10s | FALSE | 3/18/1975 |
| 17 | r | 13 | -0.264955027 | 8Md | TRUE | 6/11/1974 |
| 18 | r | 4 | 0.518302149 | 4KW | FALSE | 9/12/1980 |
| 19 | r | 5 | -0.053620183 | 8An | FALSE | 4/17/2004 |
| 20 | r | 14 | -0.359197116 | F8Q | TRUE | 6/14/2005 |
| 21 | spss | 11 | -2.211875193 | AgS | TRUE | 4/11/1973 |
| 22 | stata | 4 | -1.718749471 | Zqr | FALSE | 2/20/1999 |
| 23 | python | 10 | 1.207878576 | tcC | FALSE | 4/18/2008 |
| 24 | stata | 11 | 0.548902226 | PFJ | TRUE | 9/20/1994 |
| 25 | stata | 6 | 1.479125922 | 7a7 | FALSE | 3/2/1989 |
| 26 | python | 10 | -0.437245299 | r32 | TRUE | 6/7/1997 |
| 27 | sas | 14 | 0.404746106 | 6NJ | TRUE | 9/23/2013 |
| 28 | stata | 8 | 2.206741458 | Ive | TRUE | 5/26/2008 |
| 29 | spss | 12 | -0.470694096 | dPS | TRUE | 5/4/1983 |
| 30 | sas | 15 | -0.57169507 | yle | TRUE | 6/20/1979 |
SQL (uses aggregate in subquery but can be a stored query)
SELECT r.*, IIF(sub.MinID = r.ID,1, 0) AS Dup
FROM Random_Data r
LEFT JOIN
(
SELECT r.GROUP, MIN(r.ID) As MinID
FROM Random_Data r
GROUP BY r.Group
) sub
ON r.[Group] = sub.[GROUP]
Output (notice the first GROUP value is tagged 1, all else 0)
| ID | GROUP | INT | NUM | CHAR | BOOL | DATE | Dup |
|----|--------|-----|--------------|------|-------|------------|-----|
| 1 | r | 9 | 1.424490258 | B6z | TRUE | 7/4/1994 | 1 |
| 2 | stata | 10 | 2.591235683 | h7J | FALSE | 10/5/1971 | 1 |
| 3 | spss | 6 | 0.560461966 | Hrn | TRUE | 11/27/1990 | 1 |
| 4 | stata | 10 | -1.499272175 | eXL | FALSE | 4/17/2010 | 0 |
| 5 | stata | 15 | 1.470269177 | Vas | TRUE | 6/13/2010 | 0 |
| 6 | r | 14 | -0.072238898 | puP | TRUE | 4/1/1994 | 0 |
| 7 | julia | 2 | -1.370405263 | S2l | FALSE | 12/11/1999 | 1 |
| 8 | spss | 6 | -0.153684675 | mAw | FALSE | 7/28/1977 | 0 |
| 9 | spss | 10 | -0.861482674 | cxC | FALSE | 7/17/1994 | 0 |
| 10 | spss | 2 | -0.817222582 | GRn | FALSE | 10/19/2012 | 0 |
| 11 | stata | 2 | 0.949287754 | xgc | TRUE | 1/18/2003 | 0 |
| 12 | stata | 5 | -1.580841322 | Y1D | TRUE | 6/3/2011 | 0 |
| 13 | r | 14 | -1.671303816 | JCP | FALSE | 5/15/1981 | 0 |
| 14 | r | 7 | 0.904181025 | Rct | TRUE | 7/24/1977 | 0 |
| 15 | stata | 10 | -1.198211174 | qJY | FALSE | 5/6/1982 | 0 |
| 16 | julia | 10 | -0.265808162 | 10s | FALSE | 3/18/1975 | 0 |
| 17 | r | 13 | -0.264955027 | 8Md | TRUE | 6/11/1974 | 0 |
| 18 | r | 4 | 0.518302149 | 4KW | FALSE | 9/12/1980 | 0 |
| 19 | r | 5 | -0.053620183 | 8An | FALSE | 4/17/2004 | 0 |
| 20 | r | 14 | -0.359197116 | F8Q | TRUE | 6/14/2005 | 0 |
| 21 | spss | 11 | -2.211875193 | AgS | TRUE | 4/11/1973 | 0 |
| 22 | stata | 4 | -1.718749471 | Zqr | FALSE | 2/20/1999 | 0 |
| 23 | python | 10 | 1.207878576 | tcC | FALSE | 4/18/2008 | 1 |
| 24 | stata | 11 | 0.548902226 | PFJ | TRUE | 9/20/1994 | 0 |
| 25 | stata | 6 | 1.479125922 | 7a7 | FALSE | 3/2/1989 | 0 |
| 26 | python | 10 | -0.437245299 | r32 | TRUE | 6/7/1997 | 0 |
| 27 | sas | 14 | 0.404746106 | 6NJ | TRUE | 9/23/2013 | 1 |
| 28 | stata | 8 | 2.206741458 | Ive | TRUE | 5/26/2008 | 0 |
| 29 | spss | 12 | -0.470694096 | dPS | TRUE | 5/4/1983 | 0 |
| 30 | sas | 15 | -0.57169507 | yle | TRUE | 6/20/1979 | 0 |

SIGN() formula returns unexpected results

In continuation of my previous question: Sumproduct with multiple criteria on one range
Jeeped provided me with an very helpful formula to achieve a sumproduct() which takes multiple criteria. My current case is however a bit broader:
Take these example tables:
First column is the ID number, second column a respondent group(A,B). Column headers are question types (X,Y,Z).
Table Q1
| | | X | Y | Y | Z | Y |
|----|---|---|---|---|---|---|
| 1 | A | 2 | 2 | 1 | | 1 |
| 2 | A | 1 | 1 | | | 2 |
| 3 | A | 1 | 1 | | | 1 |
| 4 | A | 2 | 1 | | | 1 |
| 5 | A | 1 | 2 | 1 | | 1 |
| 6 | A | 1 | 1 | | | 1 |
| 7 | A | | | | | |
| 8 | A | | | | | |
| 9 | A | 1 | 1 | | | 1 |
| 10 | A | 2 | 2 | 2 | | 2 |
| 11 | A | | | | | |
| 12 | A | 1 | 2 | 1 | | 2 |
| 13 | B | | | | | |
| 14 | B | 1 | 1 | | | 1 |
| 15 | B | 2 | 2 | 1 | | 1 |
Table Q2
| | | X | Y | Y | Z | Y |
|----|---|---|---|---|---|---|
| 1 | A | 1 | 2 | 1 | | 1 |
| 2 | A | 1 | 1 | | | 1 |
| 3 | A | 1 | 1 | | | 1 |
| 4 | A | 1 | 1 | | | 1 |
| 5 | A | 1 | 1 | | | 1 |
| 6 | A | 1 | 1 | | | 1 |
| 7 | A | | | | | |
| 8 | A | | | | | |
| 9 | A | 1 | 1 | | | 1 |
| 10 | A | 1 | 1 | | | 1 |
| 11 | A | | | | | |
| 12 | A | 1 | 2 | 1 | | 1 |
| 13 | B | | | | | |
| 14 | B | 1 | 1 | | | 1 |
| 15 | B | 1 | 2 | 1 | | 1 |
Now I want to know the amount of times a respondent answered 1 (yes) on Q2 for each question type (X,Y,Z). The catch is that if someone answered 1 (yes) on Q1 it should "override" the answer on Q2, as we assume that when someone answers yes on Q1 (implementation of a measure), their answer on Q2 (knowledge of said measure) has to be yes as well.
The second catch is that for the first two occurrences of Y there can only be yes in one of both columns, so in fact there can only be two yes answers for question type Y for each respondent.
I used the following formula (on sheet 3): =SUMPRODUCT(SIGN(('Q1'!$C$2:$G$16=1)+('Q2'!$C$2:$G$16=1))*('Q2'!$B$2:$B$16=Blad3!$D5)*('Q2'!$C$1:$G$1=Blad3!E$4)) to obtain the following results.
| | X | Y | Z |
|---|---|----|---|
| A | 9 | 19 | 0 |
| B | 2 | 4 | 0 |
For X these results are correct, as there are 9 1's in table Q2.
For Y the results for B are correct, for A however they are not, as there are only 9 respondents, answering max 2 questions would result in a max of 18, we have 19 however.
It turns out there is nothing wrong with the formula, just that it isn't suited for the way this data is organised. If you look at row 5:
Q1
| | | X | Y | Y | Z | Y |
|----|---|---|---|---|---|---|
| 5 | A | 1 | 2 | 1 | | 1 |
Q2
| | | X | Y | Y | Z | Y |
|----|---|---|---|---|---|---|
| 5 | A | 1 | 1 | | | 1 |
If we condense that to everywhere there is a 1 in any of the Y column we get this table:
| | | X | Y | Y | Z | Y |
|----|---|---|---|---|---|---|
| 5 | A | | 1 | 1 | | 1 |
When I ask for the sumproduct() for this combined table the result will be 3.
To prevent this I added a helper column (between the two Y and the Z column) to my tables, with the following formula: IF(OR(D1=1,E1=1),1,""). Removed the headers from the double Y columns, and re-running the query produced the correct results.
New table Q1 looks like this then:
| | | X | | | Y | Z | Y |
|----|---|---|---|---|---|---|---|
| 1 | A | 2 | 2 | 1 | 1 | | 1 |
| 2 | A | 1 | 1 | | 1 | | 2 |
| 3 | A | 1 | 1 | | 1 | | 1 |
| 4 | A | 2 | 1 | | 1 | | 1 |
| 5 | A | 1 | 2 | 1 | 1 | | 1 |
| 6 | A | 1 | 1 | | 1 | | 1 |
| 7 | A | | | | | | |
| 8 | A | | | | | | |
| 9 | A | 1 | 1 | | 1 | | 1 |
| 10 | A | 2 | 2 | 2 | | | 2 |
| 11 | A | | | | | | |
| 12 | A | 1 | 2 | 1 | 1 | | 2 |
| 13 | B | | | | | | |
| 14 | B | 1 | 1 | | 1 | | 1 |
| 15 | B | 2 | 2 | 1 | 1 | | 1 |

Memsql is taking lot of time for a query to execute

I am having one problem with my memsql cluster when I run the query to fetch the 51M records it returns result in 5 minutes
but it used to take more than 15 min when data insertion is parallel to read.
I measured disk io and it is ok and the disk is hdd disk.
There are no other connections to the memsql and cpu is also 15% utilized with 64 core machine
Below are my varaiables
Variable_name | Value |
+----------------------------------------------+------------------------------------------------------------------------------+
| aggregator_failure_detection | ON |
| auto_replicate | OFF |
| autocommit | ON |
| basedir | /data/master-3306 |
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_sets_dir | /data/master-3306/share/charsets/ |
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
| columnar_segment_rows | 102400 |
| columnstore_window_size | 2147483648 |
| compile_only | OFF |
| connect_timeout | 10 |
| core_file | ON |
| core_file_mode | PARTIAL |
| critical_diagnostics | ON |
| datadir | /data/master-3306/data |
| default_partitions_per_leaf | 16 |
| enable_experimental_metrics | OFF |
| error_count | 0 |
| explain_expression_limit | 500 |
| external_user | |
| flush_before_replicate | OFF |
| general_log | OFF |
| geo_sphere_radius | 6367444.657120 |
| hostname | **** |
| identity | 0 |
| kerberos_server_keytab | |
| lc_messages | en_US |
| lc_messages_dir | /data/master-3306/share |
| leaf_failure_detection | ON |
| load_data_max_buffer_size | 1073741823 |
| load_data_read_size | 8192 |
| load_data_write_size | 8192 |
| lock_wait_timeout | 60 |
| master_aggregator | self |
| max_allowed_packet | 104857600 |
| max_connection_threads | 192 |
| max_connections | 100000 |
| max_pooled_connections | 4096 |
| max_prefetch_threads | 1 |
| max_prepared_stmt_count | 16382 |
| max_user_connections | 0 |
| maximum_memory | 506602 |
| maximum_table_memory | 455941 |
| memsql_id | ** |
| memsql_version | 5.7.2 |
| memsql_version_date | Thu Jan 26 12:34:22 2017 -0800 |
| memsql_version_hash | 03e5e3581e96d65caa30756f191323437a3840f0 |
| minimal_disk_space | 100 |
| multi_insert_tuple_count | 20000 |
| net_buffer_length | 102400 |
| net_read_timeout | 3600 |
| net_write_timeout | 3600 |
| pid_file | /data/master-3306/data/memsqld.pid |
| pipelines_batches_metadata_to_keep | 1000 |
| pipelines_extractor_debug_logging | OFF |
| pipelines_kafka_version | 0.8.2.2 |
| pipelines_max_errors_per_partition | 1000 |
| pipelines_max_offsets_per_batch_partition | 1000000 |
| pipelines_max_retries_per_batch_partition | 4 |
| pipelines_stderr_bufsize | 65535 |
| pipelines_stop_on_error | ON |
| plan_expiration_minutes | 720 |
| port | 3306 |
| protocol_version | 10 |
| proxy_user | |
| query_parallelism | 0 |
| redundancy_level | 1 |
| reported_hostname | |
| secure_file_priv | |
| show_query_parameters | ON |
| skip_name_resolve | AUTO |
| snapshot_trigger_size | 268435456 |
| snapshots_to_keep | 2 |
| socket | /data/master-3306/data/memsql.sock |
| sql_quote_show_create | ON |
| ssl_ca | |
| ssl_capath | |
| ssl_cert | |
| ssl_cipher | |
| ssl_key | |
| sync_slave_timeout | 20000 |
| system_time_zone | UTC |
| thread_cache_size | 0 |
| thread_handling | one-thread-per-connection |
| thread_stack | 1048576 |
| time_zone | SYSTEM |
| timestamp | 1504799067.127069 |
| tls_version | TLSv1,TLSv1.1,TLSv1.2 |
| tmpdir | . |
| transaction_buffer | 67108864 |
| tx_isolation | READ-COMMITTED |
| use_join_bucket_bit_vector | ON |
| use_vectorized_join | ON |
| version | 5.5.8 |
| version_comment | MemSQL source distribution (compatible; MySQL Enterprise & MySQL Commercial) |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
| warn_level | WARNINGS |
| warning_count | 0 |
| workload_management | ON |
| workload_management_expected_aggregators | 1 |
| workload_management_max_connections_per_leaf | 1024 |
| workload_management_max_queue_depth | 100 |
| workload_management_max_threads_per_leaf | 8192 |
| workload_management_queue_time_warning_ratio | 0.500000 |
| workload_management_queue_timeout | 3600
Some thoughts:
Workload profiling (https://docs.memsql.com/concepts/v5.8/workload-profiling-overview/) can help you understand what resources are limiting the speed of this query - if it's not cpu and not disk io, maybe it's network io or something
Query profiling (https://docs.memsql.com/sql-reference/v5.8/profile/) will help indicate which operators within the query are expensive
You mentioned your machine has 64 cores - is your cluster properly numa-optimized? Run memsql-ops memsql-optimize to check.
Jack, I did the workload profiling and find out that lock_time_ms is quite high when NETWORK_LOGICAL_SEND_B is high
LAST_FINISHED_TIMESTAMP, LOCK_TIME_MS, LOCK_ROW_TIME_MS, ACTIVITY_NAME, NETWORK_LOGICAL_RECV_B, NETWORK_LOGICAL_SEND_B, activity_name, database_name, partition_id, left(q.query_text, 50)
'2017-09-11 10:41:31', '39988', '0', 'InsertSelect_AggregatedHourly_temp_7aug_KING__et_al_c44dc7ab56d56280', '0', '538154753', 'InsertSelect_AggregatedHourly_temp_7aug_KING__et_al_c44dc7ab56d56280', 'datawarehouse', '7', 'SELECT \n combined.day AS day,\n `lineitem'

Reducing Rows by Grouping Data

I have a set of spreadsheets which define a set of business rules. These business rules are then processed by our system.
The users that create the spreadsheets do so naively and I have found that by factoring the data across rows - and thus reducing the number of rules - greatly improves performance of the system.
One of the "naively" structured spreadsheets might look like this:
+-----------+------------+------------+------------+------------+--------+
| Rule Name | Criteron 1 | Criteron 2 | Criteron 3 | Criteron 4 | Accept |
+-----------+------------+------------+------------+------------+--------+
| Rule 1 | A | B | C | | Yes |
| Rule 2 | A | C | C | | Yes |
| Rule 3 | A | D | C | | Yes |
| Rule 4 | A | E | C | | Yes |
| Rule 5 | A | F | C | | Yes |
| Rule 6 | A | B | D | | Yes |
| Rule 7 | A | C | D | | Yes |
| Rule 8 | A | D | D | | Yes |
| Rule 9 | A | E | D | | Yes |
| Rule 10 | A | F | D | | Yes |
| Rule 11 | A | B | E | | Yes |
| Rule 12 | A | C | E | | Yes |
| Rule 13 | A | D | E | | Yes |
| Rule 14 | A | E | E | | Yes |
| Rule 15 | A | F | E | | Yes |
| Rule 16 | | | | G | Yes |
| Rule 17 | | | | H | Yes |
| Rule 18 | | | | I | Yes |
| Rule 19 | | | | J | Yes |
| Rule 20 | | | | K | Yes |
| Rule 21 | | | | L | Yes |
| Rule 22 | | | | M | Yes |
| Rule 23 | | | | N | No |
| Rule 24 | | | | O | No |
| Rule 25 | | | | P | No |
| Rule 26 | | | | Q | No |
| Rule 27 | | | | R | No |
| Rule 28 | | | | S | No |
| Rule 29 | A | J | F | | No |
| Rule 30 | A | K | F | | No |
+-----------+------------+------------+------------+------------+--------+
As an example, Rule 1 would be evaluated as:
IF (Criterion 1 == A) AND (Criterion 2 == B) AND (Criterion 3 == C) THEN Accept
Using a bit of thought and assuming we can use OR conditionals in our columns, the above can be reduced to:
+-----------+------------+------------+------------+-------------+--------+
| Rule Name | Criteron 1 | Criteron 2 | Criteron 3 | Criteron 4 | Accept |
+-----------+------------+------------+------------+-------------+--------+
| Rule 1 | A | B,C,D,E,F | C,D,E | | Yes |
| Rule 2 | | | |G,H,I,J,K,L,M| Yes |
| Rule 3 | | | |N,O,P,Q,R,S | No |
| Rule 4 | A | J,K | F | | No |
+-----------+------------+------------+------------+-------------+--------+
Rule 1 is now evaluated as follows:
IF (Criterion 1 == A) AND
(Criterion 2 == B OR Criterion 2 == C OR...) AND
(Criterion 3 == C OR Criterion 3 == D OR...) THEN Accept
Now, I've done this manually. What I want to know is: does Excel have in-built functionality to do this kind of grouping for me. If not, can anyone point me in the direction of an algorithm which will help me implement this efficiently?
this looks like a situation where you could query the table using ADO and OLE DB into an ADO Recordset using GROUP BY HAVING in the SQL Query, then dump the (grouped) results into your new sheet using CopyFromRecordset
Alternatively, perhaps a Pivot Table?

Severe mysqldump performance degradation using Centos Linux, 8GB PAE and MySQL 5.0.77

We use MySQL 5.0.77 on CentOS 5.5 on VMWare:
Linux dev.ic.soschildrensvillages.org.uk 2.6.18-194.11.4.el5PAE #1 SMP Tue Sep 21 05:48:23 EDT 2010 i686 i686 i386 GNU/Linux
We have recently upgraded from 4GB RAM to 8GB. When we did this the time of our mysqldump overnight backup jumped from under 10 minutes to over 2 hours. It also caused unresponsiveness on our plone based web site due to database load. The dump is using the optimized mysqldump format and is spooled directly through a socket to another server.
Any ideas on what we could do to fix gratefully appreciated. Would a MySQL upgrade help? Anything we can do to MySQL config? Anything we can do to Linux config? Or do we have to add another server or go to 64-bit?
We ran a previous (non-virtual) server on 6GB PAE and didn't notice a similar issue. This was on same MySQL version, but Centos 4.4.
Server config file:
[mysqld]
port=3307
socket=/tmp/mysql_live.sock
wait_timeout=31536000
interactive_timeout=31536000
datadir=/var/mysql/live/data
user=mysql
max_connections = 200
max_allowed_packet = 64M
table_cache = 2048
binlog_cache_size = 128K
max_heap_table_size = 32M
sort_buffer_size = 2M
join_buffer_size = 2M
lower_case_table_names = 1
innodb_data_file_path = ibdata1:10M:autoextend
innodb_buffer_pool_size=1G
innodb_log_file_size=300M
innodb_log_buffer_size=8M
innodb_flush_log_at_trx_commit=1
innodb_file_per_table
[mysqldump]
# Do not buffer the whole result set in memory before writing it to
# file. Required for dumping very large tables
quick
max_allowed_packet = 64M
[mysqld_safe]
# Increase the amount of open files allowed per process. Warning: Make
# sure you have set the global system limit high enough! The high value
# is required for a large number of opened tables
open-files-limit = 8192
Server variables:
mysql> show variables;
+---------------------------------+------------------------------------------------------------------+
| Variable_name | Value |
+---------------------------------+------------------------------------------------------------------+
| auto_increment_increment | 1 |
| auto_increment_offset | 1 |
| automatic_sp_privileges | ON |
| back_log | 50 |
| basedir | /usr/local/mysql-5.0.77-linux-i686-glibc23/ |
| binlog_cache_size | 131072 |
| bulk_insert_buffer_size | 8388608 |
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.0.77-linux-i686-glibc23/share/mysql/charsets/ |
| collation_connection | latin1_swedish_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
| completion_type | 0 |
| concurrent_insert | 1 |
| connect_timeout | 10 |
| datadir | /var/mysql/live/data/ |
| date_format | %Y-%m-%d |
| datetime_format | %Y-%m-%d %H:%i:%s |
| default_week_format | 0 |
| delay_key_write | ON |
| delayed_insert_limit | 100 |
| delayed_insert_timeout | 300 |
| delayed_queue_size | 1000 |
| div_precision_increment | 4 |
| keep_files_on_create | OFF |
| engine_condition_pushdown | OFF |
| expire_logs_days | 0 |
| flush | OFF |
| flush_time | 0 |
| ft_boolean_syntax | + -><()~*:""&| |
| ft_max_word_len | 84 |
| ft_min_word_len | 4 |
| ft_query_expansion_limit | 20 |
| ft_stopword_file | (built-in) |
| group_concat_max_len | 1024 |
| have_archive | YES |
| have_bdb | NO |
| have_blackhole_engine | YES |
| have_compress | YES |
| have_crypt | YES |
| have_csv | YES |
| have_dynamic_loading | YES |
| have_example_engine | NO |
| have_federated_engine | YES |
| have_geometry | YES |
| have_innodb | YES |
| have_isam | NO |
| have_merge_engine | YES |
| have_ndbcluster | DISABLED |
| have_openssl | DISABLED |
| have_ssl | DISABLED |
| have_query_cache | YES |
| have_raid | NO |
| have_rtree_keys | YES |
| have_symlink | YES |
| hostname | app.ic.soschildrensvillages.org.uk |
| init_connect | |
| init_file | |
| init_slave | |
| innodb_additional_mem_pool_size | 1048576 |
| innodb_autoextend_increment | 8 |
| innodb_buffer_pool_awe_mem_mb | 0 |
| innodb_buffer_pool_size | 1073741824 |
| innodb_checksums | ON |
| innodb_commit_concurrency | 0 |
| innodb_concurrency_tickets | 500 |
| innodb_data_file_path | ibdata1:10M:autoextend |
| innodb_data_home_dir | |
| innodb_adaptive_hash_index | ON |
| innodb_doublewrite | ON |
| innodb_fast_shutdown | 1 |
| innodb_file_io_threads | 4 |
| innodb_file_per_table | ON |
| innodb_flush_log_at_trx_commit | 1 |
| innodb_flush_method | |
| innodb_force_recovery | 0 |
| innodb_lock_wait_timeout | 50 |
| innodb_locks_unsafe_for_binlog | OFF |
| innodb_log_arch_dir | |
| innodb_log_archive | OFF |
| innodb_log_buffer_size | 8388608 |
| innodb_log_file_size | 314572800 |
| innodb_log_files_in_group | 2 |
| innodb_log_group_home_dir | ./ |
| innodb_max_dirty_pages_pct | 90 |
| innodb_max_purge_lag | 0 |
| innodb_mirrored_log_groups | 1 |
| innodb_open_files | 300 |
| innodb_rollback_on_timeout | OFF |
| innodb_support_xa | ON |
| innodb_sync_spin_loops | 20 |
| innodb_table_locks | ON |
| innodb_thread_concurrency | 8 |
| innodb_thread_sleep_delay | 10000 |
| interactive_timeout | 31536000 |
| join_buffer_size | 2097152 |
| key_buffer_size | 8384512 |
| key_cache_age_threshold | 300 |
| key_cache_block_size | 1024 |
| key_cache_division_limit | 100 |
| language | /usr/local/mysql-5.0.77-linux-i686-glibc23/share/mysql/english/ |
| large_files_support | ON |
| large_page_size | 0 |
| large_pages | OFF |
| lc_time_names | en_US |
| license | GPL |
| local_infile | ON |
| locked_in_memory | OFF |
| log | OFF |
| log_bin | OFF |
| log_bin_trust_function_creators | OFF |
| log_error | |
| log_queries_not_using_indexes | OFF |
| log_slave_updates | OFF |
| log_slow_queries | OFF |
| log_warnings | 1 |
| long_query_time | 10 |
| low_priority_updates | OFF |
| lower_case_file_system | OFF |
| lower_case_table_names | 1 |
| max_allowed_packet | 67108864 |
| max_binlog_cache_size | 4294963200 |
| max_binlog_size | 1073741824 |
| max_connect_errors | 10 |
| max_connections | 200 |
| max_delayed_threads | 20 |
| max_error_count | 64 |
| max_heap_table_size | 33554432 |
| max_insert_delayed_threads | 20 |
| max_join_size | 18446744073709551615 |
| max_length_for_sort_data | 1024 |
| max_prepared_stmt_count | 16382 |
| max_relay_log_size | 0 |
| max_seeks_for_key | 4294967295 |
| max_sort_length | 1024 |
| max_sp_recursion_depth | 0 |
| max_tmp_tables | 32 |
| max_user_connections | 0 |
| max_write_lock_count | 4294967295 |
| multi_range_count | 256 |
| myisam_data_pointer_size | 6 |
| myisam_max_sort_file_size | 2146435072 |
| myisam_recover_options | OFF |
| myisam_repair_threads | 1 |
| myisam_sort_buffer_size | 8388608 |
| myisam_stats_method | nulls_unequal |
| ndb_autoincrement_prefetch_sz | 1 |
| ndb_force_send | ON |
| ndb_use_exact_count | ON |
| ndb_use_transactions | ON |
| ndb_cache_check_time | 0 |
| ndb_connectstring | |
| net_buffer_length | 16384 |
| net_read_timeout | 30 |
| net_retry_count | 10 |
| net_write_timeout | 60 |
| new | OFF |
| old_passwords | OFF |
| open_files_limit | 8192 |
| optimizer_prune_level | 1 |
| optimizer_search_depth | 62 |
| pid_file | /var/mysql/live/mysqld.pid |
| plugin_dir | |
| port | 3307 |
| preload_buffer_size | 32768 |
| profiling | OFF |
| profiling_history_size | 15 |
| protocol_version | 10 |
| query_alloc_block_size | 8192 |
| query_cache_limit | 1048576 |
| query_cache_min_res_unit | 4096 |
| query_cache_size | 0 |
| query_cache_type | ON |
| query_cache_wlock_invalidate | OFF |
| query_prealloc_size | 8192 |
| range_alloc_block_size | 4096 |
| read_buffer_size | 131072 |
| read_only | OFF |
| read_rnd_buffer_size | 262144 |
| relay_log | |
| relay_log_index | |
| relay_log_info_file | relay-log.info |
| relay_log_purge | ON |
| relay_log_space_limit | 0 |
| rpl_recovery_rank | 0 |
| secure_auth | OFF |
| secure_file_priv | |
| server_id | 0 |
| skip_external_locking | ON |
| skip_networking | OFF |
| skip_show_database | OFF |
| slave_compressed_protocol | OFF |
| slave_load_tmpdir | /tmp/ |
| slave_net_timeout | 3600 |
| slave_skip_errors | OFF |
| slave_transaction_retries | 10 |
| slow_launch_time | 2 |
| socket | /tmp/mysql_live.sock |
| sort_buffer_size | 2097152 |
| sql_big_selects | ON |
| sql_mode | |
| sql_notes | ON |
| sql_warnings | OFF |
| ssl_ca | |
| ssl_capath | |
| ssl_cert | |
| ssl_cipher | |
| ssl_key | |
| storage_engine | MyISAM |
| sync_binlog | 0 |
| sync_frm | ON |
| system_time_zone | GMT |
| table_cache | 2048 |
| table_lock_wait_timeout | 50 |
| table_type | MyISAM |
| thread_cache_size | 0 |
| thread_stack | 196608 |
| time_format | %H:%i:%s |
| time_zone | SYSTEM |
| timed_mutexes | OFF |
| tmp_table_size | 33554432 |
| tmpdir | /tmp/ |
| transaction_alloc_block_size | 8192 |
| transaction_prealloc_size | 4096 |
| tx_isolation | REPEATABLE-READ |
| updatable_views_with_limit | YES |
| version | 5.0.77 |
| version_comment | MySQL Community Server (GPL) |
| version_compile_machine | i686 |
| version_compile_os | pc-linux-gnu |
| wait_timeout | 31536000 |
+---------------------------------+------------------------------------------------------------------+
237 rows in set (0.00 sec)
If you're sending data over sockets, you might as well use MySQL Replication with the binlog. Use binlog_format=ROWS so that your slave(s) doesn't waste time with triggers and so on.
Failing that, you could try a "hotcopy" script such as http://dev.mysql.com/doc/refman/5.0/en/mysqlhotcopy.html
Above all, I think you are running outdated software. Upgrade to CentOS 5.5 64bit, and Percona Server (100% MySQL-compatible database on steroids) http://www.percona.com/software/percona-server/

Resources