As I understand from this question my Sybase ASE database connection has its own SPID. My question is: are complex queries with nested subselects executed by that single SPID? Or does Sybase spawn other SPID's to execute complex queries?
If parallel processing is enabled, it is possible for spids to spawn other processes. This could occur in large complex queries if the optimizer chooses parallel execution, reorgs and other similar database actions.
If this occurs, then the newly spawned spid will have the parent spid in the fid (Family ID) column of master..sysprocesses, or in the out put of sp_who
More information on Parallel Queries can be found in the documentation.
Related
Is there any way to execute VoltDB stored procedures at regular interval or schedule store procedure to run at a specific time?
I am exploring VotlDB to shift out product from RDBMS to VotlDB. Out produce written in java.
Most of the query can be migrated into the VoltDB stored procedures. But In our product, we have cron job in oracle which executes at regular interval. Now I do not find such features in VoltDB.
I know VoltDB stored procedures can be called from the application at regular interval but our product deploys in an Active-Active mode, in that case, all application will call store procedure at regular interval and that is not a good solution or otherwise, we have to develop some mechanism to run procedure from one instance only.
so It would be good if I get cron job feature from VoltDB.
I work at VoltDB. There isn't currently a feature like this in VoltDB, for example like DBMS_JOB in Oracle.
You could certainly use a cron job on one of the servers in your cluster, or on some other server within your network that could invoke sqlcmd to run a script or echo individual SQL statements or execute procedure commands through sqlcmd to the database. Making cron jobs highly available is a general problem. You might find these other discussions helpful:
How to convert Linux cron jobs to "the Amazon way"?
https://www.reddit.com/r/linuxadmin/comments/3j3bz4/run_cronjob_only_on_one_node_in_cluster/
You could also look into something like rcron.
One thing to be careful of when converting from an RDBMS to VoltDB is that VoltDB is optimized for processing many small transactions in parallel across many partitions. While the architecture of serialized execution per partition excels for many operational and streaming workloads, it is not designed to perform bulk operations on many rows at a time, especially transactions that need to perform writes on many rows that may be in different partitions within one transaction.
If you have a periodic job that does something like "process all the new rows that meet some criteria" you may find this transaction is slow and every time it runs it could delay other parts of the workload, especially if many rows have accumulated. It would be more the "VoltDB Way" to replace a simple INSERT statement that you may be using to ingest data (to be processed later by a scheduled job) with a procedure that inserts and immediately processes the row of data. You might even need a procedure that checks for other records and processes small sets of rows as a group, for example stitching together segments of data that go together but may have arrived out of order. By operating on fewer records at a time within one partition at a time, this type of procedure would be more scalable and would keep the data closer to your desired finished state in real time, rather than always having some data waiting to be processed.
Just want to check if the Apache Calcite can be used for the use case "Data Federation"(query with multiple databases).
The idea is I have a master query (5 tables) that has tables from one database (say Hive) and 3 tables from another database (say MySQL).
Can I execute master query on multiple database from one JDBC Client interface ?
If this is possible; where the query execution (particularly inter database join) happens?
Also, can I get a physical plan from Calcite where I can execute explicitly in another execution engine?
I read from Calcite documentation that it can push down Join and GroupBy but I could not understand it? Can anyone help me understand this?
I will try to answer. you can as well send questions to the mailing list. dev#calcite.apache.org you are more likely get answer there.
Can I execute master query on multiple database from one JDBC Client interface ? If this is possible; where the query execution (particularly inter database join) happens?
yes, you can. the Inter database join happens in your memory where calcite runs.
Can I get a physical plan from Calcite where I can execute explicitly in another execution engine?
yes, you can. a lot of calcite consumers are doing this way. but you will have to wrap around the calcite rule system, I mean excute
I read from calcite documentation that it can push down Join and GroupBy but I could not understand it? Can anyone help me understand this?
these are the SQL optimisations that the engine does. imagine a groupBy which could have happened on a tiny table but actually specified after joining with a huge table.
we encounter a new problem with our anrango installation. If we send an complex AQL query like iterating over multiple collections to find specific information and then follow edges etc, the whole database blocks. We see that one of our three CPU cores is at 100% the other two are around 0%-1%. While the AQL query runs, the database does not react to any other request and the web interface is unreachable, too. This means that the whole processing is halted until the one query finished.
There are two problem in this:
First: The query takes much to long (graph queries)
Second: The database does not react while the one query is in work.
Any ideas/solutions for this problem? What are the biggest databases/graphs you have successfully worked with?
Thx, secana
ArangoDB 2.8 contains a deadlock detection. So ArangoDB will now raise an exception if your query blocks on locking.
ArangoDB 2.8 also offers fast graph traversals which improve graph performance a lot.
Another good solution is to separate reading to a second instance with a replication slave.
With RocksDB as storage engine (available since 3.2) there are no collection-level locks anymore, which means most queries can be executed in parallel without blocking: https://docs.arangodb.com/3.4/Manual/Architecture/StorageEngines.html
I was wondering what DBMSs actually use multithreading in their query plans/executions?
Oracle supports this, as does SQL Server and DB2. I do not believe that MySQL or PostgeSQL support parallel queries though.
I believe most databases that support table partitioning will support querying each partition at the same time if the need arises rather than just pruning unneeded partitions. Oracle can do this. Teradata definitely does this.
MySQL only uses one thread per query (in the standard engines); this includes if the tables are partitioned.
The Multi-threading is used in dB # many areas, for example at the Query Evaluation.
*) The Parallel Query execution is done with the help of Multi-threading for Optimizing
the performance of the Query evaluation.
*) Parallelize the dB backup like creating a separate backup thread for each available tape drive will accomplish the dB server Back-up. (eg) Oracle Uses it.
*) Using the Table Reorganization - When the time goes the dB becomes bulky and the DBA will reorganize the tables in the intention of improve the performance of the dB.
---- In oracle the POSIX and C++ is used to achieve the multi-threading.----
I have a process(c++ code) that reads and writes from database (Oracle).
But it takes long time for process to finish.
I was thinking of creating partitions in the tables that this process queries.
And then making the process multi-threaded so that each thread(one for each partition) can read/write the data in parallel.
I will be creating a DB connection per thread.
Will write slow it down?
Will this work?
Is there any other way of improving performance (all queries are tuned and optimized already)?
Thanks,
Nikhil
If the current bottleneck is writing the data to the database then creating more threads to write more data may or may not help, depending on how the data is partitioned, and whether or not the writes can occur concurrently, or whether they interfere with each other (either at the database lock level, or at the database disk IO level).
Creating more threads will instead allow the application to process more data, and queue it up for writing to the database, assuming that there is sufficient hardware concurrency (e.g. on a multicore machine) to handle the additional threads.
Partitioning may improve the database performance, as may changing the indexes on the relevant tables. If you can put separate partitions on separate physical disks then that can improve IO when only one partition needs to be accessed by a given SQL statement.
Dropping indexes that aren't needed, changing the order of index columns to match the queries, and even changing the index type can also improve performance.
As with everything: profile it before and after every proposed change.