I am running a series of archiving queries from a SPROC on a SOURCE database and a DESTINATION database on Sybase ASE. I run these queries in batches or in a series of transactions, which means only n number of records will be archived in each transaction.
However, there are times when the Sybase ASE will run out of log space and end SPROC.
My question is: when the Sybase ASE runs out of log space and ends my SPROC, will my transaction rollback the data it was working with at the time the "out of log space" error ends my SPROC?
I know all transactions committed before the "out of log space" error is permanent. But I am not sure whether or not the transaction will rollback on this error and I also find it difficult to test this.
Many Thanks
This will ususally depend on individual database settings.
If the database is set to "Abort Tran on Log Full", then when the transaction log fills up, the transaction will be aborted/rolled back. If that option is not set, then the database will go into "LOG SUSPEND" mode, and will pause all activity within the database until space is freed up, or added to the log. Once log space is available, the transaction will be allowed to complete.
The flags currently set in a database can be found by issuing the sp_helpdb {DBNAME} command and looking at the status column.
Related
I'm using Cassandra Java driver with a fetch size set to 1k. I need to query all records in a table and perform some time consuming action for a every row.
What will happen if I'll keep the ResultSet open (not fully iterated) for a one day?
What I don't care about:
consistency. If some new record will be written in the meantime, I'm ok to fetch it. However, I'm fine if I won't get it
fault tolerance. If during that process some node will fail, I'm fine if the query will fail too. However, I would like to detect that from the client perspective.
What I care about:
Cassandra resource utilization - I don't want to cause cluster outage due to some blocked resources
lateness - I don't want to block (or slow down much) cluster for other consumers of that table
I would like to get all records which existed when I started the query (assuming no deletions). However, they don't have to be up to date
The paging state is the information about the last read data (literally serialized partition key, clustering, and remaining). When sent to coordinator it will look for everything greater than that. So there are no resources in the server spent for this and no performance impact vs a normal read.
Cassandra does not have any features to allow isolation even within a single query. If data has changed from when the first query was made and the second, you will get the up to date information.
We have a mysql server running which is serving application writes. To do some batch processing we have written a sync job to migrate data into cassandra cluster.
1. A daily sync job which transfers by updated timestamp for that day.
2. A complete sync job which transfers complete data, overriding existing ones.
Now there may be a possibility that the row was deleted from mysql, in that case using the above approach it will lie forever in cassandra.
To solve that problem we have given a TTL of 15 days for every row. So eventually it will get deleted, if it was not deleted then in next full sync the TTL will be over written again.
Its working fine as far as the use case is concerned but the issue is that in full sync complete data is over written and sstable is generated continuously with compactions happenning all the time, load averages shoot up with slowness and backup size increases (which could have been avoided).
Essentially we would want to replace the existing table data by new data but we dont want to truncate before starting the job but only after job completes.
Is there any way by which this can be solved other than creating a new table altogether and dropping past table when data is generated?
You can look at the double-run migration strategy I presented here: http://www.slideshare.net/doanduyhai/from-rdbms-to-cassandra-without-a-hitch
It has the advantage of allowing 100% uptime and possible rollback if things go wrong. The downside is the amount of work required in term of releases & codes
Using Oracle 11gR2:
We already have a process that cleans up particular tables by deleting records from them that are past a specified retention date (based on the comparison between the timestamp from when the record finished processing and the retention date). I am currently writing code that will alert my team if this process fails. The only way I can see this process possibly failing is if DELETEs are disabled on the particular table it is trying to clean up.
I want to test the alerts to make sure they work and look correct by having the process fail. If I temporarily exclusively lock the table, will that disable DELETEs and cause the procedure that deletes records to fail? Or does it only disable DDL operations? Is there a better way to do this?
Assuming that "fail" means "throw an error" rather than, say, exceeding some performance bound, locking the table won't accomplish what you want. If you locked every row via a SELECT FOR UPDATE in one session, your delete job would block forever waiting for the first session to release its lock. That wouldn't throw an error and wouldn't cause the process to fail for most definitions. If your monitoring includes alerts for jobs that are running longer than expected, however, that would work well.
If your monitoring process only looks to see if the process ran and encountered an error, the easiest option would be to put a trigger on the table that throws an error when there is a delete. You could also create a child table with a foreign key constraint that would generate an error if the delete tried to delete the parent row while a child row exists. Depending on how the delete process is implemented, you probably could engineer a second process that would produce an ORA-00060 deadlock for the process you are monitoring but that is probably harder to implement than the trigger or the child table.
I was wondering of anyone has ever encountered this:
When inserting documents via AQL, I can easily kill my arango server. For example
FOR i IN 1 .. 10
FOR u IN users
INSERT {
_from: u._id,
_to: CONCAT("posts/",CEIL(RAND()*2000)),
displayDate: CEIL(RAND()*100000000)
} INTO canSee
(where users contains 500000 entries), the following happens
canSee becomes completely locked (also no more reads)
memory consumption goes up
arangosh or web console becomes unresponsive
fails [ArangoError 2001: Could not connect]
server is still running, accessing collection gives timeouts
it takes around 5-10 minutes until the server recovers and I can access the collection again
access to any other collection works fine
So ok, I'm creating a lot of entries and AQL might be implemented in a way that it does this in bulk. When doing the writes via db.save method it works but is much slower.
Also I suspect this might have to do with write-ahead cache filling up.
But still, is there a way I can fix this? Writing a lot of entries to a database should not necessarily kill it.
Logs say
DEBUG [./lib/GeneralServer/GeneralServerDispatcher.h:411] shutdownHandler called, but no handler is known for task
DEBUG [arangod/VocBase/datafile.cpp:949] created datafile '/usr/local/var/lib/arangodb/journals/logfile-6623368699310.db' of size 33554432 and page-size 4096
DEBUG [arangod/Wal/CollectorThread.cpp:1305] closing full journal '/usr/local/var/lib/arangodb/databases/database-120933/collection-4262707447412/journal-6558669721243.db'
bests
The above query will insert 5M documents into ArangoDB in a single transaction. This will take a while to complete, and while the transaction is still ongoing, it will hold lots of (potentially needed) rollback data in memory.
Additionally, the above query will first build up all the documents to insert in memory, and once that's done, will start inserting them. Building all the documents will also consume a lot of memory. When executing this query, you will see the memory usage steadily increasing until at some point the disk writes will kick in when the actual inserts start.
There are at least two ways for improving this:
it might be beneficial to split the query into multiple, smaller transactions. Each transaction then won't be as big as the original one, and will not block that many system resources while ongoing.
for the query above, it technically isn't necessary to build up all documents to insert in memory first, and only after that insert them all. Instead, documents read from users could be inserted into canSee as they arrive. This won't speed up the query, but it will significantly lower memory consumption during query execution for result sets as big as above. It will also lead to the writes starting immediately, and thus write-ahead log collection starting earlier. Not all queries are eligible for this optimization, but some (including the above) are. I worked on a mechanism today that detects eligible queries and executes them this way. The change was pushed into the devel branch today, and will be available with ArangoDB 2.5.
In the last months, we've been charged for a lot of HTTP requests we were not expecting on Cloudant. By looking at CouchDB console locally, I found out that for each continuous replication a GET request is issued every 5 seconds or so.
I have stopped all the continuous replications I could find in Futon and I did the same for every Cloudant accounts we have. By looking at Cloudant's dashboard, I have seen a reduction of GET requests (many thousands), but it did not went down to a reasonable level. So there must be some continuous replications left, but I cannot find them.
How can I find and stop the remaining replications?
To identify continuous replications that may be hidden to the user, the best way is to query a curl command, invoke _active_tasks, and apply a jq filter to display only those tasks of type "replication".
That is to say, in the command line, run a command of the following form:
curl 'https://username:password#username.cloudant.com/databasename/_active_tasks | jq 'map(select(.type == "replication"))'
The same methodology can be applied to retrieve other active tasks (view_compaction, database_compaction, etc.)
That said, in general, Cloudant-based replication is much smoother when using the _replicator database. To do so:
1) As an initial one-time task, create the database:
https://username.cloudant.com/_replicator
2) Then, create a document for every replication. If you have "continuous":true in the doc it will be treated as continuous.
3) Then, to cancel the replication you simply delete the document.
All of the above commands (e.g. creating and deleting documents) are well documented on Cloudant's website as well as throughout Stack Overflow, so please refer there for further details.
Finally, it is imperative to add the usr_ctx field so that the replication gets triggered and run within your user context. This is critical so that it shows up when you query _active_tasks, otherwise it'll run anonymously and only show up in the _active_tasks if queried by an admin. This is precisely what happened in the case of the original poster.