Records not showing until Azure Databricks cluster restarted - databricks

We have been using Azure Databricks / Delta lake for the last couple of months and recently have started to spot some strange behaviours with loaded records, in particular latest records not being returned unless the cluster is restarted or a specific version number is specified.
For example (returns no records)
df_nw = spark.read.format('delta').load('/mnt/xxxx')
display(df_nw.filter("testcolumn = ???"))
But this does
%sql
SELECT * FROM delta.`/mnt/xxxx` VERSION AS OF 472 where testcolumn = ???
As mentioned above this only seems to be effecting newly inserted records. Has anyone else come across this before?
Any help would be appreciated.
Thanks
Col

Check to see if you've set a staleness limit. If you have, this is expected, if not, please create a support ticket.
https://docs.databricks.com/delta/optimizations/file-mgmt.html#manage-data-recency

Just in case anyone else is having a similar problem, I thought it would be worth sharing the solution I accidentally stumbled across.
Over the last week I was encountering issues with our Databricks cluster, whereby the spark drivers kept crashing with resource intensive workloads. After a lot of investigations, it turned out that our cluster was in Standard (Single User) mode. So, I spun up a new High Concurrency cluster.
The issue was still occasionally appearing on the High Concurrency cluster, so I decided to flip the notebook to the old cluster, which was still in an active state, and the newly loaded data was there to be queried. This led me to believe that Databricks / Spark Engine was not refreshing the underlying data set and using a previously cached version of the data even though I hadn’t explicitly cached the underlying data set.
By running %sql CLEAR CACHE the data appeared as expected.

Related

Apache Ignite 2.9.0 Recover from lost partitions

We have setup a apache ignite 2.9.0 cluster with native persistence using kubernetes in Azure with 4 nodes. To update some cache configuration, we restarted all the ignite nodes. After restart, running any sql query on one particular table - results in restart of 2 ignite nodes and after that we see lost partitions exception.
If we try to restart all nodes to recover from lost partitions, then its fine until we run any sql query on that table after which 2 nodes restart and we get lost partitions exception.
Is there anyway we can recover from lost partitions and overcome this problem? We also wanted to understand why its occuring?We could not find any logs related to this.
When all partition owners left the grid, the partition is considered to be lost, you might think of this as a special internal marker. Depending on the PartitionLossPolicy Ignite might ignore this fact and allow cache operations or disallow them to protect data consistency.
If you use native persistence, then most likely there was no physical data loss and all you need is to tell Ignite that you are aware of the situation, now all data are in place and it's safe to remove the "lost" mark from the partitions.
I think the most simple way to handle this would be to use the control script from within a pod:
control.sh --cache reset_lost_partitions cacheName1,cacheName2,...
More details:
https://ignite.apache.org/docs/latest/configuring-caches/partition-loss-policy#handling-partition-loss

Cassandra Cluster Replication- Existing Node & Existing Data

We have one requirement where we need to replicate Cassandra Cluster with existing nodes and existing data in it. Approx 2.5 TB of data is on Azure and 3.5 TB on AWS. We need to pull the remaining data from AWS to Azure. Your Kind Help is appreciated.
There are many options here.
You can connect the two using GPFS - stand up a DC in Azure replicate across remove the old DC.
You could unload the data via the Cassandra loader. https://github.com/brianmhess/cassandra-loader
You could take a snapshot and then stream the data to the new cluster via sstableloader.
It's hard to give a complete answer - it would depend on so many factors. The above should get you started at least.

Is it possible to recover a Cassandra node without a snapshot?

Offsite backups for Cassandra seem like a challenging thing. You basically have to make yet another copy of ALL your data, including the copies of data that exist due to the replication factor. Snapshots make backups easy when you don't mind storing it on the same disk that your node already uses. I'm curious - in the event of a catastrophic failure of this disk, is it possible to recover the node using the nodes that the data was replicated to?
Yes, you can restore data on crashed node using a procedure in documentation - Replacing a dead node or dead seed node. It's for Cassandra 3.x, please pick your Cassandra version from a drop-down menu on the top of the page.
But please note that you still need to do backups if your data is valuable. If you using AWS you can use this project to backup Cassandra to S3 storage.
If you are looking for offsite or off-host backups, you can also look at opscenter from Datastax or Talena software (my company). Both provide you the ability to backup your database locally or to S3. As you may expect, you also have the ability to restore data in case of hardware failures, user errors or logical corruptions which the replicas will not protect you against.
Yes, it is possible. Just execute in terminal "nodetool repair" on the node with missed data. It can take a lot of time. Also I would recommend execute repair operation on each node every month to keep your data always replicated because cassandra does not repairs data automatically (for example after node(s) falling).

OpsCenter graphs are slow to refresh, can I configure the refresh rate?

I am new to OpsCenter and trying to get a feel for the metric graphs. The graphs seem slow to refresh and I'm trying to determine if this is a configuration issue on my part or simply what to expect.
For example, I have a three node Cassandra test cluster created via CCM. OpsCenter and the node Agents were configured manually.
I have graphs on the dashboard for Read and Write Requests and Latency. I'm running a JMeter test that inserts 100k rows into a Cassandra table (via REST calls to my webapp) over the course of about 5 minutes.
I have both OpsCenter and VisualVm open. When the test kicks off, VisualVM graphs immediately start showing the change in load (via Heap and CPU/GC graphs) but the OpsCenter graphs lag behind and are slow to update. I realize I'm comparing different metrics (ie. Heap vs Write Requests) but I would expect to see some immediate indication in OpsCenter that a load is being applied.
My environment is as follows:
Cassandra: dsc-cassandra-2.1.2
OpsCenter: opscenter-5.1.0
Agents: datastax-agent-5.1.0
OS: OSX 10.10.1
Currently metrics are collected every 60 seconds, plus there’s a (albeit very small) overhead on inserting them into C*, reading back on the OpsCenter server side, and pushing to the UI.
OpsCenter team is working on both improving metrics collection in general and on delivering realtime metrics, so stay tuned.
By the way, comparing VisualVM and OpsCenter in terms of latencies is not quite correct since OpsCenter has to do a lot more work to both collect and aggregate those metrics due to its distributed nature (and also because VisualVM is so close to the meta^WJVM ;)

Cassandra compaction tasks stuck

I'm running Datastax Enterprise in a cluster consisting of 3 nodes. They are all running under the same hardware: 2 Core Intel Xeon 2.2 Ghz, 7 GB RAM, 4 TB Raid-0
This should be enough for running a cluster with a light load, storing less than 1 GB of data.
Most of the time, everything is just fine but it appears that sometimes the running tasks related to the Repair Service in OpsCenter sometimes get stuck; this causes an instability in that node and an increase in load.
However, if the node is restarted, the stuck tasks don't show up and the load is at normal levels again.
Because of the fact that we don't have much data in our cluster we're using the min_repair_time parameter defined in opscenterd.conf to delay the repair service so that it doesn't complete too often.
It really seems a little bit weird that the tasks that says that are marked as "Complete" and are showing a progress of 100% don't go away, and yes, we've waited hours for them to go away but they won't; the only way that we've found to solve this is to restart the nodes.
Edit:
Here's the output from nodetool compactionstats
Edit 2:
I'm running under Datastax Enterprise v. 4.6.0 with Cassandra v. 2.0.11.83
Edit 3:
This is output from dstat on a node that behaving normally
This is output from dstat on a node with stucked compaction
Edit 4:
Output from iostat on node with stucked compaction, see the high "iowait"
azure storage
Azure divides disk resources among storage accounts under an individual user account. There can be many storage accounts in an individual user account.
For the purposes of running DSE [or cassandra], it is important to note that a single storage account should not should not be shared between more than two nodes if DSE [or cassandra] is configured like the examples in the scripts in this document. This document configures each node to have 16 disks. Each disk has a limit of 500 IOPS. This yields 8000 IOPS when configured in RAID-0. So, two nodes will hit 16,000 IOPS and three would exceed the limit.
See details here
So, this has been an issue that have been under investigation for a long time now and we've found a solution, however, we aren't sure what the underlaying problem that were causing the issues were but we got a clue even tho that, nothing can be confirmed.
Basically what we did was setting up a RAID-0 also known as Striping consisting of four disks, each at 1 TB of size. We should have seen somewhere 4x one disks IOPS when using the Stripe, but we didn't, so something was clearly wrong with the setup of the RAID.
We used multiple utilities to confirm that the CPU were waiting for the IO to respond most of the time when we said to ourselves that the node was "stucked". Clearly something with the IO and most probably our RAID-setup was causing this. We tried a few differences within MDADM-settings etc, but didn't manage to solve the problems using the RAID-setup.
We started investigating Azure Premium Storage (which still is in preview). This enables attaching disks to VMs whose underlaying physical storage actually are SSDs. So we said, well, SSDs => more IOPS, so let us give this a try. We did not setup any RAID using the SSDs. We are only using one single SSD-disk per VM.
We've been running the Cluster for almost 3 days now and we've stress tested it a lot but haven't been able to reproduce the issues.
I guess we didn't came down to the real cause but the conclusion is that some of the following must have been the underlaying cause for our problems.
Too slow disks (writes > IOPS)
RAID was setup incorrectly which caused the disks to function non-normally
These two problems go hand-in-hand and most likely is that we basically just was setting up the disks in the wrong way. However, SSDs = more power to the people, so we will definitely continue using SSDs.
If someone experience the same problems that we had on Azure with RAID-0 on large disks, don't hesitate to add to here.
Part of the problem you have is that you do not have a lot of memory on those systems and it is likely that even with only 1GB of data per node, your nodes are experiencing GC pressure. Check in the system.log for errors and warnings as this will provide clues as to what is happening on your cluster.
The rollups_60 table in the OpsCenter schema contains the lowest (minute level) granularity time series data for all your Cassandra, OS, and DSE metrics. These metrics are collected regardless of whether you have built charts for them in your dashboard so that you can pick up historical views when needed. It may be that this table is outgrowing your small hardware.
You can try tuning OpsCenter to avoid this kind of issues. Here are some options for configuration in your opscenterd.conf file:
Adding keyspaces (for example the opsc keyspace) to your ignored_keyspaces setting
You can also decrease the TTL on this table by tuning the 1min_ttlsetting
Sources:
Opscenter Config DataStax docs
Metrics Config DataStax Docs

Resources