Cassandra slowed down with more nodes - cassandra

I set up a Cassandra cluster on AWS. What I want to get is increased I/O throughput (number of reads/writes per second) as more nodes are added (as advertised). However, I got exactly the opposite. The performance is reduced as new nodes are added.
Do you know any typical issues that prevents it from scaling?
Here is some details:
I am adding a text file (15MB) to the column family. Each line is a record. There are 150000 records. When there is 1 node, it takes about 90 seconds to write. But when there are 2 nodes, it takes 120 seconds. I can see the data is spread to 2 nodes. However, there is no increase in throughput.
The source code is below:
public class WordGenCAS {
static final String KEYSPACE = "text_ks";
static final String COLUMN_FAMILY = "text_table";
static final String COLUMN_NAME = "text_col";
public static void main(String[] args) throws Exception {
if (args.length < 2) {
System.out.println("Usage: WordGenCAS <input file> <host1,host2,...>");
System.exit(-1);
}
String[] contactPts = args[1].split(",");
Cluster cluster = Cluster.builder()
.addContactPoints(contactPts)
.build();
Session session = cluster.connect(KEYSPACE);
InputStream fis = new FileInputStream(args[0]);
InputStreamReader in = new InputStreamReader(fis, "UTF-8");
BufferedReader br = new BufferedReader(in);
String line;
int lineCount = 0;
while ( (line = br.readLine()) != null) {
line = line.replaceAll("'", " ");
line = line.trim();
if (line.isEmpty())
continue;
System.out.println("[" + line + "]");
String cqlStatement2 = String.format("insert into %s (id, %s) values (%d, '%s');",
COLUMN_FAMILY,
COLUMN_NAME,
lineCount,
line);
session.execute(cqlStatement2);
lineCount++;
}
System.out.println("Total lines written: " + lineCount);
}
}
The DB schema is the following:
CREATE KEYSPACE text_ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };
USE text_ks;
CREATE TABLE text_table (
id int,
text_col text,
primary key (id)
) WITH COMPACT STORAGE;
Thanks!

Even if this an old post, I think it's worth posting a solution for these (common) kind of problems.
As you've already discovered, loading data with a serial procedure is slow. What you've been suggested is the right thing to do.
However, issuing a lot of queries without applying some sort of back pressure is likely looking for troubles, and you'll gonna lose data due to excessive overload on the server (and on the driver to some extent).
This solution will load data with async calls, and will try to apply some back pressure on the client to avoid data loss.
public class WordGenCAS {
static final String KEYSPACE = "text_ks";
static final String COLUMN_FAMILY = "text_table";
static final String COLUMN_NAME = "text_col";
public static void main(String[] args) throws Exception {
if (args.length < 2) {
System.out.println("Usage: WordGenCAS <input file> <host1,host2,...>");
System.exit(-1);
}
String[] contactPts = args[1].split(",");
Cluster cluster = Cluster.builder()
.addContactPoints(contactPts)
.build();
Session session = cluster.connect(KEYSPACE);
InputStream fis = new FileInputStream(args[0]);
InputStreamReader in = new InputStreamReader(fis, "UTF-8");
BufferedReader br = new BufferedReader(in);
String line;
int lineCount = 0;
// This is the futures list of our queries
List<Future<ResultSet>> futures = new ArrayList<>();
// Loop
while ( (line = br.readLine()) != null) {
line = line.replaceAll("'", " ");
line = line.trim();
if (line.isEmpty())
continue;
System.out.println("[" + line + "]");
String cqlStatement2 = String.format("insert into %s (id, %s) values (%d, '%s');",
COLUMN_FAMILY,
COLUMN_NAME,
lineCount,
line);
lineCount++;
// Add the "future" returned by async method the to the list
futures.add(session.executeAsync(cqlStatement2));
// Apply some backpressure if we issued more than X query.
// Change X to another value suitable for your cluster
while (futures.size() > 1000) {
Future<ResultSet> future = futures.remove(0);
try {
future.get();
} catch (Exception e) {
e.printStackTrace();
}
}
}
System.out.println("Total lines written: " + lineCount);
System.out.println("Waiting for writes to complete...");
// Wait until all writes are done.
while (futures.size() > 0) {
Future<ResultSet> future = futures.remove(0);
try {
future.get();
} catch (Exception e) {
e.printStackTrace();
}
}
System.out.println("Done!");
}
}

Related

azure search work around $skip limit

I'm doing a job to check if all records from my database exists on Azure Search (around 610k). However there's a 100000 limit with the $skip parameter. Is there a way to work around this limit?
You can not facet over more thank 100K documents, however, you can add facets to work around this. For example, let’s say you have a facet called Country and no one facet has more than 100K documents. You can facet over all documents where Country == ‘Canada’, then facet over all documents where Country == ‘USA’, etc…
Just to clarify the other answers: you can't bypass the limit directly but you can use a workaround.
Here's what you can do:
1) Add a unique field to the index. The contents can be a modification timestamp (if it's granular enough to make it unique) or for example a running number. Or alternatively you can use some existing unique field for this.
2) Take the first 100000 results from the index ordered by your unique field
3) Check what is the maximum value (if ordering ascending) in the results for your unique field - so the value of the last entry
4) Take the next 100000 results by ordering based on the same unique field and adding a filter which takes results only where the unique field's value is bigger that the previous maximum. This way the same first 100000 values are not returned but we get the next 100000 values.
5) Continue until you have all results
The downside is that you can't use other custom ordering with the results unless you do the ordering after getting the results.
I use data metadata_storage_last_modified as filter, and the following is my example.
offset skip time
0 --%--> 0
100,000 --%--> 100,000 getLastTime
101,000 --%--> 0 useLastTime
200,000 --%--> 99,000 useLastTime
201,000 --%--> 100,000 useLastTime & getLastTime
202,000 --%--> 0 useLastTime
Because Skip limit is 100k, so we can calculate skip by
AzureSearchSkipLimit = 100k
AzureSearchTopLimit = 1k
skip = offset % (AzureSearchSkipLimit + AzureSearchTopLimit)
If total search count will large than AzureSearchSkipLimit, then apply
orderby = "metadata_storage_last_modified desc"
When skip reach AzureSearchSkipLimit ,then get metadata_storage_last_modified time from end of data. And put metadata_storage_last_modified as next 100k search filer.
filter = metadata_storage_last_modified lt ${metadata_storage_last_modified}
I understand the limitation of the API with the 100K limit, and MS's site says as a work around "you can work around this limitation by adding code to iterate over, and filter on, a facet with less that 100K documents per facet value."
I'm using the "Back up and restore an Azure Cognitive Search index" sample solution provided by MS. (https://github.com/Azure-Samples/azure-search-dotnet-samples)
But can some when tell me where or how I implement this "iterate loop" on a facet" The facetable field I'm trying to use is "tribtekey" but I don't know where to place the code in the below. Any help would be greatly appreciated.
// This is a prototype tool that allows for extraction of data from a search index
// Since this tool is still under development, it should not be used for production usage
using Azure;
using Azure.Search.Documents;
using Azure.Search.Documents.Indexes;
using Azure.Search.Documents.Models;
using Microsoft.Extensions.Configuration;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net.Http;
using System.Text.Json;
using System.Threading;
using System.Threading.Tasks;
namespace AzureSearchBackupRestore
{
class Program
{
private static string SourceSearchServiceName;
private static string SourceAdminKey;
private static string SourceIndexName;
private static string TargetSearchServiceName;
private static string TargetAdminKey;
private static string TargetIndexName;
private static string BackupDirectory;
private static SearchIndexClient SourceIndexClient;
private static SearchClient SourceSearchClient;
private static SearchIndexClient TargetIndexClient;
private static SearchClient TargetSearchClient;
private static int MaxBatchSize = 500; // JSON files will contain this many documents / file and can be up to 1000
private static int ParallelizedJobs = 10; // Output content in parallel jobs
static void Main(string[] args)
{
//Get source and target search service info and index names from appsettings.json file
//Set up source and target search service clients
ConfigurationSetup();
//Backup the source index
Console.WriteLine("\nSTART INDEX BACKUP");
BackupIndexAndDocuments();
//Recreate and import content to target index
//Console.WriteLine("\nSTART INDEX RESTORE");
//DeleteIndex();
//CreateTargetIndex();
//ImportFromJSON();
//Console.WriteLine("\r\n Waiting 10 seconds for target to index content...");
//Console.WriteLine(" NOTE: For really large indexes it may take longer to index all content.\r\n");
//Thread.Sleep(10000);
//
//// Validate all content is in target index
//int sourceCount = GetCurrentDocCount(SourceSearchClient);
//int targetCount = GetCurrentDocCount(TargetSearchClient);
//Console.WriteLine("\nSAFEGUARD CHECK: Source and target index counts should match");
//Console.WriteLine(" Source index contains {0} docs", sourceCount);
//Console.WriteLine(" Target index contains {0} docs\r\n", targetCount);
//
//Console.WriteLine("Press any key to continue...");
//Console.ReadLine();
}
static void ConfigurationSetup()
{
IConfigurationBuilder builder = new ConfigurationBuilder().AddJsonFile("appsettings.json");
IConfigurationRoot configuration = builder.Build();
SourceSearchServiceName = configuration["SourceSearchServiceName"];
SourceAdminKey = configuration["SourceAdminKey"];
SourceIndexName = configuration["SourceIndexName"];
TargetSearchServiceName = configuration["TargetSearchServiceName"];
TargetAdminKey = configuration["TargetAdminKey"];
TargetIndexName = configuration["TargetIndexName"];
BackupDirectory = configuration["BackupDirectory"];
Console.WriteLine("CONFIGURATION:");
Console.WriteLine("\n Source service and index {0}, {1}", SourceSearchServiceName, SourceIndexName);
Console.WriteLine("\n Target service and index: {0}, {1}", TargetSearchServiceName, TargetIndexName);
Console.WriteLine("\n Backup directory: " + BackupDirectory);
SourceIndexClient = new SearchIndexClient(new Uri("https://" + SourceSearchServiceName + ".search.windows.net"), new AzureKeyCredential(SourceAdminKey));
SourceSearchClient = SourceIndexClient.GetSearchClient(SourceIndexName);
// TargetIndexClient = new SearchIndexClient(new Uri($"https://" + TargetSearchServiceName + ".search.windows.net"), new AzureKeyCredential(TargetAdminKey));
// TargetSearchClient = TargetIndexClient.GetSearchClient(TargetIndexName);
}
static void BackupIndexAndDocuments()
{
// Backup the index schema to the specified backup directory
Console.WriteLine("\n Backing up source index schema to {0}\r\n", BackupDirectory + "\\" + SourceIndexName + ".schema");
File.WriteAllText(BackupDirectory + "\\" + SourceIndexName + ".schema", GetIndexSchema());
// Extract the content to JSON files
int SourceDocCount = GetCurrentDocCount(SourceSearchClient);
WriteIndexDocuments(SourceDocCount); // Output content from index to json files
}
static void WriteIndexDocuments(int CurrentDocCount)
{
// Write document files in batches (per MaxBatchSize) in parallel
string IDFieldName = GetIDFieldName();
int FileCounter = 0;
for (int batch = 0; batch <= (CurrentDocCount / MaxBatchSize); batch += ParallelizedJobs)
{
List<Task> tasks = new List<Task>();
for (int job = 0; job < ParallelizedJobs; job++)
{
FileCounter++;
int fileCounter = FileCounter;
if ((fileCounter - 1) * MaxBatchSize < CurrentDocCount)
{
Console.WriteLine(" Backing up source documents to {0} - (batch size = {1})", BackupDirectory + "\\" + SourceIndexName + fileCounter + ".json", MaxBatchSize);
tasks.Add(Task.Factory.StartNew(() =>
ExportToJSON((fileCounter - 1) * MaxBatchSize, IDFieldName, BackupDirectory + "\\" + SourceIndexName + fileCounter + ".json")
));
}
}
Task.WaitAll(tasks.ToArray()); // Wait for all the stored procs in the group to complete
}
return;
}
static void ExportToJSON(int Skip, string IDFieldName, string FileName)
{
// Extract all the documents from the selected index to JSON files in batches of 500 docs / file
string json = string.Empty;
try
{
SearchOptions options = new SearchOptions()
{
SearchMode = SearchMode.All,
Size = MaxBatchSize,
Skip = Skip,
// ,IncludeTotalCount = true
// ,Filter = Azure.Search.Documents.SearchFilter.Create('%24top=2&%24skip=0&%24orderby=tributeId%20asc')
//,Filter = String.Format("&search=*&%24top=2&%24skip=0&%24orderby=tributeId%20asc")
//,Filter = "%24top=2&%24skip=0&%24orderby=tributeId%20asc"
//,Filter = "tributeKey eq '5'"
};
SearchResults<SearchDocument> response = SourceSearchClient.Search<SearchDocument>("*", options);
foreach (var doc in response.GetResults())
{
json += JsonSerializer.Serialize(doc.Document) + ",";
json = json.Replace("\"Latitude\":", "\"type\": \"Point\", \"coordinates\": [");
json = json.Replace("\"Longitude\":", "");
json = json.Replace(",\"IsEmpty\":false,\"Z\":null,\"M\":null,\"CoordinateSystem\":{\"EpsgId\":4326,\"Id\":\"4326\",\"Name\":\"WGS84\"}", "]");
json += "\r\n";
}
// Output the formatted content to a file
json = json.Substring(0, json.Length - 3); // remove trailing comma
File.WriteAllText(FileName, "{\"value\": [");
File.AppendAllText(FileName, json);
File.AppendAllText(FileName, "]}");
Console.WriteLine(" Total documents: {0}", response.GetResults().Count().ToString());
json = string.Empty;
}
catch (Exception ex)
{
Console.WriteLine("Error: {0}", ex.Message.ToString());
}
}
static string GetIDFieldName()
{
// Find the id field of this index
string IDFieldName = string.Empty;
try
{
var schema = SourceIndexClient.GetIndex(SourceIndexName);
foreach (var field in schema.Value.Fields)
{
if (field.IsKey == true)
{
IDFieldName = Convert.ToString(field.Name);
break;
}
}
}
catch (Exception ex)
{
Console.WriteLine("Error: {0}", ex.Message.ToString());
}
return IDFieldName;
}
static string GetIndexSchema()
{
// Extract the schema for this index
// We use REST here because we can take the response as-is
Uri ServiceUri = new Uri("https://" + SourceSearchServiceName + ".search.windows.net");
HttpClient HttpClient = new HttpClient();
HttpClient.DefaultRequestHeaders.Add("api-key", SourceAdminKey);
string Schema = string.Empty;
try
{
Uri uri = new Uri(ServiceUri, "/indexes/" + SourceIndexName);
HttpResponseMessage response = AzureSearchHelper.SendSearchRequest(HttpClient, HttpMethod.Get, uri);
AzureSearchHelper.EnsureSuccessfulSearchResponse(response);
Schema = response.Content.ReadAsStringAsync().Result.ToString();
}
catch (Exception ex)
{
Console.WriteLine("Error: {0}", ex.Message.ToString());
}
return Schema;
}
private static bool DeleteIndex()
{
Console.WriteLine("\n Delete target index {0} in {1} search service, if it exists", TargetIndexName, TargetSearchServiceName);
// Delete the index if it exists
try
{
TargetIndexClient.DeleteIndex(TargetIndexName);
}
catch (Exception ex)
{
Console.WriteLine(" Error deleting index: {0}\r\n", ex.Message);
Console.WriteLine(" Did you remember to set your SearchServiceName and SearchServiceApiKey?\r\n");
return false;
}
return true;
}
static void CreateTargetIndex()
{
Console.WriteLine("\n Create target index {0} in {1} search service", TargetIndexName, TargetSearchServiceName);
// Use the schema file to create a copy of this index
// I like using REST here since I can just take the response as-is
string json = File.ReadAllText(BackupDirectory + "\\" + SourceIndexName + ".schema");
// Do some cleaning of this file to change index name, etc
json = "{" + json.Substring(json.IndexOf("\"name\""));
int indexOfIndexName = json.IndexOf("\"", json.IndexOf("name\"") + 5) + 1;
int indexOfEndOfIndexName = json.IndexOf("\"", indexOfIndexName);
json = json.Substring(0, indexOfIndexName) + TargetIndexName + json.Substring(indexOfEndOfIndexName);
Uri ServiceUri = new Uri("https://" + TargetSearchServiceName + ".search.windows.net");
HttpClient HttpClient = new HttpClient();
HttpClient.DefaultRequestHeaders.Add("api-key", TargetAdminKey);
try
{
Uri uri = new Uri(ServiceUri, "/indexes");
HttpResponseMessage response = AzureSearchHelper.SendSearchRequest(HttpClient, HttpMethod.Post, uri, json);
response.EnsureSuccessStatusCode();
}
catch (Exception ex)
{
Console.WriteLine(" Error: {0}", ex.Message.ToString());
}
}
static int GetCurrentDocCount(SearchClient searchClient)
{
// Get the current doc count of the specified index
try
{
SearchOptions options = new SearchOptions()
{
SearchMode = SearchMode.All,
IncludeTotalCount = true
};
SearchResults<Dictionary<string, object>> response = searchClient.Search<Dictionary<string, object>>("*", options);
return Convert.ToInt32(response.TotalCount);
}
catch (Exception ex)
{
Console.WriteLine(" Error: {0}", ex.Message.ToString());
}
return -1;
}
static void ImportFromJSON()
{
Console.WriteLine("\n Upload index documents from saved JSON files");
// Take JSON file and import this as-is to target index
Uri ServiceUri = new Uri("https://" + TargetSearchServiceName + ".search.windows.net");
HttpClient HttpClient = new HttpClient();
HttpClient.DefaultRequestHeaders.Add("api-key", TargetAdminKey);
try
{
foreach (string fileName in Directory.GetFiles(BackupDirectory, SourceIndexName + "*.json"))
{
Console.WriteLine(" -Uploading documents from file {0}", fileName);
string json = File.ReadAllText(fileName);
Uri uri = new Uri(ServiceUri, "/indexes/" + TargetIndexName + "/docs/index");
HttpResponseMessage response = AzureSearchHelper.SendSearchRequest(HttpClient, HttpMethod.Post, uri, json);
response.EnsureSuccessStatusCode();
}
}
catch (Exception ex)
{
Console.WriteLine(" Error: {0}", ex.Message.ToString());
}
}
}
}

Spark streaming how to get context inside foreachRDD

How can I use "ssc.sparkContext()" in foreachRDD of spark streaming?
If I use "ssc.sparkContext()" as it is in foreachRDD (JAVA) (basically, something like ssc.sparkContext().broadcast(map)), then I get "Task not serializable" error.
If I use "(new JavaSparkContext(rdd.context())).broadcast(map)" then there is no problem.
So, basically is "ssc.sparkContext()" equivalent to "(new JavaSparkContext(rdd.context()))"?
And if I use "(new JavaSparkContext(rdd.context())).broadcast(map)" will the broadcast variable i.e. associated "map" get distributed to all executors in SparkContext.
Code is given below:
Here, "bcv.broadcastVar = (new JavaSparkContext(rdd.context())).broadcast(map);" works but "bcv.broadcastVar = ssc.sparkContext.broadcast(map);" does not work
words.foreachRDD(new Function<JavaRDD<String>, Void>() {
#Override
public Void call(JavaRDD<String> rdd) throws Exception {
if (rdd != null) {
System.out.println("Hello World - words - SSC !!!"); // Gets printed on Driver
if (stat.data_changed == 1) {
stat.data_changed = 0;
bcv.broadcastVar.unpersist(); // Unpersist BC variable
bcv.broadcastVar = (new JavaSparkContext(rdd.context())).broadcast(map); // Re-broadcast same BC variable with NEW data
}
}
rdd.foreachPartition(new VoidFunction<Iterator<String>>() {
#Override
public void call(Iterator<String> items) throws Exception {
System.out.println("words.foreachRDD.foreachPartition: CALLED ..."); // Gets called on Worker/Executor
Integer index = 1;
String lastKey = "";
Integer lastValue = 0;
while (true) {
String key = "A" + Long.toString(index);
Integer value = bcv.broadcastVar.value().get(key); // Executor Consumes map
if (value == null) break;
lastKey = key;
lastValue = value;
index++;
}
System.out.println("Executor BC: key/value: " + lastKey + " = " + lastValue);
return;
}
});
return null;
}
});

Problems in using SwingWorker class for reading a file and implementing a JProgressBar

Note: This question may look like a repetition of several question posted on the forum, but I am really stuck on this problem from quite some time and I am not able to solve this issue using the solutions posted for similar questions. I have posted my code here and need help to proceed further
So, here is my issue:
I am writing a Java GUI application which loads a file before performing any processing. There is a waiting time on an average of about 10-15 seconds during which the file is parsed. After this waiting time, what I get see on the GUI is,
The parsed file in the form of individual leaves in the JTree in a Jpanel
Some header information (example: data range) in two individual JTextField
A heat map generated after parsing the data in a different JPanel on the GUI.
The program connects to R to parse the file and read the header information.
Now, I want to use swing worker to put the file reading process on a different thread so that it does not block the EDT. I am not sure how I can build my SwingWorker class so that the process is done in the background and the results for the 3 components are displayed when the process is complete. And, during this file reading process I want to display a JProgressBar.
Here is the code which does the whole process, starting from selection of the file selection menu item. This is in the main GUI method.
JScrollPane spectralFilesScrollPane;
if ((e.getSource() == OpenImagingFileButton) || (e.getSource() == loadRawSpectraMenuItem)) {
int returnVal = fcImg.showOpenDialog(GUIMain.this);
// File chooser
if (returnVal == JFileChooser.APPROVE_OPTION) {
file = fcImg.getSelectedFile();
//JTree and treenode creation
DefaultMutableTreeNode root = new DefaultMutableTreeNode(file);
rawSpectraTree = new JTree(root);
DefaultTreeModel model = (DefaultTreeModel) rawSpectraTree.getModel();
try {
// R connection
rc = new RConnection();
final String inputFileDirectory = file.getParent();
System.out.println("Current path: " + currentPath);
rc.assign("importImagingFile", currentPath.concat("/importImagingFile.R"));
rc.eval("source(importImagingFile)");
rc.assign("currentWorkingDirectory", currentPath);
rc.assign("inputFileDirectory", inputFileDirectory);
rawSpectrumObjects = rc.eval("importImagingFile(inputFileDirectory,currentWorkingDirectory)");
rc.assign("plotAverageSpectra", currentPath.concat("/plotAverageSpectra.R"));
rc.eval("source(plotAverageSpectra)");
rc.assign("rawSpectrumObjects", rawSpectrumObjects);
REXP averageSpectraObject = rc.eval("plotAverageSpectra(rawSpectrumObjects)");
rc.assign("AverageMassSpecObjectToSpectra", currentPath.concat("/AverageMassSpecObjectToSpectra.R"));
rc.eval("source(AverageMassSpecObjectToSpectra)");
rc.assign("averageSpectraObject", averageSpectraObject);
REXP averageSpectra = rc.eval("AverageMassSpecObjectToSpectra(averageSpectraObject)");
averageSpectraMatrix = averageSpectra.asDoubleMatrix();
String[] spectrumName = new String[rawSpectrumObjects.asList().size()];
for (int i = 0; i < rawSpectrumObjects.asList().size(); i++) {
DefaultMutableTreeNode node = new DefaultMutableTreeNode("Spectrum_" + (i + 1));
model.insertNodeInto(node, root, i);
}
// Expand all the nodes of the JTree
for(int i=0;i< model.getChildCount(root);++i){
rawSpectraTree.expandRow(i);
}
DefaultMutableTreeNode firstLeaf = ((DefaultMutableTreeNode)rawSpectraTree.getModel().getRoot()).getFirstLeaf();
rawSpectraTree.setSelectionPath(new TreePath(firstLeaf.getPath()));
updateSpectralTableandChartRAW(firstLeaf);
// List the min and the max m/z of in the respective data fields
rc.assign("dataMassRange", currentPath.concat("/dataMassRange.R"));
rc.eval("source(dataMassRange)");
rc.assign("rawSpectrumObjects", rawSpectrumObjects);
REXP massRange = rc.eval("dataMassRange(rawSpectrumObjects)");
double[] massRangeValues = massRange.asDoubles();
minMzValue = (float)massRangeValues[0];
maxMzValue = (float)massRangeValues[1];
GlobalMinMz = minMzValue;
GlobalMaxMz = maxMzValue;
// Adds the range values to the jTextField
minMz.setText(Float.toString(minMzValue));
minMz.validate();
minMz.repaint();
maxMz.setText(Float.toString(maxMzValue));
maxMz.validate();
maxMz.repaint();
// Update status bar with the uploaded data details
statusLabel.setText("File name: " + file.getName() + " | " + "Total spectra: " + rawSpectrumObjects.asList().size() + " | " + "Mass range: " + GlobalMinMz + "-" + GlobalMaxMz);
// Generates a heatmap
rawIntensityMap = gim.generateIntensityMap(rawSpectrumObjects, currentPath, minMzValue, maxMzValue, Gradient.GRADIENT_Rainbow, "RAW");
rawIntensityMap.addMouseListener(this);
rawIntensityMap.addMouseMotionListener(this);
imagePanel.add(rawIntensityMap, BorderLayout.CENTER);
coordinates = new JLabel();
coordinates.setBounds(31, 31, rawIntensityMap.getWidth() - 31, rawIntensityMap.getHeight() - 31);
panelRefresh(imagePanel);
tabbedSpectralFiles.setEnabledAt(1, false);
rawSpectraTree.addTreeSelectionListener(new TreeSelectionListener() {
#Override
public void valueChanged(TreeSelectionEvent e) {
try {
DefaultMutableTreeNode selectedNode =
(DefaultMutableTreeNode) rawSpectraTree.getLastSelectedPathComponent();
int rowCount = listTableModel.getRowCount();
for (int l = 0; l < rowCount; l++) {
listTableModel.removeRow(0);
}
updateSpectralTableandChartRAW(selectedNode);
} catch (RserveException e2) {
e2.printStackTrace();
} catch (REXPMismatchException e1) {
e1.printStackTrace();
}
}
});
spectralFilesScrollPane = new JScrollPane();
spectralFilesScrollPane.setViewportView(rawSpectraTree);
spectralFilesScrollPane.setPreferredSize(rawFilesPanel.getSize());
rawFilesPanel.add(spectralFilesScrollPane);
tabbedSpectralFiles.validate();
tabbedSpectralFiles.repaint();
rawImage.setEnabled(true);
peakPickedImage.setEnabled(false);
loadPeakListMenuItem.setEnabled(true); //active now
loadPeaklistsButton.setEnabled(true); //active now
propertiesMenuItem.setEnabled(true); // active now
propertiesButton.setEnabled(true); //active now
} catch (RserveException e1) {
JOptionPane.showMessageDialog(this,
"There was an error in the R connection. Please try again!", "Error",
JOptionPane.ERROR_MESSAGE);
} catch (REXPMismatchException e1) {
JOptionPane.showMessageDialog(this,
"Operation requested is not supported by the given R object type. Please try again!", "Error",
JOptionPane.ERROR_MESSAGE);
}
// hideProgress();
}
}
I tried creating a SwingWorker class, but I am totally confused how I can get all the three outputs on the GUI, plus have a progress bar. It is not complete, but I don't know how to proceed further.
public class FileReadWorker extends SwingWorker<REXP, String>{
private static void failIfInterrupted() throws InterruptedException {
if (Thread.currentThread().isInterrupted()) {
throw new InterruptedException("Interrupted while loading imaging file!");
}
}
// The file that is being read
private final File fileName;
private JTree rawSpectraTree;
private RConnection rc;
private REXP rawSpectrumObjects;
private double[][] averageSpectraMatrix;
private Path currentRelativePath = Paths.get("");
private final String currentPath = currentRelativePath.toAbsolutePath().toString();
final JProgressBar progressBar = new JProgressBar();
// public FileReadWorker(File fileName)
// {
// this.fileName = fileName;
// System.out.println("I am here");
// }
public FileReadWorker(final JProgressBar progressBar, File fileName) {
this.fileName = fileName;
addPropertyChangeListener(new PropertyChangeListener() {
public void propertyChange(PropertyChangeEvent evt) {
if ("progress".equals(evt.getPropertyName())) {
progressBar.setValue((Integer) evt.getNewValue());
}
}
});
progressBar.setVisible(true);
progressBar.setStringPainted(true);
progressBar.setValue(0);
setProgress(0);
}
#Override
protected REXP doInBackground() throws Exception {
System.out.println("I am here... in background");
DefaultMutableTreeNode root = new DefaultMutableTreeNode(fileName);
rawSpectraTree = new JTree(root);
DefaultTreeModel model = (DefaultTreeModel) rawSpectraTree.getModel();
rc = new RConnection();
final String inputFileDirectory = fileName.getParent();
rc.assign("importImagingFile", currentPath.concat("/importImagingFile.R"));
rc.eval("source(importImagingFile)");
rc.assign("currentWorkingDirectory", currentPath);
rc.assign("inputFileDirectory", inputFileDirectory);
rawSpectrumObjects = rc.eval("importImagingFile(inputFileDirectory,currentWorkingDirectory)");
rc.assign("plotAverageSpectra", currentPath.concat("/plotAverageSpectra.R"));
rc.eval("source(plotAverageSpectra)");
rc.assign("rawSpectrumObjects", rawSpectrumObjects);
REXP averageSpectraObject = rc.eval("plotAverageSpectra(rawSpectrumObjects)");
rc.assign("AverageMassSpecObjectToSpectra", currentPath.concat("/AverageMassSpecObjectToSpectra.R"));
rc.eval("source(AverageMassSpecObjectToSpectra)");
rc.assign("averageSpectraObject", averageSpectraObject);
REXP averageSpectra = rc.eval("AverageMassSpecObjectToSpectra(averageSpectraObject)");
averageSpectraMatrix = averageSpectra.asDoubleMatrix();
for (int i = 0; i < rawSpectrumObjects.asList().size(); i++) {
DefaultMutableTreeNode node = new DefaultMutableTreeNode("Spectrum_" + (i + 1));
model.insertNodeInto(node, root, i);
}
// Expand all the nodes of the JTree
for(int i=0;i< model.getChildCount(root);++i){
rawSpectraTree.expandRow(i);
}
return averageSpectra;
}
#Override
public void done() {
setProgress(100);
progressBar.setValue(100);
progressBar.setStringPainted(false);
progressBar.setVisible(false);
}
}
Any help would be very much appreciated.

Cassandra exception while live schema change

For our project we need live schema changes. For these purposes we created test. It adds and deletes tables in cassandra. I'm using 10 threads with datastax driver. But I am getting this Exception in some threads.
Exception in thread "Thread-7"
com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s) tried for query failed (tried: localhost/127.0.0.1
(com.datastax.driver.core.exceptions.DriverException: Timeout during
read))
Cassandra is alive. I can still execute queries throw cqlsh.
Threads does'n fail. Thread can add tables, then suddenly log this exception and again add tables. Exception can appear several times in one thread.
Each thread has its own session.
Looking to the cassandra.yaml:
concurrent_writes and reads = 32
increasing of MAX_HEAP_SIZE and HEAP_NEWSIZE in cassandra-env.sh doesn't help me.
Cassandra version 2.0.7
Could someone suggest how to deal with it?
public class AddTableThread extends Thread {
private Cluster cluster;
private Session session;
private final String KEYSPACE = "cas";
private final String HOST = "localhost";
private final int PORT = 9042;
private boolean isRunning = true;
// counter for the added tables
private int i = 0;
// Number of the tables that thread has to add
private int fin = -2;
public AddTableThread() {
connect();
}
public AddTableThread setTablesCount(int f) {
this.fin = f;
System.out.println(this.getName() + " : must add " + fin + " tables");
return this;
}
#Override
public void run() {
System.out.println(this.getName() + " : starting thread \n");
while (isRunning) {
String name = createName();
try {
session.execute(createTableQuery(name));
System.out.println(this.getName() + " : table is created "
+ name);
i++;
} catch (com.datastax.driver.core.exceptions.NoHostAvailableException e) {
System.out
.println("\n\n" + this.getName() + "!!!!FAILED!!!!!!");
System.out.println(this.getName() + " : added " + i + " tables");
System.out.println(this.getName() + " : table " + name
+ " wasn't added");
}
try {
Thread.sleep((new Random().nextInt(5000)));
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
if (i == fin) {
isRunning = false;
}
}
close();
log();
}
// creates query that creates table in keyspace with name tableName
private String createTableQuery(String tableName) {
return "CREATE TABLE cas." + tableName
+ "(id timeuuid PRIMARY KEY, text varchar, value int);";
}
public void finish() {
isRunning = false;
}
private void log() {
System.out.println("\n" + this.getName() + " : finish");
System.out.println(this.getName() + " : added " + i + " tables");
System.out.println("_____________________\n");
}
// generates unique table name
private String createName() {
return "cf_" + UUIDGen.getTimeUUID().toString().replace("-", "_");
}
private void connect() {
SocketOptions so = new SocketOptions();
so.setConnectTimeoutMillis(20000);
so.setReadTimeoutMillis(20000);
cluster = Cluster.builder().withSocketOptions(so).addContactPoint(HOST)
.withPort(PORT).build();
session = cluster.connect(KEYSPACE);
}
private void close() {
session.close();
cluster.close();
}
}
Creating takes time. So try to increment:
read_request_timeout_in_ms: 5000
and
request_timeout_in_ms: 10000
And try to uncomment
rpc_max_threads: 2048
Experiment with these parameters.

Multithreaded Server using TCP in Java

I'm trying to implement a simple TCP connection between Client/Server. I made the Server multithreaded so that it can take either multiple requests (such as finding the sum, max, min of a string of numbers provided by the user) from a single client or accept multiple connections from different clients. I'm running both of them on my machine, but the server doesn't seem to push out an answer. Not sure what I'm doing wrong here --
public final class CalClient {
static final int PORT_NUMBER = 6789;
public static void main (String arg[]) throws Exception
{
String serverName;
#SuppressWarnings("unused")
String strListOfNumbers = null;
int menuIndex;
boolean exit = false;
BufferedReader inFromUser = new BufferedReader(new InputStreamReader(System.in));
System.out.println("Please enter host name...");
System.out.print("> ");
serverName = inFromUser.readLine();
Socket clientSocket = new Socket(serverName, PORT_NUMBER);
DataOutputStream outToServer = new DataOutputStream(clientSocket.getOutputStream());
BufferedReader inFromServer = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
//outToServer.writeBytes(serverName + '\n');
System.out.println("");
System.out.println("Enter 1 to enter the list of numbers");
System.out.println("Enter 2 to perform Summation");
System.out.println("Enter 3 to calculate Maximum");
System.out.println("Enter 4 to calculate Minimum");
System.out.println("Enter 5 to Exit");
while (!exit) {
System.out.print(">");
menuIndex = Integer.parseInt(inFromUser.readLine());
if (menuIndex == 1) {
System.out.println("Please enter the numbers separated by commas.");
System.out.print(">");
strListOfNumbers = inFromUser.readLine();
outToServer.writeBytes("List" + strListOfNumbers);
//continue;
}
else if (menuIndex == 2) {
outToServer.writeBytes("SUM");
System.out.println(inFromServer.readLine());
}
else if (menuIndex == 3) {
outToServer.writeBytes("MAX");
System.out.println(inFromServer.readLine());
}
else if (menuIndex == 4) {
outToServer.writeBytes("MIN");
System.out.println(inFromServer.readLine());
}
else if (menuIndex == 5) {
outToServer.writeBytes("EXIT");
exit = true;
}
}
}
}
public final class CalServer
{
static final int PORT_NUMBER = 6789;
public static void main(String[] args) throws IOException
{
try {
ServerSocket welcomeSocket = new ServerSocket(PORT_NUMBER);
System.out.println("Listening");
while (true) {
Socket connectionSocket = welcomeSocket.accept();
if (connectionSocket != null) {
CalRequest request = new CalRequest(connectionSocket);
Thread thread = new Thread(request);
thread.start();
}
}
} catch (IOException ioe) {
System.out.println("IOException on socket listen: " + ioe);
ioe.printStackTrace();
}
}
}
final class CalRequest implements Runnable
{
Socket socket;
BufferedReader inFromClient;
DataOutputStream outToClient;
TreeSet<Integer> numbers = new TreeSet<Integer>();
int sum = 0;
public CalRequest(Socket socket)
{
this.socket = socket;
}
#Override
public void run()
{
try {
inFromClient = new BufferedReader(new InputStreamReader(socket.getInputStream()));
outToClient = new DataOutputStream(socket.getOutputStream());
while(inFromClient.readLine()!= null) {
processRequest(inFromClient.readLine());
}
} catch (IOException e) {
e.printStackTrace();
}
}
public void processRequest(String string) throws IOException
{
String strAction = string.substring(0,3);
if (strAction.equals("LIS")) {
String strNumbers = string.substring(5);
String[] strNumberArr;
strNumberArr = strNumbers.split(",");
// convert each element of the string array to type Integer and add it to a treeSet container.
for (int i=0; i<strNumberArr.length; i++)
numbers.add(new Integer(Integer.parseInt(strNumberArr[i])));
}
else if (strAction.equals("SUM")) {
#SuppressWarnings("rawtypes")
Iterator it = numbers.iterator();
int total = 0;
while (it.hasNext()) {
total += (Integer)(it.next());
}
}
else if (strAction.equals("MAX")) {
outToClient.writeBytes("The max is: " + Integer.toString(numbers.last()));
}
else if (strAction.equals("MIN")) {
outToClient.writeBytes("The max is: " + Integer.toString(numbers.first()));
}
}
}
Since you are using readLine(), I would guess that you actually need to send line terminators.
My experience with TCP socket communications uses ASCII data exclusively, and my code reflects that I believe. If that's the case for you, you may want to try this:
First, try instantiating your data streams like this:
socket = new Socket (Dest, Port);
toServer = new PrintWriter (socket.getOutputStream(), true);
fromServer = new BufferedReader (new InputStreamReader
(socket.getInputStream()), 8000);
The true at the end the printWriter constructor tells it to auto flush (lovely term) the buffer when you issue a println.
When you actually use the socket, use the following:
toServer.println (msg.trim());
resp = fromServer.readLine().trim();
I don't have to append the \n to the outgoing text myself, but this may be related to my specific situation (more on that below). The incoming data needs to have a \n at its end or readLine doesn't work. I assume there are ways you could read from the socket byte by byte, but also that the code would not be nearly so simple.
Unfortunately, the TCP server I'm communicating with is a C++ program so the way we ensure the \n is present in the incoming data isn't going to work for you (And may not be needed in the outgoing data).
Finally, if it helps, I built my code based on this web example:
http://content.gpwiki.org/index.php/Java:Tutorials:Simple_TCP_Networking
Edit: I found another code example that uses DataOutputStream... You may find it helpful, assuming you haven't already seen it.
http://systembash.com/content/a-simple-java-tcp-server-and-tcp-client/

Resources