Solrnet/Tomcat 7 - writing several large documents memory consumption growing alarmingly

Solrnet/Tomcat 7 - writing several large documents memory consumption growing alarmingly - c#-4.0

I am writing very large (both size and count) documents to a solr index(100s of fields with many numeric and some text fields) . I am using Tomcat 7 on W7 x64.
Based on #Maurico's suggestion when indexing millions of documents I parallelize the write operation (see code sample below)
The write to Solr method is being "Task"ed out from a main loop (Note: I task it out since the write op takes too long and holds up the main app)
The problem is that the memory consumption grows uncontrollably, the culprit is the solr write operations (when I comment them out the run works fine). How do I handle this issue? via Tomcat? or SolrNet?
Thanks for your suggestions.
//main loop:
{
:
:
:
//indexDocsList is the list I create in main loop and "chunk" it out to send to the task.
List<IndexDocument> indexDocsList = new List<IndexDocument>();
for(int n = 0; n< N; n++)
{
indexDocsList.Add(new IndexDocument{X=1, Y=2.....});
if(n%5==0) //every 5th time we write to solr
{
var chunk = new List<IndexDocument>(indexDocsList);
indexDocsList.Clear();
Task.Factory.StartNew(() => WriteToSolr(chunk)).ContinueWith(task => chunk.Clear());
GC.Collect();
}
}
}
private void WriteToSolr(List<IndexDocument> indexDocsList)
{
try
{
if (indexDocsList == null) return;
if (indexDocsList.Count <= 0) return;
int fromInclusive = 0;
int toExclusive = indexDocsList.Count;
int subRangeSize = 25;
//TO DO: This is still leaking some serious memory, need to fix this
ParallelLoopResult results = Parallel.ForEach(Partitioner.Create(fromInclusive, toExclusive, subRangeSize), (range) =>
{
_solr.AddRange(indexDocsList.GetRange(range.Item1, range.Item2 - range.Item1));
_solr.Commit();
});
indexDocsList.Clear();
GC.Collect();
}
catch (Exception ex)
{
logger.ErrorException("WriteToSolr()", ex);
}
finally
{
GC.Collect();
};
return;
}

You are manually committing after each batch. This is the most expensive operation for Solr. In your case, I would recommend autoCommit every x seconds and do a softAutoCommit (Solr 4.0) feature. That should take care of Solr's side of things. You'll also have to tweak your JVM garbage collection options so that you don't get pause the world GC.

Related

Non-Blocking Thread-Safe Counter for JavaFX

I am trying to implement a thread-safe solution to keep a count of successful tasks that have been completed, which will ultimately get bound to label displayed on the UI. However, when I use the AtomicInteger below it locks up my UI when the tasks start running, however, if I remove all AtomicInteger refs everything works fine. Is there a non-blocking, thread-safe way which this can be accomplished?
public void handleSomeButtonClick(){
if(!dataModel.getSomeList().isEmpty()) {
boolean unlimited = false;
int count = 0;
AtomicInteger successCount = new AtomicInteger(0);
if(countSelector.getValue().equalsIgnoreCase("Unlimited"))
unlimited = true;
else
count = Integer.parseInt(countSelector.getValue());
while(unlimited || successCount.get() < count) {
Task task = getSomeTask();
taskExecutor.submit(task);
task.setOnSucceeded(event -> {
if (task.getValue())
log.info("Successfully Completed Task | Total Count: " + successCount.incrementAndGet());
else
log.error("Failed task");
});
}
}
}

Your loop waits for a certain number of tasks to be completed. It may even be an infinite loop.
This is not a good idea:
You block the calling thread which seems to be the JavaFX application thread.
You don't have any control of how many tasks are submitted. count could be 3, but since you only schedule the tasks in the loop, 1000 or more tasks could be created&scheduled before the first one completes.
Furthermore if you use onSucceeded/onFailed, you don't need to use AtomicInteger or any similar kind of synchronisation, since those handlers all run on the JavaFX application thread.
Your code could be rewritten like this:
private int successCount;
private void scheduleTask(final boolean unlimited) {
Task task = getSomeTask();
task.setOnSucceeded(event -> {
// cannot get a Boolean from a raw task, so I assume the task is successfull iff no exception happens
successCount++;
log.info("Successfully Completed Task | Total Count: " + successCount);
if (unlimited) {
// submit new task, if the number of tasks is unlimited
scheduleTask(true);
}
});
// submit new task on failure
task.setOnFailed(evt -> scheduleTask(unlimited));
taskExecutor.submit(task);
}
public void handleSomeButtonClick() {
if(!dataModel.getSomeList().isEmpty()) {
successCount = 0;
final boolean unlimited;
final int count;
if(countSelector.getValue().equalsIgnoreCase("Unlimited")) {
unlimited = true;
count = 4; // set limit of number of tasks submitted to the executor at the same time
} else {
count = Integer.parseInt(countSelector.getValue());
unlimited = false;
}
for (int i = 0; i < count; i++) {
scheduleTask(unlimited);
}
}
}
Note: This code runs the risk of handleButtonClick being clicked multiple times before the previous tasks have been completed. You should either prevent scheduling new tasks before the old ones are completed or use some reference type containing an int instead for the count, create this object in handleSomeButtonClick and pass this object to scheduleTask.

Your UI lock up means you do the counting(successCount.get() < count) in your FX application thread. I cannot understand why you keep submit the task in the while loop,
which one do you want to do? (1) start X(e.g. 10) task and count how many task is success. or (2) just keep starting new task and see the count go up.
if(2) then run the whole while loop in a background thread, update the UI in a Platform->runlater().
if(1) use the Future / CompletableFuture, or more powerful version Future in 3rd party package like vavr.

Your problem is future.get() block and wait for result.
This will be simple if you use Vavr library.
Because it can attach a code to its future which run automatically when success or fail.
So you don't have to wait.
Here is a example which using Vavr's future.
CheckedFunction0<String> thisIsATask = () -> {
if ( /*do something*/ ){
throw new Exception("Hey");
}
return "ABC";
};
List<Future<String>> futureList = new ArrayList<>();
for (int x = 0; x < 10; x++) {
futureList.add(Future.of(getExecutorService(), thisIsATask));
}
futureList.forEach((task) -> {
// This will run if success
task.onSuccess(s -> s.equals("ABC") ? Platform.runLater(()->UpdateCounter()) : wtf());
// Your get the exception if it is fail;
task.onFailure(e -> e.printStackTrace());
// task.onComplete() will run on any case when complete
});
This is not blocking, the code at onSucess onFailure or onComplete will run when the task is finish or an exception is catch.
Note: Future.of will use the executorService you pass in to run each task at new thread, the code you provide at onSuccess will continue to run at that thread once the task is done so if you calling javafx remember the Platform.runLater()
Also if you want to run something when all task is finish, then
// the code at onComplete will run when tasks all done
Future<Seq<String>> all = Future.sequence(futureList);
all.onComplete((i) -> this.btnXYZ.setDisable(false));

Why is Hazelcast so slow for basic operations like get and put?

I have a two box cluster running Hazelcast 3.6.
I am trying to run some very basic tests to get a feel for performance. Gets, puts and an aggregation across a map of Integers are all seriously slow.
I am seeing about 2.6 get or put operations per second. Any idea why this might be?
This is my test code:
public static void main(String[] args) {
ClientConfig clientConfig = new ClientConfig();
clientConfig.getGroupConfig().setName("dev").setPassword("dev-pass");
clientConfig.getNetworkConfig().addAddress("10.90.288.14", "10.90.288.15");
HazelcastInstance client = HazelcastClient.newHazelcastClient(clientConfig);
IMap<Integer, Integer> map = client.getMap("int-map");
for (int i = 0; i < 10000; i++) {
map.put(i, i);
System.out.println(i);
}
for (int i = 0; i < 10000; i++) {
map.get(i);
System.out.println(i);
}
System.out.println("Summed:" + map.values().stream().mapToInt(Integer::intValue).sum());
client.shutdown();
System.out.println("Finished");
}

First thing I would fix is to remove the System.outs on every get since this has a significant overhead. If you really want to log something, add something like this:
if(k%1000==0)println("whatever")
Also you need a lot longer running time to say anything sensible, with 1k runs, you have not even heated up the JIT. So run at least 1M time.
And I would calculate the performance directly by storing the time before you start, and storing the time after the test. And you get something like
long duration=endMs-startMs;
double throughput = (1000d*iteration)/duration

ScriptProcessorNode Memory leak

I'm working on a large project that relies heavily on web audio and ScriptProcessorNodes. After some recent intermittent crashing I've tracked down the problems to memory leaking from the ScriptProcessorNodes. I've read many many tutorials, guides, bug reports, etc.. and none of it seems to be helping. Here's a small toy example:
http://jsfiddle.net/6YBWf/
var context = new webkitAudioContext();
function killNode(node)
{
return function()
{
node.disconnect();
node.onaudioprocess = null;
node = null;
}
}
function noise()
{
var node = context.createScriptProcessor(1024, 0, 1);
node.onaudioprocess = function(e)
{
var output = e.outputBuffer.getChannelData(0);
for(var i = 0; i < 1024; ++i)
{
output[i] = (Math.random() * 2 - 1) * 0.001;
}
}
node.connect(context.destination);
setTimeout(killNode(node), 100);
}
function generateNoise()
{
for(var i = 0; i < 99999; ++i)
{
noise();
}
}
generateNoise();
This will spin up many nodes and then disconnect them and set their onaudioprocess to null. From what I've read, given that I'm not retaining any references to them, shouldn't they get garbage collected?
My computer memory jumps up to about 16% and settles down to 14% a bit later but never goes below that. Can anyone show me an example similar to this where the nodes get properly collected? Is there something obvious I'm missing?

This has been confirmed as a regression in Chrome:
https://code.google.com/p/chromium/issues/detail?id=379753

Winforms updates with high performance

Let me setup this question with some background information, we have a long running process which will be generating data in a Windows Form. So, obviously some form of multi-threading is going to be needed to keep the form responsive. But, we also have the requirement that the form updates as many times per second while still remaining responsive.
Here is a simple test example using background worker thread:
void bw_ProgressChanged(object sender, ProgressChangedEventArgs e)
{
int reportValue = (int)e.UserState;
label1.Text = reportValue;
//We can put this.Refresh() here to force repaint which gives us high repaints but we lose
//all other responsiveness with the control
}
void bw_DoWork(object sender, DoWorkEventArgs e)
{
for (int x = 0; x < 100000; x++)
{
//We could put Thread.Sleep here but we won't get highest performance updates
bw.ReportProgress(0, x);
}
}
Please see the comments in the code. Also, please don't question why I want this. The question is simple, how do we achieve the highest fidelity (most repaints) in updating the form while maintaining responsiveness? Forcing the repaint does give us updates but we don't process windows messages.
I have also try placing DoEvents but that produces stack overflow. What I need is some way to say, "process any windows messages if you haven't lately". I can see also that maybe a slightly different pattern is needed to achieve this.
It seems we need to handle a few issues:
Updating the Form through the non UI thread. There are quite a few solution to this problem such as invoke, synchronization context, background worker pattern.
The second problem is flooding the Form with too many updates which blocks the message processing and this is the issue around which my question really concerns. In most examples, this is handles trivially by slowing down the requests with an arbitrary wait or only updating every X%. Neither of these solutions are approriate for real-world applications nor do they meet the maximum update while responsive criteria.
Some of my initial ideas on how to handle this:
Queue the items in the background worker and then dispatch them in a UI thread. This will ensure every item is painted but will result in lag which we don't want.
Perhaps use TPL
Perhaps use a timer in the UI thread to specify a refresh value. In this way, we can grab the data at the fastest rate that we can process. It will require accessing/sharing data across threads.
Update, I've updated to use a Timer to read a shared variable with the Background worker thread updates. Now for some reason, this method produces a good form response and also allows the background worker to update about 1,000x as fast. But, interestingly it only 1 millisecond accurate.
So we should be able to change the pattern to read the current time and call the updates from the bw thread without the need for the timer.
Here is the new pattern:
//Timer setup
{
RefreshTimer.SynchronizingObject = this;
RefreshTimer.Elapsed += RefreshTimer_Elapsed;
RefreshTimer.AutoReset = true;
RefreshTimer.Start();
}
void bw_DoWork(object sender, DoWorkEventArgs e)
{
for (int x = 0; x < 1000000000; x++)
{
//bw.ReportProgress(0, x);
//mUiContext.Post(UpdateLabel, x);
SharedX = x;
}
}
void RefreshTimer_Elapsed(object sender, System.Timers.ElapsedEventArgs e)
{
label1.Text = SharedX.ToString();
}
Update And here we have the new solution that doesn't require the timer and doesn't block the thread! We achieve a high performance in calculations and fidelity on the updates with this pattern. Unfortunately, ticks TickCount is only 1 MS accurate, however we can run a batch of X updates per MS to get faster then 1 MS timing.
void bw_DoWork(object sender, DoWorkEventArgs e)
{
long lastTickCount = Environment.TickCount;
for (int x = 0; x < 1000000000; x++)
{
if (Environment.TickCount - lastTickCount > 1)
{
bw.ReportProgress(0, x);
lastTickCount = Environment.TickCount;
}
}
}

There is little point in trying to report progress any faster than the user can keep track of it.
If your background thread is posting messages faster than the GUI can process them, (and you have all the symtoms of this - poor GUI resonse to user input, DoEvents runaway recursion), you have to throttle the progress updates somehow.
A common approach is to update the GUI using a main-thread form timer at a rate sufficiently small that the user sees an acceptable progress readout. You may need a mutex or critical section to protect shared data, though that amy not be necessary if the progress value to be monitored is an int/uint.
An alternative is to strangle the thread by forcing it to block on an event or semaphore until the GUI is idle.

The UI thread should not be held for more than 50ms by a CPU-bound operation taking place on it ("The 50ms Rule"). Usually, the UI work items are executed upon events, triggered by user input, completion of an IO-bound operation or a CPU-bound operation offloaded to a background thread.
However, there are some rare cases when the work needs to be done on the UI thread. For example, you may need to poll a UI control for changes, because the control doesn't expose proper onchange-style event. Particularly, this applies to WebBrowser control (DOM Mutation Observers are only being introduced, and IHTMLChangeSink doesn't always work reliably, in my experience).
Here is how it can be done efficiently, without blocking the UI thread message queue. A few key things was used here to make this happen:
The UI work tasks yields (via Application.Idle) to process any pending messages
GetQueueStatus is used to decide on whether to yield or not
Task.Delay is used to throttle the loop, similar to a timer event. This step is optional, if the polling needs to be as precise as possible.
async/await provide pseudo-synchronous linear code flow.
using System;
using System.Threading;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace WinForms_21643584
{
public partial class MainForm : Form
{
EventHandler ContentChanged = delegate { };
public MainForm()
{
InitializeComponent();
this.Load += MainForm_Load;
}
// Update UI Task
async Task DoUiWorkAsync(CancellationToken token)
{
try
{
var startTick = Environment.TickCount;
var editorText = this.webBrowser.Document.Body.InnerText;
while (true)
{
// observe cancellation
token.ThrowIfCancellationRequested();
// throttle (optional)
await Task.Delay(50);
// yield to keep the UI responsive
await ApplicationExt.IdleYield();
// poll the content for changes
var newEditorText = this.webBrowser.Document.Body.InnerText;
if (newEditorText != editorText)
{
editorText = newEditorText;
this.status.Text = "Changed on " + (Environment.TickCount - startTick) + "ms";
this.ContentChanged(this, EventArgs.Empty);
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
async void MainForm_Load(object sender, EventArgs e)
{
// navigate the WebBrowser
var documentTcs = new TaskCompletionSource<bool>();
this.webBrowser.DocumentCompleted += (sIgnore, eIgnore) => documentTcs.TrySetResult(true);
this.webBrowser.DocumentText = "<div style='width: 100%; height: 100%' contentEditable='true'></div>";
await documentTcs.Task;
// cancel updates in 10 s
var cts = new CancellationTokenSource(20000);
// start the UI update
var task = DoUiWorkAsync(cts.Token);
}
}
// Yield via Application.Idle
public static class ApplicationExt
{
public static Task<bool> IdleYield()
{
var idleTcs = new TaskCompletionSource<bool>();
if (IsMessagePending())
{
// register for Application.Idle
EventHandler handler = null;
handler = (s, e) =>
{
Application.Idle -= handler;
idleTcs.SetResult(true);
};
Application.Idle += handler;
}
else
idleTcs.SetResult(false);
return idleTcs.Task;
}
public static bool IsMessagePending()
{
// The high-order word of the return value indicates the types of messages currently in the queue.
return 0 != (GetQueueStatus(QS_MASK) >> 16 & QS_MASK);
}
const uint QS_MASK = 0x1FF;
[System.Runtime.InteropServices.DllImport("user32.dll")]
static extern uint GetQueueStatus(uint flags);
}
}
This code is specific to WinForms. Here is a similar approach for WPF.

Handling threads for multiprocessing

I'm experiencing an issue managing threads on .Net 4.0 C#, and my knowledge of threads is not sufficient to solve it, so I've post it here expecting that somebody could give me some piece of advise please.
The scenario is the following:
We have a Windows service on C# framework 4.0 that (1)connects via socket to a server to get a .PCM file, (2)then convert it to a .WAV file, (3)send it via Email - SMTP and finally (4)notify the initial server that it was successfully sent.
The server where the service had been installed has 8 processors and 8 GB or RAM.
To allow multiprocessing I've built the service with 4 threads, each one of them performs each task I mentioned previously.
On the code, I have classes and methods for each task, so I create threads and invoke methods as follows:
Thread eachThread = new Thread(object.PerformTask);
Inside each method I'm having a While that checks if the connection of the socket is alive and continue fetching data or processing data depending on their porpuse.
while (_socket.Connected){
//perform task
}
The problem is that as more services are being installed (the same windows service is replicated and pointed between two endpoints on the server to get the files via socket) the CPU consumption increases dramatically, each service continues running and processing files but there is a moment were the CPU consumption is too high that the server just collapse.
The question is: what would you suggest me to handle this scenario, I mean in general terms what could be a good approach of handling this highly demanded processing tasks to avoid the server to collapse in CPU consumption?
Thanks.
PS.: If anybody needs more details on the scenario, please let me know.
Edit 1
With CPU collapse I mean that the server gets too slow that we have to restart it.
Edit 2
Here I post some part of the code so you can get an idea of how it's programmed:
while(true){
//starting the service
try
{
IPEndPoint endPoint = conn.SettingConnection();
string id = _objProp.Parametros.IdApp;
using (socket = conn.Connect(endPoint))
{
while (!socket.Connected)
{
_log.SetLog("INFO", "Conectando socket...");
socket = conn.Connect(endPoint);
//if the connection failed, wait 5 seconds for a new try.
if (!socket.Connected)
{
Thread.Sleep(5000);
}
}
proInThread = new Thread(proIn.ThreadRun);
conInThread = new Thread(conIn.ThreadRun);
conOutThread = new Thread(conOut.ThreadRun);
proInThread.Start();
conInThread.Start();
conOutThread.Start();
proInThread.Join();
conInThread.Join();
conOutThread.Join();
}
}
}
Edit 3
Thread 1
while (_socket.Connected)
{
try
{
var conn = new AppConection(ref _objPropiedades);
try
{
string message = conn.ReceiveMessage(_socket);
lock (((ICollection)_queue).SyncRoot)
{
_queue.Enqueue(message);
_syncEvents.NewItemEvent.Set();
_syncEvents.NewResetEvent.Set();
}
lock (((ICollection)_total_rec).SyncRoot)
{
_total_rec.Add("1");
}
}
catch (SocketException ex)
{
//log exception
}
catch (IndexOutOfRangeException ex)
{
//log exception
}
catch (Exception ex)
{
//log exception
}
//message received
}
catch (Exception ex)
{
//logging error
}
}
//release ANY instance that could be using memory
_socket.Dispose();
log = null;
Thread 2
while (_socket.Connected)
{
try{
_syncEvents.NewItemEventOut.WaitOne();
if (_socket.Connected)
{
lock (((ICollection)_queue).SyncRoot)
{
total_queue = _queue.Count();
}
int i = 0;
while (i < total_queue)
{
//EMail Emails;
string mail = "";
lock (((ICollection)_queue).SyncRoot)
{
mail = _queue.Dequeue();
i = i + 1;
}
try
{
conn.SendMessage(_socket, mail);
_syncEvents.NewResetEvent.Set();
}
catch (SocketException ex)
{
//log exception
}
}
}
else
{
//log exception
_syncEvents.NewAbortEvent.Set();
Thread.CurrentThread.Abort();
}
}
catch (InvalidOperationException e)
{
//log exception
}
catch (Exception e)
{
//log exception
}
}
//release ANY instance that could be using memory
_socket.Dispose();
conn = null;
log = null;
Thread 3
while (_socket.Connected)
{
int total_queue = 0;
try
{
_syncEvents.NewItemEvent.WaitOne();
lock (((ICollection) _queue).SyncRoot)
{
total_queue = _queue.Count();
}
int i = 0;
while (i < total_queue)
{
if (mgthreads.GetThreatdAct() <
mgthreads.GetMaxThread())
{
string message = "";
lock (((ICollection) _queue).SyncRoot)
{
message = _queue.Dequeue();
i = i + 1;
}
count++;
lock (((ICollection) _queueO).SyncRoot)
{
app.SetParameters(_socket, _id,
message, _queueO, _syncEvents,
_total_Env, _total_err);
}
Thread producerThread = new
Thread(app.ThreadJob) { Name =
"ProducerThread_" +
DateTime.Now.ToString("ddMMyyyyhhmmss"),
Priority = ThreadPriority.AboveNormal
};
producerThread.Start();
producerThread.Join();
mgthreads.IncThreatdAct(producerThread);
}
mgthreads.DecThreatdAct();
}
mgthreads.DecThreatdAct();
}
catch (InvalidOperationException e)
{
}
catch (Exception e)
{
}
Thread.Sleep(500);
}
//release ANY instance that could be using memory
_socket.Dispose();
app = null;
log = null;
mgthreads = null;
Thread 4
MessageVO mesVo =
fac.ParseMessageXml(_message);

I would lower the thread priority and have all threads pass through a Semaphore that limits concurrency to Environment.ProcessorCount. This not a perfect solution but it sounds like it is enough in this case and an easy fix.
Edit: Thinking about it, you have to fold the 10 services into one single process because otherwise you won't have centralized control about the threads that are running. If you have 10 independent processes they cannot coordinate.

There should normally be no collapse because of high cpu usage. While any of the threads is waiting for something remote to happen (for instance for the remote server to response to the request), that thread uses no cpu resource. But while it is actually doing something, it uses cpu accordingly. In the Task you mentioned, there is no inherent high cpu usage (as the saving of PCM file as WAV requires no complex algorithm), so the high cpu usage seems to be a sign of an error in programming.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Solrnet/Tomcat 7 - writing several large documents memory consumption growing alarmingly - c#-4.0

Related

Non-Blocking Thread-Safe Counter for JavaFX

Why is Hazelcast so slow for basic operations like get and put?

ScriptProcessorNode Memory leak

Winforms updates with high performance

Handling threads for multiprocessing

Categories

Resources