In my application, I'm trying to process data in IMap, the scenario is as follows:
application recieves request (REST for example) with set of keys to be processed
application processes entries with given key and returns result - map where key is original key of the entry and result is calculated
for this scenario IMap.executeOnKeys is almost perfect, with one problem - the entry is locked while being processed - and really it hurts thruput. The IMap is populated on startup and never modified.
Is it possible to process entries without locking them? If possible without sending entries to another node and without causing network overhead (sending 1000 tasks to single node in for-loop)
Here is reference implementation to demonstrate what I'm trying to achieve:
public class Main {
public static void main(String[] args) throws Exception {
HazelcastInstance instance = Hazelcast.newHazelcastInstance();
IMap<String, String> map = instance.getMap("the-map");
// populated once on startup, never modified
for (int i = 1; i <= 10; i++) {
map.put("key-" + i, "value-" + i);
}
Set<String> keys = new HashSet<>();
keys.add("key-1"); // every requst may have different key set, they may overlap
System.out.println(" ---- processing ----");
ForkJoinPool pool = new ForkJoinPool();
// to simulate parallel requests on the same entry
pool.execute(() -> map.executeOnKeys(keys, new MyEntryProcessor("first")));
pool.execute(() -> map.executeOnKeys(keys, new MyEntryProcessor("second")));
System.out.println(" ---- pool is waiting ----");
pool.shutdown();
pool.awaitTermination(5, TimeUnit.MINUTES);
System.out.println(" ------ DONE -------");
}
static class MyEntryProcessor implements EntryProcessor<String, String> {
private String name;
MyEntryProcessor(String name) {
this.name = name;
}
#Override
public Object process(Map.Entry<String, String> entry) {
System.out.println(name + " is processing " + entry);
return calculate(entry); // may take some time, doesn't modify entry
}
#Override
public EntryBackupProcessor<String, String> getBackupProcessor() {
return null;
}
}
}
Thanks in advance
In executeOnKeys the entries are not locked. Maybe you mean that the processing happens on partitionThreads, so that there may be no other processing for the particular key? Anyhow, here's the solution:
Your EntryProcessor should implement:
Offloadable interface -> this means that the partition-thread will be used only for reading the value. The calculation will be done in the offloading thread-pool.
ReadOnly interface -> in this case the EP won't hop on the partition-thread again to save the modification you might have done in the entry. Since your EP does not modify entries, this will increase the performance.
Related
I am using Spring Boot and H2 db. I have a Product entity and I want my application to be able to remove the product from the database, but my requirement is this: first set the active flag to false ( then this row will not be taken into account during fetching ) and after a specific period of time completely remove this row from db.
#Entity
#Table(name = "products")
public class Product {
#Id
#GeneratedValue(generator = "inc")
#GenericGenerator(name = "inc", strategy = "increment")
private int id;
private boolean active = true;
// getters & setters
}
And my method from the service layer responsible for setting the active flag to false and later complete deletion (I have nothing that does the second part of my requirement - complete deletion after specific period of time)
#Transactional
public void deleteProduct(int id) {
var target = repository.findProductById(id)
.orElseThrow(() -> new IllegalArgumentException("No product with given id"));
target.setActive(false);
// what should I add here to remove the target after a specific time?
}
EDIT
OK, I solved my problem:
#Transactional
public void deleteProduct(int id) {
var target = repository.findProductByIdAndActiveTrue(id)
.orElseThrow(() -> new IllegalArgumentException("No product with gicen id"));
target.setActive(false);
// complete removal after 150 seconds
new Thread(() -> {
try {
Thread.sleep(150000);
repository.deleteById(id);
} catch (Exception e) {
logger.error("Error removing the product");
}
}).start();
}
But now my question is if this is a safe solution as the user may start too many threads now so I think there is a better solution to my problem (safer in terms of multithreading)
I am not an expert but i think what you trying to achieve is bad practice.
I believe you should do a scheduling, for example ones per day.
You should update in db the value active. Set a schedule that will check the entries each time and if they have an active value of false then delete. Something like this:
public void deleteProduct(int id) {
//update value to false
repository.updateProductValue(id,false);
}
and your scheduling method:
#Scheduled(fixedRate = 150000)
public void deleteNonActiveProducts() {
List<Product> products = repository.findAllByFalseValue();
products.forEach(product -> repository.deleteById(product.getId());
}
With this what you are doing is that every 150000 milliseconds you repeat that task and each execution of this task is independent and non parallel.
Hope is useful to you.
I understand that Near Caches are not guaranteed to be synchronized real-time when the value is updated elsewhere on some other node.
However I do expect it to be in sync with the EntryUpdatedListener that is on the same node and therefore the same process - or am I missing something?
Sequence of events:
Cluster of 1 node modifies the same key/value, flipping a value from X to Y and back to X on an interval every X seconds.
A client connects to this cluster node and adds an EntryUpdatedListener to observe the flipping value.
Client receives the EntryUpdatedEvent and prints the value given - as expected, it gives the value recently set.
Client immediately does a map.get for the same key (which should hit the near cache), and it prints a STALE value.
I find this strange - it means that two "channels" within the same client process are showing inconsistent versions of data. I would only expect this between different processes.
Below is my reproducer code:
public class ClusterTest {
private static final int OLD_VALUE = 10000;
private static final int NEW_VALUE = 88888;
private static final int KEY = 5;
private static final int NUMBER_OF_ENTRIES = 10;
public static void main(String[] args) throws Exception {
HazelcastInstance instance = Hazelcast.newHazelcastInstance();
IMap map = instance.getMap("test");
for (int i = 0; i < NUMBER_OF_ENTRIES; i++) {
map.put(i, 0);
}
System.out.println("Size of map = " + map.size());
boolean flag = false;
while(true) {
int value = flag ? OLD_VALUE : NEW_VALUE;
flag = !flag;
map.put(KEY, value);
System.out.println("Set a value of [" + value + "]: ");
Thread.sleep(1000);
}
}
}
public class ClientTest {
public static void main(String[] args) throws InterruptedException {
HazelcastInstance instance = HazelcastClient.newHazelcastClient(new ClientConfig().addNearCacheConfig(new NearCacheConfig("test")));
IMap map = instance.getMap("test");
System.out.println("Size of map = " + map.size());
map.addEntryListener(new MyEntryListener(instance), true);
new CountDownLatch(1).await();
}
static class MyEntryListener
implements EntryAddedListener,
EntryUpdatedListener,
EntryRemovedListener {
private HazelcastInstance instance;
public MyEntryListener(HazelcastInstance instance) {
this.instance = instance;
}
#Override
public void entryAdded(EntryEvent event) {
System.out.println("Entry Added:" + event);
}
#Override
public void entryRemoved(EntryEvent event) {
System.out.println("Entry Removed:" + event);
}
#Override
public void entryUpdated(EntryEvent event) {
Object o = instance.getMap("test").get(event.getKey());
boolean equals = o.equals(event.getValue());
String s = "Event matches what has been fetched = " + equals;
if (!equals) {
s += ", EntryEvent value has delivered: " + (event.getValue()) + ", and an explicit GET has delivered:" + o;
}
System.out.println(s);
}
}
}
The output from the client:
INFO: hz.client_0 [dev] [3.11.1] HazelcastClient 3.11.1 (20181218 - d294f31) is CLIENT_CONNECTED
Jun 20, 2019 4:58:15 PM com.hazelcast.internal.diagnostics.Diagnostics
INFO: hz.client_0 [dev] [3.11.1] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
Size of map = 10
Event matches what has been fetched = true
Event matches what has been fetched = false, EntryEvent value has delivered: 88888, and an explicit GET has delivered:10000
Event matches what has been fetched = true
Event matches what has been fetched = true
Event matches what has been fetched = false, EntryEvent value has delivered: 10000, and an explicit GET has delivered:88888
Near Cache has Eventual Consistency guarantee, while Listeners work in a fire & forget fashion. That's why there are two different mechanisms for both. Also, batching for near cache events reduces the network traffic and keeps the eventing system less busy (this helps when there are too many invalidations or clients), as a tradeoff it may increase the delay of individual invalidations. If you are confident that your system can handle each invalidation event, you can disable batching.
You need to configure the property on member side as events are generated on cluster members and sent to clients.
I am using Akka.net and looking to implement a reactive equivalent of a 'DDD repository', from what I have seen from here http://qnalist.com/questions/5585484/ddd-eventsourcing-with-akka-persistence and https://gitter.im/petabridge/akka-bootcamp/archives/2015/06/25
I understand the idea of having a coordinator that keeps a number of actors in memory according to some live in-memory count or some amount of elapsed time.
As a summary (based on the links above) I am trying to:
Create an Aggregate coordinator (for each actor type) that returns aggregates on request.
Each aggregate uses Context.SetReceiveTimeout method to identify if it's not used for some period of time. If so, it will receive ReceiveTimeout message.
On receipt of timeout message, the Child will send a Passivate message back to coordinator (which in turn will then cause the coordinator to shut the child down).
Whilst the child is being shutdown, all messages to child are intercepted by the coordinator and buffered.
Once shutdown of child has been confirmed (in the coordinator), if there are buffered messages for that child it is recreated and all messages flushed through to the recreated child.
How would one intercept the messages that are being attempted to be sent to the child (step 4) and instead route them to the parent? Or in other words I want the child to say at the point of sending the Passivate message to also say "hey don't send me anymore messages, send them to my parent instead".
This would save me routing everything through the coordinator (or am i going about it in the wrong way and message intercept impossible to do, and should instead proxy everything through the parent)?
I have my message contracts:
public class GetActor
{
public readonly string Identity;
public GetActor(string identity)
{
Identity = identity;
}
}
public class GetActorReply
{
public readonly IActorRef ActorRef;
public GetActorReply(IActorRef actorRef)
{
ActorRef = actorRef;
}
}
public class Passivate // sent from child aggregate to parent coordinator
{
}
Coordinator class, which for every aggregate type there is a unique instance:
public class ActorLifetimeCoordinator r<T> : ReceiveActor where T : ActorBase
{
protected Dictionary<Identity,IActorRef> Actors = new Dictionary<Identity, IActorRef>();
protected Dictionary<Identity, List<object>> BufferedMsgs = new Dictionary<Identity, List<object>>();
public ActorLifetimeCoordinator()
{
Receive<GetActor>(message =>
{
var actor = GetActor(message.Identity);
Sender.Tell(new GetActorReply(actor), Self); // reply with the retrieved actor
});
Receive<Passivate>(message =>
{
var actorToUnload = Context.Sender;
var task = actorToUnload.GracefulStop(TimeSpan.FromSeconds(10));
// the time between the above and below lines, we need to intercept messages to the child that is being
// removed from memory - how to do this?
task.Wait(); // dont block thread, use pipeto instead?
});
}
protected IActorRef GetActor(string identity)
{
IActorRef value;
return Actors.TryGetValue(identity, out value)
? value : Context.System.ActorOf(Props.Create<T>(identity));
}
}
Aggregate base class from which all aggregates derive:
public abstract class AggregateRoot : ReceivePersistentActor
{
private readonly DispatchByReflectionStrategy _dispatchStrategy
= new DispatchByReflectionStrategy("When");
protected AggregateRoot(Identity identity)
{
PersistenceId = Context.Parent.Path.Name + "/" + Self.Path.Name + "/" + identity;
Recover((Action<IDomainEvent>)Dispatch);
Command<ReceiveTimeout>(message =>
{
Context.Parent.Tell(new Passivate());
});
Context.SetReceiveTimeout(TimeSpan.FromMinutes(5));
}
public override string PersistenceId { get; }
private void Dispatch(IDomainEvent domainEvent)
{
_dispatchStrategy.Dispatch(this, domainEvent);
}
protected void Emit(IDomainEvent domainEvent)
{
Persist(domainEvent, success =>
{
Dispatch(domainEvent);
});
}
}
Easiest (but not simplest) option here is to use Akka.Cluster.Sharding module, which covers areas of coordinator pattern with support for actors distribution and balancing across the cluster.
If you will choose that you don't need it, unfortunately you'll need to pass messages through coordinator - messages themselves need to provide identifier used to determine recipient. Otherwise you may end up sending messages to dead actor.
I have partitioned (IMDB) I would like to start a compute task on each node which do some calculation on each node IMDB against ALL records on THE node it was executed. Thus each task do a part of the job.
it seems that that colocation is not quite possible since I can not restrict access to the data on the node.
Please confirm or suggest a solution.
Sounds like you are asking how to collocate computations with nodes on where the data is cached. You can take a look at CacheAffinityExample shipped with GridGain. Specifically, the following code snippet:
for (int i = 0; i < KEY_CNT; i++) {
final int key = i;
// This callable will execute on the remote node where
// data with the given key is located.
grid.compute().affinityCall(CACHE_NAME, key, new GridCallable() {
#Override public void call() throws Exception {
String val = cache.get(key);
// Work on cached value.
...
return val;
}
}).get();
}
This code will send a closure to every node and do calculation against all data on the node:
grid.forCache("mycache").compute().broadcast(new GridRunnable() {
#Override public void run() {
for (GridCacheEntry<K, V> e : cache.entrySet()) {
// Do something
...
}
}
}).get();
Sorry for big chunk of code, I couldn't explain that with less.Basically I'm trying to write into a file from many tasks.
Can you guys please tell me what I'm doing wrong? _streamWriter.WriteLine() throws the ArgumentOutOfRangeException.
class Program
{
private static LogBuilder _log = new LogBuilder();
static void Main(string[] args)
{
var acts = new List<Func<string>>();
var rnd = new Random();
for (int i = 0; i < 10000; i++)
{
acts.Add(() =>
{
var delay = rnd.Next(300);
Thread.Sleep(delay);
return "act that that lasted "+delay;
});
}
Parallel.ForEach(acts, act =>
{
_log.Log.AppendLine(act.Invoke());
_log.Write();
});
}
}
public class LogBuilder : IDisposable
{
public StringBuilder Log = new StringBuilder();
private FileStream _fileStream;
private StreamWriter _streamWriter;
public LogBuilder()
{
_fileStream = new FileStream("log.txt", FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite);
_streamWriter = new StreamWriter(_fileStream) { AutoFlush = true };
}
public void Write()
{
lock (Log)
{
if (Log.Length <= 0) return;
_streamWriter.WriteLine(Log.ToString()); //throws here. Although Log.Length is greater than zero
Log.Clear();
}
}
public void Dispose()
{
_streamWriter.Close(); _streamWriter.Dispose(); _fileStream.Close(); fileStream.Dispose();
}
}
This is not a bug in StringBuilder, it's a bug in your code. And the modification you shown in your followup answer (where you replace Log.String with a loop that extracts characters one at a time) doesn't fix it. It won't throw an exception any more, but it won't work properly either.
The problem is that you're using the StringBuilder in two places in your multithreaded code, and one of them does not attempt to lock it, meaning that reading can occur on one thread simultaneously with writing occurring on another. In particular, the problem is this line:
_log.Log.AppendLine(act.Invoke());
You're doing that inside your Parallel.ForEach. You are not making any attempt at synchronization here, even though this will run on multiple threads at once. So you've got two problems:
Multiple calls to AppendLine may be in progress simultaneously on multiple threads
One thread may attempt to be calling Log.ToString at the same time as one or more other threads are calling AppendLine
You'll only get one read at a time because you are using the lock keyword to synchronize those. The problem is that you're not also acquiring the same lock when calling AppendLine.
Your 'fix' isn't really a fix. You've succeeded only in making the problem harder to see. It will now merely go wrong in different and more subtle ways. For example, I'm assuming that your Write method still goes on to call Log.Clear after your for loop completes its final iteration. Well in between completing that final iteration, and making the call to Log.Clear, it's possible that some other thread will have got in another call to AppendLine because there's no synchronization on those calls to AppendLine.
The upshot is that you will sometimes miss stuff. Code will write things into the string builder that then get cleared out without ever being written to the stream writer.
Also, there's a pretty good chance of concurrent AppendLine calls causing problems. If you're lucky they will crash from time to time. (That's good because it makes it clear you have a problem to fix.) If you're unlucky, you'll just get data corruption from time to time - two threads may end up writing into the same place in the StringBuilder resulting either in a mess, or completely lost data.
Again, this is not a bug in StringBuilder. It is not designed to support being used simultaneously from multiple threads. It's your job to make sure that only one thread at a time does anything to any particular instance of StringBuilder. As the documentation for that class says, "Any instance members are not guaranteed to be thread safe."
Obviously you don't want to hold the lock while you call act.Invoke() because that's presumably the very work you want to parallelize. So I'd guess something like this might work better:
string result = act();
lock(_log.Log)
{
_log.Log.AppendLine(result);
}
However, if I left it there, I wouldn't really be helping you, because this looks very wrong to me.
If you ever find yourself locking a field in someone else's object, it's a sign of a design problem in your code. It would probably make more sense to modify the design, so that the LogBuilder.Write method accepts a string. To be honest, I'm not even sure why you're using a StringBuilder here at all, as you seem to use it just as a holding area for a string that you immediately write to a stream writer. What were you hoping the StringBuilder would add here? The following would be simpler and doesn't seem to lose anything (other than the original concurrency bugs):
public class LogBuilder : IDisposable
{
private readonly object _lock = new object();
private FileStream _fileStream;
private StreamWriter _streamWriter;
public LogBuilder()
{
_fileStream = new FileStream("log.txt", FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite);
_streamWriter = new StreamWriter(_fileStream) { AutoFlush = true };
}
public void Write(string logLine)
{
lock (_lock)
{
_streamWriter.WriteLine(logLine);
}
}
public void Dispose()
{
_streamWriter.Dispose(); fileStream.Dispose();
}
}
I think the cause is because you are accessing the stringBuilder in the Parellel bracket
_log.Log.AppendLine(act.Invoke());
_log.Write();
and inside the LogBuilder you perform lock() to disallow memory allocation on stringBuidler. You are changing the streamwriter to handle the log in every character so would give the parellel process to unlock the memory allocation to stringBuilder.
Segregate the parallel process into distinct action would likely reduce the problem
Parallel.ForEach(acts, act =>
{
_log.Write(act.Invoke());
});
in the LogBuilder class
private readonly object _lock = new object();
public void Write(string logLines)
{
lock (_lock)
{
//_wr.WriteLine(logLines);
Console.WriteLine(logLines);
}
}
An alternate approach is to use TextWriter.Synchronized to wrap StreamWriter.
void Main(string[] args)
{
var rnd = new Random();
var writer = new StreamWriter(#"C:\temp\foo.txt");
var syncedWriter = TextWriter.Synchronized(writer);
var tasks = new List<Func<string>>();
for (int i = 0; i < 1000; i++)
{
int local_i = i; // get a local value, not closure-reference to i
tasks.Add(() =>
{
var delay = rnd.Next(5);
Thread.Sleep(delay);
return local_i.ToString() + " act that that lasted " + delay.ToString();
});
}
Parallel.ForEach(tasks, task =>
{
var value = task();
syncedWriter.WriteLine(value);
});
writer.Dispose();
}
Here are some of the synchronization helper classes
http://referencesource.microsoft.com/#q=Synchronized
System.Collections
static ArrayList Synchronized(ArrayList list)
static IList Synchronized(IList list)
static Hashtable Synchronized(Hashtable table)
static Queue Synchronized(Queue queue)
static SortedList Synchronized(SortedList list)
static Stack Synchronized(Stack stack)
System.Collections.Generic
static IList Synchronized(List list)
System.IO
static Stream Synchronized(Stream stream)
static TextReader Synchronized(TextReader reader)
static TextWriter Synchronized(TextWriter writer)
System.Text.RegularExpressions
static Match Synchronized(Match inner)
static Group Synchronized(Group inner)
It is seems that it isn't problem of Parallelism. It's StringBuilder's problem.
I have replaced:
_streamWriter.WriteLine(Log.ToString());
with:
for (int i = 0; i < Log.Length; i++)
{
_streamWriter.Write(Log[i]);
}
And it worked.
For future reference: http://msdn.microsoft.com/en-us/library/system.text.stringbuilder(v=VS.100).aspx
Memory allocation section.