I have done some test using setStoreByValue(true/false) and I do not appreciate the difference.
I expect to store in the cache a lot of more that 30 objects when I use store by reference.
CacheManager manager = Caching.getCachingProvider().getCacheManager();
MutableConfiguration<String, CaLpgDataCollectionDto<CaBigNumber>> configuration = new MutableConfiguration<String, CaLpgDataCollectionDto<CaBigNumber>>();
configuration.setStoreByValue(false);
Cache<String, CaLpgDataCollectionDto<CaBigNumber>> testCache = manager.createCache("testCache", configuration);
//ICache is a Hazelcast interface that extends JCache, provides more functionality
ICache<String, CaLpgDataCollectionDto<CaBigNumber>> icache = testCache.unwrap(ICache.class);
List<CaLpgDataRowDto<CaBigNumber>> bigList = lpgDatasource.getDataRows();
while (bigList.size() <= 5000000)
{
bigList.addAll(bigList);
}
lpgDatasource.setDataRows(bigList);
System.out.println("Free memory before (bytes): " + Runtime.getRuntime().freeMemory());
for (int i = 0; i < 30 ; i++)
{
icache.put("objectTest"+i, lpgDatasource);
}
Am I using properly this propertie?
Kind regards.
JSR107 standard specifies that store-by-reference is an optional feature (see page 9 of JSR107 1.1.1 specification here. You can query a CachingProvider to test whether an optional feature is supported via CachingProvider#isSupported(OptionalFeature).
Hazelcast, being primarily used as a distributed cache, does not support store by reference. Keys and values have to be serializable in order to be transferred across the network among Hazelcast members and clients. Also, depending on the chosen in-memory storage format, values can be stored as serialized blobs (the default option with BINARY in-memory format) or as deserialized objects, however even in the latter case the value is serialized/deserialized first so it is a clone of the original.
Related
Some of the cytoscape layout is randomize where the position is not fix every time we launch it. I understand from multiple stack overflow questions where we can save the layout data (i.e. including its position x and y) into browser local storage or session storage so that we can display the same layout using the same data.
However, the problem with local storage or session storage is good for one users. But, imagine if there are thousands of users using the same app, the server will undergo mass computation for each user to store respective data to individual browsers. Can we save the data into a file format directly into app/web server so that 1000 users will see the same layout and this reduces the computation of different data set as well.
Thank you. Would like to know the possibility to convert data into a file and store in the web/app server.
Yes, you can store position data. Actually, there are 2 options in my mind.
Use cy.json(). You can store the elements as JSON like JSON.stringify(cy.json().elements) and then save this JSON string.
cy.json().elements is something like the below image
You can restore this data easily like cy.json({elements: JSON.parse(jsonStr));
As you could see cy.json().elements is a bit big thing. Position data is just a small object like {x: 0, y: 0}. Additional to position it contains many other data. So if you only need to restore the positions, you could store them manually easily with a code like below. You can use ele.id and node.position() functions.
function storePositions() {
const nodes = cy.nodes();
const nodePositions = {};
for (let i = 0; i < nodes.length; i++) {
nodePositions[nodes[i].id()] = nodes[i].position();
}
return nodePositions;
}
You can also restore node positions easily. You can use getElementById and node.position() functions.
function restorePositions(nodePositions) {
const nodes = cy.nodes();
const nodePositions = {};
for (let k in nodePositions) {
const node = cy.getElementById(k);
if (node && node.length > 0) {
node.position(nodePositions[k]);
}
}
return nodePositions;
}
I've been trying to figure this out for the past day or two with minimal results. Essentially what I want to do is send my selected comps in After Effects to Adobe Media Encoder via script, and using information about them (substrings of their comp name, width, etc - all of which I already have known and figured out), and specify the appropriate AME preset based on the conditions met. The current two methods that I've found won't work for what I'm trying to do:
https://www.youtube.com/watch?v=K8_KWS3Gs80
https://blogs.adobe.com/creativecloud/new-changed-after-effects-cc-2014/?segment=dva
Both of these options more or less rely on the output module/render queue, (with the first option allowing sending it to AME without specifying preset) which, at least to my knowledge, won't allow h.264 file-types anymore (unless you can somehow trick render queue with a created set of settings prior to pushing queue to AME?).
Another option that I've found involves using BridgeTalk to bypass the output module/render queue and go directly to AME...BUT, that primarily involves specifying a file (rather than the currently selected comps), and requires ONLY having a single comp (to be rendered) at the root level of the project: https://community.adobe.com/t5/after-effects/app-project-renderqueue-queueiname-true/td-p/10551189?page=1
Now as far as code goes, here's the relevant, non-working portion of code:
function render_comps(){
var mySelectedItems = [];
for (var i = 1; i <= app.project.numItems; i++){
if (app.project.item(i).selected)
mySelectedItems[mySelectedItems.length] = app.project.item(i);
}
for (var i = 0; i < mySelectedItems.length; i++){
var mySelection = mySelectedItems[i];
//~ front = app.getFrontend();
//~ front.addItemToBatch(mySelection);
//~ enc = eHost.createEncoderForFormat("H.264");
//~ flag = enc.loadPreset("HD 1080i 25");
//app.getFrontend().addItemToBatch(mySelection);
var bt = new BridgeTalk();
bt.appName = "ame";
bt.target = "ame";
//var message = "alert('Hello')";
//bt.body = message;
bt.body="app.getFrontend().addCompToBatch(mySelection)";
bt.send();
}
}
Which encapsulates a number of different attempts and things that I've tried.
I've spent about 4-5 hours trying to scour the internet and various resources but so far have come up short. Thanks in advance for the help!
I have a list of accounts and perform a hashjoin on ticks and return the accounts with ticks data. But after hashjoin I have drainTo lListJet and then read it with DistributedStream and return it.
public List<Account> populateTicksInAccounts(List<Account> accounts) {
...
...
Pipeline p = Pipeline.create();
BatchSource<Tick> ticksSource = Sources.list(TICKS_LIST_NAME);
BatchSource<Account> accountSource = Sources.fromProcessor(AccountProcessor.of(accounts));
p.drawFrom(ticksSource)
.hashJoin(p.drawFrom(accountSource), JoinClause.joinMapEntries(Tick::getTicker), accountMapper())
.drainTo(Sinks.list(TEMP_LIST));
jet.newJob(p).join();
IListJet<Account> list = jet.getList(TEMP_LIST);
return DistributedStream.fromList(list).collect(DistributedCollectors.toIList());
}
Is it possible to drainTo to java List instead of lListJet after performing a hashjoin?
Something like below is possible?
IListJet<Account> accountWithTicks = new ArrayList<>();
p.drawFrom(ticksSource)
.hashJoin(p.drawFrom(accountSource), JoinClause.joinMapEntries(Tick::getTicker), accountMapper())
.drainTo(<CustomSinkProcessor(accountWithTicks)>);
return accountWithTicks;
where in CustomSinkProcessor will take empty java list and return with the accounts?
Keep in mind that the code you submit to Jet for execution runs outside the process where you submit it from. While it would be theoretically possible to provide the API you're asking for, under the hood it would just have to perform some tricks to run the code on each member of the cluster, let all members send their results to one place, and fill up a list to return to you. It would go against the nature of distributed computing.
If you think it will help the readability of your code, you can write a helper method such as this:
public <T, R> List<R> drainToList(GeneralStage<T> stage) {
String tmpListName = randomListName();
SinkStage sinkStage = stage.drainTo(Sinks.list(tmpListName));
IListJet<R> tmpList = jet.getList(tmpListName);
try {
jet.newJob(sinkStage.getPipeline()).join();
return new ArrayList<>(tmpList);
} finally {
tmpList.destroy();
}
}
Especially note the line
return new ArrayList<>(tmpList);
as opposed to your
IListJet<Account> list = jet.getList(TEMP_LIST);
return DistributedStream.fromList(list).collect(DistributedCollectors.toIList());
This just copies one Hazelcast list to another one and returns a handle to it. Now you have leaked two lists in the Jet cluster. They don't automatically disappear when you stop using them.
Even the code I provided can still be leaky. The JVM process that runs it can die during Job.join() without reaching finally. Then the temporary list lingers on.
No, it's not, due to the distributed nature of Jet. The sink will execute in multiple parallel processors (workers). It can't add to plain Collection. The sink has to be able to insert items on multiple cluster members.
so the problem i'm trying to tackle is the following:
I need a data source that emits messages at a certain frequency
There are N neural nets that need to process each message individually
The outputs from all neural nets are aggregated and only when all N outputs for each message are collected, should a message be declared fully processed
At the end i should measure the time it took for a message to be fully processed (time between when it was emitted and when all N neural net outputs from that message have been collected)
I'm curious as to how one would approach such a task using spark streaming.
My current implementation uses 3 types of components: a custom receiver and two classes that implement Function, one for the neural nets, one for the end aggregator.
In broad strokes, my application is built as follows:
JavaReceiverInputDStream<...> rndLists = jssc.receiverStream(new JavaRandomReceiver(...));
Function<JavaRDD<...>, Void> aggregator = new JavaSyncBarrier(numberOfNets);
for(int i = 0; i < numberOfNets; i++){
rndLists.map(new NeuralNetMapper(neuralNetConfig)).foreachRDD(aggregator);
}
The main problem i'm having with this, though, is that it runs faster in local mode than when submitted to a 4-node cluster.
Is my implementation wrong to begin with or is something else happening here ?
There's also a full post here http://apache-spark-user-list.1001560.n3.nabble.com/Developing-a-spark-streaming-application-td12893.html with more details regarding the implementation of each of the three components mentioned previously.
It seems there might be a lot of repetitive instantiation and serialization of objects. The later might be hitting your performance in a cluster.
You should try instantiating your neural networks only once. You will have to ensure that they are serializable. You should use flatMap instead of multiple maps + union. Something along these lines:
// Initialize neural net first
List<NeuralNetMapper> neuralNetMappers = new ArrayList<>(numberOfNets);
for(int i = 0; i < numberOfNets; i++){
neuralNetMappers.add(new NeuralNetMapper(neuralNetConfig));
}
// Then create a DStream applying all of them
JavaDStream<Result> neuralNetResults = rndLists.flatMap(new FlatMapFunction<Item, Result>() {
#Override
public Iterable<Result> call(Item item) {
List<Result> results = new ArrayList<>(numberOfNets);
for (int i = 0; i < numberOfNets; i++) {
results.add(neuralNetMappers.get(i).doYourNeuralNetStuff(item));
}
return results;
}
});
// The aggregation stuff
neuralNetResults.foreachRDD(aggregator);
If you can afford to initialize the networks this way, you can save quite a lot of time. Also, the union stuff you included in your linked posts seems unnecessary and is penalizing your performance: a flatMap will do.
Finally, in order to further tune your performance in the cluster, you can use the Kryo serializer.
I have an application that requires mappings between string values, so essentially a container that can hold key values pairs. Instead of using a dictionary or a name-value collection I used a resource file that I access programmatically in my code. I understand resource files are used in localization scenarios for multi-language implementations and the likes. However I like their strongly typed nature which ensures that if the value is changed the application does not compile.
However I would like to know if there are any important cons of using a *.resx file for simple key-value pair storage instead of using a more traditional programmatic type.
There are two cons which I can think of out of the blue:
it requires I/O operation to read key/value pair, which may result in significant performance decrease,
if you let standard .Net logic to resolve loading resources, it will always try to find the file corresponding to CultureInfo.CurrentUICulture property; this could be problematic if you decide that you actually want to have multiple resx-es (i.e. one per language); this could result in even further performance degradation.
BTW. Couldn't you just create helper class or structure containing properties, like that:
public static class GlobalConstants
{
private const int _SomeInt = 42;
private const string _SomeString = "Ultimate answer";
public static int SomeInt
{
get
{
return _SomeInt;
}
}
public static string SomeString
{
get
{
return _SomeString;
}
}
}
You can then access these properties exactly the same way, as resource files (I am assuming that you're used to this style):
textBox1.Text = GlobalConstants.SomeString;
textBox1.Top = GlobalConstants.SomeInt;
Maybe it is not the best thing to do, but I firmly believe this is still better than using resource file for that...