Spark IntArrayParm in JAVA - apache-spark

I am trying to test a number of multilayer perception network architectures. So, I am training a model via a crossvalidation using different params. However, I fail to set up layer param using JAVA. Not sure how this is done, but none of the following work:
int[] layers1 = new int[]{10,1,3,2};
IntArrayParam p = new IntArrayParam(null, "name", "doc");
p.w(layers1);
int[] layers2 = new int[]{10,1,3,2};
IntArrayParam p2 = new IntArrayParam(null, "name", "doc");
p2.w(layers2);
builder.addGrid(mlpc.layers(), JavaConverters.asScalaIterableConverter(Arrays.asList(p,1)).asScala());
Sending a list of arrays (or a multidimensional array):
builder.addGrid(mlpc.layers(), JavaConverters.asScalaIterableConverter(Arrays.asList(1,2,2), Arrays.asList(1,2,2)).asScala());
I am not sure how this i suppose to be done in JAVA, and was not able to find any examples. Any ideas appreciated.
Best,
Ilija

After some research got it, just in case anyone gets stuck using IntArrayParam on Java here is an example:
//build network parameters grid
int[] layers1=new int[]{17,8,4,26};
int[] layers2=new int[]{17,12,8,26};
//use scala collections converters to get a Scala Iterable of Int[]
scala.collection.Iterable<int[]> iter= JavaConverters.iterableAsScalaIterableConverter(Arrays.asList(layers1, layers2)).asScala();
Hope that helps!

Related

How to set compound structure for two different layers if I used 2 different categories of material for wall structure , using revit api

I am trying to create a wall with 2 layers and each layer materials are different. When I try to set the CompoundStructure for the wall I am getting an exception that CompoundStructure is not valid.
CompoundStructure cStructure = CompoundStructure.CreateSimpleCompoundStructure(clayer);
wallType.SetCompoundStructure(cStructure);
Can anyone tell me how I can create compound structure for layers with different materials?
First of all, solve your task manually through the end user interface and verify that it works at all.
Then, use RevitLookup and other database exploration tools to examine the results in the BIM elements, their properties and relationships.
Once you have done that, you will have a good idea how to address the task programmatically – and have confidence that it will work as expected:
How to research to find a Revit API solution
Intimate Revit database exploration with the Python Shell
newWallMaterial = wallMaterial.Duplicate("newCreatedMaterial");
newWallmaterial2 = wallMaterial.Duplicate("NewCreatedMAterial2");
//roofMaterial3 = roofMaterial2.Duplicate("NewCreatedMAterial3");
bool usr = newWallMaterial.UseRenderAppearanceForShading;
//newWallMaterial.Color = BuiltInTypeParam.materialCol;
foreach (Layers layer in layers)
{
if (layer.layerId == 0)
{
c = new CompoundStructureLayer(layer.width, layer.materialAssignement, newWallMaterial.Id);
newWallMaterial.Color = color;
clayer.Add(c);
}
if (layer.layerId == 1)
{
c1 = new CompoundStructureLayer(layer.width, layer.materialAssignement, newWallmaterial2.Id);
newWallmaterial2.Color = color;
clayer.Add(c1);
}

General recommendations and tricks of how to instantiate fields and call methods

I want to instantiate a large number of StringProperty fields to put text values
(>100000). All in all my code performs well so far. I'm still trying to optimize my code as well as possible to harness the full power capabilities of my weak CPU (Intel Atom N2600, 1,6GHz, 2GByte Ram).
I'm calling the following method 100000 times and it takes some seconds
until all values are stored in my array of StringProperty.
public setData(int row, int numberOfCols, String data [][]) {
this.dataValue = new StringProperty[numberOfCols];
for(int i=0;i<numberOfCols;i++) dataValue[i] = new SimpleStringProperty(data[row][i]);
}
Is the method above good enough for intantiating fields and putting values?
Any alternative ideas of how to tweak the method above?

Developing a spark streaming application

so the problem i'm trying to tackle is the following:
I need a data source that emits messages at a certain frequency
There are N neural nets that need to process each message individually
The outputs from all neural nets are aggregated and only when all N outputs for each message are collected, should a message be declared fully processed
At the end i should measure the time it took for a message to be fully processed (time between when it was emitted and when all N neural net outputs from that message have been collected)
I'm curious as to how one would approach such a task using spark streaming.
My current implementation uses 3 types of components: a custom receiver and two classes that implement Function, one for the neural nets, one for the end aggregator.
In broad strokes, my application is built as follows:
JavaReceiverInputDStream<...> rndLists = jssc.receiverStream(new JavaRandomReceiver(...));
Function<JavaRDD<...>, Void> aggregator = new JavaSyncBarrier(numberOfNets);
for(int i = 0; i < numberOfNets; i++){
rndLists.map(new NeuralNetMapper(neuralNetConfig)).foreachRDD(aggregator);
}
The main problem i'm having with this, though, is that it runs faster in local mode than when submitted to a 4-node cluster.
Is my implementation wrong to begin with or is something else happening here ?
There's also a full post here http://apache-spark-user-list.1001560.n3.nabble.com/Developing-a-spark-streaming-application-td12893.html with more details regarding the implementation of each of the three components mentioned previously.
It seems there might be a lot of repetitive instantiation and serialization of objects. The later might be hitting your performance in a cluster.
You should try instantiating your neural networks only once. You will have to ensure that they are serializable. You should use flatMap instead of multiple maps + union. Something along these lines:
// Initialize neural net first
List<NeuralNetMapper> neuralNetMappers = new ArrayList<>(numberOfNets);
for(int i = 0; i < numberOfNets; i++){
neuralNetMappers.add(new NeuralNetMapper(neuralNetConfig));
}
// Then create a DStream applying all of them
JavaDStream<Result> neuralNetResults = rndLists.flatMap(new FlatMapFunction<Item, Result>() {
#Override
public Iterable<Result> call(Item item) {
List<Result> results = new ArrayList<>(numberOfNets);
for (int i = 0; i < numberOfNets; i++) {
results.add(neuralNetMappers.get(i).doYourNeuralNetStuff(item));
}
return results;
}
});
// The aggregation stuff
neuralNetResults.foreachRDD(aggregator);
If you can afford to initialize the networks this way, you can save quite a lot of time. Also, the union stuff you included in your linked posts seems unnecessary and is penalizing your performance: a flatMap will do.
Finally, in order to further tune your performance in the cluster, you can use the Kryo serializer.

Construct Completely Ad-hoc Slick Query

Pardon my newbieness but im trying to build a completely ad-hoc query builder using slick. From our API, I will get a list of strings that is representative of the table, as well as another list that represents the filter for the tables, munge then together to create a query. The hope is that I can take these and create the inner join. A similar example of what i'm trying to do would be JIRA's advanced query builder.
I've been trying to build it using reflection but I've come across so many blocking issues i'm wondering if this is even possible at all.
In code this is what I want to do:
def getTableQueryFor(tbl:String):TableQuery[_] = {
... a matcher that returns a tableQueries?
... i think the return type is incorrect b/c erasure?
}
def getJoinConditionFor:(tbl1:String, tbl2:String) => scala.slick.lifted.Column[Boolean] = (l:Coffees,r:Suppies) => {
...a matcher
}
Is the Following even possible?
val q1 = getTableQueryFor("coffee")
val q2 = getTableQueryFor("supply")
val q3 = q1.innerJoin.q2.on(getJoinCondition("coffee", "supply")
edit: Fixed grammar issue.

Dealing with integer-valued features for CRF in mallet

I am just starting to use the SimpleTagger class in mallet. My impression is that it expects binary features. The model that I want to implement has positive integer-valued features and I wonder how to implement this in mallet. Also, I heard that non-binary features need to be normalized if the model is to make sense. I would appreciate any suggestions on how to do this.
ps. yes, I know that there is a dedicated mallet mail list but I am waiting for nearly a day already to get my subscription approved to be able to post there. I'm simply in a hurry.
Well it's 6 years later now. If you're not in a hurry anymore, you could check out the Java API to create your instances. A minimal example:
private Instance createInstance(LabelAlphabet labelAlphabet){
// observations and labels should be equal size for linear chain CRFs
TokenSequence observations = new TokenSequence();
LabelSequence labels = new LabelSequence(labelAlphabet, n);
observations.add(createToken());
labels.add("idk, some target or something");
return new Instance(
observations,
label,
"myInstance",
null
);
}
private Token createToken() {
Token token = new Token("exampleToken");
// Note: properties are not used for computing (I think)
token.setProperty("SOME_PROPERTY", "hello");
// Any old double value
token.setFeatureValue(featureVal, 666.0);
// etc for more features ...
return token;
}
public static void main(String[] args){
// Note the first arg is false to denote we *do not* deal with binary features
InstanceList instanceList = new InstanceList(new TokenSequence2FeatureVectorSequence(false, false));
LabelAlphabet labelAlphabet = new LabelAlphabet();
// Converts our tokens to feature vectors
instances.addThruPipe(createInstance(labelAlphabet));
}
Or, if you want to keep using SimpleTagger, just define binary features like HAS_1_LETTER, HAS_2_LETTER, etc, though this seems tedious.

Resources