We are aggregating some large matrices and have a custom Matrix class.
These are aggregated with a custom aggregation function.
A formatter takes care of showing part of the aggregated matrix measure for debugging, but it seems that even with a formatter in place, the entire Matrix, is still serialized and send to Live when that measure is shown. Is there a way to avoid that?
You should be able to do your formatting in a basic post-processor, taking in underlying value your matrix and returning the formatted value you wish to display.
Then you can use this new measure instead of the previous one
The David's solution is a good one. An other one is to implement the Externalizable interface with your custom Matrix class and write dummy overridden methods for void writeExternal(ObjectOutput out) and void readExternal(ObjectInput in). By this way, only the formatted value of the matrix will be serialized and send to Live.
However, if you need the whole value of the matrix somewhere else, you won't be able to get it anymore. In this case, you had better use the David's solution.
Paul
Related
I am building an app in which I have a Room entity that one of its columns is supposed to hold a List.
What is the best approach for doing this in an app that uses Flow, Coroutines and Room?
I tried serializing with Jackson (turning the List to a long json String and then bring it back to a List when fetched) but I am not sure if this is the correct approach.
Thank you,
What is the best approach for doing this in an app that uses Flow, Coroutines and Room?
This is very much open to opinion.
From a database perspective the approach would be to have any list as a table and thus
reducing the JSON bloat and thus reducing efficiency,
reduce duplication and thus be more likely to conform to normalisation
not potentially introducing complexities and even greater inefficiencies (e.g. not mentioned in the answer below but wild-character as the first character must do a full table scan)
perhaps consider this question and answer matching multiple title in single query using like keyword where if the table per list approach were taken then a simple SELECT * FROM task WHERE task_tags IN(:taglist) could do the same
From a coding point of view at first the coding is simpler when embedding JSON as the complex code is within the JSON libraries.
I would like to calculate time_since_previous, but not transaction after transaction, instead only between transactions that exceed a maximum value.
Can I do that automatically? or do I need to slice the dataframe?
More specifically, I have a function to detect local maxima, which I do with scipy.signal.finds_peaks, which creates a boolean vector with the arrays of the local maxima, which I could add as a feature to the data set, and then I would like the time since previous for those local maxima.
Is that possible in a semi-automated way with featuretools?
If there is a resource doing that, that you could link to this question, that would be great!
Thanks a lot
Yes, a custom transform primitive can be made then used by DFS to automatically calculate this feature. The time_since_previous would only calculate between transactions, so the custom primitive would need to implement the time since the previous local maxima given the boolean vector from finds_peaks. Here are guides for defining simple and advanced custom primitives. Let me know if this helps.
I want to build a type-ahead function but I need an alternative to getAllEntriesByKey method because the initial data collection is seems to be too large for an acceptable performance.
I would rather like to use the getEntryByKey method and the next X number of documents in a View.
Is something possible? Just jump into a position in a view (matching a specified query) and collect the next X number of documents?
For now I have written most in SSJS.
you can use a combination of NotesView.GetEntryByKey and NotesView.CreateViewNavFrom. This means however you will access the view twice so I do not know if you gain any performance improvement here.
An example (LotusScript) can be found here:
http://lpar.ath0.com/2011/09/19/notesviewentrycollection-vs-notesviewnavigator/
The LotusScript can easily be transformed into SSJS. I have used it something similar before. I can write a blog-post about it.
I am working on a project that involves a lot of data, and at first I was doing it all in plist, and I realized it was getting out of hand and I would have to learn Core Data. I'm still not entirely sure whether I can do what I want in Core Data, but I think it should work out. I've set up a data model, but I'm not sure if it's the right way to do it. Please read on if you think you can help out and let me know if I'm on the right track. Please bear with me, because I am trying to explain it as thoroughly as I can.
I've got the basic object with attributes set up at the root level; say a person with attributes like a name, date of birth, etc. Pretty simple. You set up one entity like this "Person" in your model, and you can save as many of them as you want in your data and retrieve them as an array, right? It could be sorted based on an attribute in the Person, such as the date they were added to the database.
Now where I get a bit more confused is when I want to store several different collections of data with each person. For example a list of courses and associated test marks. In a plist I would have stored an array of dictionaries that stored this, sorted by the date assessed. The way I set this up in my data model was that I added an entity called "Tests" and a "to-many" relationship from Person to Tests, and then when I pull that I get an NSSet that I can order by a timestamp again? Is there a better way to do this?
Similarly the Person may have a set of arrays of numerical data (the kind that you could graph over time,eg. Nike+ stores your running data like distance vs time, and a person would have multiple runs associated with them, hence a set of arrays, each with their own associated date of collection). The way I set this up is a little different, with a "Runs" attribute with just a timestamp attribute, and that is connected from Person via a to-many relationship, with inverse "forPerson". Then the Runs entity is connected to another entity via a to-many relationship that has attributes to store numerical data and the time. This would once again I would use a time/order attribute to sort them.
So the main question I have is whether using an internal attribute like timestamp to sort a set would be the right way to load in a "array" from core data. Searching forums/stack overflow about how to store NSArrays in core data seem overly complicated compared to this, giving me the sense that I'm misunderstanding something.
Thanks for your help. Sorry for all the text, but I'm new to Core Data and I figure setting up the data model properly is essential before starting to code methods for getting/saving data. If necessary, I can set up a sample model to demonstrate this and post a picture of it.
CoreData will give you NSSets by default. These are convertible to arrays by calling allObjects or sortedArrayUsingDescriptors, if you want a sorted array. The "ordered" property on the relationship description gives you an NSOrderedSet in the managed object. Hashed sets provide quicker adds, access and membership checks, with a penalty (relative to ordered sets) for the sort.
I'm using the following code to execute a query in Lucene.Net
var collector = new GroupingHitCollector(searcher.GetIndexReader());
searcher.Search(myQuery, collector);
resultsCount = collector.Hits.Count;
How do I sort these search results based on a field?
Update
Thanks for your answer. I had tried using TopFieldDocCollector but I got an error saying, "value is too small or too large" when i passed 5000 as numHits argument value. Please suggest a valid value to pass.
The search.Searcher.search method will accept a search.Sort parameter, which can be constructed as simply as:
new Sort("my_sort_field")
However, there are some limitations on which fields can be sorted on - they need to be indexed but not tokenized, and the values convertible to Strings, Floats or Integers.
Lucene in Action covers all of the details, as well as sorting by multiple fields and so on.
What you're looking for is probably TopFieldDocCollector. Use it instead of the GroupingHitCollector (what is that?), or inside it.
Comment on this if you need more info. I'll be happy to help.
In the original (Java) version of Lucene, there is no hard restriction on the size of the the TopFieldDocCollector results. Any number greater than zero is accepted. Although memory constraints and performance degradation create a practical limit that depends on your environment, 5000 hits is trivial and shouldn't pose a problem outside of a mobile device.
Perhaps in porting Lucene, TopFieldDocCollector was modified to use something other than Lucene's "heap" implementation (called PriorityQueue, extended by FieldSortedHitQueue)—something that imposes an unreasonably small limit on the results size. If so, you might want to look at the source code for TopFieldDocCollector, and implement your own similar hit collector using a better heap implementation.
I have to ask, however, why are you trying to collect 5000 results? No user in an interactive application is going to want to see that many. I figure that users willing to look at 200 results are rare, but double it to 400 just as factor of safety. Depending on the application, limiting the result size can hamper malicious screen scrapers and mitigate denial-of-service attacks too.
The constructor for Sort accepting only the string field name has been depreciated. Now you have to create a sort object and pass it in as the last paramater of searcher.Search()
/* sorting by a field of type long called "size" from greatest -> smallest
(signified by passing in true for the last isReversed paramater)*/
Sort sorter = new Sorter(new SortField("size", SortField.Type.LONG, true))
searcher.Search(myQuery, collector, sorter);