JPQL Query with calculated value? - jpql

I want to use a calculated value in the WHERE clause and in an ORDER BY expression. In plain sql it would look like
SELECT some, colums, (some arbitrary math) AS calc_value FROM table WHERE calc_value <= ? ORDER BY calc_value
If I try in JPQL
entitymanager.createQuery("SELECT e, (some arbitrary math) AS calc_value FROM Entity e WHERE calc_value <= :param ORDER BY calc_value", Entity.class);
it fails. Obviously, because the return of the query is the tuple of Entity and calc_value (i.e. Double).
Is there a way of getting this into one query, using strong typed return values (i.e. Entity.class, as the calculated value doesn't matter).

I've had a similar problem and didn't resolve the problem to fetch it into the correct object:
Tried all constructor combinations for the object - no luck.
tried Tuple.class - no luck.
Finally I used this approach and then fetched oject[0] into my real Java-Object:
TypedQuery<Object[]> q = em.createQuery("select v, (2 * 4) as temp from MyObject v order by temp", Object[].class);
List<Object[]> results = q.getResultList();
for (Object[] o : results)
MyObject mo = (MyObject) o[0];

Related

finding index values of multiple Objects that share data field values in an ArrayList

I have an ArrayList storing cars (for instance). Each instance of a car has three data fields (make, model, and year). Make and Model are both Strings, and year is an Int value. I want to be able to search the ArrayList and return the index location of every car that was produced in 2014 (say). I can use a simple search to return the first index location using something like this:
public static int searchYear(ArrayList<Cars> cars, int key)
{
int size = cars.size();
for (int i = 0; i < size; i++)
{
if (cars.get(i).getYear() == key)
return i;
}
return - 1;
}
where key == 2014 (the year I am searching for). How can I get this to return the index value of all cars with that key rather than only the first instance of it?
Short answer: you shall return an array of values, instead of a single int. Like in this signature:
public static ArrayList<Integer> searchYear (ArrayList<Cars> cars, int key)
More comments:
It is misleading that your parameter is named key if there is more than one instance that matches that key. The key is not an unique key, but just a value... I'd call that parameter year instead.
The signature of your method shall be something like this:
public static int[] searchYear (Cars[] cars, int year)
Of course you shall implement in your method the creation of an array of ints, appending the indexes those cars that match your year parameter.
You may ask yourself why I changed ArrayList<Cars> by Cars[]. Well, it is a matter of flexibility of future uses of this method you are creating. A plain array [] is a more common construction that ArrayList. Actually I would not put ArrayList in my method signature unless I'm using any method specific of ArrayList.
Since you are accessing by index with .get(i), this method is defined at the java.util.List interface, so a List argument makes sense:
public static int[] searchYear (List<Cars> cars, int year)
We can talk about the int[] return type: another option would be a Collection<Integer> or even Iterable<Integer>. The reasoning to choose one or another would be the same as for the cars argument: it all depends on what you want to do with the list of indexes returned by your method.

How to get X% percentile in Cassandra

Consider a table with structure:
CREATE TABLE statistics (name text, when timestamp, value int,
PRIMARY KEY ((name, when)));
What is the best way to calculate, for example, 50% value percentile by name?
I thought about:
a) writing custom aggregate function + query like:
SELECT PERCENTILE(value, 0.5) FROM statistics WHERE name = '...'
b) count elements by name first
SELECT COUNT(value) FROM statistics WHERE name = '...'
then find (0.5/count)th row value with paging when it is sorted by value ascending. Say, if count is 100 it will be 50th row.
c) your ideas
I'm not sure if case A can handle the task. Case B might be tricky when there is odd number of rows.
As long as you always provide name - this request can be very expensive without specifying partition and having everything within one. I am assuming you mean ((name), when) not ((name, when)) in your table, otherwise what your asking is impossible without full table scans (using hadoop or spark).
The UDA would work - but it can be expensive unless your willing to accept an approximation. To have it perfectly accurate you need to do 2 pass (ie doing a count, than a 2nd pass to go X into set, but since no isolation this isnt gonna be perfect either). So if you need it perfectly accurate your best bet is probably to just pull entire statistics[name] partition locally or to have UDA build up entire set (or majority) in a map (not recommended if partitions get large at all) before calculating. ie:
CREATE OR REPLACE FUNCTION all(state tuple<double, map<int, int>>, val int, percentile double)
CALLED ON NULL INPUT RETURNS tuple<double, map<int, int>> LANGUAGE java AS '
java.util.Map<Integer, Integer> m = state.getMap(1, Integer.class, Integer.class);
m.put(m.size(), val);
state.setMap(1, m);
state.setDouble(0, percentile);
return state;';
CREATE OR REPLACE FUNCTION calcAllPercentile (state tuple<double, map<int, int>>)
CALLED ON NULL INPUT RETURNS int LANGUAGE java AS
'java.util.Map<Integer, Integer> m = state.getMap(1, Integer.class, Integer.class);
int offset = (int) (m.size() * state.getDouble(0));
return m.get(offset);';
CREATE AGGREGATE IF NOT EXISTS percentile (int , double)
SFUNC all STYPE tuple<double, map<int, int>>
FINALFUNC calcAllPercentile
INITCOND (0.0, {});
If willing to accept an approximation you can use a sampling reservoir, say 1024 elements you store and as your UDA gets elements you replace elements in it at a decreasingly statistical chance. (vitter's algorithm R) This is pretty easy to implement, and IF your data set is expected to have a normal distribution will give you a decent approximation. If your data set is not a normal distribution this can be pretty far off. With a normal distribution theres actually a lot of other options as well but R is I think easiest to implement in a UDA. like:
CREATE OR REPLACE FUNCTION reservoir (state tuple<int, double, map<int, int>>, val int, percentile double)
CALLED ON NULL INPUT RETURNS tuple<int, double, map<int, int>> LANGUAGE java AS '
java.util.Map<Integer, Integer> m = state.getMap(2, Integer.class, Integer.class);
int current = state.getInt(0) + 1;
if (current < 1024) {
// fill the reservoir
m.put(current, val);
} else {
// replace elements with gradually decreasing probability
int replace = (int) (java.lang.Math.random() * (current + 1));
if (replace <= 1024) {
m.put(replace, val);
}
}
state.setMap(2, m);
state.setDouble(1, percentile);
state.setInt(0, current);
return state;';
CREATE OR REPLACE FUNCTION calcApproxPercentile (state tuple<int, double, map<int, int>>)
CALLED ON NULL INPUT RETURNS int LANGUAGE java AS
'java.util.Map<Integer, Integer> m = state.getMap(2, Integer.class, Integer.class);
int offset = (int) (java.lang.Math.min(state.getInt(0), 1024) * state.getDouble(1));
if(m.get(offset) != null)
return m.get(offset);
else
return 0;';
CREATE AGGREGATE IF NOT EXISTS percentile_approx (int , double)
SFUNC reservoir STYPE tuple<int, double, map<int, int>>
FINALFUNC calcApproxPercentile
INITCOND (0, 0.0, {});
In above, the percentile function will get slower sooner, playing with size of sampler can give you more or less accuracy but too large and you start to impact performance. Generally a UDA over more than 10k values (even simple functions like count) starts to fail. Important to recognize in these scenarios too that while the single query returns a single value, theres a ton of work to get it. So a lot of these queries or much concurrency will put a lot of pressure on your coordinators. This does require >3.8 (I would recommend 3.11.latest+) for CASSANDRA-10783
note: I make no promises that I havent missed an off by 1 error in example UDAs - I did not test fully, but should be close enough you can make it work from there

Select from subquery and join on subquery in Esqueleto

How can I do select ... from (select ...) join (select ...) in Esqueleto?
I'm aware that I can use rawSql from Persistent, but I'd like to avoid that.
For the record, here is the full query:
select q.uuid, q.upvotes, q.downvotes, count(a.parent_uuid), max(a.isAccepted) as hasAccepted
from
(select post.uuid, post.title, sum(case when (vote.type = 2) then 1 else 0 end) as upvotes, sum(case when (vote.type = 3) then 1 else 0 end) as downvotes
from post left outer join vote on post.uuid = vote.post_id
where post.parent_uuid is null
group by post.uuid
order by post.created_on desc
) q
left outer join
(select post.parent_uuid, max(case when (vote.type = 1) then 1 else 0 end) as isAccepted
from post left outer join vote on post.uuid = vote.post_id
where post.parent_uuid is not null
group by post.id
) a
on a.parent_uuid = q.uuid
group by q.uuid
limit 10
I got here because I had the same question. I imagine the thing we want would be something like:
fromSelect
:: ( Database.Esqueleto.Internal.Language.From query expr backend a
, Database.Esqueleto.Internal.Language.From query expr backend b
)
=> (a -> query b)
-> (b -> query c)
-> query c
Unfortunately, from looking at Database.Esqueleto.Internal.Sql.FromClause:
-- | A part of a #FROM# clause.
data FromClause =
FromStart Ident EntityDef
| FromJoin FromClause JoinKind FromClause (Maybe (SqlExpr (Value Bool)))
| OnClause (SqlExpr (Value Bool))
I don't think there's any support for this in Esqueleto. It only seems to support simple table names and joins with on-clauses that have a boolean expression. I imagine the hardest part of adding support for this is handling table and column name aliases (as sql clause), since ^. expects an expr (Entity val) and an EntityField val typ. Simplest way is to change that to using String or Text for both operands, but that's not very type-safe. I'm not sure what the best option would be implementation-wise to make that type safe.
EDIT: Probably best to forget ^. and have fromSelect generate the aliases when providing the returned values of its first parameter as the arguments of its second parameter. Types would probably have to be altered to make room for these aliases. This is only contemplating from subqueries, not joins. That's another problem.

Ordered iteration in map string string

In the Go blog, this is how to print the map in order.
http://blog.golang.org/go-maps-in-action
import "sort"
var m map[int]string
var keys []int
for k := range m {
keys = append(keys, k)
}
sort.Ints(keys)
for _, k := range keys {
fmt.Println("Key:", k, "Value:", m[k])
}
but what if I have the string keys like var m map[string]string
I can't figure out how to print out the string in order(not sorted, in order of string creation in map container)
The example is at my playground http://play.golang.org/p/Tt_CyATTA3
as you can see, it keeps printing the jumbled strings, so I tried map integer values to map[string]string but I still could not figure out how to map each elements of map[string]string.
http://play.golang.org/p/WsluZ3o4qd
Well, the blog mentions that iteration order is randomized:
"...When iterating over a map with a range loop, the iteration order is not specified and is not guaranteed to be the same from one iteration to the next"
The solution is kind of trivial, you have a separate slice with the keys ordered as you need:
"...If you require a stable iteration order you must maintain a separate data structure that specifies that order."
So, to work as you expect, create an extra slice with the correct order and the iterate the result and print in that order.
order := []string{"i", "we", "he", ....}
func String(result map[string]string) string {
for _, v := range order {
if present in result print it,
}
... print all the Non-Defined at the end
return stringValue
}
See it running here: http://play.golang.org/p/GsDLXjJ0-E

How to use the term position parameter in Xapian query constructors

Xapian docs talk about a query constructor that takes a term position parameter, to be used in phrase searches:
Quote:
This constructor actually takes a couple of extra parameters, which
may be used to specify positional and frequency information for terms
in the query:
Xapian::Query(const string & tname_,
Xapian::termcount wqf_ = 1,
Xapian::termpos term_pos_ = 0)
The term_pos represents the position of the term in the query. Again,
this isn't useful for a single term query by itself, but is used for
phrase searching, passage retrieval, and other operations which
require knowledge of the order of terms in the query (such as
returning the set of matching terms in a given document in the same
order as they occur in the query). If such operations are not
required, the default value of 0 may be used.
And in the reference, we have:
Xapian::Query::Query ( const std::string & tname_,
Xapian::termcount wqf_ = 1,
Xapian::termpos pos_ = 0
)
A query consisting of a single term.
And:
typedef unsigned termpos
A term position within a document or query.
So, say I want to build a query for the phrase: "foo bar baz", how do I go about it?!
Does term_pos_ provide relative position values, ie define the order of terms within the document:
(I'm using here the python bindings API, as I'm more familiar with it)
q = xapian.Query(xapian.Query.OP_AND, [xapian.Query("foo", wqf, 1),xapian.Query("bar", wqf,2),xapian.Query("baz", wqf,3)] )
And just for the sake of testing, suppose we did:
q = xapian.Query(xapian.Query.OP_AND, [xapian.Query("foo", wqf, 3),xapian.Query("bar", wqf, 4),xapian.Query("baz", wqf, 5)] )
So this would give the same results as the previous example?!
And suppose we have:
q = xapian.Query(xapian.Query.OP_AND, [xapian.Query("foo", wqf, 2),xapian.Query("bar", wqf, 4),xapian.Query("baz", wqf, 5)] )
So now this would match where documents have "foo" "bar" separated with one term, followed by "baz" ??
Is it as such, or is it that this parameter is referring to absolute positions of the indexed terms?!
Edit:
And how is OP_PHRASE related to this? I find some online samples using OP_PHRASE as such:
q = xapian.Query(xapian.Query.OP_PHRASE, term_list)
This makes obvious sense, but then what is the role of the said term_pos_ constructor in phrase searches - is it a more surgical way of doing things!?
int pos = 1;
std::list<Xapian::Query> subs;
subs.push_back(Xapian::Query("foo", 1, pos++));
subs.push_back(Xapian::Query("bar", 1, pos++));
querylist.push_back(Xapian::Query(Xapian::Query::OP_PHRASE, subs.begin(), subs.end()));

Resources