compare and extract number from string in jpql - jpql

I have column named record_number of type varchar that has the following format data: [currentYear]-[Number] ex:2015-11
I need to search for the maximum number of this column; ie: if the value of the column that holds the maximum is 2015-15 and then the value should be 15, however if the column has a value of 2016-2, then the max should be 2.
how can I do it in jpql?
I'm using Postgres and EJB 3.1

You can use the SUBSTRING method of the JPA:
select table From Table table order by SUBSTRING(table.record_number, 5) desc;
To get only the first result, you need to use the method maxResults, like this:
em.createQuery("select table From Table table order by SUBSTRING(table.record_number, 5) desc")
.setMaxResults(1) -- only the first result
.getResultList()

I managed to fix the problem based on the comment of Dherik:
I used the following query to get the object that holds the correct value which seems more optimized than the one porposed by Dherik:
final TypedQuery<Table> query = createTypedQuery("from Table t where t.recordNumber= (select max(t.recordNumber) from t)", Table.class);
Table t= null;
try {
t = query.getSingleResult();
}catch(Exception e){
//handle Exception Here
}
return t;
The trick is since it's my app which creates the record number, I changed the method that creates the record number to format the number on 2 digits to avoid having wrong string comparaison (the case when '9' is considered greater than '10')
// format the number <10 so that is on 2 digits
final String formattedNumber = String.format("%02d", number);
final int year = SomeUtilClass.getYearFromDate(new Date());
return new StringBuilder().append(year).append("-").append(formattedNumber).toString();

Related

How to get value of a Spark dataset column value to use it dynamically in SQL Query?

I have a Dataset DS1 which is having one column value "LEVEL". I want to check this column value and get another column "COMPANIES" which is an array and based on some business logic, I have to update the values.
For this update operation, I am using withColumn() method.
DS1.withColumn("COMPANIES", functions.when(functions.col("LEVEL").gt(1), someMethod(sparkSession, functions.col("COMPANIES"), functions.col("LEVEL"))).otherwise(functions.col("value")));
inside the someMethod(), I am trying to use the Column as parameters.
private int[] someMethod(SparkSession sparkSession, Column companies, Column Level) {
String query = "Select cs.level from DS1 cs inner join DS2 cp on cs.level=" + (Level.minus(1)) + " and cs.company_private_id=ANY(" + companies + ")";
sparkSession.sql(query);
List<Integer> list = sparkSession.sql(query).collectAsList().get(0).getList(0);
return list.stream().mapToInt(i -> i).toArray();
}
I could not get the values for the variables Level, companies as they are of Column type. How to do the logic here.
Assuming data type for levels is Integer. If type is something else change row.getInteger(0) to row.getDecimal(0) for data type bigDecimal.
List<Row> dataSet = sparkSession.sql(query).collectAsList();
List<Integer> levels = dataSet.stream().map(row -> row.getInteger(0)).collect(Collectors.toList());

unable to get the Row from ResultSet

The following function saves data in cassandra. It calls the abstract rowToModel method defined in the same class to convert the data returned from cassandra into the respective data model.
def saveDataToDatabase(data:M):Option[M] = { //TODOM - should this also be part of the Repository trait?
println("inserting in table "+tablename+" with partition key "+partitionKeyColumns +" and values "+data)
val insertQuery = insertValues(tablename,data)
println("insert query is "+insertQuery)
try {
val resultSet:ResultSet = session.execute(insertQuery) //execute can take a Statement. Insert is derived from Statement so I can use Insert.
println("resultset after insert: " + resultSet)
println("resultset applied: " + resultSet.wasApplied())
println(s"columns definition ${resultSet.getColumnDefinitions}")
if(resultSet.wasApplied()){
println(s"saved row ${resultSet.one()}")
val savedData = rowToModel(resultSet.one())
Some(savedData)
} else {
None
}
}catch {
case e:Exception => {
println("cassandra exception "+e)
None
}
}
}
The abstract rowToModel is defined as follows
override def rowToModel(row: Row): PracticeQuestionTag = {
PracticeQuestionTag(row.getLong("year"),row.getLong("month"),row.getLong("creation_time_hour"),
row.getLong("creation_time_minute"),row.getUUID("question_id"),row.getString("question_description"))
}
But the print statements I have defined in saveDataToDatabase are not printing the data. I expected that the prints will print PracticeQuestionTag but instead I see the following
I expect to see something like this - PracticeQuestionTag(2018,6,1,1,11111111-1111-1111-1111-111111111111,some description1) when I print one from ResultSet`. But what I see is
resultset after insert: ResultSet[ exhausted: false, Columns[[applied](boolean)]]
resultset applied: true
columns definition Columns[[applied](boolean)]
saved row Row[true]
row to Model called for row null
cassandra exception java.lang.NullPointerException
Why ResultSet, one and columnDefinitions is not showing me the values from the data model?
This is by design. The resultset of an insert will only contain a single row which tells if the result was applied or not.
When executing a conditional statement, the ResultSet will contain a
single Row with a column named “applied” of type boolean. This tells
whether the conditional statement was successful or not.
It makes sense also as ResultSet is supposed to return the result of the query and why would you want to make the result set object heavy by retuning all the inputs in the result set itself. More details can be found here.
Ofcourse Get queries will have the detailed result set.

HazelcastJet rolling-aggregation with removing previous data and add new

We have use case where we are receiving message from kafka that needs to be aggregated. This has to be aggregated in a way that if an updates comes on same id then existing value if any needs to be subtracted and the new value has to be added.
From various forum i got to know that jet doesnt store raw values rather aggregated result and some internal data.
In such case how can i achieve this?
Example
Balance 1 {id:1, amount:100} // aggregated result 100
Balance 2 {id:2, amount:200} // 300
Balance 3 {id:1, amount:400} // 600 after removing 100 and adding 400
I could achieve a simple use where every time add. But i was not able to achieve the aggregation where existing value needs to be subtracted and new value has to be added.
rollingAggregation(AggregatorOperations.summingDouble(<login to add remove>))
.drainTo(Sinks.logger()).
Balance 1,2,3 are sequnce of messages
The comment shows whats the aggregated value at each message performed by jet.
My aim is to add new amount (if id comes for the first time) and subtract amount if an updated balance comes i. e. Id is same as earlier.
You can try a custom aggregate operation which will emit the previous and currently seen values like this:
public static <T> AggregateOperation1<T, ?, Tuple2<T, T>> previousAndCurrent() {
return AggregateOperation
.withCreate(() -> new Object[2])
.<T>andAccumulate((acc, current) -> {
acc[0] = acc[1];
acc[1] = current;
})
.andExportFinish((acc) -> tuple2((T) acc[0], (T) acc[1]));
}
The output should be a Tuple of the form (previous, current). Then you can apply rolling aggregate again to the output. To simplify the problem as input I have a pair of (id, amount) pairs.
Pipeline p = Pipeline.create();
p.drawFrom(Sources.<Integer, Long>mapJournal("map", START_FROM_OLDEST)) // (id, amount)
.groupingKey(Entry::getKey)
.rollingAggregate(previousAndCurrent(), (key, val) -> val)
.rollingAggregate(AggregateOperations.summingLong(e -> {
long prevValue = e.f0() == null ? 0 : e.f0().getValue();
long newValue = e.f1().getValue();
return newValue - prevValue;
}))
.drainTo(Sinks.logger());
JetConfig config = new JetConfig();
config.getHazelcastConfig().addEventJournalConfig(new EventJournalConfig().setMapName("map"));
JetInstance jet = Jet.newJetInstance(config);
IMapJet<Object, Object> map = jet.getMap("map");
map.put(0, 1L);
map.put(0, 2L);
map.put(1, 10L);
map.put(1, 40L);
jet.newJob(p).join();
This should produce as output: 1, 2, 12, 42.

Spark DataFrame created from JavaRDD<Row> copies all columns data into first column

I have a DataFrame which I need to convert into JavaRDD<Row> and back to DataFrame I have the following code
DataFrame sourceFrame = hiveContext.read().format("orc").load("/path/to/orc/file");
//I do order by in above sourceFrame and then I convert it into JavaRDD
JavaRDD<Row> modifiedRDD = sourceFrame.toJavaRDD().map(new Function<Row,Row>({
public Row call(Row row) throws Exception {
if(row != null) {
//updated row by creating new Row
return RowFactory.create(updateRow);
}
return null;
});
//now I convert above JavaRDD<Row> into DataFrame using the following
DataFrame modifiedFrame = sqlContext.createDataFrame(modifiedRDD,schema);
sourceFrame and modifiedFrame schema is same when I call sourceFrame.show() output is expected I see every column has corresponding values and no column is empty but when I call modifiedFrame.show() I see all the columns values gets merged into first column value for e.g. assume source DataFrame has 3 column as shown below
_col1 _col2 _col3
ABC 10 DEF
GHI 20 JKL
When I print modifiedFrame which I converted from JavaRDD it shows in the following order
_col1 _col2 _col3
ABC,10,DEF
GHI,20,JKL
As shown above all the _col1 has all the values and _col2 and _col3 is empty. I don't know what is wrong.
As I mentioned in question's comment ;
It might occurs because of giving list as a one parameter.
return RowFactory.create(updateRow);
When investigated Apache Spark docs and source codes ; In that specifying schema example They assign parameters one by one for all columns respectively. Just investigate the some source code roughly RowFactory.java class and GenericRow class doesn't allocate that one parameter. So Try to give parameters respectively for row's column's.
return RowFactory.create(updateRow.get(0),updateRow.get(1),updateRow.get(2)); // List Example
You may try to convert your list to array and then pass as a parameter.
YourObject[] updatedRowArray= new YourObject[updateRow.size()];
updateRow.toArray(updatedRowArray);
return RowFactory.create(updatedRowArray);
By the way RowFactory.create() method is creating Row objects. In Apache Spark documentation about Row object and RowFactory.create() method;
Represents one row of output from a relational operator. Allows both generic access by ordinal, which will incur boxing overhead for
primitives, as well as native primitive access. It is invalid to use
the native primitive interface to retrieve a value that is null,
instead a user must check isNullAt before attempting to retrieve a
value that might be null.
To create a new Row, use RowFactory.create() in Java or Row.apply() in
Scala.
A Row object can be constructed by providing field values. Example:
import org.apache.spark.sql._
// Create a Row from values.
Row(value1, value2, value3, ...)
// Create a Row from a Seq of values.
Row.fromSeq(Seq(value1, value2, ...))
According to documentation; You can also apply your own required algorithm to seperate rows columns while creating Row objects respectively. But i think converting list to array and pass parameter as an array will work for you(I couldn't try please post your feedbacks, thanks).

Error in Linq: The text data type cannot be selected as DISTINCT because it is not comparable

I've a problem with LINQ. Basically a third party database that I need to connect to is using the now depreciated text field (I can't change this) and I need to execute a distinct clause in my linq on results that contain this field.
I don't want to do a ToList() before executing the Distinct() as that will result in thousands of records coming back from the database that I don't require and will annoy the client as they get charged for bandwidth usage. I only need the first 15 distinct records.
Anyway query is below:
var query = (from s in db.tSearches
join sc in db.tSearchIndexes on s.GUID equals sc.CPSGUID
join a in db.tAttributes on sc.AttributeGUID equals a.GUID
where s.Notes != null && a.Attribute == "Featured"
select new FeaturedVacancy
{
Id = s.GUID,
DateOpened = s.DateOpened,
Notes = s.Notes
});
return query.Distinct().OrderByDescending(x => x.DateOpened);
I know I can do a subquery to do the same thing as above (tSearches contains unique records) but I'd rather a more straightfoward solution if available as I need to change a number of similar queries throughout the code to get this working.
No answers on how to do this so I went with my first suggestion and retrieved the unique records first from tSearch then constructed a subquery with the non unique records and filtered the search results by this subquery. Answer below:
var query = (from s in db.tSearches
where s.DateClosed == null && s.ConfidentialNotes != null
orderby s.DateOpened descending
select new FeaturedVacancy
{
Id = s.GUID,
Notes = s.ConfidentialNotes
});
/* Now filter by our 'Featured' attribute */
var subQuery = from sc in db.tSearchIndexes
join a in db.tAttributes on sc.AttributeGUID equals a.GUID
where a.Attribute == "Featured"
select sc.CPSGUID;
query = query.Where(x => subQuery.Contains(x.Id));
return query;

Resources