The following function saves data in cassandra. It calls the abstract rowToModel method defined in the same class to convert the data returned from cassandra into the respective data model.
def saveDataToDatabase(data:M):Option[M] = { //TODOM - should this also be part of the Repository trait?
println("inserting in table "+tablename+" with partition key "+partitionKeyColumns +" and values "+data)
val insertQuery = insertValues(tablename,data)
println("insert query is "+insertQuery)
try {
val resultSet:ResultSet = session.execute(insertQuery) //execute can take a Statement. Insert is derived from Statement so I can use Insert.
println("resultset after insert: " + resultSet)
println("resultset applied: " + resultSet.wasApplied())
println(s"columns definition ${resultSet.getColumnDefinitions}")
if(resultSet.wasApplied()){
println(s"saved row ${resultSet.one()}")
val savedData = rowToModel(resultSet.one())
Some(savedData)
} else {
None
}
}catch {
case e:Exception => {
println("cassandra exception "+e)
None
}
}
}
The abstract rowToModel is defined as follows
override def rowToModel(row: Row): PracticeQuestionTag = {
PracticeQuestionTag(row.getLong("year"),row.getLong("month"),row.getLong("creation_time_hour"),
row.getLong("creation_time_minute"),row.getUUID("question_id"),row.getString("question_description"))
}
But the print statements I have defined in saveDataToDatabase are not printing the data. I expected that the prints will print PracticeQuestionTag but instead I see the following
I expect to see something like this - PracticeQuestionTag(2018,6,1,1,11111111-1111-1111-1111-111111111111,some description1) when I print one from ResultSet`. But what I see is
resultset after insert: ResultSet[ exhausted: false, Columns[[applied](boolean)]]
resultset applied: true
columns definition Columns[[applied](boolean)]
saved row Row[true]
row to Model called for row null
cassandra exception java.lang.NullPointerException
Why ResultSet, one and columnDefinitions is not showing me the values from the data model?
This is by design. The resultset of an insert will only contain a single row which tells if the result was applied or not.
When executing a conditional statement, the ResultSet will contain a
single Row with a column named “applied” of type boolean. This tells
whether the conditional statement was successful or not.
It makes sense also as ResultSet is supposed to return the result of the query and why would you want to make the result set object heavy by retuning all the inputs in the result set itself. More details can be found here.
Ofcourse Get queries will have the detailed result set.
Related
My Code:
finalJoined.show();
Encoder<Row> rowEncoder = Encoders.bean(Row.class);
Dataset<Row> validatedDS = finalJoined.map(row -> validationRowMap(row), rowEncoder);
validatedDS.show();
Map function :
public static Row validationRowMap(Row row) {
//PART-A validateTxn()
System.out.println("Inside map");
//System.out.println("Value of CIS_DIVISION is " + row.getString(7));
//1. CIS_DIVISION
if ((row.getString(7)) == null || (row.getString(7)).trim().isEmpty()) {
System.out.println("CIS_DIVISION cannot be blank.");
}
return row;
}
Output :
finalJoined Dataset<Row> is properly shown with all columns and rows with proper values, however validatedDS Dataset<Row>is shown with only one column with empty values.
*Expected output : *
validatedDS should also show same values as finalJoined dataset because I am only performing validation inside the map function and not changing the dataset itself.
Please let me know if you need more information.
Encoders.bean is intended for usage with Bean classes. Row is not one of these (doesn't define setter and getters for specific fields, only generic getters).
To return Row object you have to use RowEncoder and provide expected output schema.
Check for example Encoder for Row Type Spark Datasets
I have column named record_number of type varchar that has the following format data: [currentYear]-[Number] ex:2015-11
I need to search for the maximum number of this column; ie: if the value of the column that holds the maximum is 2015-15 and then the value should be 15, however if the column has a value of 2016-2, then the max should be 2.
how can I do it in jpql?
I'm using Postgres and EJB 3.1
You can use the SUBSTRING method of the JPA:
select table From Table table order by SUBSTRING(table.record_number, 5) desc;
To get only the first result, you need to use the method maxResults, like this:
em.createQuery("select table From Table table order by SUBSTRING(table.record_number, 5) desc")
.setMaxResults(1) -- only the first result
.getResultList()
I managed to fix the problem based on the comment of Dherik:
I used the following query to get the object that holds the correct value which seems more optimized than the one porposed by Dherik:
final TypedQuery<Table> query = createTypedQuery("from Table t where t.recordNumber= (select max(t.recordNumber) from t)", Table.class);
Table t= null;
try {
t = query.getSingleResult();
}catch(Exception e){
//handle Exception Here
}
return t;
The trick is since it's my app which creates the record number, I changed the method that creates the record number to format the number on 2 digits to avoid having wrong string comparaison (the case when '9' is considered greater than '10')
// format the number <10 so that is on 2 digits
final String formattedNumber = String.format("%02d", number);
final int year = SomeUtilClass.getYearFromDate(new Date());
return new StringBuilder().append(year).append("-").append(formattedNumber).toString();
I have a DataFrame which I need to convert into JavaRDD<Row> and back to DataFrame I have the following code
DataFrame sourceFrame = hiveContext.read().format("orc").load("/path/to/orc/file");
//I do order by in above sourceFrame and then I convert it into JavaRDD
JavaRDD<Row> modifiedRDD = sourceFrame.toJavaRDD().map(new Function<Row,Row>({
public Row call(Row row) throws Exception {
if(row != null) {
//updated row by creating new Row
return RowFactory.create(updateRow);
}
return null;
});
//now I convert above JavaRDD<Row> into DataFrame using the following
DataFrame modifiedFrame = sqlContext.createDataFrame(modifiedRDD,schema);
sourceFrame and modifiedFrame schema is same when I call sourceFrame.show() output is expected I see every column has corresponding values and no column is empty but when I call modifiedFrame.show() I see all the columns values gets merged into first column value for e.g. assume source DataFrame has 3 column as shown below
_col1 _col2 _col3
ABC 10 DEF
GHI 20 JKL
When I print modifiedFrame which I converted from JavaRDD it shows in the following order
_col1 _col2 _col3
ABC,10,DEF
GHI,20,JKL
As shown above all the _col1 has all the values and _col2 and _col3 is empty. I don't know what is wrong.
As I mentioned in question's comment ;
It might occurs because of giving list as a one parameter.
return RowFactory.create(updateRow);
When investigated Apache Spark docs and source codes ; In that specifying schema example They assign parameters one by one for all columns respectively. Just investigate the some source code roughly RowFactory.java class and GenericRow class doesn't allocate that one parameter. So Try to give parameters respectively for row's column's.
return RowFactory.create(updateRow.get(0),updateRow.get(1),updateRow.get(2)); // List Example
You may try to convert your list to array and then pass as a parameter.
YourObject[] updatedRowArray= new YourObject[updateRow.size()];
updateRow.toArray(updatedRowArray);
return RowFactory.create(updatedRowArray);
By the way RowFactory.create() method is creating Row objects. In Apache Spark documentation about Row object and RowFactory.create() method;
Represents one row of output from a relational operator. Allows both generic access by ordinal, which will incur boxing overhead for
primitives, as well as native primitive access. It is invalid to use
the native primitive interface to retrieve a value that is null,
instead a user must check isNullAt before attempting to retrieve a
value that might be null.
To create a new Row, use RowFactory.create() in Java or Row.apply() in
Scala.
A Row object can be constructed by providing field values. Example:
import org.apache.spark.sql._
// Create a Row from values.
Row(value1, value2, value3, ...)
// Create a Row from a Seq of values.
Row.fromSeq(Seq(value1, value2, ...))
According to documentation; You can also apply your own required algorithm to seperate rows columns while creating Row objects respectively. But i think converting list to array and pass parameter as an array will work for you(I couldn't try please post your feedbacks, thanks).
for the following piece of code I am getting an InvalidTypeException whenever I am using the row.getToken("fieldname").
Record RowToRecord(Row rw) {
ColumnDefinitions cd = rw.getColumnDefinitions();
Record rec = new Record();
int i;
for(i = 0; i < cd.size(); i++) {
rec.fields.add(cd.getName(i));
System.out.println(cd.getName(i));
//System.out.println((rw.getToken(cd.getName(i))).getValue());
Token tk = rw.getToken(cd.getName(i)); //// InvalidTypeException on this line.
//System.out.println(tk.getValue()+" "+tk.getType().toString());
rec.values.add(tk.getValue());
rec.types.add(tk.getType().toString());
//Token tk = new Token();
}
return rec;
}
getToken is meant to be called on a column that contains a Cassandra token. In 99% of cases, that will be the result of a call to the token() CQL function, for example the first column in this query:
select token(id), col1 from my_table where id = ...
Your code is calling it for all columns, which will fail as soon as you have a column that doesn't match the CQL type for tokens.
That CQL type depends on the partitioner used in your cluster:
murmur3 partitioner (the default): token(...) will return a BIGINT
random partitioner: VARINT
ordered partitioner: BLOB
In theory you can call getToken on any column with this type (although in practice it probably only makes sense for columns that are the result of a token() call, as explained above).
I've a problem with LINQ. Basically a third party database that I need to connect to is using the now depreciated text field (I can't change this) and I need to execute a distinct clause in my linq on results that contain this field.
I don't want to do a ToList() before executing the Distinct() as that will result in thousands of records coming back from the database that I don't require and will annoy the client as they get charged for bandwidth usage. I only need the first 15 distinct records.
Anyway query is below:
var query = (from s in db.tSearches
join sc in db.tSearchIndexes on s.GUID equals sc.CPSGUID
join a in db.tAttributes on sc.AttributeGUID equals a.GUID
where s.Notes != null && a.Attribute == "Featured"
select new FeaturedVacancy
{
Id = s.GUID,
DateOpened = s.DateOpened,
Notes = s.Notes
});
return query.Distinct().OrderByDescending(x => x.DateOpened);
I know I can do a subquery to do the same thing as above (tSearches contains unique records) but I'd rather a more straightfoward solution if available as I need to change a number of similar queries throughout the code to get this working.
No answers on how to do this so I went with my first suggestion and retrieved the unique records first from tSearch then constructed a subquery with the non unique records and filtered the search results by this subquery. Answer below:
var query = (from s in db.tSearches
where s.DateClosed == null && s.ConfidentialNotes != null
orderby s.DateOpened descending
select new FeaturedVacancy
{
Id = s.GUID,
Notes = s.ConfidentialNotes
});
/* Now filter by our 'Featured' attribute */
var subQuery = from sc in db.tSearchIndexes
join a in db.tAttributes on sc.AttributeGUID equals a.GUID
where a.Attribute == "Featured"
select sc.CPSGUID;
query = query.Where(x => subQuery.Contains(x.Id));
return query;