Saving DataSet<Row> to Ignite - apache-spark

Here is my code
public static void save(IgniteContext igniteContext, String cacheName, Dataset<Row> dataSet) {
CacheConfiguration<BinaryObject, BinaryObject> cacheConfiguration = new CacheConfiguration<BinaryObject, BinaryObject>(cacheName)
.setAtomicityMode(CacheAtomicityMode.ATOMIC)
.setBackups(0)
.setAffinity(new RendezvousAffinityFunction(false, 2))
.setIndexedTypes(BinaryObject.class, BinaryObject.class);
IgniteCache<BinaryObject, BinaryObject> rddCache = igniteContext.ignite()
.getOrCreateCache(cacheConfiguration)
.withKeepBinary();
rddCache.clear();
IgniteRDD<BinaryObject, BinaryObject> igniteRDD = igniteContext.fromCache(cacheName);
StructField[] fields = dataSet.schema().fields();
RDD<BinaryObject> binaryObjectJavaRDD = dataSet.toJavaRDD().map(row -> {
BinaryObjectBuilder valueBuilder = igniteContext.ignite().binary().builder(BinaryObject.class.getCanonicalName());
for (int i = 0; i < fields.length; i++) {
valueBuilder.setField(fields[i].name(), convertValue(String.valueOf(row.get(i)), fields[i].dataType())); //convertValue converts value to specific datatype
}
return valueBuilder.build();
}).rdd();
igniteRDD.saveValues(binaryObjectJavaRDD);
}
I have a problem with the above code, that is even after successful completion of this method cache remains empty. Dataset has 20 rows so that is not the problem.
The other problem is that if I use savePairs method from IgniteRDD then I have to generate the Key by myself(here Key is BinaryObject), so how to do that?
update
saveDFInPairs(IgniteContext igniteContext, Dataset<Row> dataSet, IgniteRDD<BinaryObject, BinaryObject> igniteRDD) {
StructField[] fields = dataSet.schema().fields();
JavaRDD<Tuple2<BinaryObject, BinaryObject>> rdd = dataSet.toJavaRDD().map(row -> {
BinaryObjectBuilder keyBuilder = igniteContext.ignite()
.binary().builder("TypeName");
keyBuilder.setField("id", row.mkString().hashCode());
BinaryObject key = keyBuilder.build();
BinaryObjectBuilder valueBuilder = igniteContext.ignite()
.binary().builder("TypeName");
for (int i = 0; i < fields.length; i++) {
valueBuilder.setField(fields[i].name(), convert(row, i, fields[i].dataType()));
}
BinaryObject value = valueBuilder.build();
return new Tuple2<>(key, value);
});
igniteRDD.savePairs(rdd.rdd(), true);
}

Couple of considerations:
The type name (the one passed to the builder() method) should be a meaningful name representing the data type. Do not use BinaryObject class name for this.
setIndexedTypes(BinaryObject.class, BinaryObject.class) is incorrect. This should specify classes to be processed for query annotations. If you don't have classes, you can use QueryEntity to configure queries. See this page for further details: https://apacheignite.readme.io/docs/sql-queries
Other than that code looks correct. I would recommend to try with default settings and check if it works this way. Also it's not very clear how you check that the data is in cache or not.

Related

Variable outside local scope is not defined for test case

I wish to access outside variables for a test function that I am writing, in Groovy.
However, it doesn't seem that I can.
My code is like this:
Map<String, String> originalTableRowState = new HashMap<String, String>(),
newTableRowState
// if there is table data to get, and do actions on
def WebDriver driver = DriverFactory.getWebDriver()
def List<WebElement> dataRows = driver.findElements(
By.cssSelector('div.tab-pane.active .dataTables_scrollBody tbody tr:not(.dataTables_empty)'))
'if there\'s table data, this test should run'
if (dataRows.size() > 0) {
WebUI.comment('populate the tableRowState with the data from the first table row')
fetchFirstRowDataInto(originalTableRowState)
}
void fetchFirstRowDataInto(Map<String, String> tableRowState) {
List<WebElement> tableHeadings = driver.findElements(
By.cssSelector('div.tab-pane.active .dataTables_scrollHead th'))
WebElement firstRow = dataRows.get(0)
List<WebElement> dataCells = firstRow.findElements(
By.xpath('//td[not(#class="dataTables_empty") and not(*)]'))
for (int i = 0; i < dataCells.size(); i++) {
// save data to originalTableRowState with the table header text as the key
tableRowState.put(tableHeadings.get(i), dataCells.get(i))
}
}
and when I run it, it greets me with the error saying that Variable 'driver' is not defined outside test case. I just added the def keywords to the driver,dataRows definintions.
How to make driver,dataRows accessible inside functions, without passing them in as parameters?
I fixed the method variable-access issue by declaring it a JS-like closure:
/* change void fetchFirstRowDataInto(Map<String, String> tableRowState) { to def fetchFirstRowDataInto = { Map<String, String> tableRowState -> */
and putting the definition above the invocation.
I welcome any better solutions...

Return a set of objects from a class

I have a method that adds a new item to an EF table, then queries back the table to return a subset of the table. It needs to return to the caller a set of "rows", each of which is a set of columns. I'm not sure how to do this. I have some code, but I think it's wrong. I don't want to return ONE row, I want to return zero or more rows. I'm not sure what DataType to use... [qryCurrentTSApproval is an EF object, referring to a small view in SS. tblTimesheetEventlog is also an EF object, referring to the underlying table]
Ideas?
private qryCurrentTSApproval LogApprovalEvents(int TSID, int EventType)
{
using (CPASEntities ctx = new CPASEntities())
{
tblTimesheetEventLog el = new tblTimesheetEventLog();
el.TSID = TSID;
el.TSEventType = EventType;
el.TSEUserName = (string)Session["strShortUserName"];
el.TSEventDateTime = DateTime.Now;
ctx.tblTimesheetEventLogs.AddObject(el);
ctx.AcceptAllChanges();
var e = (from x in ctx.qryCurrentTSApprovals
where x.TSID == TSID
select x);
return (qryCurrentTSApproval)e;
}
}
Change your method return type to a collection of qryCurrentTSApproval
private List<qryCurrentTSApproval> LogApprovalEvents(int TSID, int EventType)
{
using (CPASEntities ctx = new CPASEntities())
{
// some other existing code here
var itemList = (from x in ctx.qryCurrentTSApprovals
where x.TSID == TSID
select x).ToList();
return itemList;
}
}

Limitation in Cassandra-0.8.1 when using batch mutation

I found some exceptions from cassandra when I do batch mutation, it said "already has modifications in this mutation", but the info given are two different operations.
I use Super column with counters in this case, it's like
Key: md5 of urls, utf-8
SuperColumnName: date, utf-8
ColumnName: Counter name is a random number from 1 to 200,
ColumnValue:1L
L
public void SuperCounterMutation(ArrayList<String> urlList) {
LinkedList<HCounterSuperColumn<String, String>> counterSuperColumns;
for(String line : urlList) {
String[] ele = StringUtils.split(StringUtils.strip(line), ':');
String key = ele[0];
String SuperColumnName = ele[1];
LinkedList<HCounterColumn<String>> ColumnList = new LinkedList<HCounterColumn<String>>();
for(int i = 2; i < ele.length; ++i) {
ColumnList.add(HFactory.createCounterColumn(ele[i], 1L, ser));
}
mutator.addCounter(key, ColumnFamilyName, HFactory.createCounterSuperColumn(SuperColumnName, ColumnList, ser, ser));
++count;
if(count >= BUF_MAX_NUM) {
try {
mutator.execute();
} catch(Exception e) {
e.printStackTrace();
}
mutator = HFactory.createMutator(keyspace, ser);
count = 0;
}
}
return;
}
Error info from cassandra log showed that the duplicated operations have the same key only, SuperColumnName are not the same, and for counter name set, some conflicts have intersects and some not.
I'm using Cassandra 0.8.1 with hector 0.8.0-rc2
Can anyone tell me the reason of this problem? Thanks in advance!
Error info from cassandra log showed that the duplicated operations have the same key
Bingo. You'll need to combine operations from the same key into a single mutation.

Anonymous type and getting values out side of method scope

I am building an asp.net site in .net framework 4.0, and I am stuck at the method that supposed to call a .cs class and get the query result back here is my method call and method
1: method call form aspx.cs page:
helper cls = new helper();
var query = cls.GetQuery(GroupID,emailCap);
2: Method in helper class:
public IQueryable<VariablesForIQueryble> GetQuery(int incomingGroupID, int incomingEmailCap)
{
var ctx = new some connection_Connection();
ObjectSet<Members1> members = ctx.Members11;
ObjectSet<groupMember> groupMembers = ctx.groupMembers;
var query = from m in members
join gm in groupMembers on m.MemberID equals gm.MemID
where (gm.groupID == incomingGroupID) && (m.EmailCap == incomingEmailCap)
select new VariablesForIQueryble(m.MemberID, m.MemberFirst, m.MemberLast, m.MemberEmail, m.ValidEmail, m.EmailCap);
//select new {m.MemberID, m.MemberFirst, m.MemberLast, m.MemberEmail, m.ValidEmail, m.EmailCap};
return query ;
}
I tried the above code with IEnumerable too without any luck. This is the code for class VariablesForIQueryble:
3:Class it self for taking anonymouse type and cast it to proper types:
public class VariablesForIQueryble
{
private int _emailCap;
public int EmailCap
{
get { return _emailCap; }
set { _emailCap = value; }
}`....................................
4: and a constructor:
public VariablesForIQueryble(int memberID, string memberFirst, string memberLast, string memberEmail, int? validEmail, int? emailCap)
{
this.EmailCap = (int) emailCap;
.........................
}
I can't seem to get the query result back, first it told me anonymous type problem, I made a class after reading this: link text; and now it tells me constructors with parameters not supported. Now I am an intermediate developer, is there an easy solution to this or do I have to take my query back to the .aspx.cs page.
If you want to project to a specific type .NET type like this you will need to force the query to actually happen using either .AsEnumerable() or .ToList() and then use .Select() against linq to objects.
You could leave your original anonymous type in to specify what you want back from the database, then call .ToList() on it and then .Select(...) to reproject.
You can also clean up your code somewhat by using an Entity Association between Groups and Members using a FK association in the database. Then the query becomes a much simpler:
var result = ctx.Members11.Include("Group").Where(m => m.Group.groupID == incomingGroupID && m.EmailCap == incomingEmailCap);
You still have the issue of having to do a select to specify which columns to return and then calling .ToList() to force execution before reprojecting to your new type.
Another alternative is to create a view in your database and import that as an Entity into the Entity Designer.
Used reflection to solve the problem:
A: Query, not using custom made "VariablesForIQueryble" class any more:
//Method in helper class
public IEnumerable GetQuery(int incomingGroupID, int incomingEmailCap)
{
var ctx = new some_Connection();
ObjectSet<Members1> members = ctx.Members11;
ObjectSet<groupMember> groupMembers = ctx.groupMembers;
var query = from m in members
join gm in groupMembers on m.MemberID equals gm.MemID
where ((gm.groupID == incomingGroupID) && (m.EmailCap == incomingEmailCap)) //select m;
select new { m.MemberID, m.MemberFirst, m.MemberLast, m.MemberEmail, m.ValidEmail, m.EmailCap };
//select new VariablesForIQueryble (m.MemberID, m.MemberFirst, m.MemberLast, m.MemberEmail, m.ValidEmail, m.EmailCap);
//List<object> lst = new List<object>();
//foreach (var i in query)
//{
// lst.Add(i.MemberEmail);
//}
//return lst;
//return query.Select(x => new{x.MemberEmail,x.MemberID,x.ValidEmail,x.MemberFirst,x.MemberLast}).ToList();
return query;
}
B:Code to catch objects and conversion of those objects using reflection
helper cls = new helper();
var query = cls.GetQuery(GroupID,emailCap);
if (query != null)
{
foreach (var objRow in query)
{
System.Type type = objRow.GetType();
int memberId = (int)type.GetProperty("MemberID").GetValue(objRow, null);
string memberEmail = (string)type.GetProperty("MemberEmail").GetValue(objRow, null);
}
else
{
something else....
}

how to run stored procedure from groovy that returns multiple resultsets

I couldnt find any good example of doing this online.
Can someone please show how to run a stored procedure (that returns multiple resultsets) from groovy?
Basically I am just trying to determine how many resultsets the stored procedure returns..
I have written a helper which allows me to work with stored procedures that return a single ResultSet in a way that is similar to working with queries with groovy.sql.Sql. This could easily be adapted to process multiple ResultSets (I assume each would need it's own closure).
Usage:
Sql sql = Sql.newInstance(dataSource)
SqlHelper helper = new SqlHelper(sql);
helper.eachSprocRow('EXEC sp_my_sproc ?, ?, ?', ['a', 'b', 'c']) { row ->
println "foo=${row.foo}, bar=${row.bar}, baz=${row.baz}"
}
Code:
class SqlHelper {
private Sql sql;
SqlHelper(Sql sql) {
this.sql = sql;
}
public void eachSprocRow(String query, List parameters, Closure closure) {
sql.cacheConnection { Connection con ->
CallableStatement proc = con.prepareCall(query)
try {
parameters.eachWithIndex { param, i ->
proc.setObject(i+1, param)
}
boolean result = proc.execute()
boolean found = false
while (!found) {
if (result) {
ResultSet rs = proc.getResultSet()
ResultSetMetaData md = rs.getMetaData()
int columnCount = md.getColumnCount()
while (rs.next()) {
// use case insensitive map
Map row = new TreeMap(String.CASE_INSENSITIVE_ORDER)
for (int i = 0; i < columnCount; ++ i) {
row[md.getColumnName(i+1)] = rs.getObject(i+1)
}
closure.call(row)
}
found = true;
} else if (proc.getUpdateCount() < 0) {
throw new RuntimeException("Sproc ${query} did not return a result set")
}
result = proc.getMoreResults()
}
} finally {
proc.close()
}
}
}
}
All Java classes are usable from Groovy. If Groovy does not give you a way to do it, then you can do it Java-way using JDBC callable statements.
I just stumbled across what could possibly be a solution to your problem, if an example was what you were after, have a look at the reply to this thread

Resources