I have a database manager class that manages access do the database. It contains the connections pool and 2 DAOs. Each for a different table. Looks something like this:
public class ActivitiesDatabase {
private final ConnectionSource connectionSource;
private final Dao<JsonActivity, String> jsonActivityDao;
private final Dao<AtomActivity, String> atomActivityDao;
private ActivitiesDatabase() {
try {
connectionSource = new JdbcPooledConnectionSource(Consts.JDBC);
TableUtils.createTableIfNotExists(connectionSource, JsonActivity.class);
jsonActivityDao = DaoManager.createDao(connectionSource, JsonActivity.class);
TableUtils.createTableIfNotExists(connectionSource, AtomActivity.class);
atomActivityDao = DaoManager.createDao(connectionSource, AtomActivity.class);
} catch (SQLException e) {
throw new RuntimeException(e);
}
}
public long insertAtom(String id, String content) throws SQLException {
long additionTime = System.currentTimeMillis();
atomActivityDao.createIfNotExists(new Activity(id, content, additionTime));
return additionTime;
}
public long insertJson(String id, String content) throws SQLException {
long additionTime = System.currentTimeMillis();
jsonActivityDao.createIfNotExists(new Activity(id, content, additionTime));
return additionTime;
}
public AtomResult getAtomEntriesBetween(long from, long to) throws SQLException {
long updated = System.currentTimeMillis();
PreparedQuery<Activity> query = atomActivityDao.queryBuilder().limit(500L).orderBy(Activity.UPDATED_FIELD, true).where().between(Activity.UPDATED_FIELD, from, to).prepare();
return new Result(atomActivityDao.query(query), updated);
}
public JsonResult getJsonEntriesBetween(long from, long to) throws SQLException {
long updated = System.currentTimeMillis();
PreparedQuery<Activity> query = jsonActivityDao.queryBuilder().limit(500L).orderBy(Activity.UPDATED_FIELD, true).where().between(Activity.UPDATED_FIELD, from, to).prepare();
return new Result(jsonActivityDao.query(query), updated);
}
}
In addition, I have two thread running using the same database manager. Each thread writes to a different table. There are also threads who read from the database. A reading thread can read from any table.
I noticed in the ConnectionsSource documentation that it is not thread safe.
my question is. Should I synchronize the function that write to the database.
Would the answer to my question be different if both write thread were to write to the the same table?
I noticed in the ConnectionsSource documentation that it is not thread safe.
Right but you are using the JdbcPooledConnectionSource which is thread-safe.
Would I synchronize the function that write to the database.
You shouldn't have a problem with ORMLite doing this. However, you need to make sure that your database supports multiple concurrent database updates. For example, you won't have a problem if you are using MySQL, Postgres, or Oracle. You'll need to read up on H2 multithreading to see what options you will need to use to get that to work.
Would the answer to my question be different if both write thread were to write to the the same table?
That would increase the concurrency so (uh) maybe? Again it depends on the database type.
You may use Connection pool for multithreading working with ORMLite here is javaDoc
Related
I am evaluating spark with marklogic database. I have read a csv file, now i have a JavaRDD object which i have to dump into marklogic database.
SparkConf conf = new SparkConf().setAppName("org.sparkexample.Dataload").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> data = sc.textFile("/root/ml/workArea/data.csv");
SQLContext sqlContext = new SQLContext(sc);
JavaRDD<Record> rdd_records = data.map(
new Function<String, Record>() {
public Record call(String line) throws Exception {
String[] fields = line.split(",");
Record sd = new Record(fields[0], fields[1], fields[2], fields[3],fields[4]);
return sd;
}
});
This JavaRDD object i want to write to marklogic database.
Is there any spark api available for faster writing to the marklogic database ?
Lets say, If we could not write JavaRDD directly to marklogic then what is the currect approach to achieve this ?
Here is the code which i am using to write the JavaRDD data to marklogic database, let me know if it is wrong way to do that.
final DatabaseClient client = DatabaseClientFactory.newClient("localhost",8070, "MLTest");
final XMLDocumentManager docMgr = client.newXMLDocumentManager();
rdd_records.foreachPartition(new VoidFunction<Iterator<Record>>() {
public void call(Iterator<Record> partitionOfRecords) {
while (partitionOfRecords.hasNext()) {
Record record = partitionOfRecords.next();
System.out.println("partitionOfRecords - "+record.toString());
String docId = "/example/"+record.getID()+".xml";
JAXBContext context = JAXBContext.newInstance(Record.class);
JAXBHandle<Record> handle = new JAXBHandle<Record>(context);
handle.set(record);
docMgr.writeAs(docId, handle);
}
}
});
client.release();
I have used java client api to write the data, but i am getting below exception even though POJO class Record is implementing Serializable interface. Please let me know what could be the reason & how to solve that.
org.apache.spark.sparkexception task not Serializable .
The easiest way to get data into MarkLogic is via HTTP and the client REST API - specifically the /v1/documents endpoints - http://docs.marklogic.com/REST/client/management .
There are a variety of ways to optimize this, such as via a write set, but based on your question, I think the first thing to decide is - what kind of document do you want to write for each Record? Your example shows 5 columns in the CSV - typically, you'll write either a JSON or XML document with 5 fields/elements, each named based on the column index. So you'd need to write a little code to generate that JSON/XML, and then use whatever HTTP client you prefer (and one option is the MarkLogic Java Client API) to write that document to MarkLogic.
That addresses your question of how to write a JavaRDD to MarkLogic - but if your goal is to get data from a CSV into MarkLogic as fast as possible, then skip Spark and use mlcp - https://docs.marklogic.com/guide/mlcp/import#id_70366 - which involves zero coding.
Modified example from spark streaming guide, Here you will have to implement connection and writing logic specific to database.
public void send(JavaRDD<String> rdd) {
rdd.foreachPartition(new VoidFunction<Iterator<String>>() {
#Override
public void call(Iterator<String> partitionOfRecords) {
// ConnectionPool is a static, lazily initialized pool of
Connection connection = ConnectionPool.getConnection();
while (partitionOfRecords.hasNext()) {
connection.send(partitionOfRecords.next());
}
ConnectionPool.returnConnection(connection); // return to the pool
// for future reuse
}
});
}
I'm wondering if you just need to make sure everything you access inside your VoidFunction that was instantiated outside it is serializable (see this page). DatabaseClient and XMLDocumentManager are of course not serializable, as they're connected resources. You're right, however, to not instantiate DatabaseClient inside your VoidFunction as that would be less efficient (though it would work). I don't know if the following idea would work with spark. But I'm guessing you could create a class that keeps hold of a singleton DatabaseClient instance:
public static class MLClient {
private static DatabaseClient singleton;
private MLClient() {}
public static DatabaseClient get(DatabaseClientFactory.Bean connectionInfo) {
if ( connectionInfo == null ) {
throw new IllegalArgumentException("connectionInfo cannot be null");
}
if ( singleton == null ) {
singleton = connectionInfo.newClient();
}
return singleton;
}
}
then you just create a serializable DatabaseClientFactory.Bean outside your VoidFunction so your auth info is still centralized
DatabaseClientFactory.Bean connectionInfo =
new DatabaseClientFactory.Bean();
connectionInfo.setHost("localhost");
connectionInfo.setPort(8000);
connectionInfo.setUser("admin");
connectionInfo.setPassword("admin");
connectionInfo.setAuthenticationValue("digest");
Then inside your VoidFunction you could get that singleton DatabaseClient and new XMLDocumentManager like so:
DatabaseClient client = MLClient.get(connectionInfo);
XMLDocumentManager docMgr = client.newXMLDocumentManager();
Having an huge customers profile page if two or more users start using same page and start editing big change will happen in my database so planing to implement Threads concept where only one user can use that customer page
i'm aware about threads concept but confused how to implement it
hope i need to use Singleton class as well
Any suggestion or Logic's will be helpful
I'm using Struts,Hibernate frame work
You may use application context to store a flag variable. Action will use its value to allow only one simultaneous execution.
public class TestAction extends ActionSupport implements ApplicationAware {
private static final String APP_BUSY_KEY = "APP_BUSY";
Map<String, Object> map;
#Override
public void setApplication(Map<String, Object> map) {
this.map = map;
}
#Override
public String execute() throws Exception {
if (map.containsKey(APP_BUSY_KEY)) {
return ERROR;
} else {
map.put(APP_BUSY_KEY, "1");
try {
// action logic here
} finally {
map.remove(APP_BUSY_KEY);
}
return SUCCESS;
}
}
}
If you plan to implement similar logic for two requests (lock after displaying values and release lock after submitting new values) then logic will be more complex and you will also need to handle lock release after timeout.
Sorry for big chunk of code, I couldn't explain that with less.Basically I'm trying to write into a file from many tasks.
Can you guys please tell me what I'm doing wrong? _streamWriter.WriteLine() throws the ArgumentOutOfRangeException.
class Program
{
private static LogBuilder _log = new LogBuilder();
static void Main(string[] args)
{
var acts = new List<Func<string>>();
var rnd = new Random();
for (int i = 0; i < 10000; i++)
{
acts.Add(() =>
{
var delay = rnd.Next(300);
Thread.Sleep(delay);
return "act that that lasted "+delay;
});
}
Parallel.ForEach(acts, act =>
{
_log.Log.AppendLine(act.Invoke());
_log.Write();
});
}
}
public class LogBuilder : IDisposable
{
public StringBuilder Log = new StringBuilder();
private FileStream _fileStream;
private StreamWriter _streamWriter;
public LogBuilder()
{
_fileStream = new FileStream("log.txt", FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite);
_streamWriter = new StreamWriter(_fileStream) { AutoFlush = true };
}
public void Write()
{
lock (Log)
{
if (Log.Length <= 0) return;
_streamWriter.WriteLine(Log.ToString()); //throws here. Although Log.Length is greater than zero
Log.Clear();
}
}
public void Dispose()
{
_streamWriter.Close(); _streamWriter.Dispose(); _fileStream.Close(); fileStream.Dispose();
}
}
This is not a bug in StringBuilder, it's a bug in your code. And the modification you shown in your followup answer (where you replace Log.String with a loop that extracts characters one at a time) doesn't fix it. It won't throw an exception any more, but it won't work properly either.
The problem is that you're using the StringBuilder in two places in your multithreaded code, and one of them does not attempt to lock it, meaning that reading can occur on one thread simultaneously with writing occurring on another. In particular, the problem is this line:
_log.Log.AppendLine(act.Invoke());
You're doing that inside your Parallel.ForEach. You are not making any attempt at synchronization here, even though this will run on multiple threads at once. So you've got two problems:
Multiple calls to AppendLine may be in progress simultaneously on multiple threads
One thread may attempt to be calling Log.ToString at the same time as one or more other threads are calling AppendLine
You'll only get one read at a time because you are using the lock keyword to synchronize those. The problem is that you're not also acquiring the same lock when calling AppendLine.
Your 'fix' isn't really a fix. You've succeeded only in making the problem harder to see. It will now merely go wrong in different and more subtle ways. For example, I'm assuming that your Write method still goes on to call Log.Clear after your for loop completes its final iteration. Well in between completing that final iteration, and making the call to Log.Clear, it's possible that some other thread will have got in another call to AppendLine because there's no synchronization on those calls to AppendLine.
The upshot is that you will sometimes miss stuff. Code will write things into the string builder that then get cleared out without ever being written to the stream writer.
Also, there's a pretty good chance of concurrent AppendLine calls causing problems. If you're lucky they will crash from time to time. (That's good because it makes it clear you have a problem to fix.) If you're unlucky, you'll just get data corruption from time to time - two threads may end up writing into the same place in the StringBuilder resulting either in a mess, or completely lost data.
Again, this is not a bug in StringBuilder. It is not designed to support being used simultaneously from multiple threads. It's your job to make sure that only one thread at a time does anything to any particular instance of StringBuilder. As the documentation for that class says, "Any instance members are not guaranteed to be thread safe."
Obviously you don't want to hold the lock while you call act.Invoke() because that's presumably the very work you want to parallelize. So I'd guess something like this might work better:
string result = act();
lock(_log.Log)
{
_log.Log.AppendLine(result);
}
However, if I left it there, I wouldn't really be helping you, because this looks very wrong to me.
If you ever find yourself locking a field in someone else's object, it's a sign of a design problem in your code. It would probably make more sense to modify the design, so that the LogBuilder.Write method accepts a string. To be honest, I'm not even sure why you're using a StringBuilder here at all, as you seem to use it just as a holding area for a string that you immediately write to a stream writer. What were you hoping the StringBuilder would add here? The following would be simpler and doesn't seem to lose anything (other than the original concurrency bugs):
public class LogBuilder : IDisposable
{
private readonly object _lock = new object();
private FileStream _fileStream;
private StreamWriter _streamWriter;
public LogBuilder()
{
_fileStream = new FileStream("log.txt", FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite);
_streamWriter = new StreamWriter(_fileStream) { AutoFlush = true };
}
public void Write(string logLine)
{
lock (_lock)
{
_streamWriter.WriteLine(logLine);
}
}
public void Dispose()
{
_streamWriter.Dispose(); fileStream.Dispose();
}
}
I think the cause is because you are accessing the stringBuilder in the Parellel bracket
_log.Log.AppendLine(act.Invoke());
_log.Write();
and inside the LogBuilder you perform lock() to disallow memory allocation on stringBuidler. You are changing the streamwriter to handle the log in every character so would give the parellel process to unlock the memory allocation to stringBuilder.
Segregate the parallel process into distinct action would likely reduce the problem
Parallel.ForEach(acts, act =>
{
_log.Write(act.Invoke());
});
in the LogBuilder class
private readonly object _lock = new object();
public void Write(string logLines)
{
lock (_lock)
{
//_wr.WriteLine(logLines);
Console.WriteLine(logLines);
}
}
An alternate approach is to use TextWriter.Synchronized to wrap StreamWriter.
void Main(string[] args)
{
var rnd = new Random();
var writer = new StreamWriter(#"C:\temp\foo.txt");
var syncedWriter = TextWriter.Synchronized(writer);
var tasks = new List<Func<string>>();
for (int i = 0; i < 1000; i++)
{
int local_i = i; // get a local value, not closure-reference to i
tasks.Add(() =>
{
var delay = rnd.Next(5);
Thread.Sleep(delay);
return local_i.ToString() + " act that that lasted " + delay.ToString();
});
}
Parallel.ForEach(tasks, task =>
{
var value = task();
syncedWriter.WriteLine(value);
});
writer.Dispose();
}
Here are some of the synchronization helper classes
http://referencesource.microsoft.com/#q=Synchronized
System.Collections
static ArrayList Synchronized(ArrayList list)
static IList Synchronized(IList list)
static Hashtable Synchronized(Hashtable table)
static Queue Synchronized(Queue queue)
static SortedList Synchronized(SortedList list)
static Stack Synchronized(Stack stack)
System.Collections.Generic
static IList Synchronized(List list)
System.IO
static Stream Synchronized(Stream stream)
static TextReader Synchronized(TextReader reader)
static TextWriter Synchronized(TextWriter writer)
System.Text.RegularExpressions
static Match Synchronized(Match inner)
static Group Synchronized(Group inner)
It is seems that it isn't problem of Parallelism. It's StringBuilder's problem.
I have replaced:
_streamWriter.WriteLine(Log.ToString());
with:
for (int i = 0; i < Log.Length; i++)
{
_streamWriter.Write(Log[i]);
}
And it worked.
For future reference: http://msdn.microsoft.com/en-us/library/system.text.stringbuilder(v=VS.100).aspx
Memory allocation section.
In a multi form .NetCF 3.5 application I'm trying create the forms in the background while the user is occupied with the previous form.
We're using Orientation Aware Control in the project
We use a wrapper class (FormController) (please let me know if I'm using the wrong terminology) to keep static references to the different forms in our application. Since we only want to create them once.
At the moment the Forms are created the first time they are used. But since this is a time consuming operation we'd like to do this in the background while the user
Application.Run(new FormController.StartUI());
class FormController{
private static object lockObj = new object();
private static bool secIsLoaded = false;
private static StartForm startForm = new StartForm();
private static SecForm m_SecForm;
static SecForm FormWorkOrderList
{
get
{
CreateSecForm();
return m_SecForm;
}
}
private static void StartUI(){
startForm.Show();
ThreadStart tsSecForm = CreateSecForm;
Thread trSecForm = new Thread(tsSecForm);
trSecForm.Priority = ThreadPriority.BelowNormal;
trSecForm.IsBackground = true;
trSecForm.Start();
return startForm;
}
private static void CreateSecForm()
{
Monitor.Enter(lockObj);
if(!secIsLoaded){
m_SecForm = new SecForm();
secIsLoaded = true;
}
Monitor.Exit(lockObj);
}
private static void GotoSecForm()
{
SecForm.Show();
StartForm.Hide();
}
When I call GotoSecForm() the program throws an excepton on SecForm.Show() with an exection with hResult: 2146233067 and no other valuable information.
The stacktrace of the exception is:
on Microsoft.AGL.Common.MISC.HandleAr(PAL_ERROR ar)
on System.Windows.Forms.Control.SuspendLayout()
on b..ctor(OrientationAwareControl control)
on Clarius.UI.OrientationAwareControl.ApplyResources(CultureInfo cultureInfo, Boolean skipThis)
on Clarius.UI.OrientationAwareControl.ApplyResources()
on Clarius.UI.OrientationAwareControl.OnLoad(EventArgs e)
on Clarius.UI.OrientationAwareControl.c(Object , EventArgs )
on System.Windows.Forms.Form.OnLoad(EventArgs e)
on System.Windows.Forms.Form._SetVisibleNotify(Boolean fVis)
on System.Windows.Forms.Control.set_Visible(Boolean value)
on System.Windows.Forms.Control.Show()
I'm quite qlueless about what's going wrong here. Can anyone help me out?
Or are there some better ways to load the forms in the background?
Let me know if any more information is needed.
You can't create forms (or safely do any manipulation of controls or forms) in background threads. They need to be created on the same thread that the message pump is running on - its just the way that Windows Forms work.
Creating the form itself shouldn't be in itself an expensive task. My advice would be to perform any expensive computations needed to display the form in a background thread, and then pass the result of those computations back to the main message pump in order to create and display the form itself.
(Half way through writing this I realised that this question is about windows mobile, however I'm 99% sure that the above still applies in this situation)
Have a table 'temp' ..
Code:
CREATE TABLE `temp` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`student_id` bigint(20) unsigned NOT NULL,
`current` tinyint(1) NOT NULL DEFAULT '1',
`closed_at` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_index` (`student_id`,`current`,`closed_at`),
KEY `studentIndex` (`student_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
The corresponding Java pojo is http://pastebin.com/JHZwubWd . This table has a unique constraint such that only one record for each student can be active.
2) I have a test code which does try to continually add records for a student ( each time making the older active one as inactive and adding a new active record) and also in a different thread accessing some random ( non-related ) table.
Code:
public static void main(String[] args) throws Exception {
final SessionFactory sessionFactory = new AnnotationConfiguration().configure().buildSessionFactory();
ExecutorService executorService = Executors.newFixedThreadPool(1);
int runs = 0;
while(true) {
Temp testPojo = new Temp();
testPojo.setStudentId(1L);
testPojo.setCurrent(true);
testPojo.setClosedAt(new Date(0));
add(testPojo, sessionFactory);
Thread.sleep(1500);
executorService.submit(new Callable<Object>() {
#Override
public Object call() throws Exception {
Session session = sessionFactory.openSession();
// Some dummy code to print number of users in the system.
// Idea is to "touch" the DB/session in this background
// thread.
System.out.println("No of users: " + session.createCriteria(User.class).list().size());
session.close();
return null;
}
});
if(runs++ > 100) {
break;
}
}
executorService.shutdown();
executorService.awaitTermination(1, TimeUnit.MINUTES);
}
private static void add(final Temp testPojo, final SessionFactory sessionFactory) throws Exception {
Session dbSession = null;
Transaction transaction = null;
try {
dbSession = sessionFactory.openSession();
transaction = dbSession.beginTransaction();
// Set all previous state of the student as not current.
List<Temp> oldActivePojos = (List<Temp>) dbSession.createCriteria(Temp.class)
.add(Restrictions.eq("studentId", testPojo.getStudentId())).add(Restrictions.eq("current", true))
.list();
for(final Temp oldActivePojo : oldActivePojos) {
oldActivePojo.setCurrent(false);
oldActivePojo.setClosedAt(new Date());
dbSession.update(oldActivePojo);
LOG.debug(String.format(" Updated old state as inactive:%s", oldActivePojo));
}
if(!oldActivePojos.isEmpty()) {
dbSession.flush();
}
LOG.debug(String.format(" saving state:%s", testPojo));
dbSession.save(testPojo);
LOG.debug(String.format(" new state saved:%s", testPojo));
transaction.commit();
}catch(Exception exception) {
LOG.fatal(String.format("Exception in adding state: %s", testPojo), exception);
transaction.rollback();
}finally {
dbSession.close();
}
}
Upon running the code, after a few runs, I am getting an index constraint exception. It happens because for some strange reason, it does not find the latest active record but instead some older stale active record and tries marking it as inactive before saving ( though the DB actually has a new active record already present).
Notice that both the code share the same sessionfactory and the both code works on a totally different tables. My guess is that some internal cache state gets dirty. If I use 2 different sessionfactory for the foreground and background thread, it works fine.
Another weird thing is that in the background thread ( where I print the no of users), if I wrap it in a transaction ( even though it is only a read operation), the code works fine! Sp looks like I need to wrap all DB operations ( irrespective of read / write ) in a transaction for it to work in a multithreaded environment.
Can someone point out the issue?
Yes, basically, transaction demarcation is always needed:
Hibernate documentation says:
Database, or system, transaction boundaries are always necessary. No communication with the database can occur outside of a database transaction (this seems to confuse many developers who are used to the auto-commit mode). Always use clear transaction boundaries, even for read-only operations. Depending on your isolation level and database capabilities this might not be required, but there is no downside if you always demarcate transactions explicitly.
When trying to reproduce your setup I experienced some problems caused by the lack of transaction demarcation (though not the same as yours). Further investigation showed that sometimes, depending on connection pool configuration, add() is executed in the same database transaction as the previous call(). Adding beginTransaction()/commit() to call() fixed that problem. This behaviour can be responsible for your problem, since, depending on transaction isolation level, add() can work with the stale snapshot of the database taken at the begining of transaction, i.e. during the previous call().