We are moving our multi-database web application from LS to a Java beans architecture, but are struggling to decide how best to handle database connections and what scope should we use for them.
If we use sessionScope then connection to 5-6 databases per call will be created for each user. If we use a applicationScope bean for the database connection then it will remain open until the server is restarted, causing memory leaks. I understand that certain values such as System Configuration values which rarely change can be cached at applicationScope level, but I am concerned about the rest of the connections.
My question really is what's the best way to handle domino database connections (domino objects are not serializable) without affecting performance or memory leaks or automatic GC issues?
This is a tough one because it deals with architecting a specific solution vs just some generic "this works better than that" advice. We have had great success architecting a consumer XPage application so that data is retrieved from additional databases. Sort of a front end with database backends but with Domino.
We use no applicationScope anything because there is nothing global to the application but even if there was there is enough chatter out there to indicate that perhaps applicationScope is not as ubiquitous as it sounds and therefore you have to monitor your objects closely.
You already figured out the Domino object issue so that has to be done no matter which approach you choose.
Depending on your application you may be staring down some major rearchitecting but my recommendation is to try it with the sessionScope first and see how it performs. Do some benchmarking. If it works fast enough then go with that but as you develop your beans you should pay VERY close attention to performance optimization. The multiple database calls could be an issue but you really won't know until you play with it a little bit.
One thing that will help is that if you build your classes beans using a more detailed architecture than you think you need at first (don't try to pile everything into a single class or bean), not only will it be easier to adapt your architecture if needed but you will also start to see design patterns that you may not have even known were possibilities.
As Russell mentions, there is no one way to do this and each will have their pros/cons.
There is a Wrapped Document class you can use to store Document information.
public static DominoDocument wrap(java.lang.String database,
lotus.domino.Database db,
java.lang.String parentId,
java.lang.String form,
java.lang.String computeWithForm,
java.lang.String concurrencyMode,
boolean allowDeletedDocs,
java.lang.String saveLinksAs)
Javadoc is here:
http://public.dhe.ibm.com/software/dw/lotus/Domino-Designer/JavaDocs/XPagesExtAPI/8.5.2/com/ibm/xsp/model/domino/wrapped/DominoDocument.html
However this just does some of the handling of recycle() in the background. So you are still going to have the same overheads generated by making/recycle() of the database objects.
The main overhead you will find is the creating the connection to the Database in your Java code. Once that connection is made, everything else is relatively faster.
I would recommend when testing this for performance that you use the XPages Toolkit. Videos on how to use it are part of the XPages Masterclass on openNTF.
http://www.openntf.org/internal/home.nsf/project.xsp?action=openDocument&name=XPages%20Masterclass
Related
I'm using Hibernate in an embedded Jetty server, and I want to be able to parallelize my data processing with some multithreading and still have it all be in the same transaction. As Sessions are not thread safe this means I need a way to get multiple sessions attached to the same transaction, which means I need to switch away from the "thread" session context I've been using.
By my understanding of the documentation, this means I need to switch to JTA session context, but I'm having trouble getting that to work. My research so far seems to indicate that it requires something external to Hibernate in the server to provide transaction management, and that Jetty does not have such a thing built in, so I would have to pull in some additional library to do it. The top candidates I keep running across for that generally seem to be large packages that do all sorts of other stuff too, which seems wasteful, confusing, and distracting when I'm just looking for the one specific feature.
So, what is the minimal least disruptive setup and configuration change that will allow getCurrentSession() to return Sessions attached to the same transaction in different threads?
While I'm at it, I know that fetching objects in one thread and altering them in another is not safe, but what about reading their properties in another thread, for example calling toString() or a side effect free getter?
I have a couple of questions regarding EJB transactions. I have a situation where a process has become longer running that originally intended and is sometimes failing due to server timeout's being exceeded. While I have increased the timeouts initially (both total transaction and max transaction), for a long running process, I know that it make more sense to segment this work as much as possible into smaller units of work that don't fail based on timeout. As a result, I'm looking for some thoughts or references regarding next course of action based on the background below and the questions that follow.
Environment:
EJB 3.1, JPA 2.0, WebSphere 8.5
Background:
I built a set of POJOs to do some batch oriented work for an enterprise application. They are non-EJB POJOs that were intended to implement several business processes (5 related, sequential processes, each depending on it's predecessor). The POJOs are in a plain Java project, not an EJB project.
However, these POJOs access an EJB facade for database access via JPA. The abstract core of the 5 business processes does the JNDI lookup for the EJB facade in order to return the domain objects for processing. Originally, the design was to run from the server completely, however, a need arose to initiate these processes externally. As a result, I created an EJB wrapper so that the processes could be called remotely (individually or as a single process based on a common strategy interface). Unfortunately, the size of the data, both row width and row count, has grown well beyond the original intent.
The processing time required to complete these batch processes has increased significantly (from around a couple of hours to around 1/2 a day and could increase beyond that). Only one of the 5 processes made sense to multi-thread (I did implement it multi-threaded). Since I have the wrapper EJB to initiate 1 or all, I have decided to create a new container transaction for each process as opposed to the single default transaction of "required" when I run all as a single process. Since the one process is multi-threaded, it would make sense to attempt to create a new transaction per thread, however, being a group of POJOs, I do not have transaction capability.
Question:
So my question is, what makes more sense and why? Re-engineer the POJOs to be EJBs themselves and have the wrapper EJB instantiate each process as a child process where each can have its own transaction and more importantly, the multi-threaded process can create a transaction per thread. Or does it make more sense to attempt to create a UserTransaction in the POJOs from a JNDI lookup in the container and try to manage it as if it were a bean managed transaction (if that's even a viable solution). I know this may be application dependent, but what is reasonable with regard to timeouts for a Java EE container? Obviously, I don't want run away processes, but want to make sure that I can complete these batch processes.
Unfortunatly, this application has already been deployed as a production system. Re-engineering, though it may be little more than assembling the strategy logic in EJBs, is a large change to the functionality.
I did look around for some other threads here and via general internet searches, but thought I would see if anyone had compelling arguments for one over the other or another solution entirely. Additional links that talk about a topic such as this are appreciated. I wrestled with whether to post this since some may construe this as subjective, however, I felt the narrowed topic was worth the post and potentially relevant to others attempting processes like this.
This is not direct answer to your question, but something you could consider.
WebSphere 8.5 especially for these kind of applications (batch) provides a batch container. The batch function accommodate applications that must perform batch work alongside transactional applications. Batch work might take hours or even days to finish and uses large amounts of memory or processing power while it runs. You can reuse your Java classes in batch applications, batch steps can be run in parallel in cluster and has transaction checkpoint management.
Take a look at following resources:
IBM Education Assistant - Batch applications
Getting started with the batch environment
Since I really didn't get a whole lot of response or thoughts for this question over the past couple of weeks, I figured I would answer this question to hopefully help others in making a decision if they run across this or a similar situation.
Ultimately, I re-engineered one of the POJOs into an EJB that acted as a wrapper to call the other POJOs. The wrapper EJB performed the same activity as when it was just a POJO, except for the fact that I added the transaction semantics (REQUIRES_NEW) on the primary method. The primary method calls the other POJOs based on a stategy pattern so each call (or POJO) gets its own transaction. Other methods in the EJB that call the primary method were defined with NOT_SUPPORTED so that I could separate the transactions for each call to the primary method and not join an existing transaction.
Full disclosure, the original addition of transaction semantics significantly increased the processing time (on the order of days), but the process did not fail due to exceeding transaction timeouts. It was the result of some unexpected problems with JPA Many-To-One relationships that were bringing back too much data. Data retreived as a result of a the Many-To-One relationship. As I mentioned originally, some of my data row width increased unexpectedly. That data increase was in the related table object, but the query did not need that data at the time. I corrected those issues by changing my queries (creating objects for SELECT NEW queries, changed relationships to FetchType.LAZY, etc).
Going forward, if I am able to dedicate enough time, I will transform the rest of those POJOs into EJBs. The POJO doing the most significant amount of work that is threaded has been implemented with a Callable implementation that is run via an ExecutorService. If I can transform that one, the plan will be to make each thread its own transaction. However, while I'm not sure yet, it appears that my container may already be creating transactions for each thread group (of 10 threads) due to status updates I'm seeing. I will have to do more investigation.
In my XPages web app (Xpages a Lotus Notes Technology based in JSF), I need to have a dynamic map to store the session IDs and the last accessed time (in millisecond). This is implemented as a TreeMap inside a application-scoped bean. Each initial access to the app registers the current session to the TreeMap in this bean. Only a limited number of session entries are permitted to this map, and excess sessions are not registered. The map is also cleared once in a while from old session entries so that new sessions can be registered. I need to know if this is an acceptable approach/use of a application bean. I know I can store the session entries temporarily in an external DB (non-lotus notes) but the company I'm working for doesn't allow me to do so. Will this approach lead me to potential problems? If yes, is there another way for me to do this?
This sounds like a perfectly valid use of an application bean, but I'd offer two suggestions. The first is to use a ConcurrentSkipListMap instead of a TreeMap. The former is thread-safe, while the latter is not. When interacting with the lower scopes, thread safety is typically not crucial, as each user can only write to their own session, view, and request scopes, but all users can write to the application scope, so it's conceivable that concurrent writes could occur, especially in applications with heavy user load. The second suggestion is to urge caution with how much information about each session is stored in an application bean. Since the bean will be accessible to all users, it is theoretically possible to inadvertently expose too much information about a user to other users. If you're only storing session name or ID in addition to the last access time, you'll be fine. But if you're actually storing a pointer to each user's session scope, you may accidentally provide a window into data a user has cached that other users shouldn't have access to. I've never actually seen someone get bitten by this, but it's always important to keep this in mind when storing any user-specific information in the application scope.
Indeed, this is a good use of the Application Scope. Still, the TreeMap collection isn't the best approach for your situation, there are some problems with this:
Concurrency problems when 2 requests want to modify the data in your container.
If your application must scale horizontally, you will have 2 TreeMaps in each managed bean.
A good approach will be using a cache system. There are good cache libraries that will meet these requirements, I've tested ehcache and it provides both concurrent management for handling data and libraries in case you have 2 or more nodes to deploy your application, also you can configure an algorithm to clear the cache based on LRU (less frequently used) or FIFO (first in, first out).
Using an external database to handle the sessions IDs could consume some time to get/set the data (it could be very low, but still it is a disk I/O operation). For that problem, you can use BigMemory as an external database that lives in RAM, or a NoSQL database like BigTable.
Note: I do not work for ehcache nor I'm associated in a commercial way, I've tested it and it fulfills my needs.,Tthere are other cache system libraries like JBoss Cache and others you can evaluate and use.
This is more of a non-tech question.
We intend to use OrganizationServiceContext with Linq as opposed to calling OrganizationServiceProxy.
My question is: what should the lifetime of the context be? Should it instantiated once per method or can you keep it around for the life of the web application using a singleton approach?
What would the pros/cons be? Any advice?
Thanks in advance
You should never keep a datacontext around for the life of a web application. The application lifecycle is managed outside of your code.
There is also a world of pain around saving changes when other users are saving at the same time. Datacontexts should always be managed only for the life of the request and running save changes should never save bits and pieces from other people's request as they are processing.
If you want to reduce reads, then use caching.
If you want to manage concurrency use transactions with a unit of work.
Just to expand a little on Gats' answer, which is entirely correct, we create new context objects pretty much for each separate method we have. Even for Silverlight, where we know we're running for one user at a time, managing what is in the context at any time is just too painful just to avoid creating a new context object.
Without getting into all of the gory details, I am trying to design a service-based solution that will be consumed by several client applications. The solution allows admins to create and modify document templates which are used by regular users to perform data entry. It is my intent to make the application a learning tool for best practices, techniques, etc.
And, at the same time, I have to accomodate a schizophrenic environment because the 'powers that be' cannot ever stick to their decisions regarding technologies and tools. For example, I am using Linq-to-SQL today because they aren't ready to go to EF4 but there is also discussion about switching over to NHibernate. So, I have to make the code as persistent ignorant as possible to minimize the work required should we change OR/M tools.
At this point, I am also limited to using the partial class approach to extend the Linq-to-SQL classes so they implement interfaces defined in my business layer. I cannot go with POCOs because management insists that we leverage all built-in tooling, etc. so I must support the Linq-to-SQL designer.
That said, my service interface has a StartSession method that accepts a template identifier in its signature. The operation flows like this:
If a session already exists in the database for the current user and specified template, update the record to show the current activity. If not, create a new session object.
The session is associated with an instance of the template, call it the "form". So if the session is new, I need to retrieve the template information to create the new "form", associate it with the session then save the session to the database. On the other hand, if the session already existed, then I need to also load the "form" with the data entered by the user and stored in the session previously.
Finally, the session (with form definition and data) is returned to the caller.
My first objective is to create clean separation between the logical layers of my application. The second is to maintain persistence ignorance (as mentioned above). Third, I have to be able to test everything so all dependencies must be externalized for easy mocking. I am using Unity as an IoC tool to help in this area.
To accomplish this, I have defined my service class and data contracts as needed to support the service interface. The service class will have a dependency injected from the business layer that actually performs the work. And here's where it has gotten messy for me.
I've been try to go the Unit of Work and Repository route to help with persistance ignorance. I have an ITemplateRepository and an ISessionRepository which I can access from my IUnitOfWork implementation. The service class gets an instance of my SessionManager class (in my BLL) injected. The SessionManager receives the IUnitOfWork implementation through constructor injection and will delegate all persistence to the UoW but I find myself playing a shell game with the various logic.
Should all of the logic described above be in the SessionManager class or perhaps the UoW implementation? I want as little logic as possible in the repository implementations because changing the data access platform could result in unwanted changes to the application logic. Since my repository is working against an interface, how do I best go about creating the new session (keeping in mind that a valid session has a reference to the template, er, form being used)? Would it be better to still use POCOs even though I have to support the designer and use a tool like AutoMapper inside the repository implementation to handle translating the objects?
Ugh!
I know I am just stuck in analysis paralysis so a little nudge is probably all I need. What would be ideal would be if someone could provide an example how you would you would solve the problem given the business rules and architectural constraints I've defined.
If you don't use POCOs then your not really going to be data store agnostic. And using POCOs will allow you to get your system up and running with memory based repositories which is what you'll likely want to use for your unit tests anyhow.
The AutoMapper sounds nice but I wouldn't consider it a deal breaker. Mapping POCOs to EF4, LinqToSql, nHibernate isn't that time consuming unless you have hundreds of tables. When/If your POCOs begin to diverge from your persistence layer then you might find that an AutoMapper wont really fit the bill.