How to keep a connection to external database in Cloud Dataflow - cassandra

I have an unbound dataflow pipeline that reads from Pub/Sub, applies a ParDo and writes to Cassandra. It applies only ParDo transformations so I am using the default global window with the default triggering even though the source is unbound.
In a pipeline like that how should I keep the connection to Cassandra?
Currently I am keeping it in startBundle like this:
private class CassandraWriter <T> extends DoFn<T, Void> {
private transient Cluster cluster;
private transient Session session;
private transient MappingManager mappingManager;
#Override
public void startBundle(Context c) {
this.cluster = Cluster.builder()
.addContactPoints(hosts)
.withPort(port)
.withoutMetrics()
.withoutJMXReporting()
.build();
this.session = cluster.connect(keyspace);
this.mappingManager = new MappingManager(session);
}
#Override
public void processElement(ProcessContext c) throws IOException {
T element = c.element();
Mapper<T> mapper = (Mapper<T>) mappingManager.mapper(element.getClass());
mapper.save(element);
}
#Override
public void finishBundle(Context c) throws IOException {
session.close();
cluster.close();
}
}
However, this way a new connection is created for every element.
Another option is to pass it as a side input like in https://github.com/benjumanji/cassandra-dataflow:
public PDone apply(PCollection<T> input) {
Pipeline p = input.getPipeline();
CassandraWriteOperation<T> op = new CassandraWriteOperation<T>(this);
Coder<CassandraWriteOperation<T>> coder =
(Coder<CassandraWriteOperation<T>>)SerializableCoder.of(op.getClass());
PCollection<CassandraWriteOperation<T>> opSingleton =
p.apply(Create.<CassandraWriteOperation<T>>of(op)).setCoder(coder);
final PCollectionView<CassandraWriteOperation<T>> opSingletonView =
opSingleton.apply(View.<CassandraWriteOperation<T>>asSingleton());
PCollection<Void> results = input.apply(ParDo.of(new DoFn<T, Void>() {
#Override
public void processElement(ProcessContext c) throws Exception {
// use the side input here
}
}).withSideInputs(opSingletonView));
PCollectionView<Iterable<Void>> voidView = results.apply(View.<Void>asIterable());
opSingleton.apply(ParDo.of(new DoFn<CassandraWriteOperation<T>, Void>() {
private static final long serialVersionUID = 0;
#Override
public void processElement(ProcessContext c) {
CassandraWriteOperation<T> op = c.element();
op.finalize();
}
}).withSideInputs(voidView));
return new PDone();
}
However this way I have to use windowing since PCollectionView<Iterable<Void>> voidView = results.apply(View.<Void>asIterable()); applies a group by.
In general, how should a PTransform that writes from an unbounded PCollection to an external database keep its connection to the database?

You are correctly observing that a typical bundle size in the streaming/unbounded case is smaller compared to the batch/bounded case. The actual bundle size depends on many parameters, and sometimes bundles may contain a single element.
One way of solving this problem would be to use a pool of connections per worker, stored in a static state of your DoFn. You should be able to initialize it during the first call to startBundle, and use it across bundles. Alternatively, you can create a connection on demand and release it to the pool for reuse when no longer necessary.
You should make sure the static static is thread-safe, and that you aren't making any assumptions how Dataflow manages bundles.

As Davor Bonaci suggested, using a static variable solved the problem.
public class CassandraWriter<T> extends DoFn<T, Void> {
private static final Logger log = LoggerFactory.getLogger(CassandraWriter.class);
// Prevent multiple threads from creating multiple cluster connection in parallel.
private static transient final Object lock = new Object();
private static transient Cluster cluster;
private static transient Session session;
private static transient MappingManager mappingManager;
private final String[] hosts;
private final int port;
private final String keyspace;
public CassandraWriter(String[] hosts, int port, String keyspace) {
this.hosts = hosts;
this.port = port;
this.keyspace = keyspace;
}
#Override
public void startBundle(Context c) {
synchronized (lock) {
if (cluster == null) {
cluster = Cluster.builder()
.addContactPoints(hosts)
.withPort(port)
.withoutMetrics()
.withoutJMXReporting()
.build();
session = cluster.connect(keyspace);
mappingManager = new MappingManager(session);
}
}
}
#Override
public void processElement(ProcessContext c) throws IOException {
T element = c.element();
Mapper<T> mapper = (Mapper<T>) mappingManager.mapper(element.getClass());
mapper.save(element);
}
}

Related

Is this code thread safe with spring PostConstruct

I have done some tests on these two classes. Could someone please help to determine if these two classes are threadsafe? Could someone help to identify if not using concurrentHashMap, but use HashMap would it cause any concurrent issue. How can I make it more threadsafe? What is the best approach to testing it with concurrent testing?
I tested it with Hashmap only and it works fine. However, my scale of test is around 20 req/s for 2 mins.
Can anyone suggest if I should increase the req rate and try again or can point somewhere that must require fix.
#Component
public class TestLonggersImpl
implements TestSLongger {
#Autowired
YamlReader yamlReader;
#Autowired
TestSCatalog gSCatalog;
#Autowired
ApplicationContext applicationContext;
private static HashMap<String, TestLonggerImpl> gImplHashMap = new HashMap<>();
private static final Longger LONGER = LonggerFactory.getLongger(AbstractSLongger.class);
#PostConstruct
public void init() {
final String[] sts = yamlReader.getTestStreamNames();
for (String st : sts) {
System.out.println(st);
LONGER.info(st);
}
HashMap<String, BSCatalog> statsCatalogHashMap = gSCatalog.getCatalogHashMap();
for (Map.Entry<String, BSCatalog> entry : statsCatalogHashMap.entrySet()) {
BSCatalog bCatalog = statsCatalogHashMap.get(entry.getKey());
//Issue on creating the basicCategory
SProperties sProperties = yamlReader.getTestMap().get(entry.getKey());
Category category = new BasicCategory(sProperties.getSDefinitions(),
bCatalog.getVersion(),
bCatalog.getDescription(), new HashSet<>());
final int version = statsCatalogHashMap.get(entry.getKey()).getVersion();
getTestImplHashMap().put(entry.getKey(),
applicationContext.getBean(TestLonggerImpl.class, category,
entry.getKey(),
version));
}
}
#Override
public void logMessage(String st, String message) {
if (getTestImplHashMap() != null && getTestImplHashMap().get(st) != null) {
getTestImplHashMap().get(st).log(message);
}
}
#VisibleForTesting
static HashMap<String, TestLonggerImpl> getTestImplHashMap() {
return gImplHashMap;
}
}
*** 2nd class
#Component
public class GStatsCatalog {
#Autowired
YamlReader yamlReader;
private static HashMap<String, BStatsCatalog> stCatalogHashMap = new HashMap<>();
#PostConstruct
public void init() {
String[] streams = yamlReader.getGSNames();
for (String stream : streams) {
BStatsCatalog bCatalog = new BStatsCatalog();
SProperties streamProperties = yamlReader.getGMap().get(stream);
bCatalog.setSName(stream);
int version = VERSION;
try {
version = Integer.parseInt(streamProperties.getVersion());
} catch (Exception e) {
System.out.println(e.getMessage());
}
bCatalog.setVersion(version);
bCatalog.setDescription(streamProperties.getDescription());
stCatalogHashMap.put(stream, bCatalog);
}
}
public static HashMap<String, BStatsCatalog> getCatalogHashMap() {
return stCatalogHashMap;
}
public void setYamlReader(YamlReader yamlReader) {
this.yamlReader = yamlReader;
}
}
I think the methods under #postconstruct are threadsafe. It only runs once after the bean created in the whole lifecircle of the bean.

in BeforeClass method how to insert init data to db

I have a UserServiceTest, I want to init load some user data to database for tests e.g. queryByName, groupByAge and so on . And I want to only load once in this class, so I want to use BeforeClass
#BeforeClass
public static void init(){
// ...
}
but in this case I find I cannot use jdbcTemplate to insert data just like in Before e.g.
#Before
public void setUp(){
// prepare test data first
jdbcTemplate.execute("insert into user(firstname,lastname,birthday) values(...);");
}
So in BeforeClass method how to insert init data?
Alternatively you can use TestExecutionListeners in order to initialise your database before test class.
#RunWith(SpringJUnit4ClassRunner.class)
#ContextConfiguration(classes = TestConfig.class)
#TestExecutionListeners(mergeMode = TestExecutionListeners.MergeMode.MERGE_WITH_DEFAULTS, listeners = {
DbInitializerTestListener.class })
public class DbTest {
}
public class DbInitializerTestListener extends AbstractTestExecutionListener {
#Autowired
private JdbcTemplate jdbcTemplate;
#Override
public void beforeTestClass(TestContext testContext) throws Exception {
testContext.getApplicationContext()
.getAutowireCapableBeanFactory()
.autowireBean(this);
jdbcTemplate.execute("insert into user(firstname,lastname,birthday) values(...);");
}
}
Or you may consider DbUnit framework.

How to expire Hazelcast session

I'm using spring-session libs to persist the session on Hazelcast like :
1.
#WebListener
public class HazelcastInitializer implements ServletContextListener {
private HazelcastInstance instance;
#Override
public void contextInitialized(ServletContextEvent sce) {
String sessionMapName = "spring:session:sessions";
ServletContext sc = sce.getServletContext();
ClientConfig clientConfig = new ClientConfig();
clientConfig.getGroupConfig().setName("nameValue").setPassword("passValue");
clientConfig.getNetworkConfig().addAddress("ipValue");
clientConfig.getNetworkConfig().setSmartRouting(true);
Collection<SerializerConfig> scfg = new ArrayList<SerializerConfig>();
SerializerConfig serializer = new SerializerConfig()
.setTypeClass(Object.class)
.setImplementation(new ObjectStreamSerializer());
scfg.add(serializer);
clientConfig.getSerializationConfig().setSerializerConfigs(scfg);
instance = HazelcastClient.newHazelcastClient(clientConfig);
Map<String, ExpiringSession> sessions = instance.getMap(sessionMapName);
SessionRepository<ExpiringSession> sessionRepository
= new MapSessionRepository(sessions);
SessionRepositoryFilter<ExpiringSession> filter
= new SessionRepositoryFilter<ExpiringSession>(sessionRepository);
Dynamic fr = sc.addFilter("springSessionFilter", filter);
fr.addMappingForUrlPatterns(EnumSet.of(DispatcherType.REQUEST), true, "/*");
}
#Override
public void contextDestroyed(ServletContextEvent sce) {
if (instance != null) {
instance.shutdown();
}
}
}
How can i expire the session on Hazelcast ( on Hazelcast Management the number of sessions entries allways incrementing ) ?
You can add ttl to map config. So inactive sessions are evicted after some timeout. You can see an example here:
https://github.com/spring-projects/spring-session/blob/1.0.0.RELEASE/samples/hazelcast/src/main/java/sample/Initializer.java#L59
Also i guess, this sample application is what you want.

JavaFX access ui-elements from Controller(Singleton)

I have a javafx design in the file javafx.fxml where the root element has the following attribute
fx:controller="de.roth.jsona.javafx.ViewManagerFX"
This controller class has a singleton machanism and is binded with some ui-elements.
public class ViewManagerFX {
private static ViewManagerFX instance = new ViewManagerFX();
#FXML
private Slider volumeSlider;
#FXML
private Label volumeLabel;
public IntegerProperty volumeValue = new SimpleIntegerProperty();
#FXML
private TabPane musicTabs;
public List<StringProperty> tabNames = new ArrayList<StringProperty>();
public static ViewManagerFX getInstance() {
return (instance);
}
public void initialize() {
// Volume
volumeSlider.valueProperty().bindBidirectional(volumeValue);
volumeLabel.textProperty().bindBidirectional(volumeValue, new Format() {
#Override
public StringBuffer format(Object obj, StringBuffer toAppendTo,
FieldPosition pos) {
toAppendTo.append(obj);
toAppendTo.append("%");
return toAppendTo;
}
#Override
public Object parseObject(String source, ParsePosition pos) {
return null; // no need to be implemented
}
});
volumeValue.set(Config.getInstance().VOLUME);
}
public void addMusicFolderTab(final String t, final ArrayList<MusicListItem> items) {
Platform.runLater(new Runnable() {
#Override
public void run() {
Tab m = new Tab("Test Tab");
musicTabs.getTabs().add(0, m);
}
});
}
}
The method addMusicFolderTab is called from a thread that is used to scan files and directories.
In the initialize method I can access the ui-elements but in the method addMusicFolderTab, that is called from the filescanner-thread, the variable musicTabs is null. Here is the exception:
java.lang.NullPointerException
at de.roth.jsona.javafx.ViewManagerFX$3.run(ViewManagerFX.java:110)
I have no clue, why I can't access the TabPane from outside the initialize method.
Aside from the many questionable patterns used here, the problem is that your ViewManagerFX singleton (besides not being a singleton) never has its instance set.
When using FXML, the Controller is created and loaded dynamically by Reflection from the FXMLoader.
What happens is that by calling ViewManagerFX.getInstance(), you access the a different controller than the one created by the FXMLoader. The instance you access is the one created here:
private static ViewManagerFX instance = new ViewManagerFX();
The quickest way to solve the issue is to set the instance in the initialize() since it's called by the FXMLoader on the instance created by the FXMLoader.
public void initialize() {
instance = this;
// Volume
...
}

How to intercept methods of EntityManager with Seam 3?

I'm trying to intercept the method persist and update of javax.persistence.EntityManager in a Seam 3 project.
In a previous version (Seam 2) of the micro-framework I'm trying to make, I did this using an implementation of org.hibernate.Interceptor and declaring it in the persistence.xml.
But I want something more "CDI-like" now we are in a JEE6 environment.
I want that just before entering in a EntityManager.persist call, an event #BeforeTrackablePersist is thrown. The same way, I want an event #BeforeTrackableUpdate to be thrown before entering in a EntityManager.merge call. Trackable is an interface which some of my Entitys could implement in order to be intercepted before persist or merge.
I'm using Seam 3 (3.1.0.Beta3) Extended Persistence Manager :
public class EntityManagerHandler {
#SuppressWarnings("unused")
#ExtensionManaged
#Produces
#PersistenceUnit
private EntityManagerFactory entityManagerFactory;
}
So I've made a javax.enterprise.inject.spi.Extension, and tryied many ways to do that :
public class TrackableExtension implements Extension {
#Inject #BeforeTrackablePersisted
private Event<Trackable> beforeTrackablePersistedEvent;
#Inject #BeforeTrackableMerged
private Event<Trackable> beforeTrackableMergedEvent;
#SuppressWarnings("unchecked")
public void processEntityManagerTarget(#Observes final ProcessInjectionTarget<EntityManager> event) {
final InjectionTarget<EntityManager> injectionTarget = event.getInjectionTarget();
final InjectionTarget<EntityManager> injectionTargetProxy = (InjectionTarget<EntityManager>) Proxy.newProxyInstance(event.getClass().getClassLoader(), new Class[] {InjectionTarget.class}, new InvocationHandler() {
#Override
public Object invoke(final Object proxy, final Method method, final Object[] args) throws Throwable {
if ("produce".equals(method.getName())) {
final CreationalContext<EntityManager> ctx = (CreationalContext<EntityManager>) args[0];
final EntityManager entityManager = decorateEntityManager(injectionTarget, ctx);
return entityManager;
} else {
return method.invoke(injectionTarget, args);
}
}
});
event.setInjectionTarget(injectionTargetProxy);
}
public void processEntityManagerType(#Observes final ProcessAnnotatedType<EntityManager> event) {
final AnnotatedType<EntityManager> type = event.getAnnotatedType();
final AnnotatedTypeBuilder<EntityManager> builder = new AnnotatedTypeBuilder<EntityManager>().readFromType(type);
for (final AnnotatedMethod<? super EntityManager> method : type.getMethods()) {
final String name = method.getJavaMember().getName();
if (StringUtils.equals(name, "persist") || StringUtils.equals(name, "merge")) {
builder.addToMethod(method, TrackableInterceptorBindingLiteral.INSTANCE);
}
}
event.setAnnotatedType(builder.create());
}
public void processEntityManagerBean(#Observes final ProcessBean<EntityManager> event) {
final AnnotatedType<EntityManager> annotatedType = (AnnotatedType<EntityManager>)event.getAnnotated();
// not even called
}
public void processEntityManager(#Observes final ProcessProducer<?, EntityManager> processProducer) {
processProducer.setProducer(decorate(processProducer.getProducer()));
}
private Producer<EntityManager> decorate(final Producer<EntityManager> producer) {
return new Producer<EntityManager>() {
#Override
public EntityManager produce(final CreationalContext<EntityManager> ctx) {
return decorateEntityManager(producer, ctx);
}
#Override
public Set<InjectionPoint> getInjectionPoints() {
return producer.getInjectionPoints();
}
#Override
public void dispose(final EntityManager instance) {
producer.dispose(instance);
}
};
}
private EntityManager decorateEntityManager(final Producer<EntityManager> producer, final CreationalContext<EntityManager> ctx) {
final EntityManager entityManager = producer.produce(ctx);
return (EntityManager) Proxy.newProxyInstance(entityManager.getClass().getClassLoader(), new Class[] {EntityManager.class}, new InvocationHandler() {
#Override
public Object invoke(final Object proxy, final Method method, final Object[] args) throws Throwable {
final String methodName = method.getName();
if (StringUtils.equals(methodName, "persist")) {
fireEventIfTrackable(beforeTrackablePersistedEvent, args[0]);
} else if (StringUtils.equals(methodName, "merge")) {
fireEventIfTrackable(beforeTrackableMergedEvent, args[0]);
}
return method.invoke(entityManager, args);
}
private void fireEventIfTrackable(final Event<Trackable> event, final Object entity) {
if (entity instanceof Trackable) {
event.fire(Reflections.<Trackable>cast(entity));
}
}
});
}
}
In all those observer methods, only the second one (processEntityManagerType(#Observes ProcessAnnotatedType<EntityManager>)) is called ! And even with that binding addition to methods persist and merge, my Interceptor is never called (I've of course enabled it with the correct lines in beans.xml, and enabled my extension with the services/javax.enterprise.inject.spi.Extension file).
Something I've thought simple with CDI seems to be actually really hard at last... or perhaps Seam 3 does something which prevent this code from executing correctly...
Does someone know how to handle that ?
I think you're making this a little harder than what it needs to be. Firstly though, JPA and CDI integration isn't very good in Java EE 6, we're very much hoping that changes in Java EE 7 and JPA 2.1.
What you'll want to do is create your own producer for the EntityManager that will delegate to an actual instance of an EntityManager, but also fire your own events when you call the methods you're interested in. Take a look at the Seam Persistence source to see one way this can be done.
As finally my little patch for Seam Persistence was applied in SEAMPERSIST-75, it will be possible in theory to do that by extending org.jboss.seam.persistence.HibernatePersistenceProvider and override the method proxyEntityManager(EntityManager).

Resources