Is it possible to inject dependencies into Hazelcast Jet pipeline stages? - hazelcast-jet

For example, given a simple pipeline such as:
Pipeline p = Pipeline.create();
p.readFrom(TestSources.items("the", "quick", "brown", "fox"))
.map(mapFn)
.writeTo(Sinks.logger());
I'd like mapFn to be something requiring a non-serialisable dependency to do its work.
I know I can do this:
Pipeline p = Pipeline.create();
p.readFrom(TestSources.items("the", "quick", "brown", "fox"))
.mapUsingService(JetSpringServiceFactories.bean("myDependencies"),
MyDependencies::addDependencies);
.map(mapFn)
.writeTo(Sinks.logger());
This wraps the strings being read from the source in another object that includes the dependencies, giving mapFn access to those dependencies without them needing to be injected into that object itself. That will work, but I want to use my mapping function outside Jet as well as part of a pipeline, and in that case it's a bit weird to have to pass dependencies along with the items being mapped rather than just initialising the mapper with the dependencies it needs. It also forces me to pointlessly create a wrapper object per item in my stream/batch.
The docs say another alternative is to use the #SpringAware annotation on Processor, but I think that means using the Core API, which the docs says "mostly offers you a lot of ways to make mistakes", so I'd prefer to avoid that.
In vanilla Hazelcast IMDG, anything that is deserialised can use a ManagedContext to initialise itself, and this is obviously the case with Jet too, but the functions, filters etc. of the pipeline are wrapped in lots of layers of Jet pipeline stuff and so there seems to be no way to get to them.
Am I missing something or have I listed all the options I have (other than resorting to some "global" static dependencies object)?

The way you described is actually quite close to how it should be done. You can simplify it by just doing:
Pipeline p = Pipeline.create();
p.readFrom(TestSources.items("the", "quick", "brown", "fox"))
.mapUsingService(bean("myDependencies"), (dep, item) -> mapFn.apply(dep, item));
.writeTo(Sinks.logger());
This avoids creating an intermediate item. As you said already, this requires that the mapping function takes the dependency as a parameter as well.
If you want to avoid that, another option is writing a custom ServiceFactory which will do the mapping and also take the dependency. This way, you can rewrite your mapping function as a service and have the dependency injected at the constructor level.
Regarding having a static container I think that would be possible to implement but would require some changes in core. A similar thing was done for Metrics class which also works in a static context. There's also this related issue here:
https://github.com/hazelcast/hazelcast-jet/issues/954
If you are interested in contributing I can give you some pointers.

If your mapFn is inside a bean then you can just use it as your service:
Pipeline p = Pipeline.create();
p.readFrom(TestSources.items("the", "quick", "brown", "fox"))
.mapUsingService(JetSpringServiceFactories.bean(MyService.class), MyService::mapFn);
.writeTo(Sinks.logger());
Full self contained example, where the pipeline calls mapFn on MyService, which uses its dependency for the mapping:
#SpringBootApplication
public class TutorialApplication implements InitializingBean {
#Autowired
JetInstance jetInstance;
public static void main(String[] args) {
SpringApplication.run(TutorialApplication.class, args);
}
#Override
public void afterPropertiesSet() {
Pipeline p = Pipeline.create();
p.readFrom(TestSources.items("the", "quick", "brown", "fox"))
.mapUsingService(JetSpringServiceFactories.bean(MyService.class), MyService::mapFn)
.writeTo(Sinks.logger());
jetInstance.newJob(p);
}
#Component
public static class MyService {
#Autowired
MyDependency foo;
public String mapFn(String s) {
return foo.bar(s);
}
}
#Component
public static class MyDependency {
public String bar(String s) {
return "mod: " + s;
}
}
}

Related

What are the differences between `cucumber-glue` scope and step member variables?

AFAICT, there's not much difference between using cucumber-glue scope and instantiating member variables in step classes other than where the instantiation code resides.
For example, using cucumber-glue scope:
#Configuration
public class MyConfiguration {
#Bean
#Scope("cucumber-glue")
public MyContext myContext() {
return new MyContext();
}
}
#SpringBootTest
public class MySteps {
#Autowired
private MyContext myContext;
}
versus member variables:
#SpringBootTest
public class MySteps {
private final MyContext myContext = new MyContext();
}
Are there other differences I'm missing?
When you have more then one one definition file you'll want to share some information between them. You can do this by writing to the same component.
However all regular components are singleton scoped and survive between Scenarios. This makes them unsuitable for sharing test state. It might interfere with the next scenario. Fortunately cucumber-glue scoped components (including step definitions) are recycled between scenarios. This ensures your glue code will will always be fresh.

How to make a local extension method avaiable in a function with receiver?

I found an interesting thing, but I couldn't do it. Is there any way to make the local extension method available in a function with receiver.
val list = ArrayList<Any>();
fun <T> Array<T>.bind(context: MutableList<in T>, block: Array<T>.() -> Unit) {
fun Array<T>.save() {
context.addAll(this);
}
block();
}
arrayOf(1, 2, 3).bind(list) {
save(); //todo: how to bind extension in execution scope
};
I know there is an alternative way by introducing another type for the receiver, but I want to avoid it. for example:
interface Savable {
fun save();
}
fun <T> Array<T>.bind(context: MutableList<in T>, block: Savable.() -> Unit) {
val proxy = object : Savable {
override fun save() {
context += this#bind;
}
};
proxy.block();
}
There is no such feature yet, and I think in near future it won't be added either. You should just use your second version. Don't care about adding an wrapper class. The idea of avoiding introducing a wrapper class is actually, as long as you are using JVM backend, just nonsense, because by using local function you are actually adding a local class.
This is the equivalent Java code of your kotlin function, after fixing as you have suggested, with the assumption that your bind function lives in file bind.kt:
public final class BindKt {
public static <T> void bind(T[] receiver, List<? super T> context, Function1<T> block) {
class Local { // the name of local class is unimportant, as it's generated by compiler. It should looks like "package.name.BindKt$bind$X" where X is a number.
public void save(T[] receiver) {
context.addAll(receiver);
}
}
block.invoke(this); // this won't compile. Neither will yours.
}
}
As you can see save is NOT compiled to a static method, which means, if your block somehow ever called that save, an instance of Local must be fist created. So, no matter what you do, as long as you used a local function, there is basically no point in avoiding introduing a wrapper class. Your second solution is good, and just use it. It's both elegant and efficient enough.
If you really don't want add a class/object creation, move these extension functions to a package scope, and let clients import them.

GXT Grid ValueProvider / PropertyAccess for a Map<K,V> Datastore?

Rather than using Bean model objects, my data model is built on Key-Value pairs in a HashMap container.
Does anyone have an example of the GXT's Grid ValueProvider and PropertyAccess that will work with a underlying Map?
It doesn't have one built in, but it is easy to build your own. Check out this blog post for a similar way of thinking, especially the ValueProvider section: http://www.sencha.com/blog/building-gxt-charts
The purpose of a ValueProvider is to be a simple reflection-like mechanism to read and write values in some object. The purpose of PropertyAccess<T> then is to autogenerate some of these value/modelkey/label provider instances based on getters and setters as are found on Java Beans, a very common use case. It doesn't have much more complexity than that, it is just a way to simply ask the compiler to do some very easy boilerplate code for you.
As that blog post shows, you can very easily build a ValueProvider just by implementing the interface. Here's a quick example of how you could make one that reads a Map<String, Object>. When you create each instance, you tell it which key are you working off of, and the type of data it should find when it reads out that value:
public class MapValueProvider<T> implements
ValueProvider<Map<String, Object>, T> {
private final String key;
public MapValueProvider(String key) {
this.key = key;
}
public T getValue(Map<String, Object> object) {
return (T) object.get(key);
}
public void setValue(Map<String, Object> object, T value) {
object.put(key, value);
}
public String getPath() {
return key;
}
}
You then build one of these for each key you want to read out, and can pass it along to ColumnConfig instances or whatever else might be expecting them.
The main point though is that ValueProvider is just an interface, and can be implemented any way you like.

Handling External Dependencies in a Factory

I have a factory class which i feel needs to be re factored, take the following example:
public class FileFactory
{
public static FileType Create(string fileName)
{
if(IsImageFile(fileName))
{
return new ImageFileType();
}
else if(IsDocumentFile(fileName))
{
return new DocumentFileType();
}
...
}
private static bool IsImageFile(string fileName)
{
string imageFileTypes[] = string[] {".jpg", ".gif", ".png"}; //How to avoid this line of code?
return imageFileTypes.Contains(fileName);
}
}
I'm loosely following Domain Driven Design principals and so this FileFactory class is a domain object. Should the factory class access the repository / DB to get the file types?
How should i handle the dependency in this scenario?
A new, unknown image file type is really unlikely. It's fine to hard code this.
If you're really, really intent on keeping that file list as an external dependency, pass the list of file types into the FileFactory constructor and make Create() an instance method instead of static. That'll keep you testable and SOLID.

Are we all looking for the same IRepository?

I've been trying to come up with a way to write generic repositories that work against various data stores:
public interface IRepository
{
IQueryable<T> GetAll<T>();
void Save<T>(T item);
void Delete<T>(T item);
}
public class MemoryRepository : IRepository {...}
public class SqlRepository : IRepository {...}
I'd like to work against the same POCO domain classes in each. I'm also considering a similar approach, where each domain class has it's own repository:
public interface IRepository<T>
{
IQueryable<T> GetAll();
void Save(T item);
void Delete(T item);
}
public class MemoryCustomerRepository : IRepository {...}
public class SqlCustomerRepository : IRepository {...}
My questions: 1)Is the first approach even feasible? 2)Is there any advantage to the second approach.
The first approach is feasible, I have done something similar in the past when I wrote my own mapping framework that targeted RDBMS and XmlWriter/XmlReader. You can use this sort of approach to ease unit testing, though I think now we have superior OSS tools for doing just that.
The second approach is what I currently use now with IBATIS.NET mappers. Every mapper has an interface and every mapper [could] provide your basic CRUD operations. The advantage is each mapper for a domain class also has specific functions (such as SelectByLastName or DeleteFromParent) that are expressed by an interface and defined in the concrete mapper. Because of this there's no need for me to implement separate repositories as you're suggesting - our concrete mappers target the database. To perform unit tests I use StructureMap and Moq to create in-memory repositories that operate as your Memory*Repository does. Its less classes to implement and manage and less work overall for a very testable approach. For data shared across unit tests I use a builder pattern for each domain class which has WithXXX methods and AsSomeProfile methods (the AsSomeProfile just returns a builder instance with preconfigured test data).
Here's an example of what I usually end up with in my unit tests:
// Moq mocking the concrete PersonMapper through the IPersonMapper interface
var personMock = new Mock<IPersonMapper>(MockBehavior.Strict);
personMock.Expect(pm => pm.Select(It.IsAny<int>())).Returns(
new PersonBuilder().AsMike().Build()
);
// StructureMap's ObjectFactory
ObjectFactory.Inject(personMock.Object);
// now anywhere in my actual code where an IPersonMapper instance is requested from
// ObjectFactory, Moq will satisfy the requirement and return a Person instance
// set with the PersonBuilder's Mike profile unit test data
Actually there is a general consensus now that Domain repositories should not be generic. Your repository should express what you can do when persisting or retrieving your entities.
Some repositories are readonly, some are insert only (no update, no delete), some have only specific lookups...
Using a GetAll return IQueryable, your query logic will leak into your code, possibly to the application layer.
But it's still interesting to use the kind of interface you provide to encapsulate Linq Table<T> objects so that you can replace it with an in memory implementation for test purpose.
So I suggest, to call it ITable<T>, give it the same interface that the linq Table<T> object, and use it inside your specific domain repositories (not instead of).
You can then use you specific repositories in memory by using a in memory ITable<T> implementation.
The simplest way to implement ITable<T> in memory is to use a List<T> and get a IQueryable<T> interface using the .AsQueryable() extension method.
public class InMemoryTable<T> : ITable<T>
{
private List<T> list;
private IQueryable<T> queryable;
public InMemoryTable<T>(List<T> list)
{
this.list = list;
this.queryable = list.AsQueryable();
}
public void Add(T entity) { list.Add(entity); }
public void Remove(T entity) { list.Remove(entity); }
public IEnumerator<T> GetEnumerator() { return list.GetEnumerator(); }
public Type ElementType { get { return queryable.ElementType; } }
public IQueryProvider Provider { get { return queryable.Provider; } }
...
}
You can work in isolation of the database for testing, but with true specific repositories that give more domain insight.
This is a bit late... but take a look at the IRepository implementation at CommonLibrary.NET on codeplex. It's got a pretty good feature set.
Regarding your problem, I see a lot of people using methods like GetAllProducts(), GetAllEmployees()
in their repository implementation. This is redundant and doesn't allow your repository to be generic.
All you need is GetAll() or All(). The solution provided above does solve the naming problem though.
This is taken from CommonLibrary.NET documentation online:
0.9.4 Beta 2 has a powerful Repository implementation.
* Supports all CRUD methods ( Create, Retrieve, Update, Delete )
* Supports aggregate methods Min, Max, Sum, Avg, Count
* Supports Find methods using ICriteria<T>
* Supports Distinct, and GroupBy
* Supports interface IRepository<T> so you can use an In-Memory table for unit-testing
* Supports versioning of your entities
* Supports paging, eg. Get(page, pageSize)
* Supports audit fields ( CreateUser, CreatedDate, UpdateDate etc )
* Supports the use of Mapper<T> so you can map any table record to some entity
* Supports creating entities only if it isn't there already, by checking for field values.

Resources