How do you relate GearmanJob to GearmanWorker and GearmanClient? - gearman

public GearmanTask GearmanClient::addTask ( string $function_name , string $workload [, mixed &$context [, string $unique ]] )
public bool GearmanWorker::addFunction ( string $function_name , callback $function [, mixed &$context [, int $timeout ]] )
These are class methods that can be used to integrate the two.
With these, how do you relate the workload with the called function?

GearmanClient is used to submit a task. Normally this is done from a web page, or a script meant to read a list of tasks to be submitted.
GearmanWorker is meant to be set up in such a way that many parallel 'workers' can be run at the same time. Logically, whatever a worker does should represent a single atomic unit of work. That can mean doing a single transformation of an object into another and saving it back to the database, or assembling and sending an html email for a single user. $function_name is a function that takes a single argument, which is a GearmanJob object.
So, your controller script might look something like this.
$gearman_params = json_encode( array(
'id' => 77,
'options' => array('push' => true),
) );
$client = new GearmanClient();
$client->doBackground( "widgetize", $gearman_params );
Then, your worker will do something like this.
$gmworker = GearmanWorker;
$gmworker->addFunction( "widgetize", "widgetize" );
while( $gmworker->work() ) {
if ( $gmworker->returnCode() !== GEARMAN_SUCCESS ) {
echo "Worker Terminated." . PHP_EOL ;
break;
}
} // while *working*
function widgetize( $job ) {
$workload = json_decode( $job->workload() );
/* do stuff */
}
A few things to keep in mind:
The worker script is designed to be long running script, hence the while loop. Make sure to set your timeout and memory limits appropriately.
It's also good practice to give the worker a run limit. As in, only allow the worker to run a few times, and then exit; PHP still sucks with memory usage, and it can be hard to tell when a worker will be killed because it ran out of memory.
In the same vein it's also best to design workers to be idempotent, so jobs can be resubmitted to workers if they fail.
The Client can submit jobs to the worker with different priorities, and they don't neccessarily have to be run in the background, like the above example. They can be run so that the client blocks.

Related

Coordinating emission and subscription in Kotlin coroutines with hot flows

I am trying to design an observable task-like entity which would have the following properties:
Reports its current state changes reactively
Shares state and result events: new subscribers will also be notified if the change happens after they've subscribed
Has a lifecycle (backed by CoroutineScope)
Doesn't have suspend functions in the interface (because it has a lifecycle)
The very basic code is something like this:
class Worker {
enum class State { Running, Idle }
private val state = MutableStateFlow(State.Idle)
private val results = MutableSharedFlow<String>()
private val scope = CoroutineScope(Dispatchers.Default)
private suspend fun doWork(): String {
println("doing work")
return "Result of the work"
}
fun start() {
scope.launch {
state.value = State.Running
results.emit(doWork())
state.value = State.Idle
}
}
fun state(): Flow<State> = state
fun results(): Flow<String> = results
}
The problems with this arise when I want to "start the work after I'm subscribed". There's no clear way to do that. The simplest thing doesn't work (understandably):
fun main() {
runBlocking {
val worker = Worker()
// subscriber 1
launch {
worker.results().collect { println("received result $it") }
}
worker.start()
// subscriber 2 can also be created "later" and watch
// for state()/result() changes
}
}
This prints only "doing work" and never prints a result. I understand why this happens (because collect and start are in separate coroutines, not synchronized in any way).
Adding a delay(300) to coroutine inside doWork "fixes" things, results are printed, but I'd like this to work without artificial delays.
Another "solution" is to create a SharedFlow from results() and use its onSubscription to call start(), but that didn't work either last time I've tried.
My questions are:
Can this be turned into something that works or is this design initially flawed?
If it is flawed, can I take some other approach which would still hit all the goals I have specified in the beginning of the post?
Your problem is that your SharedFlow has no buffer set up, so it is emitting results to its (initially zero) current collectors and immediately forgetting them. The MutableSharedFlow() function has a replay parameter you can use to determine how many previous results it should store and replay to new collectors. You will need to decide what replay amount to use based on your use case for this class. For simply displaying latest results in a UI, a common choice is a replay of 1.
Depending on your use case, you may want to give your CoroutineScope a SupervisorJob() in its context so it isn't destroyed by any child job failing.
Side note, your state() and results() functions should be properties by Kotlin convention, since they do nothing but return references. Personally, I would also have them return read-only StateFlow/SharedFlow instead of just Flow to clarify that they are not cold.

sync.Map or channels when using a goroutines

I'm writing program that parses a lot of files looking for "interesting" lines. Then it's checking if these lines were seen before. Each file is parsed using separate goroutine.
I'm wondering which approach is better:
Use sync.Map or something similar
Use channels and separate goroutine which should be responsible only for uniqueness check (probably using standard map). It would receive request and respond with something simple like "Not unique" or "Unique (and added)"
Is any of these solutions more popular or maybe both are wrong?
If you would like to have workers which can access a global map for unique checking, you can use a sync.RWMutex to be sure that the map is protected, like:
var (
mutex sync.RWMutex = sync.RWMutex{}
alreadySeen map[string]struct{} = make(map[string]struct{})
)
func Work() {
for {
Processing lines here...
//Checking
mutex.RLock() //Lock for reading only
if _, found := alreadySeen[line]; !found {
mutex.RUnLock()
mutex.Lock()
alreadySeen[line] = struct{}{}
mutex.UnLock()
} else {
mutex.RUnLock()
}
}
}
Another approach is to use a concurrent safe map to skip the whole mutexing, for example this package: https://github.com/cornelk/hashmap

In Kotlin Native, how to keep an object around in a separate thread, and mutate its state from any other thead without using C pointers?

I'm exploring Kotlin Native and have a program with a bunch of Workers doing concurrent stuff
(running on Windows, but this is a general question).
Now, I wanted to add simple logging. A component that simply logs strings by appending them as new lines to a file that is kept open in 'append' mode.
(Ideally, I'd just have a "global" function...
fun log(text:String) {...} ]
...that I would be able to call from anywhere, including from "inside" other workers and that would just work. The implication here is that it's not trivial to do this because of Kotlin Native's rules regarding passing objects between threads (TLDR: you shouldn't pass mutable objects around. See: https://github.com/JetBrains/kotlin-native/blob/master/CONCURRENCY.md#object-transfer-and-freezing ).
Also, my log function would ideally accept any frozen object. )
What I've come up with are solutions using DetachedObjectGraph:
First, I create a detached logger object
val loggerGraph = DetachedObjectGraph { FileLogger("/foo/mylogfile.txt")}
and then use loggerGraph.asCPointer() ( asCPointer() ) to get a COpaquePointer to the detached graph:
val myPointer = loggerGraph.asCPointer()
Now I can pass this pointer into the workers ( via the producer lambda of the Worker's execute function ), and use it there. Or I can store the pointer in a #ThreadLocal global var.
For the code that writes to the file, whenever I want to log a line, I have to create a DetachedObjectGraph object from the pointer again,
and attach() it in order to get a reference to my fileLogger object:
val fileLogger = DetachedObjectGraph(myPointer).attach()
Now I can call a log function on the logger:
fileLogger.log("My log message")
This is what I've come up with looking at the APIs that are available (as of Kotlin 1.3.61) for concurrency in Kotlin Native,
but I'm left wondering what a better approach would be ( using Kotlin, not resorting to C ). Clearly it's bad to create a DetachedObjectGraph object for every line written.
One could pose this question in a more general way: How to keep a mutable resource open in a separate thread ( or worker ), and send messages to it.
Side comment: Having Coroutines that truly use threads would solve this problem, but the question is about how to solve this task with the APIs currently ( Kotlin 1.3.61 ) available.
You definitely shouldn't use DetachedObjectGraph in the way presented in the question. There's nothing to prevent you from trying to attach on multiple threads, or if you pass the same pointer, trying to attach to an invalid one after another thread as attached to it.
As Dominic mentioned, you can keep the DetachedObjectGraph in an AtomicReference. However, if you're going to keep DetachedObjectGraph in an AtomicReference, make sure the type is AtomicRef<DetachedObjectGraph?> and busy-loop while the DetachedObjectGraph is null. That will prevent the same DetachedObjectGraph from being used by multiple threads. Make sure to set it to null, and repopulate it, in an atomic way.
However, does FileLogger need to be mutable at all? If you're writing to a file, it doesn't seem so. Even if so, I'd isolate the mutable object to a separate worker and send log messages to it rather than doing a DetachedObjectGraph inside an AtomicRef.
In my experience, DetachedObjectGraph is super uncommon in production code. We don't use it anywhere at the moment.
To isolate mutable state to a Worker, something like this:
class MutableThing<T:Any>(private val worker:Worker = Worker.start(), producer:()->T){
private val arStable = AtomicReference<StableRef<T>?>(null)
init {
worker.execute(TransferMode.SAFE, {Pair(arStable, producer).freeze()}){
it.first.value = StableRef.create(it.second()).freeze()
}
}
fun <R> access(block:(T)->R):R{
return worker.execute(TransferMode.SAFE, {Pair(arStable, block).freeze()}){
it.second(it.first.value!!.get())
}.result
}
}
object Log{
private val fileLogger = MutableThing { FileLogger() }
fun log(s:String){
fileLogger.access { fl -> fl.log(s) }
}
}
class FileLogger{
fun log(s:String){}
}
The MutableThing uses StableRef internally. producer makes the mutable state you want to isolate. To log something, call Log.log, which will wind up calling the mutable FileLogger.
To see a basic example of MutableThing, run the following test:
#Test
fun goIso(){
val mt = MutableThing { mutableListOf("a", "b")}
val workers = Array(4){Worker.start()}
val futures = mutableListOf<Future<*>>()
repeat(1000) { rcount ->
val future = workers[rcount % workers.size].execute(
TransferMode.SAFE,
{ Pair(mt, rcount).freeze() }
) { pair ->
pair.first.access {
val element = "ttt ${pair.second}"
println(element)
it.add(element)
}
}
futures.add(future)
}
futures.forEach { it.result }
workers.forEach { it.requestTermination() }
mt.access {
println("size: ${it.size}")
}
}
The approach you've taken is pretty much correct and the way it's supposed to be done.
The thing I would add is, instead of passing around a pointer around. You should pass around a frozen FileLogger, which will internally hold a reference to a AtomicRef<DetachedObjectGraph>, the the attaching and detaching should be done internally. Especially since DetachedObjectGraphs are invalid once attached.

Using Squeryl with Akka Actors

So I'm trying to work with both Squeryl and Akka Actors. I've done a lot of searching and all I've been able to find is the following Google Group post:
https://groups.google.com/forum/#!topic/squeryl/M0iftMlYfpQ
I think I might have shot myself in the foot as I originally created this factory pattern so I could toss around Database objects.
object DatabaseType extends Enumeration {
type DatabaseType = Value
val Postgres = Value(1,"Postgres")
val H2 = Value(2,"H2")
}
object Database {
def getInstance(dbType : DatabaseType, jdbcUrl : String, username : String, password : String) : Database = {
Class.forName(jdbcDriver(dbType))
new Database(Session.create(
_root_.java.sql.DriverManager.getConnection(jdbcUrl,username,password),
squerylAdapter(dbType)))
}
private def jdbcDriver(db : DatabaseType) = {
db match {
case DatabaseType.Postgres => "org.postgresql.Driver"
case DatabaseType.H2 => "org.h2.Driver"
}
}
private def squerylAdapter(db : DatabaseType) = {
db match {
case DatabaseType.Postgres => new PostgreSqlAdapter
case DatabaseType.H2 => new H2Adapter
}
}
}
Originally in my implementation, I tried surrounding all my statements in using(session), but I'd keep getting the dreaded "No session is bound to the current thread" error, so I added the session.bindToCuirrentThread to the constructor.
class Database(session: Session) {
session.bindToCurrent
def failedBatch(filename : String, message : String, start : Timestamp = now, end : Timestamp = now) =
batch.insert(new Batch(0,filename,Some(start),Some(end),ProvisioningStatus.Fail,Some(message)))
def startBatch(batch_id : Long, start : Timestamp = now) =
batch update (b => where (b.id === batch_id) set (b.start := Some(start)))
...more functions
This worked reasonably well, until I got to Scala Actors.
class TransferActor() extends Actor {
def databaseInstance() = {
val dbConfig = config.getConfig("provisioning.database")
Database.getInstance(DatabaseType.Postgres,
dbConfig.getString("jdbcUrl"),
dbConfig.getString("username"),
dbConfig.getString("password"))
}
lazy val config = ConfigManager.current
override def receive: Actor.Receive = { /* .. do some work */
I constantly get the following:
[ERROR] [03/11/2014 17:02:57.720] [provisioning-system-akka.actor.default-dispatcher-4] [akka://provisioning-system/user/$c] No session is bound to current thread, a session must be created via Session.create
and bound to the thread via 'work' or 'bindToCurrentThread'
Usually this error occurs when a statement is executed outside of a transaction/inTrasaction block
java.lang.RuntimeException: No session is bound to current thread, a session must be created via Session.create
and bound to the thread via 'work' or 'bindToCurrentThread'
I'm getting a fresh Database object each time, not caching it with a lazy val, so shouldn't that constructor always get called and attach to my current thread? Does Akka attach different threads to different actors and swap them around? Should I just add a function to call session.bindToCurrentThread each time I'm in an actor? Seems kinda hacky.
Does Akka attach different threads to different actors and swap them around?
That's exactly how the actor model works. The idea is that you can have a small thread pool servicing a very large number of threads because actors only need to use a thread when they have a message waiting to be processed.
Some general tips for Squeryl.... A session is a one to one association with a JDBC connection. The main advantage of keeping Sessions open is that you can have a transaction open that gives you a consistent view of the database as you perform multiple operations. If you don't need that, make your session/transaction code granular to avoid these types of issues. If you do need it, don't rely on Sessions being available in a thread local context. Use the transaction(session){} or transaction(sessionFactory){} methods to explicitly tell Squeryl where you want your Session to come from.

what is the difference between std::call_once and function-level static initialization

1) std::call_once
A a;
std::once_flag once;
void f ( ) {
call_once ( once, [ ] { a = A {....}; } );
}
2) function-level static
A a;
void f ( ) {
static bool b = ( [ ] { a = A {....}; } ( ), true );
}
For your example usage, hmjd's answer fully explains that there is no difference (except for the additional global once_flag object needed in the call_once case.) However, the call_once case is more flexible, since the once_flag object isn't tied to a single scope. As an example, it could be a class member and be used by more than one function:
class X {
std::once_flag once;
void doSomething() {
std::call_once(once, []{ /* init ...*/ });
// ...
}
void doSomethingElse() {
std::call_once(once, []{ /*alternative init ...*/ });
// ...
}
};
Now depending on which member function is called first the initialization code can be different (but the object will still only be initialized once.)
So for simple cases a local static works nicely (if supported by your compiler) but there are some less common uses that might be easier to implement with call_once.
Both code snippets have the same behaviour, even in the presence of exceptions thrown during initialization.
This conclusion is based on (my interpretation of) the following quotes from the c++11 standard (draft n3337):
1 Section 6.7 Declaration statement clause 4 states:
The zero-initialization (8.5) of all block-scope variables with static storage duration (3.7.1) or thread storage duration (3.7.2) is performed before any other initialization takes place. Constant initialization (3.6.2) of a block-scope entity with static storage duration, if applicable, is performed before its block is first entered. An implementation is permitted to perform early initialization of other block-scope variables with static or thread storage duration under the same conditions that an implementation is permitted to statically initialize a variable with static or thread storage duration in namespace scope (3.6.2). Otherwise such a variable is initialized the first time control passes through its declaration; such a variable is considered initialized upon the completion of its initialization. If the initialization exits by throwing an exception, the initialization is not complete, so it will be tried again the next time control enters the declaration. If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.88 If control re-enters the declaration recursively while the variable is being initialized, the behavior is undefined.
This means that in:
void f ( ) {
static bool b = ( [ ] { a = A {....}; } ( ), true );
}
b is guaranteed to be initialized once only, meaning the lambda is executed (successfully) once only, meaning a = A {...}; is executed (successfully) once only.
2 Section 30.4.4.2 Function call-once states:
An execution of call_once that does not call its func is a passive execution. An execution of call_once that calls its func is an active execution. An active execution shall call INVOKE (DECAY_COPY ( std::forward(func)), DECAY_COPY (std::forward(args))...). If such a call to func throws an exception the execution is exceptional, otherwise it is returning. An exceptional execution shall propagate the exception to the caller of call_once. Among all executions of call_once for any given once_flag: at most one shall be a returning execution; if there is a returning execution, it shall be the last active execution; and there are passive executions only if there is a returning execution.
This means that in:
void f ( ) {
call_once ( once, [ ] { a = A {....}; } );
the lambda argument to std::call_once is executed (successfully) once only, meaning a = A {...}; is executed (successfully) once only.
In both cases a = A{...}; is executed (successfully) once only.

Resources