Perforce streams: Task streams usage - perforce

I have a requirement to create a child stream which will pick only specific folders from mainline(parent) stream. While creating child stream, to achieve this I restrict the view by using share/isolate/import successfully able to create the child streams which only the code i am interested in.
But, I have gone through some tutorials on streams and found something on lightweight streams (task streams) which is used to create the streams partially from parent. In my scenario do i need to really use this lightweight streams? What is the main advantage & limitiations of using this light-weight streams over using normal approach as I mentioned above?

The purpose of task streams is not to create streams "partially" -- you have already done this with your share/import paths. Don't fix what isn't broken!
Task streams are built to be short-lived and easily archive-able once the associated task is complete (via the "unload" command). The limitations of task streams are described in the documentation here:
https://www.perforce.com/perforce/doc.current/manuals/p4v/Content/P4V/streams.task.html
namely that they can't be reparented and they may not have children. If you use task streams as short-lived single-task streams (as the name "task stream" implies, a task stream is for a single task), these limitations won't generally be a problem. If you try to use a task stream as a development mainline, you're going to have problems.
If your development process involves creating a new branch for a short-term task (e.g. an individual hotfix parented to a particular branch), and you have a lot of these tasks, task streams may be useful due to their easy cleanup and low overhead (when a task stream is unloaded it's removed from the db, which means you don't accumulate db cruft over time as you create and abandon them).
If this does not sound like your development process, forget you ever heard about task streams. Do not try to imagine ways that you can use task streams for things that aren't short-term tasks. Hammers are suitable for nails. Do not use them to try to drive screws, especially when you have a perfectly good screwdriver right there and are already using it successfully.
(Can you tell I've seen more than a few instances of people trying to use task streams for absolutely everything because they "sound cool"? Resist the urge!)

Related

Best way to implement background “timer” functionality in Python/Django

I am trying to implement a Django web application (on Python 3.8.5) which allows a user to create “activities” where they define an activity duration and then set the activity status to “In progress”.
The POST action to the View writes the new status, the duration and the start time (end time, based on start time and duration is also possible to add here of course).
The back-end should then keep track of the duration and automatically change the status to “Finished”.
User actions can also change the status to “Finished” before the calculated end time (i.e. the timer no longer needs to be tracked).
I am fairly new to Python so I need some advice on the smartest way to implement such a concept?
It needs to be efficient and scalable – I’m currently using a Heroku Free account so have limited system resources, but efficiency would also be important for future production implementations of course.
I have looked at the Python threading Timer, and this seems to work on a basic level, but I’ve not been able to determine what kind of constraints this places on the system – e.g. whether the spawned Timer thread might prevent the main thread from finishing and releasing resources (i.e. Heroku Dyno threads), etc.
I have read that persistence might be a problem (if the server goes down), and I haven’t found a way to cancel the timer from another process (the .cancel() method seems to rely on having the original object to cancel, and I’m not sure if this is achievable from another process).
I was also wondering about a more “background” approach, i.e. a single process which is constantly checking the database looking for activity records which have reached their end time and swapping the status.
But what would be the best way of implementing such a server?
Is it practical to read the database every second to find records with an end time of “now”? I need the status to change in real-time when the end time is reached.
Is something like Celery a good option, or is it overkill for a single process like this?
As I said I’m fairly new to these technologies, so I may be missing other obvious solutions – please feel free to enlighten me!
Thanks in advance.
To achieve this you need some kind of scheduling tasks functionality. For a fast simpler implementation is a good solution to use the Timer object from the
Threading module.
A more complete solution is tu use Celery. If you are new, deeping in it will give you a good value start using celery as a queue manager distributing your work easily across several threads or process.
You mentioned that you want it to be efficient and scalable, so I guess you will want to implement similar functionalities that will require multiprocessing and schedule so for that reason my recommendation is to use celery.
You can integrate it into your Django application easily following the documentation Integrate Django with Celery.

Spark Streaming - Poison Pill?

I'm trying to decide how best to design a data pipeline that will involve Spark Streaming.
The essential process I imagine is:
Set up a streaming job that watches a fileStream (this is the consumer)
Do a bunch of computation elsewhere, which populates that file (this is the producer)
The streaming job consumes the data as it comes in, performing various actions
When the producer is done, wait for all the streaming computations to finish, and tear down the streaming job.
It's step (4) that has me confused. I'm not sure how to shut it down gracefully. Recommendations I've found generally seem to recommend "Ctrl-C" on the driver, along with the spark.streaming.stopGracefullyOnShutdown config setting
I don't like that approach since it requires the producing code to somehow access the consumer's driver and send it a signal. These two systems could be completely unrelated; this is not necessarily easy to do.
Plus, there is already a communication channel — the fileStream — can't I use that?
In a traditional threaded producer/consumer situation, one common technique is to use a "poison pill". The producer sends a special piece of data indicating "no more data", then you wait for your consumers to exit.
Is there a reason this can't be done in Spark?
Surely there is a way for the stream processing code, upon seeing some special data, to send a message back to its driver?
The Spark docs have an example of listening to a socket, with socketTextStream, and it somehow is able to terminate when the producer is done. I haven't dived into that code yet, but this seems like it should be possible.
Any advice?
Is this fundamentally wrong-headed?

Esper UpdateListener's concurrency

My boss want to me learning Esper, the open source library for CEP, so I need some help.
I want to many UpdateListener subscribing one event stream, and they run on concurrently. That means, if one listener have a long and big process, then other listener running concurrency, because we have so many event at short time, so I need more fast processing.
The UpdateListener code can simply use a Java threadpool to do its work. For an example there is http://www.javacodegeeks.com/2013/01/java-thread-pool-example-using-executors-and-threadpoolexecutor.html.
In Esper you can also configure threading.
http://esper.codehaus.org/esper-5.1.0/doc/reference/en-US/html_single/index.html#api-threading-advanced

Multiple Task Lists Polling Using Flow

I'm trying to figure out how utilizing SWF Flow framework, I can have my activity worker poll multiple task list. The use case is for having two different priorities for activity tasks that need to be completed.
Bouns points if someone uses glisten and can point out a way to achieve that.
Thanks!
It is not possible for a single ActivityWorker to poll on multiple task lists. The reason for such design is that each poll request can take up to a minute due to long poll. If a few such polls feed into a single threaded activity implemenation it is not clear how to deal with conflicts that arise if tasks are received on multiple task lists.
Until the SWF natively supports priority task lists the solution is to instantiate one ActivityWorker per task list (priority) and deal with conflicts yourself.

How to use node.js for a queue processing app

What are the best practices when using node.js for a queue processing application?
My main concern there would be that Node processes can handle thousands of items at once, but that a rogue unhandled error in any of them could bring down the whole process.
I'd be looking for a queue/driver combination that allowed a two-phase commit (wrong terminology I think?), i.e:
Get the next appropriate item from the queue (which then blocks that item from being consumed elsewhere)
Once each item is handed over to the downstream service/database/filesystem you can then tell the queue that the item has been processed
I'd also want repeatably unique identifiers so that you can reliably detect if an item comes down the pipe twice. In a theoretical system it might not happen, but in a practical environment the capability to deal with it will make your life easier.
check out http://learnboost.github.com/kue/ i have used it for a couple of pet projects and works quite good, you can look at their source and check what practices they have take care of

Resources