Problems configuring LivyOperator in Airflow

Problems configuring LivyOperator in Airflow - apache-spark

For LivyOperator we set the following parameters:
polling_interval=60
retries_num_timeout=100
We set it up according to this documentation: https://airflow.apache.org/docs/apache-airflow-providers-apache-livy/stable/_api/airflow/providers/apache/livy/operators/livy/index.html
But, in this configuration after 100 * 60 seconds = 6000 seconds = 1 hour 40 minutes Livy-session is interrupted, operator becomes failed, loading is interrupted. Is there any way to resove such inconsistency on Airflow/Livy side?

Related

Laravel-Excel keeps browser busy for 140 seconds after completion of import: how do I correct it?

Using the import to models option, I am importing an XLS file with about 15,000 rows.
With the microtime_float function, the script times and echos out how long it takes. At 29.6 secs, this happens, showing it took less than 30 seconds. At that time, I can see the database has all 15k+ records as expected, no issues there.
Problem is, the browser is kept busy and at 1 min 22 secs, 1 min 55 secs and 2 min 26 secs it prompts me to either wait or kill the process. I keep clicking wait and finally it ends at 2 mins 49 secs.
This is a terrible user experience, how can I cut off this extra wait time?
It's a very basic setup: the route calls importcontroller#import with http get and the code is as follows:
public function import()
{
ini_set('memory_limit', '1024M');
$start = $this->microtime_float();
Excel::import(new myImport, 'myfile.xls' , null, \Maatwebsite\Excel\Excel::XLS);
$end = $this->microtime_float();
$t = $end - $start;
return "Time: $t";
}
The class uses certain concerns as follows:
class myImport implements ToModel, WithBatchInserts, WithChunkReading, WithStartRow

Ramp and Hold Users for Some time and Ramp again

I have the following scenario to be load tested for a service and it does not seem to work as expected. My scenario is as follows.
Test with rampUsers(100) over 15 minutes duration
Hold the users for about 10 minutes holdFor(10 minutes)
Then again rampUsers(200) over 15 minutes duration
Hold the users for about 10 minutes holdFor(10 minutes)
Then again rampUsers(200) over 15 minutes duration
I am trying to use throttle option for this but it does not seem to work as expected
here is my code snippets combinations that I have tried so far
//NUM_USERS = 300
//DURATION = 15 minutes
//CONSTANT_DURATION = 5 minutes
// Tried with different combinations of NUM_USERS and DURATION but not helpful
scn.inject(
rampUsers(NUM_USERS*1) during DURATION,
constantUsersPerSec(1) during CONSTANT_DURATION,
rampUsers(NUM_USERS*2) during DURATION,
constantUsersPerSec(2) during CONSTANT_DURATION,
rampUsers(NUM_USERS*3) during DURATION,
constantUsersPerSec(3) during CONSTANT_DURATION
)
scn.inject(
rampUsers(NUM_USERS) during DURATION
).throttle(
reachRps(NUM_USERS/4) in (CONSTANT_DURATION),
holdFor(CONSTANT_DURATION),
jumpToRps(NUM_USERS/3),
holdFor(CONSTANT_DURATION),
jumpToRps(NUM_USERS/2),
holdFor(CONSTANT_DURATION)
)
scn.inject(
rampUsers(NUM_USERS) during DURATION
).throttle(
holdFor(CONSTANT_DURATION),
reachRps(NUM_USERS+NUM_USERS) in (DURATION+DURATION),
holdFor(CONSTANT_DURATION)
)
Can any one help on this which one works in this case. I would like to have a graph like this

To target injection rate as you stated you want in the comments, you need something like this
scn.inject(
rampUsersPerSec(0) to (300) during DURATION,
constantUsersPerSec(300) during CONSTANT_DURATION,
rampUsersPerSec(300) to (600) during DURATION,
constantUsersPerSec(600) during CONSTANT_DURATION,
...
)

rampUser method is getting stuck in gatling 3.3

I am having issues using rampUser() method in my gatling script. The request is getting stuck after the following entry which had passed half way through.
Version : 3.3
================================================================================
2019-12-18 09:51:44 45s elapsed
---- Requests ------------------------------------------------------------------
> Global (OK=2 KO=0 )
> graphql / request_0 (OK=1 KO=0 )
> rest / request_0 (OK=1 KO=0 )
---- xxxSimulation ---------------------------------------------------
[##################################### ] 50%
waiting: 1 / active: 0 / done: 1
================================================================================
I am seeing the following in the log which gets repeated for ever and the log size increases
09:35:46.495 [GatlingSystem-akka.actor.default-dispatcher-2] DEBUG io.gatling.core.controller.inject.open.OpenWorkload - Injecting 0 users in scenario xxSimulation, continue=true
09:35:47.494 [GatlingSystem-akka.actor.default-dispatcher-6] DEBUG io.gatling.core.controller.inject.open.OpenWorkload - Injecting 0 users in scenario xxSimulation, continue=true
The above issue is happening only with rampUser and not happening with
atOnceUsers()
rampUsersPerSec()
rampConcurrentUsers()
constantConcurrentUsers()
constantUsersPerSec()
incrementUsersPerSec()
Is there a way to mimic rampUser() in some other way or is there a solution for this.
My code is very minimal
setUp(
scenarioBuilder.inject(
rampUsers(2).during(1 minutes)
)
).protocols(protocolBuilder)
I am stuck with this for some time and my earlier post with more information can be found here
Can any of the gatling experts help me on this?
Thanks for looking into it.

It seems you have slightly incorrect syntax for a rampUsers. You should try remove a . before during.
I have in my own script this code and it works fine:
setUp(userScenario.inject(
// atOnceUsers(4),
rampUsers(24) during (1 seconds))
).protocols(httpProtocol)
Also, in Gatling documentation example is also without a dot Open model:
scn.inject(
nothingFor(4 seconds), // 1
atOnceUsers(10), // 2
rampUsers(10) during (5 seconds), // HERE
constantUsersPerSec(20) during (15 seconds), // 4
constantUsersPerSec(20) during (15 seconds) randomized, // 5
rampUsersPerSec(10) to 20 during (10 minutes), // 6
rampUsersPerSec(10) to 20 during (10 minutes) randomized, // 7
heavisideUsers(1000) during (20 seconds) // 8
).protocols(httpProtocol)
)
My guess is that syntax can't be parsed, so instead 0 is substituted. (Here is example of rounding. Not applicable, but as reference: gatling-user-injection-constantuserspersec)
Also, you mentioned that others method work, could you paste working code as well?

Azure LogicApp calculating price

I have these two LogicApps
LogicApp 1
Actions: 6
Standard Connections: 2
Runs: every 5. minute or 8640 exections pr. month (12 * 24 * 30)
LogicApp 2
Actions: 3
Standard Connections: 2
Runs: every 2. minute or 21600 exections pr. month (30 * 24 * 30)
The pricing, according to https://azure.microsoft.com/en-us/pricing/details/logic-apps/ is:
Actions: 0.000025 $
Standard connections: 0.000125 $
As i understand it, the pricing is pr. execution.
is it correct to say that the monthly cost of the two functions is:
LogicApp 1: (8640 * 6 * 0.000025) + (8640 * 2 * 0.000125) = 3.46 $
LogicApp 2: (21600 * 3 * 0.000025) + (21600 * 2 * 0.000125) = 7.02 $
All actions and connections is executed every time.

Your calculations seem okay, don't forget that it is only successful and failed actions that are billable.
You can set up an Azure Function to do the actual pulling against any data source and then have the Function do an HTTP call to a when an http request is received in logic apps to reduce the times that the trigger will fire, this should be cheaper all the building the azure function costs as well.
If this is a recurring job that is only running on business hours you can set up a job like this with the recurrence trigger and with an interval and frequency like this.

NodeJs scheduling jobs on multiple nodes

I have two nodeJs servers running behind a Load Balancer. I have some scheduled jobs that i want to run only once on any of the two instances in a distributed manner.
Which module should i use ? Will node-quartz(https://www.npmjs.com/package/node-quartz) be useful for this ?

Adding redis and using node-redlock seemed like overkill for the little caching job I needed to schedule for once a day on a single server with three Node.js processes behind a load balancer.
I discovered http://kvz.io/blog/2012/12/31/lock-your-cronjobs/ - and that led me to the concept behind Tim Kay's solo.
The concept goes like this - instead of locking on an object (only works in a single process) or using a distributed lock (needed for multiple servers), "lock" by listening on a port. All the processes on the server share the same ports. And if the process fails, it will (of course) release the port.
Note that hard-failing (no catch anywhere surrounding) or releasing the lock in catch are both OK, but neglecting to release the lock when catching exceptions around the critical section will mean that the scheduled job never executes until the locking process gets recycled for some other reason.
I'll update when I've tried to implement this.
Edit
Here's my working example of locking on a port:
multiProc.js
var net = require('net');
var server = net.createServer();
server.on('error', function () { console.log('I am process number two!'); });
server.listen({ port: 3000 },
function () { console.log('I am process number one!');
setTimeout(function () {server.close()}, 3000); });
If I run this twice within 3 seconds, here's the output from the first and second instances
first
I am process number one!
second
I am process number two!
If, on the other hand, more than 3 seconds pass between executing the two instances, both claim to be process number one.

I haven't done this before but I can see myself doing it this way.
Using any scheduler library for Node.js.
In order to achieve your goal, i would use redis for distributed lock. Before running any scheduled jobs, a worker / node will have to get the lock; do the job; and release / ack() when finishing the job (or on error).

A single server can be selected a leader by conducting a election among available instances using Zoologist package
https://www.npmjs.com/package/zoologist
Requires Zookeeper server to conduct the election

I don't know if this might help you, but still posting it here.
Usually node-schedule is used for time based schedules where you have to execute arbitrary code only once. For eg: a database read/write on next month 6:00 PM.

The following post will explain writing the scheduled Jobs which will perform certain action based on our requirement for a particular time / day instance.
For performing the above task we are going to use CRON package of node.
To add a job we need to :
1) Install Cron
npm install cron
2) Require cron 's CronJob to our project.
var CronJob = require('cron').CronJob
3) Create an instance of CronJob
var jobs = new CronJob({
cronTime: ' * * * * * *',
onTick: function () {
//perform Your action
},
start: false,
timeZone: 'Asia/Kolkata'
});
Arguments
cronTime: it takes 6 arguments namely :
1) Second - > 0 - 59
2) Minute - > 0 - 59
3) Hour - > 0 - 23
4) Day of Month - > 1 - 31
5) Months - > 0 - 11
6) Day of Week - > 0 - 6
Note: We Can define cronTime in ranges alse like * for always.
0 - 59 / 5 at every 5 minute.
onTick: The operation to perform.
Start: It takes a boolean and if true then starts the job now.
timeZone: job's timeZone
4) To start Job
jobs.start()
For Example :
var jobs = new CronJob({
cronTime: ' 00 00 0-23 * * *',
onTick: function () {
printMyName();
},
start: false,
timeZone: 'Asia/Kolkata'
});
jobs.start();
var printMyName = function () {
var date = new Date();
console.log("Hi Vipul it is ", today);
};
Hope it helps.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Problems configuring LivyOperator in Airflow - apache-spark

Related

Laravel-Excel keeps browser busy for 140 seconds after completion of import: how do I correct it?

Ramp and Hold Users for Some time and Ramp again

rampUser method is getting stuck in gatling 3.3

Azure LogicApp calculating price

NodeJs scheduling jobs on multiple nodes

Categories

Resources