Monitor progress of Virtuoso bulk data load

Monitor progress of Virtuoso bulk data load - dbpedia

I am loading DBPedia (file by file) into Virtuoso using the standard ld_dir command in isql.
Question: Is there a way to monitor the progress of loading a single file?
I understand that select ll_file, ll_status from db.dba.load_list; gives 1 for files that are in progress, and status() displays the current active statements and their running times.

The best thing I could find is to run status('c'); and monitor the File Size in the displayed output. If the file size is increasing, then there is something being done. But I could not know the progress or percentage of work done or get an ETA.

Related

Rotate logfiles on an hourly basis by appending date and hour

I wanted to implement a log rotation option in linux. I have a *.trc file where all the logs are getting written. I wanted a new log file to be created every hour. I have done some analysis and found the below
I have done some analysis and got to know about the logrotate option. Where we need to update the rotation details for a specific file in the logrotate.conf file
I wanted to know if there is an option without using the logrotate option. I wanted to rotate the logfiles on an hourly basis, so something like appending date and hour information to the log file and create new files based on the current hour information.
Im looking for some suggestions on how to implement the log rotation using the second option specified above.
Any details on the above would be really helpful

If you have control over the process that creates the logs, you could just timestamp the file at the moment of creation. This will remove the need to rename the log.
Before you write every line you check the time. If one hour passed after that file was created, you close the current file and open a new one with a new timestamp.
If you do not have control over the process, you can pipe the output of your process (stdout,stderr) to multilog, which is a binary that's part of the package daemon-tools in most Linux distros.
https://cr.yp.to/daemontools/multilog.html

HTML5 Audio long buffering before playing

I'm currently making an electron app that needs to play some 40Mbyte audio file from the file system, maybe it's wrong to do this but I found that the only way to play from anywhere in the file system is to convert the file to a dataurl in the background script and then transfer it using icp, after that I simply do
this.sound = new Audio(dataurl);
this.sound.preload = "metadata"
this.sound.play()
(part of a VueJS component hence the this)
I did a profling inside electron and this is what came out:
Note that actually transferring the 40Mbytes audio file doesn't take that long (around 80ms) what is extremely annoying is the "Second Task" which is probably buffering (I have no idea) which last around 950ms, this is way too long and ideally would need it under <220ms
I've already tried changing the preload option to all available options and while I'm using the native html5 audio right now I've also tried howlerjs with similar results (seemed a bit faster tho).
I would guess that loading the file directly might be faster but even after disabling security measures put by electron to block the file:/// it isn't recognized as a valid URI by XHR
Is there a faster way to load the dataurl since all the data is there it just needs to be converted to a buffer or something like that ?
Note: I can not "pre-buffer" every file in advance since there is about 200 of them it just wouldn't make sense in my opinion.
Update:
I found this post Electron - throws Not allowed to load local resource when using showOpenDialog
don't know how I missed it, so I followed step 1 and I now can load files inside electron with the custom protocol, however, nor Audio nor howlerjs is faster, it's actually slower at around 6secs from click to first sound, is it that it needs to buffer the whole file before playing ?
Update 2:
It appears that the 6sec loading time is only effective on the first instance of audio that is created. I do not know why tho. After that the use of two instances (one playing and one pre-buffering) work just fine, however even loading a file that isn't loaded is instantaneous. Seems weird that it only is the firs one.

Why request in View results in a tree listener shows "No data to display" in non-gui mode

Expected Requests and Response are showing after executing jmeter script in GUI mode but when the same is execcuted in non-gui mode, for few requests it shows "No data to display" and Response is empty.
This was encountered in jmeter v4. Can anybody help as in why such variance in requests when script is executed in non-gui mode?
enter image description here
enter image description here

This is done intentionally, JMeter removes request and especially response data from sample results (mainly because it cannot be stored in CSV format) in order to reduce memory consumption and disk IO required to aggregate and store the data.
If you really need to see request and response data you can "tell" JMeter to store it by adding the next lines to user.properties file (lives in "bin" folder of your JMeter installation)
jmeter.save.saveservice.output_format=xml
jmeter.save.saveservice.response_data=true
jmeter.save.saveservice.samplerData=true
jmeter.save.saveservice.requestHeaders=true
jmeter.save.saveservice.url=true
jmeter.save.saveservice.responseHeaders=true
JMeter restart will be required to pick the properties up. Once done - re-run the test and make sure to use the different output .jtl file (or delete the existing one) - this time it will have much more results suitable for displaying in the View Results Tree listener
References:
Configuring JMeter
Apache JMeter Properties Customization Guide
Results File Configuration

How can I prune executors' logs in spark streaming

I'm working on a spark streaming job which runs on standalone mode. The executors by default append the logs in $SPARK_HOME/work/app_idxxxx/stderr and stdout files. Now the problem comes when app runs for a long time say a month or more and it generates a lot of logs inside stderr file. I would like to rollup the stderr daily for a week and archive(delete) that after that. I changed the log4j.properties with org.apache.log4j.RollingFileAppender and directed the logs to a file instead of stderr but the file doesn't respect the rolling and keeps growing.
Creating a cron job to do that is also not working since spark has a pointer to that specific file and changing the name probably not working.
I could't find any documentations for these specific logs. I really appreciate for any help.

After digging more, I finally found how to resolve the issue and I post it here so that the next person don't go through all this suffer and trial/error.
The setting for those logs are in two different places. One in $SPARK_HOME/conf/spark-default.conf add these three lines below in each executor:
spark.executor.logs.rolling.time.interval daily
spark.executor.logs.rolling.strategy time
spark.executor.logs.rolling.maxRetainedFiles 7
The other file that you need to change in each executor is $SPARK_HOME/conf/spark-env.sh add the following line:
SPARK_WORKER_OPTS="$SPARK_WORKER_OPTS -Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=1800
-Dspark.worker.cleanup.appDataTtl=864000
-Dspark.executor.logs.rolling.strategy=time
-Dspark.executor.logs.rolling.time.interval=daily
-Dspark.executor.logs.rolling.maxRetainedFiles=7 "
export SPARK_WORKER_OPTS
After these changes it started working properly. Hope this helps some people :)

if you are in standalone mode, just export an environment is enough:
export SPARK_WORKER_OPTS="-Dspark.executor.logs.rolling.strategy=time -Dspark.executor.logs.rolling.time.interval=daily -Dspark.executor.logs.rolling.maxRetainedFiles=7"
you can also refer to: http://apache-spark-user-list.1001560.n3.nabble.com/Executor-Log-Rotation-Is-Not-Working-td18024.html

Generate auto increment sequence in logstash

I am pushing logs to Elastic Search from Logstash and then i need to get back the logs in the order they were written. Sorting by time stamp does not help because there could me multiple log statements in the same time. I followed the solution in Include monotonically increasing value in logstash field? and it worked perfectly in my windows system.
But when the code was moved to the linux production environment, logstash is not starting up. Failing with the below error
reason=>"Couldn't find any filter plugin named 'seq'. Are you sure
this is correct? Trying to load the seq filter plugin resulted in this
error: no such file to load -- logstash/filters/seq", :level=>:error}

Check if the seq.rb file is in the filter folder.
Also check if the line ending of your seq.rb are linux. If you transferred the file from a windows machine to a linux, the problem might come from here.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Monitor progress of Virtuoso bulk data load - dbpedia

The best thing I could find is to run status('c'); and monitor the File Size in the displayed output. If the file size is increasing, then there is something being done. But I could not know the progress or percentage of work done or get an ETA.

Related

Rotate logfiles on an hourly basis by appending date and hour

HTML5 Audio long buffering before playing

Why request in View results in a tree listener shows "No data to display" in non-gui mode

How can I prune executors' logs in spark streaming

Generate auto increment sequence in logstash

Categories

Resources