I have curl command, whose input I want to load using BSON.
For performance reason, I want to read the curl output directly to memory, without saving it to file.
Also, I want to close the curl as soon as possible, so I want to read data from curl and then to pass them to BSON, we had some problems when curl was open because it was faster than consecutive parsing.
I know this works, but it keeps curl open for too long, which causes problems when we do this in parallel many times at once and the server where we download from is a bit busy.
using BSON
cmd = `curl <some data>`
BSON.load(open(cmd))
To close cmd ASAP, I have this:
# created IOBuffer to wrap bytes
import BSON.load
function BSON.load(bytes::Vector{UInt8})
io = IOBuffer()
write(io, bytes)
seekstart(io)
BSON.load(io)
end
cmd = `curl <some data>`
BSON.load(read(cmd))
which works, but I consider it very ugly. Also I'm not sure if this doesn't have some performance penalty.
Is there a more elegant way to do this? Can I read(cmd) into some IO structure, which could be then passed to BSON.load?
I realized the exactly same problem holds for Serialization.deserialize. My solution for deserialization is same, but I welcome any improvoements.
It's a little unclear what your question means when you say that it "keeps curl open for too long", but here are two different ways to do this:
julia> using BSON
julia> url = "https://raw.githubusercontent.com/JuliaIO/BSON.jl/master/test/test.bson"
"https://raw.githubusercontent.com/JuliaIO/BSON.jl/master/test/test.bson"
julia> open(BSON.load, `curl -s $url`)
Dict{Symbol,Any} with 2 entries:
:a => Complex{Int64}[1+2im, 3+4im]
:b => "Hello, World!"
julia> BSON.load(IOBuffer(read(`curl -s $url`)))
Dict{Symbol,Any} with 2 entries:
:a => Complex{Int64}[1+2im, 3+4im]
:b => "Hello, World!"
The first version is similar to your first version but closes the curl process immediately when done downloading. The second version reads the result of the curl call into a byte vector, wraps it in an IOBuffer and then calls BSON.load on that.
Related
I'm using fake-words module (npm install fake-words) with the following simple code:
#!/usr/bin/env node
const fake = require("fake-words");
while (true) {
console.log(fake.sentence());
}
When I run ./genwords.js, everything works as expected.
However when I pipe into external program (on Ubuntu shell), the generation of words stops after a second.
$ ./genwords.js | cat
...
(output generation stops after a second)
$ ./genwords.js | tee
...
(stuck as well)
$ ./genwords.js | pv -l
...
4.64k 0:00:13 [0.00 /s]
Same happening when assigning a value to variable to avoid any caching (as precaution after reading this post, probably not relevant to Node.js):
while (true) {
words = fake.sentence();
console.log(words);
}
What I'm doing wrong?
I'm using Node v16 on Ubuntu:
$ node --version
v16.13.1
The behavior of console.log() in code (such as the while loop in your example) that never relinquishes control to the event loop (especially when piped to another process) in Node.js is a longstanding...uh...quirk. It's a lot harder to fix than you might think. Here's the relevant issue in the tracker: https://github.com/nodejs/node/issues/6379
Or more specifically on the whole handling-when-piped-to-another-process issue: https://github.com/nodejs/node/issues/1741
You can work around the issue by restructuring the code to relinquish control to the event loop. Here is one possibility.
#!/usr/bin/env node
const fake = require("fake-words");
function getWords() {
console.log(fake.sentence());
setImmediate(getWords);
}
getWords()
I've looked at several other solutions, but none appear to be working the way I need.
I have an embedded controller running Linux (Dreadnaught) and a router also running Linux. I want to read the routing table (just the WAN IP of the default route) of the router, from the controller. My controller has telnet and wget, but does not have ssh or curl. I'd like to do this in a single command with a single result, so I can send the one command from an internal program and parse/save one result.
If I telnet to the router from my PC, either of these two commands gives me the exact result I need:
route |grep default|cut -c 17-32
or
dbctl get route/default/current_gateway
Route takes about 30 seconds (not sure why?), even without grep and cut; but dbctl is instant for all intents and purposes.
I've tried the eval method per Telnet to login with username and password to mail Server, but that shows all the telnet interactions; I want just the final string result.
I had a poke around at wget, but it looks to be for downloading files, not executing commands.
I'm hoping for:
somecommand server=1.2.3.4 user=myuser passwd=MyP#s$ command='dbctl get route/default/current_gateway'
which just returns:
8.7.6.5
Then my internal program (ISaGRAF, but shouldn't be relevant) can send one string to cmd and be returned 1 string, which I can use for my own nefarious purposes (well, I'm just going to log it actually).
If there's absolutely no other way, I can drop a sh script on to the requesting controller, but I'd rather not (extra steps to install, not as portable).
Solved as I was reviewing the question, but looking for suggestions - is this the cleanest method? Could it be done better?
OK, I poked around at the eval method again. Yes, it shows me the full interaction, but it's easy to just get the bits I need, using head and tail:
eval "{ sleep 2; echo myuser; sleep 1; echo MyP#s$; sleep 1; echo 'dbctl get route/default/current_gateway'; sleep 2; }" |telnet 1.2.3.4 |head -n 5|tail -n 1
eval returns the full interaction:
Entering character mode Escape character is '^]'.
login: myuser
Password:
admin#myrouter:~# dbctl get route/default/current_gateway
8.7.6.5
admin#myrouter:~#
So I just need head and tail to grab the one line I want using |head -n 5|tail -n 1
I wanted to download files for around 300 item. An example is below:
curl 'http://genome.jgi.doe.gov/ext-api/downloads/get-directory?organism=Absrep1' -b cookies > Absrep1.xml
This opens the page and downloads the content and stores it as xml file in my end
I tried to do a batch script in perl with system command, like
system('curl 'http://genome.jgi.doe.gov/ext-api/downloads/get-directory?organism=Absrep1'
-b cookies > Absrep1.xml');
But, it did not work. There was syntax error, which I guess is due to single quotes.
I tried with python,
import subprocess
bash_com = 'curl "http://genome.jgi.doe.gov/ext-api/downloads/get-directory?organism=Absrep1" '
subprocess.Popen(bash_com)
output = subprocess.check_output(['bash','-c', bash_com])
It did not work. I get the error, File does not exist. Even if it works, how can I include the
-b cookies > Absrep1.xml'
part in it?
Please help. Thanks in Advance,
AP
In Perl, you should be able to use this:
system(q{curl 'http://genome.jgi.doe.gov/ext-api/downloads/get-directory?organism=Absrep1' -b cookies > Absrep1.xml});
However, you might be better off using LWP or possibly even HTTP::Tiny(unless you need the cookies) instead of shelling out. For more advanced uses, there is also WWW::Mechanize.
The syntax error is almost certainly down to the quotes in the system call:
system('curl 'http://genome.jgi.doe.gov/ext-api/downloads/get-directory?organism=Absrep1' -b cookies > Absrep1.xml');
The single quotes either need to be escaped or alternative parentheses can be used such as double quotes or custom parentheses with q or qq, eg:
system(q{curl 'http://genome.jgi.doe.gov/ext-api/downloads/get-directory?organism=Absrep1' -b cookies > Absrep1.xml});
It's hard to tell from the context given, but wrapping the curl call in perl or python would likely be a less than optimal approach. Perl has LWP, Python has requests, and the bash shell is already well equipped to run simple batch jobs. It might be best to stick to a single interpreter unless there's a good reason not to.
I have been experimenting with Haskell. I am trying to write a web crawler and I need to use external curl binary (due to some proxy settings, curl needs to have some special arguments which seem to be impossible/hard to set inside the haskell code, so i rather just pass it as a command line option. but that is another story...)
In the code at the bottom, if I change the marked line with curl instead of curl --help the output renders properly and gives:
"curl: try 'curl --help' or 'curl --manual' for more information
"
otherwise the string is empty - as the `curl --help' response is multiline.
I suspect that in haskell the buffer is cleared with every new line. (same goes for other simple shell commands like ls versus ls -l etc.)
How do I fix it?
The code:
import System.Process
import System.IO
main = do
let sp = (proc "curl --help"[]){std_out=CreatePipe} -- *** THIS LINE ***
(_,Just out_h,_,_)<- createProcess sp
out <-hGetContents out_h
print out
proc takes as a first argument the name of the executable, not a shell command. That, is when you use proc "foo bar" you are not referring to a foo executable, but to an executable named exactly foo bar, with the space in its file name.
This is a useful feature in practice, because sometimes you do have spaces in there (e.g. on Windows you might have c:\Program Files\Foo\Foo.exe). Using a shell command you would have to escape spaces in your command string. Worse, a few other characters need to be escaped as well, and it's cumbersome to check what exactly those are. proc sidesteps the issue by not using the shell at all but passing the string as it is to the OS.
For the executable arguments, proc takes a separate argument list. E.g.
proc "c:\\Program Files\\Foo\\Foo.exe" ["hello world!%$"]
Note that the arguments need no escaping as well.
If you want to pass arguments to curl you have to pass that it in the list:
sp = (proc "/usr/bin/curl" ["--help"]) {std_out=CreatePipe}
Then you will get the complete output in the entire string.
Have a relatively simple question here. I need to run a function in the background in bash. Normally I would do it just like so:
FUNCTION &
but things are a bit more complicated than that. I have the following line that runs the main function for each record in a text database. I cant really edit this code all that much without vastly changing the rest of the entire project, but im still open to new ideas.
cat databases/$WAN | grep -v \# | while read LINE; do MAIN; done
I want to spawn a new terminal in background for each record to do a sort of parallel type processing, making things go much faster. Main takes a minute to process for each record. This however does not work.
cat databases/$WAN | grep -v \# | while read LINE; do MAIN &; done
Any suggestions?
* UPDATE *
Thanks for all the responses. Let me see if I can answer some of those questions.
gniourf_gniourf - Yes I know using cat like this is wrong. This was early on, and critical code, so I have not updated it yet. I now read into the while loop for most things I do. I will fix it eventually. You may be right about syntax. When I break it up like so, things seem to work now:
cat databases/$WAN | grep -v \# | while read LINE
do
MAIN & > /dev/null 2>&1
done
So that fixes the background problem. I wonder what was messed up in my single line syntax. Thanks
chepner - I don't believe LINE is a variable. I could be wrong though. Some things about Bash still confuse me. Maybe it is and is a variable that the entire record from the database gets stored to prior to processing.
Bruce K - Waiting is exactly what I was trying to avoid. If I let it run in the same terminal one at a time, it will slowly process each record in order. If I push each record to a seperate terminal for processing, all records will be processed simultaneously (at least in our eyes). The additional overhead is intentional in order to speed up how quickly the loop through the database occurs.
Radix - Yes you're right. I'll read up on that. Thanks for the link.
This worked for me:
$ function testt(){ echo "lineee is <$lineee>";}
$ grep 5432 /etc/services|while read lineee;do testt&done
lineee is <postgres 5432/udp # POSTGRES>
lineee is <postgres 5432/tcp # POSTGRES>
If, for some reason, your MAIN function is not seeing a LINE variable, you can try:
"export" the LINE variable beforehand:
$ export LINE
$ # do your thing
Or, pass the line read as an argument to the function:
$ function testt(){ LINE="$1"; echo "LINE is <$LINE>";}
$ grep 5432 /etc/services|while read LINE;do testt "$LINE"&done