Filtering Live json Stream on linux

Filtering Live json Stream on linux - linux

I have live raw json stream data from the virtual radar server I'm using.
I use Netcat to fetch the data and jq to save it on my kali linux. using the following command.
nc 127.0.0.1 30006 | jq > A7.json
But i want to filter specific content from the data stream.
i use the following command to extract the data.
cat A7.json | jq '.acList[] | select(.Call | contains("QTR"))' - To fetch Selected airline
But i realized later that the above command only works once. in other words, it does not refresh. as the data is updating every second. i have to execute the command over and over again to extract the filter data which is causing to generate duplicate data.
Can someone help me how to filter the live data without executing the command over and over .

As you don't use the --stream option, I suppose your document is a regular JSON document.
To execute your command every second, you can implement a loop that sleeps for 1 second:
while true; do sleep 1; nc 127.0.0.1 30006 | jq '.acList[] | select(…)'; done
To have the output on the screen and also save to a file (like you did with A7.json), you can add a call to tee:
# saves the document as returned by `nc` but outputs result of `jq`
while true; do sleep 1; nc 127.0.0.1 30006 | tee A7.json | jq '.acList[] | …'; done
# saves the result of `jq` and outputs it
while true; do sleep 1; nc 127.0.0.1 30006 | jq '.acList[] | …' | tee A7.json; done

Can you try this ?
nc localhost 30006 | tee -a A7.json |
while true; do
stdbuf -o 0 jq 'try (.acList[] | select(.Call | contains("QTR")))' 2>/dev/null
done

Assuming that no other process is competing for the port, I'd suggest trying:
nc -k -l localhost 30006 | jq --unbuffered ....
Or if you want to keep a copy of the output of the netcat command:
nc -k -l localhost 30006 | tee A7.json | jq --unbuffered ....
You might want to use tee -a A7.json instead.

Break Down
Why I did what I did?
I have live raw json stream data from the virtual radar server, which is running on my laptop along with Kali Linux WSL on the background.
For those who don't know virtual radar server is a Modes-s transmission decoder that is used to decode different ADS-B Formats. Also, it rebroadcast the data in a variety of formats.. one of them is the Json stream. And I want to save select aircraft data in json format on Kali Linux.
I used the following commands to save the data before
$ nc 127.0.0.1 30001 | jq > A7.json - To save the stream.
$ cat A7.json | jq '.acList[] | select(.Call | contains("QTR"))' - To fetch Selected airline
But I realized two things after using the above. One I'm storing unwanted data which is consuming my storage. Two, when I used the second command it just goes through the json file once and produces the data which is saved at that moment and that moment alone. which caused me problems as I have to execute the command over and over again to extract the filter data which is causing me to generate duplicate data.
Command worked for me
The following command worked flawlessly for my problem.
$ nc localhost 30001 | sudo jq --unbuffered '.acList[] | select (.Icao | contains("800CB8"))' > A7.json
The following also caused me some trouble which i explain clearly down below.
Errors X Explanation
This error was resulted in the missing the Field name & key in the json object.
$ nc localhost 30001 | sudo jq --unbuffered '.acList[] | select (.Call | contains("IAD"))' > A7.json
#OUTPUT
jq: error (at <stdin>:0): null (null) and string ("IAD") cannot have their containment checked
If you see the below JSON data you'll see the missing Field name & key which caused the error message above.
{
"Icao": "800CB8",
"Alt": 3950,
"GAlt": 3794,
"InHg": 29.7637787,
"AltT": 0,
"Call": "IAD766",
"Lat": 17.608658,
"Long": 83.239166,
"Mlat": false,
"Tisb": false,
"Spd": 209,
"Trak": 88.9,
"TrkH": false,
"Sqk": "",
"Vsi": -1280,
"VsiT": 0,
"SpdTyp": 0,
"CallSus": false,
"Trt": 2
}
{
"Icao": "800CB8",
"Alt": 3950,
"GAlt": 3794,
"AltT": 0,
"Lat": 17.608658,
"Long": 83.239166,
"Mlat": false,
"Spd": 209,
"Trak": 88.9,
"Vsi": -1280
}
{
"Icao": "800CB8",
"Alt": 3800,
"GAlt": 3644,
"AltT": 0,
"Lat": 17.608795,
"Long": 83.246155,
"Mlat": false,
"Spd": 209,
"Trak": 89.2,
"Vsi": -1216
}
Commands that didn't work for me.
Command #1
When i used jq with --stream with the filter it produced the below output. --Stream with output filter worked without any errors.
$ nc localhost 30001 | sudo jq --stream '.acList[] | select (.Icao | contains("800"))' > A7.json
#OUTPUT
jq: error (at <stdin>:0): Cannot index array with string "acList"
jq: error (at <stdin>:0): Cannot index array with string "acList"
jq: error (at <stdin>:0): Cannot index array with string "acList"
jq: error (at <stdin>:0): Cannot index array with string "acList"
jq: error (at <stdin>:0): Cannot index array with string "acList"
jq: error (at <stdin>:0): Cannot index array with string "acList"
Command #2
For some reason -k -l didn't work to listen to the data. but the other command worked perfectly. i think it didn't work because of the existing port outside of the wsl.
$ nc -k -l localhost 30001
$ nc localhost 30001
Thank you to everyone who helped me to solve my issue. I'm very grateful to you guys

Related

Tee pipe into 3 different processes and grepping the second match

I am trying to create a bash script which shows me the latest stats about corona infection numbers in the countries Germany and Switzerland and also in the whole world.
corona () {
curl -s https://corona-stats.online\?minimal\=true | tee >(head -n 1) > >(grep "(CH)\|(DE)")
curl -s https://corona-stats.online\?minimal\=true | tail -n 20 | grep World
}
As you can see, to do this I had to create this very ugly script where curl is called twice. I had to do this because the website looks like this:
Rank World Total Cases New Cases ▲ Total Deaths New Deaths ▲ Recovered Active Critical Cases / 1M pop
1 USA (US) 7,497,256 2,585 ▲ 212,694 34 ▲ 4,737,369 2,547,193 14,190 22,617
2 India (IN) 6,397,896 5,936 ▲ 99,833 29 ▲ 5,352,078 945,985 8,944 4,625
3 Brazil (BR) 4,849,229 144,767 4,212,772 491,690 8,318 22,773
4 Russia (RU) 1,194,643 9,412 ▲ 21,077 186 ▲ 970,296 203,270 2,300 8,185
...
22 Germany (DE) 295,943 413 ▲ 9,586 259,500 26,857 362 3,529
...
58 Switzerland (CH) 54,384 552 ▲ 2,075 1 ▲ 45,300 7,009 32 6,272
...
World 34,534,040 63,822 ▲ 1,028,540 1,395 ▲ 25,482,492 8,023,008 66,092 4,430.85
Code: https://github.com/sagarkarira/coronavirus-tracker-cli
Twitter: https://twitter.com/ekrysis
Last Updated on: 02-Oct-2020 12:10 UTC
US STATES API: https://corona-stats.online/states/us
HELP: https://corona-stats.online/help
SPONSORED BY: ZEIT NOW
Checkout fun new side project I am working on: https://messagink.com/story/5eefb79b77193090dd29d3ce/global-response-to-coronavirus
I only want to display the first line, the last line of the table (World) and the two lines about Germany and Switzerland. I manged to display the first line as well as the two countries by piping the output of curl into head -n 1 and grepping the country codes. I was able to do both things thanks to this answer.
Now I want to get the last line in the table, the one where the current cases of the whole World are displayed. I tried to use tee again to pipe it into a third process tee >(head -n 1) > >(grep "(CH)\|(DE)") > >(tail -n 20 | grep World). But that didn't work. My first question is, how can I pipe an output into 3 different processes using tee?
The second question revolves around the way I try to grep the World line. I tail the last 20 lines and then grep "World". I do this because if I simply grep "World", it only return the title line where "World" can also be found. So my second question is: How can I grep only the last or second occurance?

You can chain several tee commands and throw away only the last output of tee:
curl -s ... | tee >( cmd1 ) | tee >( cmd2 ) | tee > >( cmd3 )
Actually, we can shorten it to:
curl -s ... | tee >( cmd1 ) | tee >( cmd2 ) | cmd3
because we do not use the output of the last tee anyway.
Having multiple commands write to the terminal at the same time might get the output mixed up. A much more elegant solution is to use only one grep, e.g.
curl -s ... | grep '(DE)\|(CH)\|World.*,'
The expression World.*, will just look for a comma in the same line after World, in order to exclude the head line.

I think a variable should suit better what you need (at least in this case), something like:
corona() {
data="$(curl -s https://corona-stats.online\?minimal\=true)"
echo "$data" | head -n 1
echo "$data" | grep "(CH)\|(DE)"
echo "$data" | tail -n 20 | grep World
}
It would convey easier what you're trying to do and would also be easier to expand if you'd need to change anything.

You can try this:
curl -s https://corona-stats.online\?minimal\=true | grep -E "(Rank|^1[^0-9]|\(CH\)|\(DE\))"
Use grep to display only line contain "Rank", 1[non-digit], (CH), (DE)

Bash: Loop Read N lines at time from CSV

I have a csv file of 100000 ids
wef7efwe1fwe8
wef7efwe1fwe3
ewefwefwfwgrwergrgr
that are being transformed into a json object using jq
output=$(jq -Rsn '
{"id":
[inputs
| . / "\n"
| (.[] | select(length > 0) | . / ";") as $input
| $input[0]]}
' <$FILE)
output
{
"id": [
"wef7efwe1fwe8",
"wef7efwe1fwe3",
....
]
}
currently, I need to manually split the file into smaller 10000 line files... because the API call has a limit.
I would like a way to automatically loop through the large file... and only use 10000 lines as a time as $FILE... up until the end of the list.

I would use the split command and write a little shell script around it:
#!/bin/bash
input_file=ids.txt
temp_dir=splits
api_limit=10000
# Make sure that there are no leftovers from previous runs
rm -rf "${temp_dir}"
# Create temporary folder for splitting the file
mkdir "${temp_dir}"
# Split the input file based on the api limit
split --lines "${api_limit}" "${input_file}" "${temp_dir}/"
# Iterate through splits and make an api call per split
for split in "${temp_dir}"/* ; do
jq -Rsn '
{"id":
[inputs
| . / "\n"
| (.[] | select(length > 0) | . / ";") as $input
| $input[0]]
}' "${split}" > api_payload.json
# now do something ...
# curl -dapi_payload.json http://...
rm -f api_payload.json
done
# Clean up
rm -rf "${temp_dir}"

Here's a simple and efficient solution that at its core just uses jq. It takes advantage of the -c command-line option. I've used xargs printf ... for illustration - mainly to show how easy it is to set up a shell pipeline.
< data.txt jq -Rnc '
def batch($n; stream):
def b: [limit($n; stream)]
| select(length > 0)
| (., b);
b;
{id: batch(10000; inputs | select(length>0) | (. / ";")[0])}
' | xargs printf "%s\n"
Parameterizing batch size
It might make sense to set things up so that the batch size is specified outside the jq program. This could be done in numerous ways, e.g. by invoking jq along the lines of:
jq --argjson n 10000 ....
and of course using $n instead of 10000 in the jq program.
Why “def b:”?
For efficiency. jq’s TCO (tail recursion optimization) only works for arity-0 filters.
Note on -s
In the Q as originally posted, the command-line options -sn are used in conjunction with inputs. Using -s with inputs defeats the whole purpose of inputs, which is to make it possible to process input in a stream-oriented way (i.e. one line of input or one JSON entity at a time).

How to make version-sort command work in a sh file?

I'm trying to use "sort -V" command (aka version-sort) in a sh file.
Specifically, I have the following line of code in a sh file:
SOME_PATH="$(ls dir_1/dir_2/v*/filename.txt | sort -V | tail -n1)"
What I'm trying to accomplish through the above command is that given a list of file paths with different version numbers, I want to get the file path with the greatest version number.
For example, let's assume that I have the following list of file paths:
dir_1/dir_2/v1/filename.txt,
dir_1/dir_2/v2/filename.txt,
dir_1/dir_2/v11/filename.txt
Then, I want the command to return dir_1/dir_2/v11/filename.txt instead of dir_1/dir_2/v2/filename.txt since the former has the greatest version value, "11".
From my understanding the above linux command precisely accomplishes this.
I confirmed it working on the Linux bash terminal.
However, when I run a sh file with the above command in it, I'm getting a
"ERROR: Unknown command line flag 'V'" error message.
Is there a way to make version-sort work in a sh file?
If not, is there a way to implement it not using -V flag?
Thank you.

Using shell's printf and awk:
SOME_PATH=$(printf %s\\0 dir_1/dir_2/v*/filename.txt |
awk 'BEGIN{FS="/";RS="\0";v=0}{match($3,/v([[:digit:]]+)/,m);if(m[1]>v){v=m[1];l=$0}}END{print l}')
Using awk only:
SOME_PATH=$(awk 'BEGIN{delete ARGV[0];v=0;for(i in ARGV){split(ARGV[i],s,"/");match(s[3],/v([[:digit:]]+)/,m);if(m[1]>v){v=m[1];l=ARGV[i]}}}END{print l}' dir_1/dir_2/v*/filename.txt)
Formatted awk script:
#!/usr/bin/env -S awk -f
BEGIN {
delete ARGV[0]
v=0
for (i in ARGV) {
split(ARGV[i], s, "/")
match(s[3], /v([[:digit:]]+)/, m)
if (m[1]>v) {
v=m[1]
l=ARGV[i]
}
}
}
END {
print l
}
Using a null delimited list stream, and not parsing the output of ls 1:
SOME_PATH=$(
printf '%s\0' dir_1/dir_2/v*/filename.txt |
sort -z -t'/' -k3V |
tail -zn1 |
tr -d '\0'
)
How it works:
printf '%s\0' dir_1/dir_2/v*/filename.txt: Expands the paths into a null delimited stream output.
sort -z -t'/' -k3V: Sorts the null delimited input stream on -k3V version number from the 3rd column, -t'/' using / as a delimiter.
tail -zn1: Outputs the least null delimited entry from the input stream.
tr -d '\0': Trim-out any remaining null to prevent the shell from complaining with error: warning: command substitution: ignored null byte in input.
StackExchange: Why not parse ls (and what to do instead)?

Hex dump to binary data conversion

Need to convert hex dump file to binary data use xxd command (or any other suitable methods that works). The raw hexdump was produced not with xxd.
Tried two variants with different options:
xxd -r input.log binout.bin
xxd -r -p input.log binout.bin
Both methods produce wrong results: first command create binary file size 2.2GB, the second command produce binary file size 82382 bytes, both binary file size mismatch, the expected binary size is 65536 bytes.
part of hex file:
807e0000: 4562 537f e0b1 6477 84bb 6bae 1cfe 81a0 | EbS...dw..k.....
807e0010: 94f9 082b 5870 4868 198f 45fd 8794 de6c | ...+XpHh..E....l
807e0020: b752 7bf8 23ab 73d3 e272 4b02 57e3 1f8f | .R{.#.s..rK.W...
807e0030: 2a66 55ab 07b2 eb28 032f b5c2 9a86 c57b | *fU....(./.....{
807e0040: a5d3 3708 f230 2887 b223 bfa5 ba02 036a | ..7..0(..#.....j
807e0050: 5ced 1682 2b8a cf1c 92a7 79b4 f0f3 07f2 | \...+.....y.....
807e0060: a14e 69e2 cd65 daf4 d506 05be 1fd1 3462 | .Ni..e........4b
What can be the issue here and how to convert data correctly?

After the xxd you need to remove the first and last parts.
$ sed -i 's/^\(.\)\{9\}//g' binary.txt
$ sed -i 's/\(.\)\{16\}$//g' binary.txt
binary.txt is the name of your file after xxd.
After that you can convert it to binary again.
$ for i in $(cat binary.txt) ; do printf "\x$i" ; done > mybinary
After this if you have the original .bin file you can check md5sums of the files to see if they have the same value. If they have same value then the transformation completed succesfully.
$ md5sum originbinary
$ md5sum mybinary
You can cover more details in the first part of this link. https://acassis.wordpress.com/2012/10/21/how-to-transfer-files-to-a-linux-embedded-system-over-serial/

Trim table of text and store values as variables

I am trying to write a script that can be run on my FreeNas (FreeBSD) box, that connects to an ESXi host via SSH and gracefully shuts down VMs. What I need to run for a list of VM IDs is:
vim-cmd vmsvc/power.shutdown VMID
Am after some assistance in filtering the output of the commands used to retrieve the IDs, and then passing it to the shutdown command.
The command to retrieve all VMs is:
vim-cmd vmsvc/getallvms
It outputs data like this:
Vmid Name File Guest OS Version Annotation
12 Eds-LINUX [Eds-FS-Datastore-1] Eds-LINUX/Eds-LINUX.vmx ubuntu64Guest vmx-13
13 Eds-RT [Eds-FS-Datastore-1] Eds-RT/Eds-RT.vmx freebsd64Guest vmx-13
14 Eds-DC [Eds-FS-Datastore-1] Eds-DC/Eds-DC.vmx windows9Server64Guest vmx-13
15 Eds-STEAM [Eds-FS-Datastore-1] Eds-STEAM/Eds-STEAM.vmx windows9_64Guest vmx-13
16 Eds-DL [Eds-FS-Datastore-1] Eds-DL/Eds-DL.vmx windows9Server64Guest vmx-13
17 Eds-RD [Eds-FS-Datastore-1] Eds-RD/Eds-RD.vmx windows9Server64Guest vmx-13
18 Eds-PLEX [Eds-FS-Datastore-1] Eds-PLEX/Eds-PLEX.vmx windows9Server64Guest vmx-13
19 Eds-MC [Eds-FS-Datastore-1] Eds-MC/Eds-MC.vmx windows9Server64Guest vmx-13
2 Eds-FS [Eds-ESXi-Datastore-1] Eds-FS/Eds-FS.vmx freebsd64Guest vmx-13
I have determined I can use a pipe into sed, to delete the first line, using:
vim-cmd vmsvc/getallvms | sed '1d'
I am then able to retrieve the ID of the VM I want to filter out, by using:
vim-cmd vmsvc/getallvms | awk '/Eds-FS.vmx/{print$1}'
This gives me the ID of 2. I am unclear however, how to store this in a variable for later use.
I need to know of a way to select just the first column from this data, and for each ID in the list, put it in an array. I then need to loop through the array and for each ID, run the below to get the power state of the VM:
vim-cmd vmsvc/power.getstate VMID
This outputs data like this, with a status of either power on or off:
Retrieved runtime info
Powered on
For each one that is found to be powered on, I need to store the VM ID in a second array to later pass to the shutdown command, except for where the ID is equal to that of the VM I want to power off.

Thanks to anubhava who gave me enough assistance to get something working (although probably not following standards or best practices)
I have this script saved on my ESXi host, which I connect to with SSH and trigger a run of:
freenasid=`vim-cmd vmsvc/getallvms | sed '1d' | awk '/Eds-FS.vmx/{print$1}'`
vmids=`vim-cmd vmsvc/getallvms | sed '1d' | awk '{print$1}'`
for vmid in $vmids
do
if [ $vmid != $freenasid ]
then
powerstate=`vim-cmd vmsvc/power.getstate $vmid | sed '1d'`
if [ "$powerstate" = "Powered on" ]
then
onvmids="$onvmids $vmid"
fi
fi
done
for vmid in $onvmids
do
vim-cmd vmsvc/power.shutdown $vmid
done
exit 0
This correctly shutsdown all running VMs.

To list IDs from first column use awk like this:
vim-cmd vmsvc/getallvms | awk 'NR>1{print $1}'
To store IDs in a shell array use:
readarray -t arr < <(vim-cmd vmsvc/getallvms | awk 'NR>1{print $1}')
To loop through array and run another command:
for id in "${arr[#]}"; do
vim-cmd vmsvc/power.getstate "$id"
done
To store one particular id use command substitution:
vmid1=$(vim-cmd vmsvc/getallvms | awk '/Eds-FS\.vmx/{print$1}')

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Filtering Live json Stream on linux - linux

Can you try this ? nc localhost 30006 | tee -a A7.json | while true; do stdbuf -o 0 jq 'try (.acList[] | select(.Call | contains("QTR")))' 2>/dev/null done

Assuming that no other process is competing for the port, I'd suggest trying: nc -k -l localhost 30006 | jq --unbuffered .... Or if you want to keep a copy of the output of the netcat command: nc -k -l localhost 30006 | tee A7.json | jq --unbuffered .... You might want to use tee -a A7.json instead.

Related

Tee pipe into 3 different processes and grepping the second match

Bash: Loop Read N lines at time from CSV

How to make version-sort command work in a sh file?

Hex dump to binary data conversion

Trim table of text and store values as variables

Categories

Resources