I'm trying to reproduce a simple spam filter using naive bayes and Node.js on windows, I got a resource with these Unix commands and don't know how to run it on PowerShell:
sed -nr 's#^0 (.*)#training/\1#p' CSDMC2010_SPAM/CSDMC2010_SPAM/SPAMTrain.label | xargs node spamfilter.js -s
sed -nr 's#^1 (.*)#training/\1#p' CSDMC2010_SPAM/CSDMC2010_SPAM/SPAMTrain.label | xargs node spamfilter.js -h
SPAMTrain.label is a file contains training files' name and label (0 for spam and 1 for non-spam), after some searching, I know that sed command is used to replace text and xargs is used to bash.
So I think these commands will figure out which TRAINING files are spam and which are non-spam.
-s, -h are arguments I pass in to run the spamfilter.js
Is there any way I can run these 2 commands on PowerShell, or rewrite it to fit PowerShell command line?
You can use select-string in the powershell for sed in linux
and Invoke-Expression for xargs
Related
I'm running this command in shell linux, but it does not give me the output that i expected.
find /targetpath/ -type d | grep -i "script 1\|script 2\|script 3" | grep -v "del"
how can avoid the output line with the folder-container before the lines with the scripts conditioned with the grep command.
i search:
Delete the based-folders name in the output of the command.
Regards
I'm developping a script to search for patterns within scripts executed from CRON on a bunch of remote servers through SSH.
Script on client machine -- SSH --> Remote Servers CRON/Scripts
For now I can't get the correct output.
Script on client machine
#!/bin/bash
server_list=( '172.x.x.x' '172.x.x.y' '172.x.x.z' )
for s in ${server_list[#]}; do
ssh -i /home/user/.ssh/my_key.rsa user#${s} crontab -l | grep -v '^#\|^[[:space:]]*$' | cut -d ' ' -f 6- | awk '{print $1}' | grep -v '^$\|^echo\|^find\|^PATH\|^/usr/bin\|^/bin/' | xargs -0 grep -in 'server.tld\|10.x.x.x'
done
This only gives me the paths of scripts from crontab, not the matched lines and line number plus the first line is prefixed with "grep:" keyword (example below):
grep: /opt/directory/script1.sh
/opt/directory/script2.sh
/opt/directory/script3.sh
/opt/directory/script4.sh
How to get proper output, meaning the script path plus line number plus line of matching pattern?
Remote CRON examples
OO 6 * * * /opt/directory/script1.sh foo
30 6 * * * /opt/directory/script2.sh bar
Remote script content examples
1 ) This will match grep pattern
#!/bin/bash
ping -c 4 server.tld && echo "server.tld ($1)"
2 ) This won't match grep pattern
#!/bin/bash
ping -c 4 8.x.x.x && echo "8.x.x.x ($1)"
Without example input, it's really hard to see what your script is attempting to do. But the cron parsing could almost certainly be simplified tremendously by refactoring all of it into a single Awk script. Here is a quick stab, with obviously no way to test.
#!/bin/sh
# No longer using an array for no good reason, so /bin/sh will work
for s in 172.x.x.x 172.x.x.y 172.x.x.z; do
ssh -i /home/user/.ssh/my_key.rsa "user#${s}" crontab -l |
awk '! /^#|^[[:space:]]*$/ && $6 !~ /^$|^(echo|find|PATH|\/usr\/bin|\/bin\/)/ { print $6 }' |
# no -0; use grep -E and properly quote literal dot
xargs grep -Ein 'server\.tld|10.x.x.x'
done
Your command would not output null-delimited data to xargs so probably the immediate problem was that xargs -0 would receive all the file names as a single file name which obviously does not exist, and you forgot to include the ": file not found" from the end of the error message.
The use of grep -E is a minor hack to enable a more modern regex syntax which is more similar to that in Awk, where you don't have to backslash the "or" pipe etc.
This script, like your original, runs grep on the local system where you run the SSH script. If you want to run the commands on the remote server, you will need to refactor to put the entire pipeline in single quotes or a here document:
for s in 172.x.x.x 172.x.x.y 172.x.x.z; do
ssh -i /home/user/.ssh/my_key.rsa "user#${s}" <<\________HERE
crontab -l |
awk '! /^#|^[[:space:]]*$/ && $6 !~ /^$|^(echo|find|PATH|\/usr\/bin|\/bin\/)/ { print $6 }' |
xargs grep -Ein 'server\.tld|10.x.x.x'
________HERE
done
The refactored script contains enough complexities in the quoting that you probably don't want to pass it as an argument to ssh, which requires you to figure out how to quote strings both locally and remotely. It's easier then to pass it as standard input, which obviously just gets transmitted verbatim.
If you get "Pseudo-terminal will not be allocated because stdin is not a terminal.", try using ssh -t. Sometimes you need to add multiple -t options to completely get rid of this message.
I'm experiencing an issue calling xargs inside a bash script to parallelize the launch of a function.
I have this line:
grep -Ev '^#|^$' "$listOfTables" | xargs -d '\n' -l1 -I args -P"$parallels" bash -c "doSqoop 'args'"
that launches the function doSqoop that I previously exported.
I am passing to xargs and then to bash -c a single, very long line, containing fields that I split and handle inside the function.
It is something like schema|tab|dest|desttab|query|splits|.... that I read from a file, via the grep command above. I am fine with this solution, I know xargs can split the line on | but I'm ok this way.
It used to work well since I had to add another field at the end, which contains this kind of value:
field1='varchar(12)',field2='varchar(4)',field3='timestamp',....
Now I have this error:
bash: -c: line 0: syntax error near unexpected token '('
I tried to escape the pharhentesis and and single quotes, without success.
It appears to me that bash -c is interpreting the arguments
Use GNU parallel that can call exported functions, and also has an easier syntax and much more capabilities.
Your sample command should could be replaced with
grep -Ev '^#|^$' file | parallel doSqoop
Test with below script:
#!/bin/bash
doSqoop() {
printf "%s\n" "$#"
}
export -f doSqoop
grep -Ev '^#|^$' file | parallel doSqoop
You can also set the number of processes with the -P option, otherwise it matches the number of cores in your system:
grep -Ev '^#|^$' file | parallel -P "$num" doSqoop
The following command below does not succeed.
for i in {1..5} ; do cat /etc/fstab | egrep "(ext3|ext4|xfs)" | awk '{print $2}' | cut -d"/" -f1-$i ; done
It seems that $i is ignored completely. It always returns instead result of
cut -d"/" -f1-
Any idea why it fails?
Thanks in advance!
The command itself is a part of a script that should help me to auto re-arrange fstab lines to match the right mount order (like /test/subfolder must come after /test was mounted and not before).
I tried and it didn't work for zsh shell. BUT I tried it in bash and it does work, so if you are using zsh just run the command with bash and it should work ;)
i have a sample code like this:
CMD="svn up blablabla | grep -v .tgz"
echo $CMD | xargs -n -P ${PARALLEL:=20} -- bash -c
the purpose is to run svn update in parallel. However when encounter the conflicts, which should prompt out several selection for users to choose, it just passes without waiting for user input. And an error is shown:
Conflict discovered in 'blablabla'.
Select: (p) postpone, (df) diff-full, (e) edit,
(mc) mine-conflict, (tc) theirs-conflict,
(s) show all options: svn: Can't read stdin: End of file found
Is there any way to fix this?
Thanks
Yes, there is a way to fix this! See the answer to how to prompt a user from a script run with xargs. Long story short, use
xargs -a FILENAME your_script
or
xargs -a <(cat FILENAME) your_script
The first version actually reads lines from a file, and the second one fakes reading lines from a file, which is convenient for using xargs in pipe chains with awk or perl. Remember to use the -0 flag if you don't want to break on whitespace!
Another solution, which doesn't rely on Bash but on GNU's flavor of xargs, is to use the -o or --open-tty option:
echo $CMD | xargs -n -P ${PARALLEL:=20} --open-tty -- bash -c
From the manpage:
-o, --open-tty
Reopen stdin as /dev/tty in the child process before executing the command. This is use‐
ful if you want xargs to run an interactive application.