What I am trying to do is run a bash script that looks somewhat like this:
#!/usr/bin/bash
only1=$(comm -23 $1 $2 | wc -l)
only2=$(comm -13 $1 $2 | wc -l)
common=$(comm -12 $1 $2 | wc -l)
echo -e "${only1} only in $1"
echo -e "${only2} only in $2"
echo -e "${common} in both"
If I execute the script as script.sh file1 file2 it works fine. However, if I use it as script.sh <(grep 'foo' file1) <(grep 'foo' file2) it fails because the virtual files of the kind dev/fd/62 are only available for the first command (only1 in the script). The output is:
262 only in /dev/fd/63
0 only in /dev/fd/62
0 in both
Is there a way to make these virtual files available to all of the commands in the script?
The issue here is that the first invocation of comm will read to the end of both input files.
As you'd like to be able to provide pipes as the input (instead of a "real file), you'll need read the inputs once only, and then provide that as input to the subsequent commands... With pipes, as soon as data is read, it's gone and isn't coming back.
For example:
#!/bin/bash -eu
# cleanup temporary files on exit
trap 'rm ${TMP_FILE1:-} ${TMP_FILE2:-}' EXIT
TMP_FILE1=$(mktemp)
cat < $1 > $TMP_FILE1
TMP_FILE2=$(mktemp)
cat < $2 > $TMP_FILE2
only1=$(comm -23 $TMP_FILE1 $TMP_FILE2 | wc -l)
only2=$(comm -13 $TMP_FILE1 $TMP_FILE2 | wc -l)
common=$(comm -12 $TMP_FILE1 $TMP_FILE2 | wc -l)
echo -e "${only1} only in $1"
echo -e "${only2} only in $2"
echo -e "${common} in both"
If your files are small enough, then you can get away with reading them into variables:
#!/bin/bash -eu
FILE1=$( < $1 )
FILE2=$( < $2 )
only1=$(comm -23 <( echo "$FILE1" ) <( echo "$FILE2" ) | wc -l)
only2=$(comm -13 <( echo "$FILE1" ) <( echo "$FILE2" ) | wc -l)
common=$(comm -12 <( echo "$FILE1" ) <( echo "$FILE2" ) | wc -l)
echo -e "${only1} only in $1"
echo -e "${only2} only in $2"
echo -e "${common} in both"
Please also note that comm only works on sorted data... which means you probably want to use sort on the inputs, unless you are fully aware of the consequences of using unsorted inputs.
sort < $1 > $TMP_FILE1
FILE1=$( sort < $1 )
Related
This has probably been created before and better than mine. I have a directory where files are created for a few milliseconds before being removed. I researched and couldn't find what I was looking for so I made something to do it and added a few more features.
How it works:
you run the script, input the directory, and input the time you want it to run from 10 seconds to 6000 seconds (1 hour). It validates what you enter to make sure the directory is real and you don't exceed or go below that time. using sdiff -s it will compare the state of the directory when the script began to a new version of it ever 0.001 seconds. If there are changes it will tell you.
I wanted to share it since other may find it useful, and more importantly ask if you guys had improvements. I have been doing a lot of self-taught (mostly using stack exchange) bash scripting for almost a year and I really love it. I am always looking to improve me code. I am new to interactive scripts so if you guys have recommendations for input validation I'd love to hear it. I couldn't figure out how to get the "if" statements for time in seconds combined to check for anything less than 10 and greater than 6,000 despite trying a lot of things so I just made them separate. The "sed" portions are kind of wonky here and I didn't do a great job optimizing. I just worked on them until the output was what I wanted.
EDIT: I don't have inotify and I don't think I could get it on this locked down system.
#!/bin/bash
# Directory Check Script
# Created 13 Aug 2022
CLISESSID=$$
export CLISESSID
### DEFINE A LOCATION WHERE FILES CAN BE TEMPORARILY MADE ###
tmp=/tmp
temp1=$tmp/temp1.txt
temp2=$tmp/temp2.txt
echo "This script will check a directory to see if any files were added for the length of time you specify"
read -ep 'What is the full directory you would like to verify? ' dir
if [ ! -d "$dir" ] ; then
echo "Directory does not exist. Exiting."
exit
fi
read -ep '(This must be between 10-6000. i.e 5 minutes = 300, 10 minutes = 600, 1 hour = 6000)
How many seconds would you like to check for? ' seconds
if [[ "$seconds" -lt 10 ]] ; then
echo "Seconds must be between 10 and 6000"
exit
fi
if [[ "$seconds" -gt 6000 ]] ; then
echo "Seconds must be between 10 and 6000"
exit
fi
echo "checking $dir for $seconds seconds."
ls --full-time $dir | tail -n +2 > $temp1
SECONDS=0
echo "Checking for changes to $dir every 0.001 seconds for $seconds seconds."
until [[ $(ls --full-time $dir | tail -n +2) != $(cat "$temp1") ]] > /dev/null 2>&1
do
if (( SECONDS > $seconds ))
then
echo "Exceded defined time of $seconds seconds. Exiting."
exit 1
fi
sleep 0.001
done
ls --full-time $dir | tail -n +2 > $temp2
if [[ $(sdiff -w 400 -s $temp1 $temp2 | grep " |" | wc -l) -gt 0 ]] ; then
echo "
File has been modified in $dir:"
sdiff -w 400 -s $temp1 $temp2 | sed 's/|/\n/' | sed 's/^ *//g' | sed '1~ i Before:' | sed '3~ i After:' | sed 's/^ *//g' | sed -e 's/^[ \t]*//'
fi
if [[ $(sdiff -w 400 -s $temp1 $temp2 | grep " >" | wc -l) -gt 0 ]] ; then
echo "
File has been added to $dir:"
sdiff -w 400 -s /tmp/temp1.txt /tmp/temp2.txt | sed 's/>/\n/' | grep -v " |" | sed 's/^ *//g' | sed '1~ i Added file:' | sed 's/^ *//g' | sed -e 's/^[ \t]*//' | sed '/./!d'
fi
if [[ $(sdiff -w 400 -s $temp1 $temp2 | grep " <" | wc -l) -gt 0 ]] ; then
echo "
File has removed modified in $dir:"
sdiff -w 400 -s $temp1 /$temp2 | sed 's/</\n/' | grep -v " |" | sed 's/^ *//g' | sed '1~ i Removed file:' | sed 's/^ *//g' | sed -e 's/^[ \t]*//' | sed '/./!d' | sed 's/ *$//'
fi
rm -f $temp1 $temp2
I have 2 files
File 1 - IN.txt
08:43:22 IN 0xabc
08:43:31 IN 0xdef
08:54:45 IN 0xghi
08:54:45 IN 0xjkl
File 2 - OUT.txt
08:43:32 OUT 0xdef
08:54:45 OUT 0xghi
08:54:45 OUT 0xjkl
Basically I am troubleshooting a network issue, IN.txt is packets coming in, OUT.txt is packets going out and column 3 is the packet code so it should match for the packet in the same transaction.
I want to know all IN packets that do not have a matching OUT packet.
Desired output:
08:43:22 IN 0xabc
#!/bin/bash
IN=$(awk -F " " '{print $3}' in.txt)
OUT=$(awk -F " " '{print $3}' out.txt)
for i in $IN
do
flag=false
for o in $OUT
do
if [[ "$i" == "$o" ]]; then
flag=true
break
fi
done
if [[ $flag == false ]]; then
echo "Cannot find packet: $i in out"
fi
done
Result:
dingrui#gdcni:~/onie$ ./filter.sh
Cannot find packet: 0xabc in out
you can use a for.
for i in $(cat IN.txt| awk '{print $3}'); do grep -i $i OUT.txt | wc -l; done
Or more readable:
for i in $(cat IN.txt| awk '{print $3}'); do result=$(grep -i $i OUT.txt | wc -l);echo $i "|" $result; done
OUTPUT:
0xabc | 0
0xdef | 1
0xghi | 1
0xjkl | 1
NOTE: Only matches the packets, I didn't look at the time which doesn't seem important since you want to check packets
You can use fgrep for this:
$ cut -d' ' -f3 < OUT.txt > OUT.txt2
$ fgrep -v IN.txt -f OUT.txt2
08:43:22 IN 0xabc
My first parameter is the file that contains the given words and the rest are the other directories in which I'm searching for files, that contain at least 3 of the words from the 1st parameter
I can successfully print out the number of matching words, but when testing if it's greater then 3 it gives me the error: test: too many arguments
Here's my code:
#!/bin/bash
file=$1
shift 1
for i in $*
do
for j in `find $i`
do
if test -f "$j"
then
if test grep -o -w "`cat $file`" $j | wc -w -ge 3
then
echo $j
fi
fi
done
done
You first need to execute the grep | wc, and then compare that output with 3. You need to change your if statement for that. Since you are already using the backquotes, you cannot nest them, so you can use the other syntax $(command), which is equivalent to `command`:
if [ $(grep -o -w "`cat $file`" $j | wc -w) -ge 3 ]
then
echo $j
fi
I believe your problem is that you are trying to get the result of grep -o -w "cat $file" $j | wc -w to see if it's greater or equal to three, but your syntax is incorrect. Try this instead:
if test $(grep -o -w "`cat $file`" $j | wc -w) -ge 3
By putting the grep & wc commands inside the $(), the shell executes those commands and uses the output rather than the text of the commands themselves. Consider this:
> cat words
western
found
better
remember
> echo "cat words | wc -w"
cat words | wc -w
> echo $(cat words | wc -w)
4
> echo "cat words | wc -w gives you $(cat words | wc -w)"
cat words | wc -w gives you 4
>
Note that the $() syntax is equivalent to the double backtick notation you're already using for the cat $file command.
Hope this helps!
Your code can be refactored and corrected at few places.
Have it this way:
#!/bin/bash
input="$1"
shift
for dir; do
while IFS= read -r d '' file; do
if [[ $(grep -woFf "$input" "$file" | sort -u | wc -l) -ge 3 ]]; then
echo "$file"
fi
done < <(find "$dir" -type f -print0)
done
for dir loops through all the arguments
Use of sort -u is to remove duplicate words from output of grep.
Usewc -linstead ofwc -wsincegrep -o` prints matching words in separate lines.
find ... -print0 is to take care of file that may have whitespaces.
find ... -type f is to retrieve only files and avoid checking for -f later.
I have directory containing files:
$> ls blender/output/celebAnim/
0100.png 0107.png 0114.png 0121.png 0128.png 0135.png 0142.png 0149.png 0156.png 0163.png 0170.png 0177.png 0184.png 0191.png 0198.png 0205.png 0212.png 0219.png 0226.png 0233.png 0240.png 0247.png 0254.png 0261.png 0268.png 0275.png 0282.png
0101.png 0108.png 0115.png 0122.png 0129.png 0136.png 0143.png 0150.png 0157.png 0164.png 0171.png 0178.png 0185.png 0192.png 0199.png 0206.png 0213.png 0220.png 0227.png 0234.png 0241.png 0248.png 0255.png 0262.png 0269.png 0276.png 0283.png
0102.png 0109.png 0116.png 0123.png 0130.png 0137.png 0144.png 0151.png 0158.png 0165.png 0172.png 0179.png 0186.png 0193.png 0200.png 0207.png 0214.png 0221.png 0228.png 0235.png 0242.png 0249.png 0256.png 0263.png 0270.png 0277.png 0284.png
0103.png 0110.png 0117.png 0124.png 0131.png 0138.png 0145.png 0152.png 0159.png 0166.png 0173.png 0180.png 0187.png 0194.png 0201.png 0208.png 0215.png 0222.png 0229.png 0236.png 0243.png 0250.png 0257.png 0264.png 0271.png 0278.png
0104.png 0111.png 0118.png 0125.png 0132.png 0139.png 0146.png 0153.png 0160.png 0167.png 0174.png 0181.png 0188.png 0195.png 0202.png 0209.png 0216.png 0223.png 0230.png 0237.png 0244.png 0251.png 0258.png 0265.png 0272.png 0279.png
0105.png 0112.png 0119.png 0126.png 0133.png 0140.png 0147.png 0154.png 0161.png 0168.png 0175.png 0182.png 0189.png 0196.png 0203.png 0210.png 0217.png 0224.png 0231.png 0238.png 0245.png 0252.png 0259.png 0266.png 0273.png 0280.png
0106.png 0113.png 0120.png 0127.png 0134.png 0141.png 0148.png 0155.png 0162.png 0169.png 0176.png 0183.png 0190.png 0197.png 0204.png 0211.png 0218.png 0225.png 0232.png 0239.png 0246.png 0253.png 0260.png 0267.png 0274.png 0281.png
For some script, I will need to find out what the number of the first missing file is. In the above output, it would be 0285.png. However, it is also possible that files in between are missing. In the end, I am only interested in the number 285, which is part of the file name.
This is part of recovery logic: The files should be created by the script, but this step can fail. Therefore I want to have a means to check which files are missing and try to create them in a second step.
This is what I got so far (from how to extract part of a filename before '.' or before extension):
ls blender/output/celebAnim/ | awk -F'[.]' '{print $1}'
What I cannot figure out is how do I find the smallest number missing from that result, above a certain offset? The offset in this case is 100.
You could loop over all number from 100 to 500 and check if the corresponding file exists; if it doesn't, you'd print the number you're looking at:
for i in {100..500}; do
[[ ! -f 0$i.png ]] && { echo "$i missing!"; break; }
done
This prints, for your example, 285 missing!.
This solution could be made a bit more flexible by, for example, looping over zero padded numbers and then extracting the unpadded number:
for i in {0100..0500}; do
[[ ! -f $i.png ]] && { echo "${i##*(0)} missing!"; break; }
done
This requires extended globs (shopt -s extglob) for the *(0) pattern ("zero or more repetitions of 0").
begin=100
end=500
for i in `seq $begin 1 $end`; do
fname="0"$i".png"
if [ ! -f $fname ]; then
echo "$fname is missing"
fi
done
#!/bin/sh
search_dir=blender/output/celebAnim/
ls $search_dir > file_list
count=`wc -l file_list | awk '{ print $1 }'`
if [[ $count -eq 0 ]]
then
echo "No files in given directory!"
break
fi
file_extension=`head -1 file_list | tail -1 | awk -F "." '{ print $2 }'`
init_file_value=`head -1 file_list | tail -1 | awk -F "." '{ print $1 }'`
i=2
while [ $i -le $count ]
do
next_file_value=`head -$i file_list | tail -1 | awk -F "." '{ print $1 }'`
next_value=$((init_file_value+1));
if [ $next_file_value -ne $next_value ]
then
echo $next_value"."$file_extension
break
fi
init_file_value=$next_value;
i=$((i+1));
done
try it:
ls blender/output/celebAnim/ | sort -r | head -n1 | awk -F'.' '{print $1+1}'
command return 285
if need return 0285 than try it:
ls blender/output/celebAnim/ | sort -r | head -n1 | awk -F'.' '{print 0($1+1)}'
For example let's say I want to count the number of lines of 10 BIG files and print a total.
for f in files
do
#this creates a background process for each file
wc -l $f | awk '{print $1}' &
done
I was trying something like:
for f in files
do
#this does not work :/
n=$( expr $(wc -l $f | awk '{print $1}') + $n ) &
done
echo $n
I finally found a working solution using anonymous pipes and bash:
#!/bin/bash
# this executes a separate shell and opens a new pipe, where the
# reading endpoint is fd 3 in our shell and the writing endpoint
# stdout of the other process. Note that you don't need the
# background operator (&) as exec starts a completely independent process.
exec 3< <(./a.sh 2&1)
# ... do other stuff
# write the contents of the pipe to a variable. If the other process
# hasn't already terminated, cat will block.
output=$(cat <&3)
You should probably use gnu parallel:
find . -maxdepth 1 -type f | parallel --gnu 'wc -l' | awk 'BEGIN {n=0} {n += $1} END {print n}'
or else xargs in parallel mode:
find . -maxdepth 1 -type f | xargs -n1 -P4 wc -l | awk 'BEGIN {n=0} {n += $1} END {print n}'
Another option, if this doesn't fit your needs, is to write to temp files. If you don't want to write to disk, just write to /dev/shm. This is a ramdisk on most Linux systems.
#!/bin/bash
declare -a temp_files
count=0
for f in *
do
if [[ -f "$f" ]]; then
temp_files[$count]="$(mktemp /dev/shm/${f}-XXXXXX)"
((count++))
fi
done
count=0
for f in *
do
if [[ -f "$f" ]]; then
cat "$f" | wc -l > "${temp_files[$count]}" &
((count++))
fi
done
wait
cat "${temp_files[#]}" | awk 'BEGIN {n=0} {n += $1} END {print n}'
for tf in "${temp_files[#]}"
do
rm "$tf"
done
By the way, this can be though of as a map-reduce with wc doing the mapping and awk doing the reduction.
You could write that to a file or better, listen to a fifo as soon as data arrives.
Here is a small example on how they work:
# create the fifo
mkfifo test
# listen to it
while true; do if read line <test; then echo $line; fi done
# in another shell
echo 'hi there'
# notice 'hi there' being printed in the first shell
So you could
for f in files
do
#this creates a background process for each file
wc -l $f | awk '{print $1}' > fifo &
done
and listen on the fifo for sizes.