Error while comparing in shell - linux

I am trying to search a pattern(trailer) and if it occures more than once in a file, I need those filenames displayed
for f in *.txt
do
if((tail -n 1 $f | grep '[9][9][9]*' | wc -l) -ge 2);
then
echo " The file $f has more than one trailer"
fi
done

Your most crying syntax error is that -ge is an operator for the [ … ] or [[ … ]] conditional construct. It doesn't have a chance the way you wrote the program. -ge needs a number on both sides, and what you have on the left is a command. You probably meant to have the output of the command, which would need the command substitution syntax: $(…). That's
if [ $(tail -n 1 $f | grep '[9][9][9]*' | wc -l) -ge 2 ]; then
This is syntactically correct but will never match. tail -n 1 $f outputs exactly one line (unless the file is empty), so grep sees at most one line, so wc -l prints either 0 or 1.
If you want to search the pattern on more than one line, change your tail invocation. While you're at it, you can change grep … | wc -l to grep -c; both do exactly the same thing, which is to count matching lines. For example, to search in the last 42 lines:
if [ $(tail -n 42 -- "$f" | grep -c '[9][9][9]*') -ge 2 ]; then
If you want to search for two matches on the last lines, that's different. grep won't help because it determines whether each line matches or not, it doesn't look for multiple matches per line. If you want to look for multiple non-overlapping matches on the last line, repeat the pattern, allowing arbitrary text in between. You're testing if the pattern is present or not, so you only need to test the return status of grep, you don't need its output (hence the -q option).
if tail -n 1 -- "$f" | grep -q '[9][9][9]*.*[9][9][9]*'; then
I changed the tail invocations to add -- in case a file name begins with - (otherwise, tail would interpret it as an option) and to have double quotes around the file name (in case it contains whitespace or \[*?). These are good habits to get into. Always put double quotes around variable substitutions "$foo" and command substitutions "$(foo)" unless you know that the substitution will result in a whitespace-separated list of glob patterns.

tail -n 1 $f will produce (at most) one line of output, which is fed to grep, which can then produce by definition at most one line of output, which means that the output of wc will never be more than 1, and will especially never be greater than 2. Aside from the syntax issues mentioned in other comments/answers, I think this logic is probably one of the core problems.

Related

Bash script that counts and prints out the files that start with a specific letter

How do i print out all the files of the current directory that start with the letter "k" ?Also needs to count this files.
I tried some methods but i only got errors or wrong outputs. Really stuck on this as a newbie in bash.
Try this Shellcheck-clean pure POSIX shell code:
count=0
for file in k*; do
if [ -f "$file" ]; then
printf '%s\n' "$file"
count=$((count+1))
fi
done
printf 'count=%d\n' "$count"
It works correctly (just prints count=0) when run in a directory that contains nothing starting with 'k'.
It doesn't count directories or other non-files (e.g. fifos).
It counts symlinks to files, but not broken symlinks or symlinks to non-files.
It works with 'bash' and 'dash', and should work with any POSIX-compliant shell.
Here is a pure Bash solution.
files=(k*)
printf "%s\n" "${files[#]}"
echo "${#files[#]} files total"
The shell expands the wildcard k* into the array, thus populating it with a list of matching files. We then print out the array's elements, and their count.
The use of an array avoids the various problems with metacharacters in file names (see e.g. https://mywiki.wooledge.org/BashFAQ/020), though the syntax is slightly hard on the eyes.
As remarked by pjh, this will include any matching directories in the count, and fail in odd ways if there are no matches (unless you set nullglob to true). If avoiding directories is important, you basically have to get the directories into a separate array and exclude those.
To repeat what Dominique also said, avoid parsing ls output.
Demo of this and various other candidate solutions:
https://ideone.com/XxwTxB
To start with: never parse the output of the ls command, but use find instead.
As find basically goes through all subdirectories, you might need to limit that, using the -maxdepth switch, use value 1.
In order to count a number of results, you just count the number of lines in your output (in case your output is shown as one piece of output per line, which is the case of the find command). Counting a number of lines is done using the wc -l command.
So, this comes down to the following command:
find ./ -maxdepth 1 -type f -name "k*" | wc -l
Have fun!
This should work as well:
VAR="k"
COUNT=$(ls -p ${VAR}* | grep -v ":" | wc -w)
echo -e "Total number of files: ${COUNT}\n" 1>&2
echo -e "Files,that begin with ${VAR} are:\n$(ls -p ${VAR}* | grep -v ":" )" 1>&2

How to escape square brackets in a ls output

I'm experiencing some problems to escape square brackets in any file name.
I need to compare two list. The ls output is the first list and the second is the ARQ02.
#!/bin/bash
exec 3< <(ls /home/lint)
while read arq <&3; do
var=`grep -e "$arq" ARQ02`
if [ "$?" -ne 0 ] ; then
echo "$arq" >> result
fi
done
exec 3<&-
Sorry for my bad english.
Your immediate problem is that you must instruct grep to interpret the search term as a literal rather than a regular expression, using the -F option:
var=$(grep -Fe "$arq" ARQ02)
That way, any regex metacharacters that happen to be in the output from ls /home/lint - such as [ and ] - will still be treated as literals and won't break the grep invocation.
That said, it looks like your command could be streamlined, such as by using the output from ls /home/lint directly as the set of search strings to pass to grep at once, using the -f option:
grep -Ff <(ls /home/lint) ARQ02 > result
<(...) is a so-called process substitution, which, simply put, presents the output from a command as if it were a (temporary) file, which is what -f expects: a file containing the search terms for grep.
Alternatively, if:
the lines of ARQ02 contain only filenames that fully match (some of) the filenames in the output from ls /home/lint, and
you don't mind sorting or want to sort the matches stored in result,
consider HuStmpHrrr's helpful answer.
i have to assume my interpretation is correct. based on that, i can raise a oneliner easily solve your solution. there are 2 assumption i need to make here: your file name doesn't contain carriage return and you are using modern bash:
comm -23 <(printf "%s\n" * | sort) <(sort ARQ02)
in bash <() emits a subshell and pipe the stdout as a file. comm is the command to compute difference of 2 input stream.
to explain in details,
comm
-23 # suppress files unique in ARQ02 and files in common
<(printf "%s\n" * | # print all the files in local folder with new line breaker
sort) # sort them
<(sort ARQ02)
it's necessary to sort as comm only compare incrementally.

Operating on multiple results from find command in bash

Hi I'm a novice linux user. I'm trying to use the find command in bash to search through a given directory, each containing multiple files of the same name but with varying content, to find a maximum value within the files.
Initially I wasn't taking the directory as input and knew the file wouldn't be less than 2 directories deep so I was using nested loops as follows:
prev_value=0
for i in <directory_name> ; do
if [ -d "$i" ]; then
cd $i
for j in "$i"/* ; do
if [ -d "$j" ]; then
cd $j
curr_value=`grep "<keyword>" <filename>.txt | cut -c32-33` #gets value I'm comparing
if [ $curr_value -lt $prev_value ]; then
curr_value=$prev_value
else
prev_value=$curr_value
fi
fi
done
fi
done
echo $prev_value
Obviously that's not going to cut it now. I've looked into the -exec option of find but since find is producing a vast amount of results I'm just not sure how to handle the variable assignment and comparisons. Any help would be appreciated, thanks.
find "${DIRECTORY}" -name "${FILENAME}.txt" -print0 | xargs -0 -L 1 grep "${KEYWORD}" | cut -c32-33 | sort -nr | head -n1
We find the filenames that are named FILENAME.txt (FILENAME is a bash variable) that exist under DIRECTORY.
We print them all out, separated by nulls (this avoids any problems with certain characters in directory or file names).
Then we read them all in again using xargs, and pass the null-separated (-0) values as arguments to grep, launching one grep for each filename (-L 1 - let's be POSIX-compliant here). (I do that to avoid grep printing the filenames, which would screw up cut).
Then we sort all the results, numerically (-n), in descending order (-r).
Finally, we take the first line (head -n1) of the sorted numbers - which will be the maximum.
P.S. If you have 4 CPU cores you can try adding the -P 4 option to xargs to try to make the grep part of it run faster.

Error with a script in bash

I have a little error with a script I wrote in bash and I can't figure out what's I'm doing wrong
note that I'm using this script for thousands of calculations and this error happened only a few times (like 20 or so), but it still happened
What the script does is this: basically it takes in input a web page that I got from a site with the utility w3m and it counts all the occurrences of the words in it... After it orders them from the most common to the ones that occur only once
this is the code:
#!/bin/bash
# counts the numbers of words from specific sites #
# writes in a file the occurrences ordered from the most common #
touch check # file used to analyze the occurrences
touch distribution # final file ordered
page=$1 # the web page that needs to be analyzed
occurrences=$2 # temporary file for the occurrences
dictionary=$3 # dictionary used for another purpose (ignore this)
# write the words one by column
cat $page | tr -c [:alnum:] "\n" | sed '/^$/d' > check
# lopp to analyze the words
cat check | while read words
do
word=${words}
strlen=${#word}
# ignores blacklisted words or small ones
if ! grep -Fxq $word .blacklist && [ $strlen -gt 2 ]
then
# if the word isn't in the file
if [ `egrep -c -i "^$word: " $occurrences` -eq 0 ]
then
echo "$word: 1" | cat >> $occurrences
# else if it is already in the file, it calculates the occurrences
else
old=`awk -v words=$word -F": " '$1==words { print $2 }' $occurrences`
### HERE IS THE ERROR, EITHER THE LET OR THE SED ###
let "new=old+1"
sed -i "s/^$word: $old$/$word: $new/g" $occurrences
fi
fi
done
# orders the words
awk -F": " '{print $2" "$1}' $occurrences | sort -rn | awk -F" " '{print $2": "$1}' > distribution
# ignore this, not important
grep -w "1" distribution | awk -F ":" '{print $1}' > temp_dictionary
for line in `cat temp_dictionary`
do
if ! grep -Fxq $line $dictionary
then
echo $line >> $dictionary
fi
done
rm check
rm temp_dictionary
this is the error: (I'm translating it, so it could be different in english)
./wordOccurrences line:30 let:x // where x is a number, usually 9 or 10 (but also 11, 13, etc)
1: syntax error in the espression (the error token is 1)
sed: expression -e #1, character y: command 's' not terminated // where y is another number (this one is also usually 9 or 10) with y being different from x
EDIT:
Talking with kev it looks like it's a newline problem
I added an echo between let and sed to print the sed and it worked perfectly for like 5 to 10 minutes until that error. Usually the sed without error looked like this:
s/^CONSULENTI: 6$/CONSULENTI: 7/g
but when I got the error it was like this:
s/^00145: 1
1$/00145: 4/g
how to fix this?
If you get a new line in $old, it means awk prints two lines so there is a duplicate in $occurences.
The script seems complicated to count words, and not efficient because it launches many processes and process file in a loop ;
maybe you can do something similar with
sort | uniq -c
You should also consider that your case-insensitivity is not consistent throughout the program. I created a page with just "foooo" in it and ran the program, then created one with "Foooo" in it and ran the program again. The 'old=`awk...' line sets 'old' to the empty string because awk is matching case sensitively. This results in the occurrences file not being updated. The subsequent sed and possibly some of the greps are also case sensitive.
This may not be the only error since it doesn't explain the error message you saw, but it is an indication that the same word with different capitalization will be handled erroneously by your script.
The following would separate the words, lowercase them, and then remove the ones smaller than three characters:
tr -cs '[:alnum:]' '\n' <foo | tr '[:upper:]' '[:lower:]' | egrep -v '^.{0,2}$'
Using this at the front of your script would mean that the rest of the script would not have to be case insensitive to be correct.

Quick unix command to display specific lines in the middle of a file?

Trying to debug an issue with a server and my only log file is a 20GB log file (with no timestamps even! Why do people use System.out.println() as logging? In production?!)
Using grep, I've found an area of the file that I'd like to take a look at, line 347340107.
Other than doing something like
head -<$LINENUM + 10> filename | tail -20
... which would require head to read through the first 347 million lines of the log file, is there a quick and easy command that would dump lines 347340100 - 347340200 (for example) to the console?
update I totally forgot that grep can print the context around a match ... this works well. Thanks!
I found two other solutions if you know the line number but nothing else (no grep possible):
Assuming you need lines 20 to 40,
sed -n '20,40p;41q' file_name
or
awk 'FNR>=20 && FNR<=40' file_name
When using sed it is more efficient to quit processing after having printed the last line than continue processing until the end of the file. This is especially important in the case of large files and printing lines at the beginning. In order to do so, the sed command above introduces the instruction 41q in order to stop processing after line 41 because in the example we are interested in lines 20-40 only. You will need to change the 41 to whatever the last line you are interested in is, plus one.
# print line number 52
sed -n '52p' # method 1
sed '52!d' # method 2
sed '52q;d' # method 3, efficient on large files
method 3 efficient on large files
fastest way to display specific lines
with GNU-grep you could just say
grep --context=10 ...
No there isn't, files are not line-addressable.
There is no constant-time way to find the start of line n in a text file. You must stream through the file and count newlines.
Use the simplest/fastest tool you have to do the job. To me, using head makes much more sense than grep, since the latter is way more complicated. I'm not saying "grep is slow", it really isn't, but I would be surprised if it's faster than head for this case. That'd be a bug in head, basically.
What about:
tail -n +347340107 filename | head -n 100
I didn't test it, but I think that would work.
I prefer just going into less and
typing 50% to goto halfway the file,
43210G to go to line 43210
:43210 to do the same
and stuff like that.
Even better: hit v to start editing (in vim, of course!), at that location. Now, note that vim has the same key bindings!
You can use the ex command, a standard Unix editor (part of Vim now), e.g.
display a single line (e.g. 2nd one):
ex +2p -scq file.txt
corresponding sed syntax: sed -n '2p' file.txt
range of lines (e.g. 2-5 lines):
ex +2,5p -scq file.txt
sed syntax: sed -n '2,5p' file.txt
from the given line till the end (e.g. 5th to the end of the file):
ex +5,p -scq file.txt
sed syntax: sed -n '2,$p' file.txt
multiple line ranges (e.g. 2-4 and 6-8 lines):
ex +2,4p +6,8p -scq file.txt
sed syntax: sed -n '2,4p;6,8p' file.txt
Above commands can be tested with the following test file:
seq 1 20 > file.txt
Explanation:
+ or -c followed by the command - execute the (vi/vim) command after file has been read,
-s - silent mode, also uses current terminal as a default output,
q followed by -c is the command to quit editor (add ! to do force quit, e.g. -scq!).
I'd first split the file into few smaller ones like this
$ split --lines=50000 /path/to/large/file /path/to/output/file/prefix
and then grep on the resulting files.
If your line number is 100 to read
head -100 filename | tail -1
Get ack
Ubuntu/Debian install:
$ sudo apt-get install ack-grep
Then run:
$ ack --lines=$START-$END filename
Example:
$ ack --lines=10-20 filename
From $ man ack:
--lines=NUM
Only print line NUM of each file. Multiple lines can be given with multiple --lines options or as a comma separated list (--lines=3,5,7). --lines=4-7 also works.
The lines are always output in ascending order, no matter the order given on the command line.
sed will need to read the data too to count the lines.
The only way a shortcut would be possible would there to be context/order in the file to operate on. For example if there were log lines prepended with a fixed width time/date etc.
you could use the look unix utility to binary search through the files for particular dates/times
Use
x=`cat -n <file> | grep <match> | awk '{print $1}'`
Here you will get the line number where the match occurred.
Now you can use the following command to print 100 lines
awk -v var="$x" 'NR>=var && NR<=var+100{print}' <file>
or you can use "sed" as well
sed -n "${x},${x+100}p" <file>
With sed -e '1,N d; M q' you'll print lines N+1 through M. This is probably a bit better then grep -C as it doesn't try to match lines to a pattern.
Building on Sklivvz' answer, here's a nice function one can put in a .bash_aliases file. It is efficient on huge files when printing stuff from the front of the file.
function middle()
{
startidx=$1
len=$2
endidx=$(($startidx+$len))
filename=$3
awk "FNR>=${startidx} && FNR<=${endidx} { print NR\" \"\$0 }; FNR>${endidx} { print \"END HERE\"; exit }" $filename
}
To display a line from a <textfile> by its <line#>, just do this:
perl -wne 'print if $. == <line#>' <textfile>
If you want a more powerful way to show a range of lines with regular expressions -- I won't say why grep is a bad idea for doing this, it should be fairly obvious -- this simple expression will show you your range in a single pass which is what you want when dealing with ~20GB text files:
perl -wne 'print if m/<regex1>/ .. m/<regex2>/' <filename>
(tip: if your regex has / in it, use something like m!<regex>! instead)
This would print out <filename> starting with the line that matches <regex1> up until (and including) the line that matches <regex2>.
It doesn't take a wizard to see how a few tweaks can make it even more powerful.
Last thing: perl, since it is a mature language, has many hidden enhancements to favor speed and performance. With this in mind, it makes it the obvious choice for such an operation since it was originally developed for handling large log files, text, databases, etc.
print line 5
sed -n '5p' file.txt
sed '5q' file.txt
print everything else than line 5
`sed '5d' file.txt
and my creation using google
#!/bin/bash
#removeline.sh
#remove deleting it comes move line xD
usage() { # Function: Print a help message.
echo "Usage: $0 -l LINENUMBER -i INPUTFILE [ -o OUTPUTFILE ]"
echo "line is removed from INPUTFILE"
echo "line is appended to OUTPUTFILE"
}
exit_abnormal() { # Function: Exit with error.
usage
exit 1
}
while getopts l:i:o:b flag
do
case "${flag}" in
l) line=${OPTARG};;
i) input=${OPTARG};;
o) output=${OPTARG};;
esac
done
if [ -f tmp ]; then
echo "Temp file:tmp exist. delete it yourself :)"
exit
fi
if [ -f "$input" ]; then
re_isanum='^[0-9]+$'
if ! [[ $line =~ $re_isanum ]] ; then
echo "Error: LINENUMBER must be a positive, whole number."
exit 1
elif [ $line -eq "0" ]; then
echo "Error: LINENUMBER must be greater than zero."
exit_abnormal
fi
if [ ! -z $output ]; then
sed -n "${line}p" $input >> $output
fi
if [ ! -z $input ]; then
# remove this sed command and this comes move line to other file
sed "${line}d" $input > tmp && cp tmp $input
fi
fi
if [ -f tmp ]; then
rm tmp
fi
You could try this command:
egrep -n "*" <filename> | egrep "<line number>"
Easy with perl! If you want to get line 1, 3 and 5 from a file, say /etc/passwd:
perl -e 'while(<>){if(++$l~~[1,3,5]){print}}' < /etc/passwd
I am surprised only one other answer (by Ramana Reddy) suggested to add line numbers to the output. The following searches for the required line number and colours the output.
file=FILE
lineno=LINENO
wb="107"; bf="30;1"; rb="101"; yb="103"
cat -n ${file} | { GREP_COLORS="se=${wb};${bf}:cx=${wb};${bf}:ms=${rb};${bf}:sl=${yb};${bf}" grep --color -C 10 "^[[:space:]]\\+${lineno}[[:space:]]"; }

Resources