Space characters in arguments are not handled right in Bash script - linux

I have the following Bash script,
#!/bin/bash
if [ $# -eq 0 ]
then
echo "Error: No arguments supplied. Please provide two files."
exit 100
elif [ $# -lt 2 ]
then
echo "Error: You provided only one file. Please provide exactly two files."
exit 101
elif [ $# -gt 2 ]
then
echo "Error: You provided more than two files. Please provide exactly two files."
exit 102
else
file1="$1"
file2="$2"
fi
if [ $(wc -l "$file1" | awk -F' ' '{print $1}') -ne $(wc -l "$file2" | awk -F' ' '{print $1}') ]
then
echo "Error: Files $file1 and $file2 should have had the same number of entries."
exit 200
else
entriesNum=$(wc -l "$file1" | awk -F' ' '{print $1}')
fi
for entry in $(seq $entriesNum)
do
path1=$(head -n$entry "$file1" | tail -n1)
path2=$(head -n$entry "$file2" | tail -n1)
diff "$path1" "$path2"
if [ $? -ne 0 ]
then
echo "Error: $path1 and $path2 do not much."
exit 300
fi
done
echo "All files in both file lists match 100%."
done
which I execute giving two file paths as arguments:
./compare2files.sh /path/to/my\ first\ file\ list.txt /path/to/my\ second\ file\ list.txt
As you can see, the names of the above two files contain spaces, and every file itself contain a list of other file paths, which I want to compare line by line, e.g the first line of the one file with the first of the other, the second with the second, and so on.
The paths listed in the above two files contain spaces too, but I have escaped them using backslaces. For example, file /Volumes/WD/backup photos temp/myPhoto.jpg is turned to /Volumes/WD/backup\ photos\ temp/myPhoto.jpg.
The problem is that script fails at diff command:
diff: /Volumes/WD/backup\ photos\ temp/myPhoto.jpg: No such file or directory
diff: /Volumes/WD/backup\ photos\ 2022/IMG_20220326_1000.jpg: No such file or directory
Error: /Volumes/WD/backup\ photos\ temp/myPhoto.jpg and /Volumes/WD/backup\ photos\ 2022/IMG_20220326_1000.jpg do not much.
When I modify the diff code like diff $path1 $path2 (without double quotes), I get another kind of error:
diff: extra operand \`temp\'
diff: Try `diff --help' for more information
Error: /Volumes/WD/backup\ photos\ temp/myPhoto.jpg and /Volumes/WD/backup\ photos\ 2022/IMG_20220326_1000.jpg do not much.
Apparently the files exist and the paths are valid, but the spaces in paths' names are not handled right. What is wrong with my code and how can be fixed (apart from renaming directories and files)?

The title is incorrect: Spaces in your script's arguments are handled correctly. The backslash/space sequences in your input files (as returned in the stdout from head -n 1), by contrast, are not processed as you expect.
Your input files should not contain literal backslashes. Backslashes are only meaningful when they're part of your script itself, parsed as syntax; not when they're part of your data. (That is to say, the string hello\ world in your script or in the command-line arguments given to the shell that calls your script becomes a single string hello world in memory; the backslash guides its parsing, but is not part of the value itself).
Command substitution results do not go through this parsing phase, so the output from head is stored in path1 and path2 exactly as it is (other than removal of the final trailing newline), without backslashes being removed.
If you must process input that contains quote and escape characters, xargs or the Python shlex module can be used to split that input into an array, as demonstrated in Reading quoted/escaped arguments correctly from a string.

Related

bash/awk/unix detect changes in lines of csv files

I have a timestamp in this format:
(normal_file.csv)
timestamp
19/02/2002
19/02/2002
19/02/2002
19/02/2002
19/02/2002
19/02/2002
The dates are usually uniform, however, there are files with irregular dates pattern such as this example:
(abnormal_file.csv)
timestamp
19/02/2002
19/02/2003
19/02/2005
19/02/2006
In my directory, there are hundreds of files that consist of normal.csv and abnormal.csv.
I want to write a bash or awk script that detect the dates pattern in all files of a directory. Files with abnormal.csv should be moved automatically to a new, separate directory (let's say dir_different/).
Currently, I have tried the following:
#!/bin/bash
mkdir dir_different
for FILE in *.csv;
do
# pipe 1: detect the changes in the line
# pipe 2: print the timestamp column (first column, columns are comma-separated)
awk '$1 != prev {print ; prev = $1}' < $FILE | awk -F , '{print $1}'
done
If the timestamp in a given file is normal, then only one single timestamp will be printed; but for abnormal files, multiple dates will be printed.
I am not sure how to separate the abnormal files from the normal files, and I have tried the following:
do
output=$(awk 'FNR==3{print $0}' $FILE)
echo ${output}
if [[ ${output} =~ ([[:space:]]) ]]
then
mv $FILE dir_different/
fi
done
Or is there an easier method to detect changes in lines and separate files that have different lines? Thank you for any suggestions :)
Assuming that none of your "normal" CSV files have trailing newlines this should do the separation just fine:
#!/bin/bash
mkdir -p dir_different
for FILE in *.csv;
do
if awk '{a[$1]++}END{if(length(a)<=2){exit 1}}' "$FILE" ; then
echo mv "$FILE" dir_different
fi
done
After a dry-run just get rid of the echo :)
Edit:
{a[$1]++} This bit creates an array a that gets the first field of each line as an index, and that gets incremented every time the same value is seen.
END{if(length(a)<=2){exit 1}} This checks how many elements are in the array. If there there are less than 3 (which should be the case if there's always the same date and we only get 1 header, 1 date) exit the processing with 1.
"$FILE" is part of the bash script, not awk, and I quoted your variable out of habit, should you ever have files w/ spaces in their names you'll see why :)
So, a "normal" file contains only two different lines:
timestamp
dd/mm/yyyy
Testing if a file is normal is thus as simple as:
[ $(sort -u file.csv | wc -l) -eq 2 ]
This leads to the following possible solution:
#!/usr/bin/env bash
mkdir -p dir_different
for FILE in *.csv;
do
if [ $(sort -u "$FILE" | wc -l) -ne 2 ] ; then
echo mv "$FILE" dir_different
fi
done

Can I get the name of the file currently being read in a for loop?

I want to write a script that takes a word as an argument and searches the current and sub directories' files for the word. if it is found in any of the files it should echo out a message containing the file name and the line the word is found on.
this is what I have so far, but I can't find a way to actually store the file name of the file being read or the line number..
word=$1
for var in $(grep -R "$word *")
do
filename=$(find . -type f -name "*") ------- //this doesnt work
linenmbr=$(grep -n "$ord" file) ----------- //this doesnt work
echo found $word in $filename on line number $linenmbr
done
In bash, any time you are looping, you want to avoid calling utilities (e.g. grep and find) within the loop. That is horribly inefficient because it will spawn a separate subshell for every utility every iteration. (which for 10 iterations -- that is 20 additional subshells, it adds up quick) So in your case, you call grep to feed the loop, and then spawn a separate subshell calling grep again within the loop as well as spawning a separate subshell for find.
You should think of a way to only call grep (or a utility that will provide the needed information) only once, and then parse the output.
If you did want to use grep, then calling grep -rn within a process substitution which is used to feed a while loop is probably as good as you are going to get. You can then use the bash builtin parameter expansions to isolate the filename and line-numbers which will be about as efficient as bash could get, e.g.
#!/bin/bash
[ -z "$1" ] && { ## validate at least 1 input given
printf "error: insufficient input.\nusage: %s srch_term\n" "${0##*/}"
exit 1
}
while read -r line; do ## read each line of grep output
fn="${line%%:*}" ## isolate filename
no="${line#*:}" ## remove filename
no="${no%%:*}" ## isolate number
printf "found %s in %s on line number %d\n" "$1" "$fn" "$no"
done < <(grep -rn "$1") ## grep in process substitution
Choosing A More Efficient Method
If you can accomplish what you are attempting with one of the stream editing tools, e.g. awk or sed, you are likely to be able to isolate the wanted information an order of magnitude faster. For example, using awk and setting globstar you could do something similar to the following:
#!/bin/bash
shopt -s globstar ## set globstar
[ -z "$1" ] && { ## validate at least 1 input given
printf "error: insufficient input.\nusage: %s srch_term\n" "${0##*/}"
exit 1
}
## find all matching files and line numbers
awk -v word="$1" '/'$1'/ {
print "found",word,"in",FILENAME,"on line number",FNR; next
}' **/* 2>/dev/null
Give both a try and let me know if you have further questions.
If you want to compare and ensure both are producing the same output, you can use diff to confirm, e.g.
$ diff <(grepscript.sh | sort) <(awkscript.sh | sort)
(if no difference is reported, the output is the same)

Linux : check if something is a file [ -f not working ]

I am currently trying to list the size of all files in a directory which is passed as the first argument to the script, but the -f option in Linux is not working, or am I missing something.
Here is the code :
for tmp in "$1/*"
do
echo $tmp
if [ -f "$tmp" ]
then num=`ls -l $tmp | cut -d " " -f5`
echo $num
fi
done
How would I fix this problem?
I think the error is with your glob syntax which doesn't work in either single- or double-quotes,
for tmp in "$1"/*; do
..
Do the above to expand the glob outside the quotes.
There are couple more improvements possible in your script,
Double-quote your variables to prevent from word-splitting, e.g. echo "$temp"
Backtick command substitution `` is legacy syntax with several issues, use the $(..) syntax.
The [-f "filename"] condition check in linux is for checking the existence of a file and it is a regular file. For reference, use this text as reference,
-b FILE
FILE exists and is block special
-c FILE
FILE exists and is character special
-d FILE
FILE exists and is a directory
-e FILE
FILE exists
-f FILE
FILE exists and is a regular file
-g FILE
FILE exists and is set-group-ID
-G FILE
FILE exists and is owned by the effective group ID
I suggest you try with [-e "filename"] and see if it works.
Cheers!
At least on the command line, this piece of script does it:
for tmp in *; do echo $tmp; if [ -f $tmp ]; then num=$(ls -l $tmp | sed -e 's/ */ /g' | cut -d ' ' -f5); echo $num; fi; done;
If cut uses space as delimiter, it cuts at every space sign. Sometimes you have more than one space between columns and the count can easily go wrong. I'm guessing that in your case you just happened to echo a space, which looks like nothing. With the sed command I remove extra spaces.

Attempting to pass two arguments to a called script for a pattern search

I'm having trouble getting a script to do what I want.
I have a script that will search a file for a pattern and print the line numbers and instances of that pattern.
I want to know how to make it print the file name first before it prints the lines found
I also want to know how to write a new script that will call this one and pass two arguments to it.
The first argument being the pattern for grep and the second the location.
If the location is a directory, it will loop and search the pattern on all files in the directory using the script.
#!/bin/bash
if [[ $# -ne 2 ]]
then
echo "error: must provide 2 arguments."
exit -1
fi
if [[ ! -e $2 ]];
then
echo "error: second argument must be a file."
exit -2
fi
echo "------ File =" $2 "------"
grep -ne "$1" "$2"
This is the script i'm using that I need the new one to call. I just got a lot of help from asking a similar question but i'm still kind of lost. I know that I can use the -d command to test for the directory and then use 'for' to loop the command, but exactly how isn't panning out for me.
I think you just want to add the -H option to grep:
-H, --with-filename
Print the file name for each match. This is the default when there is more than one file to search.
grep has an option -r which can help you avoid testing for second argument being a directory and using for loop to iterate all files of that directory.
From the man page:
-R, -r, --recursive
Recursively search subdirectories listed.
It will also print the filename.
Test:
On one file:
[JS웃:~/Temp]$ grep -r '5' t
t:5 10 15
t:10 15 20
On a directory:
[JS웃:~/Temp]$ grep -r '5' perl/
perl//hello.pl:my $age=65;
perl//practice.pl:use v5.10;
perl//practice.pl:#array = (1,2,3,4,5);
perl//temp/person5.pm:#person5.pm
perl//temp/person9.pm: my #date = (localtime)[3,4,5];
perl//text.file:This is line 5

Linux shell script to add leading zeros to file names

I have a folder with about 1,700 files. They are all named like 1.txt or 1497.txt, etc. I would like to rename all the files so that all the filenames are four digits long.
I.e., 23.txt becomes 0023.txt.
What is a shell script that will do this? Or a related question: How do I use grep to only match lines that contain \d.txt (i.e., one digit, then a period, then the letters txt)?
Here's what I have so far:
for a in [command i need help with]
do
mv $a 000$a
done
Basically, run that three times, with commands there to find one digit, two digits, and three digit filenames (with the number of initial zeros changed).
Try:
for a in [0-9]*.txt; do
mv $a `printf %04d.%s ${a%.*} ${a##*.}`
done
Change the filename pattern ([0-9]*.txt) as necessary.
A general-purpose enumerated rename that makes no assumptions about the initial set of filenames:
X=1;
for i in *.txt; do
mv $i $(printf %04d.%s ${X%.*} ${i##*.})
let X="$X+1"
done
On the same topic:
Bash script to pad file names
Extract filename and extension in bash
Using the rename (prename in some cases) script that is sometimes installed with Perl, you can use Perl expressions to do the renaming. The script skips renaming if there's a name collision.
The command below renames only files that have four or fewer digits followed by a ".txt" extension. It does not rename files that do not strictly conform to that pattern. It does not truncate names that consist of more than four digits.
rename 'unless (/0+[0-9]{4}.txt/) {s/^([0-9]{1,3}\.txt)$/000$1/g;s/0*([0-9]{4}\..*)/$1/}' *
A few examples:
Original Becomes
1.txt 0001.txt
02.txt 0002.txt
123.txt 0123.txt
00000.txt 00000.txt
1.23.txt 1.23.txt
Other answers given so far will attempt to rename files that don't conform to the pattern, produce errors for filenames that contain non-digit characters, perform renames that produce name collisions, try and fail to rename files that have spaces in their names and possibly other problems.
for a in *.txt; do
b=$(printf %04d.txt ${a%.txt})
if [ $a != $b ]; then
mv $a $b
fi
done
One-liner:
ls | awk '/^([0-9]+)\.txt$/ { printf("%s %04d.txt\n", $0, $1) }' | xargs -n2 mv
How do I use grep to only match lines that contain \d.txt (IE 1 digit, then a period, then the letters txt)?
grep -E '^[0-9]\.txt$'
Let's assume you have files with datatype .dat in your folder. Just copy this code to a file named run.sh, make it executable by running chmode +x run.sh and then execute using ./run.sh:
#!/bin/bash
num=0
for i in *.dat
do
a=`printf "%05d" $num`
mv "$i" "filename_$a.dat"
let "num = $(($num + 1))"
done
This will convert all files in your folder to filename_00000.dat, filename_00001.dat, etc.
This version also supports handling strings before(after) the number. But basically you can do any regex matching+printf as long as your awk supports it. And it supports whitespace characters (except newlines) in filenames too.
for f in *.txt ;do
mv "$f" "$(
awk -v f="$f" '{
if ( match(f, /^([a-zA-Z_-]*)([0-9]+)(\..+)/, a)) {
printf("%s%04d%s", a[1], a[2], a[3])
} else {
print(f)
}
}' <<<''
)"
done
To only match single digit text files, you can do...
$ ls | grep '[0-9]\.txt'
One-liner hint:
while [ -f ./result/result`printf "%03d" $a`.txt ]; do a=$((a+1));done
RESULT=result/result`printf "%03d" $a`.txt
To provide a solution that's cautiously written to be correct even in the presence of filenames with spaces:
#!/usr/bin/env bash
pattern='%04d' # pad with four digits: change this to taste
# enable extglob syntax: +([[:digit:]]) means "one or more digits"
# enable the nullglob flag: If no matches exist, a glob returns nothing (not itself).
shopt -s extglob nullglob
for f in [[:digit:]]*; do # iterate over filenames that start with digits
suffix=${f##+([[:digit:]])} # find the suffix (everything after the last digit)
number=${f%"$suffix"} # find the number (everything before the suffix)
printf -v new "$pattern" "$number" "$suffix" # pad the number, then append the suffix
if [[ $f != "$new" ]]; then # if the result differs from the old name
mv -- "$f" "$new" # ...then rename the file.
fi
done
There is a rename.ul command installed from util-linux package (at least in Ubuntu) by default installed.
It's use is (do a man rename.ul):
rename [options] expression replacement file...
The command will replace the first occurrence of expression with the given replacement for the provided files.
While forming the command you can use:
rename.ul -nv replace-me with-this in-all?-these-files*
for not doing any changes but reading what changes that command would make. When sure just reexecute the command without the -v (verbose) and -n (no-act) options
for your case the commands are:
rename.ul "" 000 ?.txt
rename.ul "" 00 ??.txt
rename.ul "" 0 ???.txt

Resources