Shell file size in Linux - linux

How can I get the size of a file into a variable?
ls -l | grep testing.txt | cut -f6 -d' '
gave the size, but how can I store it in a shell variable?

filesize=$(stat -c '%s' testing.txt)

You can do it this way with ls (check the man page for the meaning of -s)
var=$(ls -s1 testing.txt | awk '{print $1}')
Or you can use stat with -c '%s'.
Or you can use find (GNU):
var=$(find testing.txt -printf "%s")

size() {
file="$1"
if [ -b "$file" ]; then
/sbin/blockdev --getsize64 "$file"
else
wc -c < "$file" # Handles pseudo files like /proc/cpuinfo
# stat --format %s "$file"
# find "$file" -printf '%s\n'
# du -b "$file" | cut -f1
fi
}
fs=$(size testing.txt)

size=`ls -l | grep testing.txt | cut -f6 -d' '`

You can get the file size in bytes with the command wc, which is fairly common on Linux systems since it's part of GNU coreutils:
wc -c < file
In a Bash script you can read it into a variable like this:
FILESIZE=$(wc -c < file)
From man wc:
-c, --bytes
print the byte counts

a=\`stat -c '%s' testing.txt\`;
echo $a

Related

BASH: How to add text in the same line after command

I have to print number of folders in my directory, so i use
ls -l $1| grep "^d" | wc -l
after that, I would liked to add a text in the same line.
any ideas?
If you don’t want to use a variable to hold the output you can use echo and put your command in $( ) on that echo line.
echo $(ls -l $1| grep "^d" | wc -l ) more text to follow here
Assign the result to a variable, then print the variable on the same line as the directory name.
folders=$(ls -l "$1" | grep "^d" | wc -l)
printf "%s %d\n" "$1" "$folders"
Also, remember to quote your variables, otherwise your script won't work when filenames contain whitespace.

how to echo the filename?

I'm searching in a .docx content with this command:
unzip -p *.docx word/document.xml | sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g' | grep $1
But I need the name of file which contains the word what I searched. How can I do it?
You can walk through the files via for cycle:
for file in *.docx; do
unzip -p "$file" word/document.xml | sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g' | grep PATTERN && echo $file
done
The && echo $file part prints the filename when grep finds the pattern.
Try with:
find . -name "*your_file_name*" | xargs grep your_word | cut -d':' -f1
If you're using GNU grep (likely, as you're on Linux), you might want to use this option:
--label=LABEL
Display input actually coming from standard input as input coming from file LABEL. This is especially useful when implementing tools like zgrep, e.g., gzip -cd foo.gz | grep --label=foo -H something. See
also the -H option.
So you'd have something like
for f in *.docx
do unzip -p "$f" word/document.xml \
| sed -e "$sed_command" \
| grep -H --label="$f" "$1"
done

Concatenating xargs with the use of if-else in bash

I've got two test files, namely, ttt.txt and ttt2.txt, the Content of which is shown as below:
#ttt.txt
(132) 123-2131
543-732-3123
238-3102-312
#ttt2.txt
1
2
3
I've already tried the following commands in bash and it works fine:
if grep -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" ttt1.txt ; then echo "found"; fi
# with output 'found'
if grep -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" ttt2.txt ; then echo "found"; fi
But when I combine the above command with xargs, it complains error '-bash: syntax error near unexpected token `then''. Could anyone give me some explanation? Thanks in advance!
ll | awk '{print $9}' | grep ttt | xargs -I $ if grep --quiet -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" $; then echo "found"; fi
$ is a special character in bash (it marks variables) so don't use it as your xargs marker, you'll only get confused.
The real problem here though is that you are passing if grep --quiet -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" $ as the argument to xargs, and then the remainder of the line is being treated as a new command, because it breaks at the ;.
You can wrap the whole thing in a sub-invocation of bash, so that xargs sees the whole command:
$ ll | awk '{print $9}' | grep ttt | xargs -I xx bash -c 'if grep --quiet -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" xx; then echo "found"; fi'
found
Finally, ll | awk '{print $9}' | grep ttt is a needlessly complicated way of listing the files that you're looking for. You actually you don't need any of the code above, just do this:
$ if grep --quiet -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" ttt*; then echo "found"; fi
found
Alternatively, if you want to process each file in turn (which you don't need here, but you might want when this gets more complicated):
for file in ttt*
do
if grep --quiet -oE "(\(\d{3}\)[ ]?\d{3}-\d{4})|(\d{3}-\d{3}-\d{4})" "$file"
then
echo "found"
fi
done

dynamically run linux shell commands

I have a command that should be executed by a shell script.
Actually the command does not matter the only thing that is important the further command execution and the right escaping of the critical parts.
The command that usually is executed normally in putty is something like this(maybe some additional flags for ls)
rm -r `ls /test/parse_first/ | awk '{print $2}' | grep trash`
but now I have a batch of such command so I would like to execute them in a loop
like
for i in {0..100}
do
str=str$i
${!str}
done
where str is :
str0="rm -r `ls /test/parse_first/ | awk '{print $2}' | grep trash`"
str1="rm -r `ls /test/parse_second/ | awk '{print $2}' | grep trash`"
and that gives me a lot of headache cause the execution done by ${!str} brakes the quotations and inline shell between `...` marks
my_rm() { rm -r `ls /test/$1 | awk ... | grep ... `; }
for i in `whatevr`; do
my_rm $i
done;
Getting this right is surprisingly tricky, but it can be done:
for i in $(seq 0 100)
do
str=str$i
eval "eval \"\$$str\""
done
You can also do:
for i in {0..10}
do
<whatevercommand>
done
It's actually simpler to place them on arrays and use glob patterns:
#!/bin/bash
shopt -s nullglob
DIRS=("/test/parse_first/" "/test/parse_second/")
for D in "${DIRS[#]}"; do
for T in "$D"/*trash*; do
rm -r -- "$T"
done
done
And if rm could accept multiple arguments, you don't need to have an extra loop:
for D in "${DIRS[#]}"; do
rm -r -- "$D"/*trash*
done
UPDATE:
#!/bin/bash
readarray -t COMMANDS <<'EOF'
rm -r `ls /test/parse_first/ | awk '{print $2}' | grep trash
rm -r `ls /test/parse_second/ | awk '{print $2}' | grep trash
EOF
for C in "${COMMANDS[#]}"; do
eval "$C"
done
Or you could just read commands from another file:
readarray -t COMMANDS < somefile.txt

How to clean csv by another csv while in a 'for' loop?

I'm not a linux expert, and usually in this situation PHP would be much more suitable... But due to the circumstances it occurred that I wrote it in Bash :)
I have the following .sh which runs over all .csv files in the current folder and execute a bunch of commands.
The goal: Cleaning email lists in .csv files (not actually .csv but just a .txt file in practice).
for file in $(find . -name "*.csv" ); do
echo "====================================================" >> db_purge_log.txt
echo "$file" >> db_purge_log.txt
echo "----------------------------------------------------" >> db_purge_log.txt
echo "Contacts BEFORE purge:" >> db_purge_log.txt
wc -l $file | cut -d " " -f1 >> db_purge_log.txt
echo " " >> db_purge_log.txt
cat $file | egrep -v "xxx|yyy|zzz" | grep -v -E -i '([0-z])\1{2,}' | uniq | sort -u > tmp_file
mv tmp_file $file ;
echo "Contacts AFTER purge:" >> db_purge_log.txt
wc -l $file | cut -d " " -f1 >> db_purge_log.txt
done
Now the trouble is:
I want to add a command, somewhere in the middle of this loop, to use another .csv file as suppression list, meaning - every line found as perfect match in that suppression list - delete from $file.
At this point my brain is stuck and I can't think of a solution. To be honest, I didn't manage using sort or grep on 2 different files and export to a 3rd file without completely eliminating the duplicated lines cross both files, so I end up with much less data.
Any help would be much appreciated!
Clean up
Before adding functionality to the script, the existing script needs to be cleaned up — a lot.
I/O Redirection — Don't Repeat Yourself
When I see wall-to-wall I/O redirections like that, I want to cry — that isn't how you do it! You have three options to avoid all that:
for file in $(find . -name "*.csv" )
do
echo "===================================================="
echo "$file"
echo "----------------------------------------------------"
echo "Contacts BEFORE purge:"
wc -l $file | cut -d " " -f1
echo " "
cat $file | egrep -v "xxx|yyy|zzz" | grep -v -E -i '([0-z])\1{2,}' | uniq | sort -u > tmp_file
mv tmp_file $file ;
echo "Contacts AFTER purge:"
wc -l $file | cut -d " " -f1
done >> db_purge_log.txt
Or:
{
for file in $(find . -name "*.csv" )
do
echo "===================================================="
echo "$file"
echo "----------------------------------------------------"
echo "Contacts BEFORE purge:"
wc -l $file | cut -d " " -f1
echo " "
cat $file | egrep -v "xxx|yyy|zzz" | grep -v -E -i '([0-z])\1{2,}' | uniq | sort -u > tmp_file
mv tmp_file $file ;
echo "Contacts AFTER purge:"
wc -l $file | cut -d " " -f1
done
} >> db_purge_log.txt
Or even:
exec >>db_purge_log.txt # By default, standard output will go to db_purge_log.txt
for file in $(find . -name "*.csv" )
do
echo "===================================================="
echo "$file"
echo "----------------------------------------------------"
echo "Contacts BEFORE purge:"
wc -l $file | cut -d " " -f1
echo " "
cat $file | egrep -v "xxx|yyy|zzz" | grep -v -E -i '([0-z])\1{2,}' | uniq | sort -u > tmp_file
mv tmp_file $file ;
echo "Contacts AFTER purge:"
wc -l $file | cut -d " " -f1
done
The first form is adequate for this script which has a single loop in it to provide I/O redirection to. The second form, using { and } would handle more general sequences of commands. The third form, using exec, is 'permanent'; you can't recover the original standard output, whereas with the { ... } form you can have different sections of the script writing to different places.
One other advantage of all these variations is that you can trivially send errors to the same place that you're sending standard output if that's what you desire. For example:
exec >>db_purge_log.txt 2>&1
Other issues
Suppressing file name from wc — instead of:
wc -l $file | cut -d " " -f1
use:
wc -l < $file
UUOC — Useless use of cat — instead of:
cat $file | egrep -v "xxx|yyy|zzz" | grep -v -E -i '([0-z])\1{2,}' | uniq | sort -u > tmp_file
use:
egrep -v "xxx|yyy|zzz" $file | grep -v -E -i '([0-z])\1{2,}' | uniq | sort -u > tmp_file
UUOU — Useless use of uniq
It is not at all clear why you need uniq and sort -u; in context, sort -u is sufficient, so:
egrep -v "xxx|yyy|zzz" $file | grep -v -E -i '([0-z])\1{2,}' | sort -u > tmp_file
UUOG — Useless use of grep
egrep is equivalent to grep -E and both are capable of handling multiple regular expressions, and the second will match what is matched by the expression in the parentheses 3 or more times (we really only need to match three times), so in fact the second expression will do the job of the first. And the [0-z] match is dubious. It probably matches sundry punctuation characters as well as the upper and lower case digits, but you're already doing a case-insensitive search because of the -i, so we can regularize all that to:
grep -Eiv '([0-9a-z]){3}' $file | sort -u > tmp_file
File names with spaces
The code is not going to handle file names with spaces, tabs or newlines because of the for file in $(find ...) notation. It probably isn't necessary to deal with that now — be aware of the issue.
Final clean up
for file in $(find . -name "*.csv" )
do
echo "===================================================="
echo "$file"
echo "----------------------------------------------------"
echo "Contacts BEFORE purge:"
wc -l < $file
echo " "
grep -Evi '([0-9a-z]){3}' | sort -u > tmp_file
mv tmp_file $file
echo "Contacts AFTER purge:"
wc -l <$file
done >> db_purge_log.txt
Add the extra functionality
I want to add a command, somewhere in the middle of this loop, to use another .csv file as suppression list — meaning that every line found as perfect match in that suppression list should be deleted from $file.
Since we're already sorting the input files ($file), we can sort the suppression file (call it suppfile='suppressions.txt'too if it is not already sorted. Given that, we then use comm to eliminate the lines that appear in both $file and $suppfile. We're interested in the lines that only appear in $file (or, as will be the case here, in the edited and sorted version of the file), so we want to suppress the common entries and the entries from $suppfile that do not appear in $file. The comm -23 - "$suppfile" command reads the edited, sorted file from standard input - and leaves out the entries from "$suppfile"
suppfile='suppressions.txt' # Must be in sorted order
for file in $(find . -name "*.csv" )
do
echo "===================================================="
echo "$file"
echo "----------------------------------------------------"
echo "Contacts BEFORE purge:"
wc -l < "$file"
echo " "
grep -Evi '([0-9a-z]){3}' | sort -u | comm -23 - "$suppfile" > tmp_file
mv tmp_file "$file"
echo "Contacts AFTER purge:"
wc -l < "$file"
done >> db_purge_log.txt
If the suppression file is not in sorted order, simply sort it into a temporary file. Beware of using the .csv suffix on the suppression file in the current directory; it will catch the file and empty it because every line in the suppression file matches a line in the suppression file, which is not helpful for any files processed after the suppression file.
Oops — I over-simplified the grep regex. It should (probably) be:
grep -Evi '([0-9a-z])\1{2}' $file
The difference is considerable. My original rewrite will look for any three adjacent digits or letters (e.g. 123 or abz); the revision (actually very similar to one of the original commands) looks for a character from [0-9A-Za-z] followed by two occurrences of the same character (e.g. 111 or aaa, but not 123 or abz).
If perchance the alternatives xxx|yyy|zzz were really not 3 repeated characters, you might need two invocations of grep in sequence.
If I understand you correctly, assuming a recent 'nix, grep should do most of the trick for you. The command, grep -vf filterfile input.csv will output the lines in input.csv that do NOT match any regular expression found in filterfile.
A couple of other comments ... uniq needs the input sorted in order to remove dups, so you might want the sort before it in the pipe (unless your input data is sorted).
Or if the input is sorted to start with, grep -u will omit duplicates.
Small suggestion -- you might add a #!/bin/bash as the first line in order to ensure that the script is run by bash rather than the user's login shell (it might not be bash).
HTH.
b

Resources