appending to a tar file in a loop - linux

I have a directory, that has a maybe 6 files.
team1_t444444_jill.csv
team1_t444444_jill.csv
team1_t444444_jill.csv
team1_t999999_jill.csv
team1_t999999_jill.csv
team1_t111111_jill.csv
team1_t111111_jill.csv
I want to be able to tar each of the files based on their t number, so t444444 should have it's own tar file with all the corresponding csv's. t999999 should then have its own and so on... a total of three tar files should be created dynamically
for file in $bad_dir/*.csv; do
fbname=`basename "$file" | cut -d. -f1` #takes the pathfile off, only shows xxx_tyyyyy_zzz.csv
t_name=$(echo "$fbname" | cut -d_ -f2) #takes the remaning stuff off, only shows tyyyyy
#now i am stuck on how to create a tar file and send email
taredFile = ??? #no idea how to implement
(cat home/files/hello.txt; uuencode $taredFile $taredFile) | mail -s "Failed Files" $t_name#hotmail.com

The simplest edit of your script that should do what you want is likely something like this.
for file in $bad_dir/*.csv; do
fbname=`basename "$file" | cut -d. -f1` #takes the pathfile off, only shows xxx_tyyyyy_zzz.csv
t_name=$(echo "$fbname" | cut -d_ -f2) #takes the remaning stuff off, only shows tyyyyy
tarFile=$t_name-combined.tar
if [ ! -f "$tarFile" ]; then
tar -cf "$tarFile" *_${t_name}_*.csv
{ cat home/files/hello.txt; uuencode $tarFile $tarFile; } | mail -s "Failed Files" $t_name#hotmail.com
fi
done
Use a tar file name based on the unique bit of the input file names. Then check for that file existing before creating it and sending email (protects against creating the file more than once and sending email more than once).
Use the fact that the files are globbable to get tar to archive them all from the first one we see.
You'll also notice that I replaced (commands) with { commands; } in the pipeline. The () force a sub-shell but so does the pipe itself so there's no reason (in this case) to force an extra sub-shell manually just for the grouping effect.

This is what you want:
for i in `find | cut -d. -f2 | cut -d_ -f1,2 | sort | uniq`;
do
tar -zvcf $i.tgz $i*
# mail the $i.tgz file
done
Take a look on my run:
$ for i in `find | cut -d. -f2 | cut -d_ -f1,2 | sort | uniq`; do tar -zvcf $i.tgz $i*; done
team1_t111111_jill.csv
team1_t111111_jxx.csv
team1_t111111.tgz
team1_t444444_j123.csv
team1_t444444_j444.csv
team1_t444444_jill.csv
team1_t444444.tgz
team1_t999999_jill.csv
team1_t999999_jilx.csv
team1_t999999.tgz
ubuntu#ubuntu1504:/tmp/foo$ ls
team1_t111111_jill.csv team1_t111111.tgz team1_t444444_j444.csv team1_t444444.tgz team1_t999999_jilx.csv
team1_t111111_jxx.csv team1_t444444_j123.csv team1_t444444_jill.csv team1_t999999_jill.csv team1_t999999.tgz

Related

Executing the ls command and storing output in variables

command is:-
ls-ltr|grep "$(date +'%b %e')"|cut -d' ' -f14
the out will give file names created today.
i need to know how can i store individual files in individual variables.
example if i have 2 files in output and i want to store the two files in 2 different variables. Please help me how to do it
Don't parse ls.
If you are sure your filenames do not contain whitespace characters, you can do
todayfiles=( $(stat -c "%y#%n" * | grep "^$(date "+%F")" | cut -d# -f2-) )
If your filenames might contain spaces or tabs, but you are sure they do not contain newline characters, you can do
mapfile -t todayfiles < <(stat -c "%y#%n" * | grep "^$(date "+%F")" | cut -d# -f2-)
If you want to be able to handle any arbitrary filename, you can do
today=$(date "+%F")
todayfiles=()
for f in *; do
stat -c "%y" "$f" | grep -q "^$today" && todayfiles+=("$f")
done
Then iterate over today's files with:
for f in "${todayfiles[#]}"; do ...; done

Output of wc -l without file-extension

I've got the following line:
wc -l ./*.txt | sort -rn
i want to cut the file extension. So with this code i've got the output:
number filename.txt
for all my .txt-files in the .-directory. But I want the output without the file-extension, like this:
number filename
I tried a pipe with cut for different kinds of parameter, but all i got was to cut the whole filename with this command.
wc -l ./*.txt | sort -rn | cut -f 1 -d '.'
Assuming you don't have newlines in your filename you can use sed to strip out ending .txt:
wc -l ./*.txt | sort -rn | sed 's/\.txt$//'
unfortunately, cut doesn't have a syntax for extracting columns according to an index from the end. One (somewhat clunky) trick is to use rev to reverse the line, apply cut to it and then rev it back:
wc -l ./*.txt | sort -rn | rev | cut -d'.' -f2- | rev
Using sed in more generic way to cut off whatever extension the files have:
$ wc -l *.txt | sort -rn | sed 's/\.[^\.]*$//'
14 total
8 woc
3 456_base
3 123_base
0 empty_base
A better approach using proper mime type (what is the extension of tar.gz or such multi extensions ? )
#!/bin/bash
for file; do
case $(file -b $file) in
*ASCII*) echo "this is ascii" ;;
*PDF*) echo "this is pdf" ;;
*) echo "other cases" ;;
esac
done
This is a POC, not tested, feel free to adapt/improve/modify

Trying to delete lines beginning with a specific string from files where the file meets a target condition, in bash/linux

I am writing a bash script that will run a couple of times a minute. What I would like it to do is find all files in a specified directory that contain a specified string, and search that list of files and delete any line beginning with a different specific string (in this case it's
Here's what I've tried s far, but they aren't working:
ls -1t /the/directory | head -10 | grep -l "qualifying string" * | sed -i '/^<meta/d' *'
ls -1t /the/directory | head -10 | grep -l "qualifying string" * | sed -i '/^<meta/d' /the/directory'
The only reason I added in the head -10 is so that every time the script runs, it will start by only looking at the 10 most recent files. I don't want it to spend a lot of time searching needlessly through the entire directory since it will be going through and removing the line many times a minute.
The script has to be run out of a different directory than the files are in. Also, would the modified date on the files change if the "<meta" string doesn't exist in the file?
There are a variety of problem with this part of the command...
ls -1t /the/directory | head -10 | grep -l "qualifying string" * ...
First, you appear to be trying to pipe the output of ls ... | head -10 into grep, which would cause grep to search for "qualifying string" in the output of ls. Except then you turn around and provide * as a command line argument to grep, causing it to search in all the files, and completely ignoring the ls and head commands.
You probably want to read about the xargs commands, which reads a list of files on stdin and then runs a given command against that list. For example, you ought to be able to generate your file list like this:
ls -1t /the/directory | head -10 |
xargs grep -l "qualifying string"
And to apply sed to those files:
ls -1t /the/directory | head -10 |
xargs grep -l "qualifying string" |
sed -i 's/something/else/g'
Modifying the files with sed will update the modification time on the files.
You can use globbing with the * character to expand file names and loop through the directory.
n=0
for file in /the/directory/*; do
if [ -f "$file" ]; then
grep "qualifying string" "$file" && sed -i '/^<meta/d' "$file"
n=$((n+1))
fi
[ $n -eq 10 ] && break
done

Ordering a loop in bash

I've a bash script like this:
for d in /home/test/*
do
echo $d
done
Which ouputs this:
/home/test/newer dir
/home/test/oldest dir
I'd like to order the folders by creation time so that the 'oldest dir' directory appears first in the list. I've tried ls and tree variations to no avail.
For example,
for d in `ls -d -c -1 $PWD/*`
Returns:
/home/test/oldest
dir
/home/test/newer
dir
Very close, but it does not respect the space in the directory name. My question, how would I have oldest dir on top and support the whitespace?
ls -d -c $PWD/* | while read line
do echo "$line"
done
Another technique, kind of a Schwartzian transform:
stat -c $'%Z\t%n' /home/test/* | sort -n | cut -f2- |
while IFS= read -r filename; do
# ...
This solution is fragile with filenames containing newlines.

List the files I own in subversion

So I have a bit of an issue. I work for a small startup (about 8 developers) and my boss recently decided that we need to put the owner of each file in the documentation. So I have been try to write something using svn blame and file to loop through every php file and see which files have my username on more that 15 lines, but I haven't been able to get it quite right.
What I would really like is a one-liner (or simple bash script) that will list every file in a subversion repository and the username that last edited the majority of the lines. Any ideas?
Alright, this is what I came up with:
#!/bin/bash
set -e
for file in `svn ls -R`; do
if [ -f $file ]; then
owner=`svn blame $file | tr -s " " " " | cut -d" " -f3 | sort | uniq -c | sort -nr | head -1 | tr -s " " " " | cut -d" " -f3`
if [ $owner ]; then
echo $file $owner
fi
fi
done
It uses svn ls to determine each file in the repository, then for each file, svn blame output is examined:
tr -s " " " " squeezes multiple spaces into one space
cut -d" " -f3 gets the third space-delimited field, which is the username
sort sorts the output so all lines last edited by one user are together
uniq -c gets all unique lines and outputs the count of how many times each line appeared
sort -nr sorts numerically, in reverse order (so that the username that appeared most is sorted first)
head -1 returns the first line
tr -s " " " " | cut -d" " -f3 same as before, squeezes spaces and returns the third fieldname which is user.
It'll take a while to run but at the end you'll have a list of <filename> <most prevalent author>
Caveats:
Error checking is not done to make sure the script is called from within an SVN working copy
If called from deeper than the root of a WC, only files at that level and deeper will be considered
As mentioned in the comments, you might want to take revision date into account (if the majority of checkins happened 10 years ago, you might want to discount them determining the owner)
Any working copy changes that aren't checked in won't be taken into effect
for f in $(find . -name .svn -prune -o -type f); do
echo $f $(svn blame $f | awk '{ print $2 }' | sort | uniq -c | sort -nr | head -n 1 | cut -f 1)
done

Resources