would like to tar files ending with same timestamp in to a single tar - linux

I got a list of log files and all these files end with a timestamp.
For each day I have bunch of log files all ending with same timestamp
For a week I have long list of file all with time stamps.
The challenge is, I would like to use tar command to archive set of files ending with same timedate stamp as one tar file.
Henc end up with tar files for every day accordingly.
How can I achieve this please? some sort of string matching wild card, I'm new to linux help please.
File Examples:
enter image description here

First, get a list of unique timestamps. Then, for each timestamp archive all files with that timestamp:
printf %s\\n *.log | grep -Eo '\.[0-9]{8}_' | tr -d ._ | sort -u | while read timestamp; do
tar cf "$timestamp.tar" ./*"$timestamp"*.log
done
Here I assumed that the timestamps always have 8 digits, always start with . and always end with _ (as shown in your screenshot).

# get all dates
all_date=`find -type f | awk -F '_' '{print $2}'`
# make a dir to save tar files
mkdir tarfiles
# archive
for d in $all_date ; do
tar zcvf tarfiles/$d.tar.gz *$d*
done

Related

How to take count some data of a file which is in archived folder?

I have a archived folder which contain some files, from one of those files I want to take count of 31 delimiter. How to get count without unzipping folder?
archived folder name =mug.tar, file name = APR_17
Below is how to take count
| awk -F "|" '{print $31}'|grep "40411"|sort -n|uniq -c|wc –l
Untar the wanted file from the archive file to stdout and pipe it your awk:
$ tar -xOf mug.tar APR_17 | awk ...
man tar:
-x, --extract, --get
extract files from an archive
-O, --to-stdout
extract files to standard output
-f, --file ARCHIVE
use archive file or device ARCHIVE

grep - limit number of files read

I have a directory with over 100,000 files. I want to know if the string "str1" exists as part of the content of any of these files.
The command:
grep -l 'str1' * takes too long as it reads all of the files.
How can I ask grep to stop reading any further files if it finds a match? Any one-liner?
Note: I have tried grep -l 'str1' * | head but the command takes just as much time as the previous one.
Naming 100,000 filenames in your command args is going to cause a problem. It probably exceeds the size of a shell command-line.
But you don't have to name all the files if you use the recursive option with just the name of the directory the files are in (which is . if you want to search files in the current directory):
grep -l -r 'str1' . | head -1
Use grep -m 1 so that grep stops after finding the first match in a file. It is extremely efficient for large text files.
grep -m 1 str1 * /dev/null | head -1
If there is a single file, then /dev/null above ensures that grep does print out the file name in the output.
If you want to stop after finding the first match in any file:
for file in *; do
if grep -q -m 1 str1 "$file"; then
echo "$file"
break
fi
done
The for loop also saves you from the too many arguments issue when you have a directory with a large number of files.

How to find files with same name part in directory using the diff command?

I have two directories with files in them. Directory A contains a list of photos with numbered endings (e.g. janet1.jpg laura2.jpg) and directory B has the same files except with different numbered endings (e.g. janet41.jpg laura33.jpg). How do I find the files that do not have a corresponding file from directory A and B while ignoring the numbered endings? For example there is a rachael3 in directory A but no rachael\d in directory B. I think there's a way to do with the diff command in bash but I do not see an obvious way to do it.
I can't see a way to use diff for this directly. It will probably be easier to use a sums tool (md5, sha1, etc.) on both directories and then sort both files based on the first (sum) column and diff/compare those output files.
Alternatively, something like findimagedupes (which isn't as simple a comparison as diff or a sums check) might be a simpler (and possibly more useful) solution.
It seems you know that your files are the same, if they exist and you are sure, there is only one of a kind per directory.
So to diff the contents of the directory according to this, you need to get only the relevant parts of the file name ("laura", "janet").
This could be done by simple grepping the appropriate parts from the output of ls like this:
ls dir1/ | egrep -o '^[a-A]+'
Then to compare, let's say dir1 and dir2, you can use:
diff <(ls dir1/ | egrep -o '^[a-A]+') <(ls dir2/ | egrep -o '^[a-A]+')
Assuming the files are simply renamed and otherwise identical, a simple solution to find the missing ones is to use md5sum (or sha or somesuch) and uniq:
#!/bin/bash
md5sum A/*.jpg B/*.jpg >index
awk '{print $1}' <index | sort >sums # delete dir/file
# list unique files (missing from one directory)
uniq -u sums | while read s; do
grep "$s" index | sed 's/^[a-z0-9]\{32\} //'
done
This fails in the case where a folder contains several copies of the same file renamed (such that the hash matches multiple files in one folder), but that is easily fixed:
#!/bin/bash
md5sum A/*.jpg B/*.jpg > index
sed 's/\/.*//' <index | sort >sums # just delete /file
# list unique files (missing from one directory)
uniq sums | awk '{print $1}' |\
uniq -u | while read s junk; do
grep "$s" index | sed 's/^[a-z0-9]\{32\} //'
done

linux-shell: renaming files to creation time

Good morning everybody,
for a website I'd like to rename files(pictures) in a folder from "1.jpg, 2.jpg, 3.jpg ..." to "yyyymmdd_hhmmss.jpg" - so I'd like to read out the creation times an set this times as names for the pics. Does anybody have an idea how to do that for example with a linux-shell or with imagemagick?
Thank you!
Naming based on file system date
In the linux shell:
for f in *.jpg
do
mv -n "$f" "$(date -r "$f" +"%Y%m%d_%H%M%S").jpg"
done
Explanation:
for f in *.jpg
do
This starts the loop over all jpeg files. A feature of this is that it will work with all file names, even ones with spaces, tabs or other difficult characters in the names.
mv -n "$f" "$(date -r "$f" +"%Y%m%d_%H%M%S").jpg"
This renames the file. It uses the -r option which tells date to display the date of the file rather than the current date. The specification +"%Y%m%d_%H%M%S" tells date to format it as you specified.
The file name, $f, is placed in double quotes where ever it is used. This assures that odd file names will not cause errors.
The -n option to mv tells move never to overwrite an existing file.
done
This completes the loop.
For interactive use, you may prefer that the command is all on one line. In that case, use:
for f in *.jpg; do mv -n "$f" "$(date -r "$f" +"%Y%m%d_%H%M%S").jpg"; done
Naming based on EXIF Create Date
To name the file based on the EXIF Create Date (instead of the file system date), we need exiftool or equivalent:
for f in *.jpg
do
mv -n "$f" "$(exiftool -d "%Y%m%d_%H%M%S" -CreateDate "$f" | awk '{print $4".jpg"}')"
done
Explanation:
The above is quite similar to the commands for the file date but with the use of exiftool and awk to extract the EXIF image Create Date.
The exiftool command provides the date in a format like:
$ exiftool -d "%Y%m%d_%H%M%S" -CreateDate sample.jpg
Create Date : 20121027_181338
The actual date that we want is the fourth field in the output.
We pass the exiftool output to awk so that it can extract the field that we want:
awk '{print $4".jpg"}'
This selects the date field and also adds on the .jpg extension.
Thanks to #John1024 !
I needed to rename files with different extensions in the same time, according to last modification date :
for f in *; do
fn=$(basename "$f")
mv "$fn" "$(date -r "$f" +"%Y-%m-%d_%H-%M-%S")_$fn"
done
"DSC_0189.JPG" ➜ "2016-02-21_18-22-15_DSC_0189.JPG"
"MOV_0131.avi" ➜ "2016-01-01_20-30-31_MOV_0131.avi"
If you don't want to keep original filename :
mv "$fn" "$(date -r "$pathAndFileName" +"%Y-%m-%d_%H-%M-%S")"
Hope it helps noobs as me !
Try this
for file in `ls -1 *.jpg`; do name=`stat -c %y $file | awk -F"." '{ print $1 }' | sed -e "s/\-//g" -e "s/\://g" -e "s/[ ]/_/g"`.jpg; mv $file $name; done
Though there might be an easier way.
I created a shell script; I think it's mac only, linux might need other arguments.
#!/bin/bash
BASEDIR=$1;
for file in `ls -1 $BASEDIR`; do
TIMESTAMP=`stat -f "%B" $BASEDIR/$file`;
DATENAME=`date -r $TIMESTAMP +'%Y%m%d-%H%M%S'`-$file
mv -v $BASEDIR/$file $BASEDIR/$DATENAME;
done
when called with a directory path, moves all files in that directory to prepend the creation date of that file, like
../camera/P1210232.JPG -> ../camera/20220121-103456-P1210232.JPG
Change filename based on file creation time:
exiftool "-filename<FileCreateDate" -d %Y%m%d_%H%M%S%z%%-c.%%le input.jpg

Merge sort gzipped files

I have 40 files of 2GB each, stored on an NFS architecture. Each file contains two columns: a numeric id and a text field. Each file is already sorted and gzipped.
How can I merge all of these files so that the resulting output is also sorted?
I know sort -m -k 1 should do the trick for uncompressed files, but I don't know how to do it directly with the compressed ones.
PS: I don't want the simple solution of uncompressing the files into disk, merging them, and compressing again, as I don't have sufficient disk space for that.
This is a use case for process substitution. Say you have two files to sort, sorta.gz and sortb.gz. You can give the output of gunzip -c FILE.gz to sort for both of these files using the <(...) shell operator:
sort -m -k1 <(gunzip -c sorta.gz) <(gunzip -c sortb.gz) >sorted
Process substitution substitutes a command with a file name that represents the output of that command, and is typically implemented with either a named pipe or a /dev/fd/... special file.
For 40 files, you will want to create the command with that many process substitutions dynamically, and use eval to execute it:
cmd="sort -m -k1 "
for input in file1.gz file2.gz file3.gz ...; do
cmd="$cmd <(gunzip -c '$input')"
done
eval "$cmd" >sorted # or eval "$cmd" | gzip -c > sorted.gz
#!/bin/bash
FILES=file*.gz # list of your 40 gzip files
# (e.g. file1.gz ... file40.gz)
WORK1="merged.gz" # first temp file and the final file
WORK2="tempfile.gz" # second temp file
> "$WORK1" # create empty final file
> "$WORK2" # create empty temp file
gzip -qc "$WORK2" > "$WORK1" # compress content of empty second
# file to first temp file
for I in $FILES; do
echo current file: "$I"
sort -k 1 -m <(gunzip -c "$I") <(gunzip -c "$WORK1") | gzip -c > "$WORK2"
mv "$WORK2" "$WORK1"
done
Fill $FILES the easiest way with the list of your files with bash globbing (file*.gz) or with a list of 40 filenames (separated with white blanks). Your files in $FILES stay unchanged.
Finally, the 80 GB data are compressed in $WORK1. While processing this script no uncompressed data where written to disk.
Adding a differently flavoured multi-file merge within a single pipeline - it takes all (pre-sorted) files in $OUT/uniques, sort-merges them and compresses the output, lz4 is used due to it's speed:
find $OUT/uniques -name '*.lz4' |
awk '{print "<( <" $0 " lz4cat )"}' |
tr "\n" " " |
(echo -n sort -m -k3b -k2 " "; cat -; echo) |
bash |
lz4 \
> $OUT/uniques-merged.tsv.lz4
It is true there are zgrep and other common utilities that play with compressed files, but in this case you need to sort/merge uncompressed data and compress the result.

Resources