Bash to get timestamp from file list and compare it to filename - linux

Implementing a GIT repository for a project we are including the DB structure by generating a dump on the post-commit hook on deployment.
What I would like to have is a simple versioning system for the file based on the timestamp of the last change to the tables structure.
After finding this post with the suggestion to check for the dates of the the *.frm files in the MySQL data dir I thought the solution would be to implement it based on that last date as part of the generated file. This is:
Find out the latest date-time of the files of the DB (i.e. /var/lib/mysql/databaseX/) via an ls command (of type ls -la *.frm)
compare that value (last changed file) with the one of a certain file (ie /project/dump_2012102620001.sql) where the numbers correspond to the last generated dump.
If files timestamp is after that date generate the mysqldump command, otherwise ignore so the dump does not get generated and
committed as a change to GIT
Unfortunately my Linux console/bash concepts are too far from being capable and have not found any similar script to use.

You can use [[ file1 -ot file2 ]] to test whether file1 is older than file2.
last=$(ls -tr /path/to/db/files/*.frm | tail -n1)
if [[ dump -ot $last ]] ; then
create_new_dump
fi

You can save yourself a lot of grief by just dumping the table structure every time with the appropriate mysqldump command as this is relatively lightweight since it won't include table contents. Strip out the variable timestamp information at the top and compare with the previous file. Store if different.

Related

SHELL : Sort Directory name as descending order

I have 3 folders in my server,
Assuming folder names are
workbook_20220217
workbook_20220407
workbook_20220105
Each folder consist of its respective files
I would only want to print the latest file based on date, there are 2 methods i have tried so far
The first method i tried
Variable Declared
TABLEAU_REPORTING_FOLDER=/farid/reporting/workbook
#First Method
ls $TABLEAU_REPORTING_FOLDER *_* | sort -t_ -n -k2 | sed ':0 N;s/\n/, /;t0'
#The first method will return all its contents in the folder as well
#The second Method i have tried
$(ls -td ${TABLEAU_REPORTING_FOLDER}/workbook/* | head -1)
# This will return folder based on ascending order
Target output should be a workbook_20220407
What is the best approach should look into? There are no other logics i could think rather than using the date as the biggest value to determine if its the latest date
*PS i could not read folder as date modified because once folder have been transferred to my server, all 3 folders will be of the same date
UPDATE
I found a way to get the latest folder based on filename based on this reference : https://www.unix.com/shell-programming-and-scripting/174140-how-sort-files-based-file-name-having-numbers.html
ls | sort -t'-' -nk2.3 | tail -1
This will return the latest folder based on folder title , will this be safe to use ?
Also what does -nk.2.3 does and mean ?
You can list your files in a directory in reverse order with option -r (independent if you have selected either sort order) See man page of ls(1) command for details.
The options -n and -k2.3 of sort(1) command mean, respectively (see also sort(1) man page for details):
sort numerically. This meaning that the keys are considered as numbers and sorted accordingly.
select fields 2 and 3 (the dot must be a comma, by the way) as keys for sorting purposes.
Read the man pages of both commands, they are your friends.

Check if same file exists in another directory using Bash

I'm new to bash and would like your help; couldn't find an answer for this case.
I'm trying to check if the files in one directory exist in another directory
Let's say I have the path /home/public/folder/ (here I have several files)
and I want to check if the files exist in /home/private/folder2
I tried that
for file in $firstPath/*
do
if [ -f $file ]; then
(ask if to over write etc.. rest of the code)
And also
for file in $firstPath/*
do
if [ -f $file/$secondPath ]; then
(ask if to over write etc.. rest of the code)
Both don't work; it seems that in the first case, it compares the files in the first path (so it always ask me if I want to overwrite although it doesn't exist in the second path)
And in the second case, it doesn't go inside the if statement.
How could I fix that?
When you have a construct like for file in $firstPath/*, the value of $file is going to include the value of $firstPath, which does not exist within $secondPath. You need to strip the path in order to get the bare filename.
In traditional POSIX shell, the canonical way to do this was with an external tool called basename. You can, however, achieve what is generally thought to be equivalent functionality using Parameter Expansion, thus:
for file in "$firstPath"/*; do
if [[ -f "$secondPath/${file##*/}" ]]; then
# file exists, do something
fi
done
The ${file##*/} bit is the important part here. Per the documentation linked above, this means "the $file variable, with everything up to the last / stripped out." The result should be the same as what basename produces.
As a general rule, you should quote your variables in bash. In addition, consider using [[ instead of [ unless you're actually writing POSIX shell scripts which need to be portable. You'll have a more extensive set of tests available to you, and more predictable handling of variables. There are other differences too.

better way to avoid use of ls inside of variable

not sure about how to correctly title this, please change it if you prefer
given that my code actually works, I'd like to have a peer review to increase the quality of it.
I have a folder full of .zip files. Theese files are streams of data (identifiable by their stream name) daily offloaded. There could be more than one daily file per stream, so I need to grab the last one in order of time. I can't rely on posix timestamp for this, so files expose timestamp on their name.
Filename example:
XX_XXYYZZ_XYZ_05_AB00C901_T001_20170808210052_20170808210631.zip
Last two fields are timestamps, and I'm interested in the second-last.
other fields are useless (now)
I've previously stored the stream name (in this case XYZ_05_AB00C901_T001 in the variable $stream
I have this line of code:
match=$(ls "$streamPath"/*.zip|grep "$stream"|rev|cut -d'_' -f2|rev|sort|tail -1)
And what it does is to search the given path for files matching the stream, cutting out the timestamp and sorting them. So now that I know what is the last timestamp for this stream, I can ls again, this time grepping for $streamand $match togegher, and I'm done:
streamFile=$(ls "$streamPath"/.zip|grep "$stream.*$match\|$match.*$stream")
Question time:
Is there a better way to achieve my goal ? Probably more than one, I'll prefer one-liner solution, tough.
ShellChecks advices me that it would be better to use a for loop or a while cycle instead of ls, to be able to handle particular filenames (which I'm not facing ATM, but who knows), but I'm not so sure about it (seems more complicated to me).
Thanks.
O.
Thanks to the page suggested by Cyrus I chose to go with this solution:
echo "$file"|grep "$stream"|rev|cut -d'_' -f2|rev|sort|tail -1
done < <(find "$streamPath" -maxdepth 1 -type f -name '*.zip' -print0)

What command to search for ID in .bz2 file?

I am new to Linux and I'm trying to look for an ID number within a .bz2 file. Seems like a fairly straight forward requirement, however I cannot find the correct command anywhere online. I believe I need to use bzgrep.
I want to look for '123456' in the file Bulk9876.bz2
How would I construct this command?
You probably just need to tell grep that it's okay to parse that data as text:
bzgrep -a 123456 Bulk9876.bz2
If you're trying to view the compressed data (rather than decompressing it and searching the decompressed data), just use grep -a ….
Otherwise, it might make sense to verify that the desired string is even present in the file; bunzip2 it and grep -a the decompressed file. If that works, the problem is in your bzgrep instance (which is odd because it should be using the same decompression library as bunzip2).

Matching text files from a list of system numbers

I have ~ 60K bibliographic records, which can be identified by system number. These records also hold full-text (individudal text files named by system number).
I have lists of system numbers in bunches of 5K and I need to find a way to copy only the text files from each 5K list.
All text files are stored in a directory (/fulltext) and are named something along these lines:
014776324.txt.
The 5k lists are plain text stored in separated directories (e.g. /5k_list_1, 5k_list_2, ...), where each system number matches to a .txt file.
For example: bibliographic record 014776324 matches to 014776324.txt.
I am struggling to find a way to copy into the 5k_list_* folders only the corresponding text files.
Any idea?
Thanks indeed,
Let's assume we invoke the following script this way:
./the-script.sh fulltext 5k_list_1 5k_list_2 [...]
Or more succinctly:
./the-script.sh fulltext 5k_list_*
Then try using this (totally untested) script:
#!/usr/bin/env bash
set -eu # enable error checking
src_dir=$1 # first argument is where to copy files from
shift 1
for list_dir; do # implicitly consumes remaining args
while read bibliographic record sys_num rest; do
cp "$src_dir/$sys_num.txt" "$list_dir/"
done < "$list_dir/list.txt"
done

Resources