Set file modification time from the date string present in the filename - linux

I'm restoring a number of archives with dates within their names, something along the lines of:
user-2018.12.20.tar.xz
user-2019.01.10.tar.xz
user-2019.02.25.tar.xz
user-2019.04.19.tar.xz
...
I want to set each file's modification date to match the date in their filename by piping the filenames to touch via xargs and using replace-str to set the dates.
touch -m -t will take a datetime in the format [CCYYMMDDhhmm], but I'm having trouble substituting inline:
find . -name "*.xz" | xargs -I {} touch -m -t $(sed -e 's/\.tar\.xz//g; s/user-//g; s/\.//g; s/\///g; s/$/0000/g' {}) {}
Returns touch: invalid date format ‘./user-2018.03.22.tar.xz’, even though this:
find . -name "*.xz" | sed -e 's/\.tar\.xz//g; s/user-//g; s/\.//g; s/\///g; s/$/0000/g'
Returns properly-formatted dates, for example 201812200000. Am I misusing command substitution in my replace string somehow?
EDIT : Yes, a simple script could do this no problem. But the question remains...

You don't need find, sed, xargs or any third party tools, but just use the shell built-in regex capabilities to get the timestamp from the file
for file in *.tar.xz; do
[ -f "$file" ] || continue
if [[ $file =~ ^user-([[:digit:]]+).([[:digit:]]+).([[:digit:]]+).tar.xz$ ]]; then
dateStr="${BASH_REMATCH[1]}${BASH_REMATCH[2]}${BASH_REMATCH[3]}0000"
touch -m -t "$dateStr"
fi
done

The problem is that the command substitution will be evaluated once when you call xargs, not for each argument. You would need to spawn a shell for that:
find . -name "*.xz" \
| xargs -I {} bash -c 'touch -m --date "$(sed -e "s/\.tar\.xz//;s/user-//g; s/\.//g; s/\///g;" <<< "$1")" "$1"' -- {}
Note: xargs is not needed because you can use the -exec option of find:
find . -name "*.xz" -exec bash -c 'touch -m --date "$(sed -e "s/\.tar\.xz//;s/user-//g; s/\.//g; s/\///g;" <<< "$1")" "$1"' -- {} \;
PS: A small for loop would be more readable:
for file in user-*.tar.xz ; do
# remove prefix and suffix
date=${file#user-}
date=${date%.tar.xz}
# replace dots by /
date=${date//./\/}
touch -m --date "${date}" "${file}"
done

This might work for you (GNU parallel):
parallel --dryrun touch -m --date '{= s/[^0-9]//g =}' {} ::: *.xz
When happy that the commands are correct, then remove the --dryrun option.
Alternative:
parallel touch -m --date `{= s/user-//;s/\.tar\.xz//;s/\.//g =}' {} ::: *.xz

Related

Perl Script to Grep Directory For String and Print

I would like to create a perl or bash script that will read keyboard input and assign a variable, perform a fixed string grep recursively within the current directory filled with Snort logs, and then automatically tcpdump the matched files, grep its output, and print the specified lines to the terminal. Does anyone have a good idea of how this should work?
Here is an example of the methodology I want from the script:
step 1: Read keyboard input and assign it to variable named string.
step 2 command: grep -Fr "$string"
step 2 output: snort.log.1470609906 matches
step 3 command: tcpdump -r snort.log.1470609906 | grep -F "$string" C-10
step 3 output:
Snort log
Here's some bash code that does that:
s="google.com"
grep -Frl "$s" | \
while IFS= read -r x; do
tcpdump -r "$x" | grep -F "$s" -C10
done
idk about perl but you can do it easily enough just in shell:
str="google.com"
find . -type f -name 'snort.log.*' -exec grep -FlZ "$str" {} + |
xargs -0 -I {} sh -c 'tcpdump -r "{}" | grep -F '"$str"' -C10'

In bash script, how to check a file is a perl script?

I have a bunch of files under a directory. how can I check all of them and make sure if it is a perl script or not?(they don't have .pl in the filename)
If you cannot rely on there being a valid shebang either, you might pass them to perl -c.
for f in *; do
perl -c "$f" 2>/dev/null && echo "$f is Perl"
done
If you want properly machine-readable output, maybe switch the echo to printf '%s\0' "$f" so you can pass it to xargs -0 and friends.
The obvious flaw with this is that a Perl script with an error in it will be reported as not being (valid) Perl.
Check the shebang
head -n 1 script | grep perl
Normally most command line scripts contain a shebang ie something like
#!/usr/bin/perl
They're not required if you are calling the script like this
perl script
but if you want to call them as system command they help.
find ./ -type f -exec egrep -I -l '^use strict;|^use warnings;|^sub |my \$|my \%|my \#|\->{' {} + 2>&1 \
| egrep -v 'README|\.git|\.zsh$|.sh$' \
| xargs file | grep 'ASCII' \
| awk '{print $1}' \
| sed 's/:$//'
not perfect but this will find most files with relatively modern Perl5 code in them
Since they do not have the extension, try this:
find /path/to/directory/ -type f | while read line; do if file -b "$line" | grep -i perl -q; then echo "$line is a perl file"; fi; done

How do I search for a file based on what is output by a command running on that file

I am working on a project for one of my professors and he asked me to sort a couple hundred .fits images based on their header files (specifically what star they are images of) I think that grep would be the best way to do this however I can't seam to figure out how to use grep based on the header.
I am entering:
ls | imhead *.fits | grep -E -r "PG\ 1104+243" *
to just list them out for now, once they are listed I know how to copy them into a directory.
I am new to using grep so I am unsure as to where my error lies? any help would be greatly appreciated! Thanks!
Assuming that imghead will extract the headers of the .fits as txt, you can use a simple shell script to do it:
script.sh
#!/bin/bash
grep "$1" "$2" > /dev/null 2>&1 && echo "$2"
Note that the + is a special character if you use extended regular expression, meaning if you pass the -E as in the question. A simple grep without any options should do the trick here.
Use find to exec the script on every *.fits file in the current folder:
find -maxdepth 1 -name '*.fits' -exec ./script.sh 'PG 1104+243' {} \;
If you are going to copy/move/alter or do something with the files you find, you might be better off, in terms of complexity and ease of quoting, using a loop like this:
#!/bin/bash
find . -name \*.fits -print0 | while read -d '' -r file; do
echo Checking file: $file
imhead "$file" | grep -q 'PG 1104+243'
if [ $? -eq 0 ]; then
echo Object matches: $file
fi
done

Linux Shell - Replacing string with other string inside files

I have a problem with this linux shell script.
#! /bin/bash
find /sdcard/ -type f -iname "*.srt" -print >> /sdcard/files
count=`wc -l /sdcard/files |cut -d'/' -f1`
for (( c=1; c<=$count; c++ ))
do
line=`sed -n ''$c'p' /sdcard/files`
cat "$line" | sed -e 's/č/c/g' > "$line".srt""
rm "$line"
done
rm /sdcard/files
I know this isnt the best way to do this but thats all i can do with my knowlage
As you can see it finds all srt files and then replaces all "č" charactes with "c". But it doesnt work with files i downloaded
However when i make a new file and write "č" inside (with my keyboard), it replaces it as it should. I dont understand why?
I think we discovered the cause, now the fix:
vim somefile.srt -c ":set bomb" -c ":set fileencoding=utf-8" -c ":wq"
There's also a dirty way
echo -e "\xC2\xA0" >> somefile.srt
I tried iconv tool which is supposed to do the conversion, but it didn't help.

iconv any encoding to UTF-8

I am trying to point iconv to a directory and all files will be converted UTF-8 regardless of the current encoding
I am using this script but you have to specify what encoding you are going FROM. How can I make it autdetect the current encoding?
dir_iconv.sh
#!/bin/bash
ICONVBIN='/usr/bin/iconv' # path to iconv binary
if [ $# -lt 3 ]
then
echo "$0 dir from_charset to_charset"
exit
fi
for f in $1/*
do
if test -f $f
then
echo -e "\nConverting $f"
/bin/mv $f $f.old
$ICONVBIN -f $2 -t $3 $f.old > $f
else
echo -e "\nSkipping $f - not a regular file";
fi
done
terminal line
sudo convert/dir_iconv.sh convert/books CURRENT_ENCODING utf8
Maybe you are looking for enca:
Enca is an Extremely Naive Charset Analyser. It detects character set and encoding of text files and can also convert them to other encodings using either a built-in converter or external libraries and tools like libiconv, librecode, or cstocs.
Currently it supports Belarusian, Bulgarian, Croatian, Czech, Estonian, Hungarian, Latvian, Lithuanian, Polish, Russian, Slovak, Slovene, Ukrainian, Chinese, and some multibyte encodings independently on language.
Note that in general, autodetection of current encoding is a difficult process (the same byte sequence can be correct text in multiple encodings). enca uses heuristics based on the language you tell it to detect (to limit the number of encodings). You can use enconv to convert text files to a single encoding.
You can get what you need using standard gnu utils file and awk. Example:
file -bi .xsession-errors
gives me:
"text/plain; charset=us-ascii"
so file -bi .xsession-errors |awk -F "=" '{print $2}'
gives me
"us-ascii"
I use it in scripts like so:
CHARSET="$(file -bi "$i"|awk -F "=" '{print $2}')"
if [ "$CHARSET" != utf-8 ]; then
iconv -f "$CHARSET" -t utf8 "$i" -o outfile
fi
Compiling all them. Go to dir, create dir2utf8.sh:
#!/bin/bash
# converting all files in a dir to utf8
for f in *
do
if test -f $f then
echo -e "\nConverting $f"
CHARSET="$(file -bi "$f"|awk -F "=" '{print $2}')"
if [ "$CHARSET" != utf-8 ]; then
iconv -f "$CHARSET" -t utf8 "$f" -o "$f"
fi
else
echo -e "\nSkipping $f - it's a regular file";
fi
done
Here is my solution to in place all files using recode and uchardet:
#!/bin/bash
apt-get -y install recode uchardet > /dev/null
find "$1" -type f | while read FFN # 'dir' should be changed...
do
encoding=$(uchardet "$FFN")
echo "$FFN: $encoding"
enc=`echo $encoding | sed 's#^x-mac-#mac#'`
set +x
recode $enc..UTF-8 "$FFN"
done
put it into convert-dir-to-utf8.sh and run:
bash convert-dir-to-utf8.sh /pat/to/my/trash/dir
Note that sed is a workaround for mac encodings here.
Many uncommon encodings need workarounds like this.
First answer
#!/bin/bash
find "<YOUR_FOLDER_PATH>" -name '*' -type f -exec grep -Iq . {} \; -print0 |
while IFS= read -r -d $'\0' LINE_FILE; do
CHARSET=$(uchardet $LINE_FILE)
echo "Converting ($CHARSET) $LINE_FILE"
# NOTE: Convert/reconvert to utf8. By Questor
iconv -f "$CHARSET" -t utf8 "$LINE_FILE" -o "$LINE_FILE"
# NOTE: Remove "BOM" if exists as it is unnecessary. By Questor
# [Refs.: https://stackoverflow.com/a/2223926/3223785 ,
# https://stackoverflow.com/a/45240995/3223785 ]
sed -i '1s/^\xEF\xBB\xBF//' "$LINE_FILE"
done
# [Refs.: https://justrocketscience.com/post/handle-encodings ,
# https://stackoverflow.com/a/9612232/3223785 ,
# https://stackoverflow.com/a/13659891/3223785 ]
FURTHER QUESTION: I do not know if my approach is the safest. I say this because I noticed that some files are not correctly converted (characters will be lost) or are "truncated". I suspect that this has to do with the "iconv" tool or with the charset information obtained with the "uchardet" tool. I was curious about the solution presented by #demofly because it could be safer.
Another answer
Based on #demofly 's answer:
#!/bin/bash
find "<YOUR_FOLDER_PATH>" -name '*' -type f -exec grep -Iq . {} \; -print0 |
while IFS= read -r -d $'\0' LINE_FILE; do
CHARSET=$(uchardet $LINE_FILE)
REENCSED=`echo $CHARSET | sed 's#^x-mac-#mac#'`
echo "\"$CHARSET\" \"$LINE_FILE\""
# NOTE: Convert/reconvert to utf8. By Questor
recode $REENCSED..UTF-8 "$LINE_FILE" 2> STDERR_OP 1> STDOUT_OP
STDERR_OP=$(cat STDERR_OP)
rm -f STDERR_OP
if [ -n "$STDERR_OP" ] ; then
# NOTE: Convert/reconvert to utf8. By Questor
iconv -f "$CHARSET" -t utf8 "$LINE_FILE" -o "$LINE_FILE" 2> STDERR_OP 1> STDOUT_OP
STDERR_OP=$(cat STDERR_OP)
rm -f STDERR_OP
fi
# NOTE: Remove "BOM" if exists as it is unnecessary. By Questor
# [Refs.: https://stackoverflow.com/a/2223926/3223785 ,
# https://stackoverflow.com/a/45240995/3223785 ]
sed -i '1s/^\xEF\xBB\xBF//' "$LINE_FILE"
if [ -n "$STDERR_OP" ] ; then
echo "ERROR: \"$STDERR_OP\""
fi
STDOUT_OP=$(cat STDOUT_OP)
rm -f STDOUT_OP
if [ -n "$STDOUT_OP" ] ; then
echo "RESULT: \"$STDOUT_OP\""
fi
done
# [Refs.: https://justrocketscience.com/post/handle-encodings ,
# https://stackoverflow.com/a/9612232/3223785 ,
# https://stackoverflow.com/a/13659891/3223785 ]
Third answer
Hybrid solution with recode and vim:
#!/bin/bash
find "<YOUR_FOLDER_PATH>" -name '*' -type f -exec grep -Iq . {} \; -print0 |
while IFS= read -r -d $'\0' LINE_FILE; do
CHARSET=$(uchardet $LINE_FILE)
REENCSED=`echo $CHARSET | sed 's#^x-mac-#mac#'`
echo "\"$CHARSET\" \"$LINE_FILE\""
# NOTE: Convert/reconvert to utf8. By Questor
recode $REENCSED..UTF-8 "$LINE_FILE" 2> STDERR_OP 1> STDOUT_OP
STDERR_OP=$(cat STDERR_OP)
rm -f STDERR_OP
if [ -n "$STDERR_OP" ] ; then
# NOTE: Convert/reconvert to utf8. By Questor
bash -c "</dev/tty vim -u NONE +\"set binary | set noeol | set nobomb | set encoding=utf-8 | set fileencoding=utf-8 | wq\" \"$LINE_FILE\""
else
# NOTE: Remove "BOM" if exists as it is unnecessary. By Questor
# [Refs.: https://stackoverflow.com/a/2223926/3223785 ,
# https://stackoverflow.com/a/45240995/3223785 ]
sed -i '1s/^\xEF\xBB\xBF//' "$LINE_FILE"
fi
done
This was the solution with the highest number of perfect conversions. Additionally, we did not have any truncated files.
WARNING: Make a backup of your files and use a merge tool to check/compare the changes. Problems probably will appear!
TIP: The command sed -i '1s/^\xEF\xBB\xBF//' "$LINE_FILE" can be executed after a preliminary comparison with the merge tool after a conversion without it since it can cause "differences".
NOTE: The search using find brings all non-binary files from the given path ("") and its subfolders.
Check out tools available for a data convertation in a linux cli: https://www.debian.org/doc/manuals/debian-reference/ch11.en.html
Also, there is a quest to find out a full list of encodings which are available in iconv. Just run iconv --list and find out that encoding names differs from names returned by uchardet tool (for example: x-mac-cyrillic in uchardet vs. mac-cyrillic in iconv)
enca command doesn't work for my Simplified-Chinese text file with GB2312 encoding.
Instead, I use the following function to convert the text file for me.
You could of course re-direct the output into a file.
It requires chardet and iconv commands.
detection_cat ()
{
DET_OUT=$(chardet $1);
ENC=$(echo $DET_OUT | sed "s|^.*: \(.*\) (confid.*$|\1|");
iconv -f $ENC $1
}
use iconv and uchardet (thx farseerfc)
fish shell
cat your_file | iconv -f (uchardet your_file ) -t UTF-8
bash shell
cat your_file | iconv -f $(uchardet your_file ) -t UTF-8
if use bash script
#!/usr/bin/bash
for fn in "$#"
do
iconv < "$fn" -f $(uchardet "$fn") -t utf8
done
by #flowinglight at ubuntu group.

Resources