Rename multiple filename with random numeric extension after one specific alphanumeric word in Linux - linux

I have a folder/subfolders that contain some files with filenames that end with a random numeric extension:
DWH..AUFTRAG.20211123115143.A901.3801176
DWH..AUFTRAGSPOSITION.20211122002147.A901.3798013
I would like to remove everything after A901 from the above filenames.
For example:
DWH..AUFTRAG.20211123115143.A901 (remove this .3801176)
DWH..AUFTRAGSPOSITION.20211122002147.A901 (remove this .3798013) from the filename
How do I use rename or any other command in linux to remove only after A901 everything from finale rest file name keep as it is?

I can see there is 5 '.' (dots) before the number so I did some desi jugad.
I made some files in folder and also made a folder and created some files inside that folder accourding to the name pattern that you gave.
I created a command and it somewhat looks like this.
find "$PWD"|grep A901|while read F; do mv "${F}" `echo ${F}|cut -d . -f 1-5`;done
When executed it worked for me.
terminal output below.
rexter#rexter:~/Desktop/test$ find $PWD
/home/rexter/Desktop/test
/home/rexter/Desktop/test/test1
/home/rexter/Desktop/test/test1/DWH..AUFTRAG.20211123115143.A901.43214
/home/rexter/Desktop/test/test1/DWH..AUFTRAGSPOSITION.2021112200fsd2147.A901.31244324
/home/rexter/Desktop/test/DWH..AUFTRAG.20211123115143.A901.321423
/home/rexter/Desktop/test/DWH..AUFTRAGSPOSITION.20211122002147.A901.3124325
rexter#rexter:~/Desktop/test$ find "$PWD"|grep A901|while read F; do mv "${F}" `echo ${F}|cut -d . -f 1-5`;done
rexter#rexter:~/Desktop/test$ find $PWD
/home/rexter/Desktop/test
/home/rexter/Desktop/test/test1
/home/rexter/Desktop/test/test1/DWH..AUFTRAG.20211123115143.A901
/home/rexter/Desktop/test/test1/DWH..AUFTRAGSPOSITION.2021112200fsd2147.A901
/home/rexter/Desktop/test/DWH..AUFTRAG.20211123115143.A901
/home/rexter/Desktop/test/DWH..AUFTRAGSPOSITION.20211122002147.A901
rexter#rexter:~/Desktop/test$
I dont know if this is a proper way to do it but it just make things work.
Let me know if it is useful to you.

Related

How to search multiple DOCX files for a string within a Word field?

Is there any Windows app that will search for a string of text within fields in a Word (DOCX) document? Apps like Agent Ransack and its big brother FileLocator Pro can find strings in the Word docs but seem incapable of searching within fields.
For example, I would like to be able to find all occurrences of the string "getProposalTranslations" within a collection of Word documents that have fields with syntax like this:
{ AUTOTEXTLIST \t "<wr:out select='$.shared_quote_info' datasource='getProposalTranslations'/>" }
Note that string doesn't appear within the text of the document itself but rather only within a field. Essentially the DOCX file is just a zip file, I believe, so if there's a tool that can grep within archives, that might work. Note also that I need to be able to search across hundreds or perhaps thousands of files in many directories, so unzipping the files one by one isn't feasible. I haven't found anything on my own and thought I'd ask here. Thanks in advance.
This script should accomplish what you are trying to do. Let me know if that isn't the case. I don't usually write entire scripts because it can hurt the learning process, so I have commented each command so that you might learn from it.
#!/bin/sh
# Create ~/tmp/WORDXML folder if it doesn't exist already
mkdir -p ~/tmp/WORDXML
# Change directory to ~/tmp/WORDXML
cd ~/tmp/WORDXML
# Iterate through each file passed to this script
for FILE in $#; do
{
# unzip it into ~/tmp/WORDXML
# 2>&1 > /dev/null discards all output to the terminal
unzip $FILE 2>&1 > /dev/null
# find all of the xml files
find -type f -name '*.xml' | \
# open them in xmllint to make them pretty. Discard errors.
xargs xmllint --recover --format 2> /dev/null | \
# search for and report if found
grep 'getProposalTranslations' && echo " [^ found in file '$FILE']"
# remove the temporary contents
rm -rf ~/tmp/WORDXML/*
}; done
# remove the temporary folder
rm -rf ~/tmp/WORDXML
Save the script wherever you like. Name it whatever you like. I'll name it docxfind. Make it executable by running chmod +x docxfind. Then you can run the script like this (assuming your terminal is running in the same directory): ./docxfind filenames...

Deleting all files in a directory except the ones mentioned in a list [duplicate]

This question already has answers here:
Shell script: How to delete all files in a directory except ones listed in a file?
(2 answers)
Closed 2 years ago.
I have a directory called a00 containing 3000 files with extension .SAC. I have a text file called gd.list containing names of 88 of those 3000 files. I am trying to write a code that will delete all .SAC files except those mentioned in gd.list
How to do that using shell/bash?
The rm command is commented out so that you can check and verify that it's working as needed. Then just un-comment that line.
The check directory section will ensure you don't accidentally run the script from the wrong directory and clobber the wrong files.
You can remove the echo deleting line to run silently.
#!/bin/bash
cd /home/me/myfolder2tocleanup/
# Exit if the directory isn't found.
if (($?>0)); then
echo "Can't find work dir... exiting"
exit
fi
for i in *; do
if ! grep -qxFe "$i" filelist.txt; then
echo "Deleting: $i"
# the next line is commented out. Test it. Then uncomment to removed the files
# rm "$i"
fi
done
You can find the answer here https://askubuntu.com/questions/830776/remove-file-but-exclude-all-files-in-a-list by L. D. James
there are a few alternatives.
I'd prefer to see find -Z as it more clearly demarcates the file names:
find . -maxdepth 1 -name '*.sac' -print0 | grep -x -z -Z -f gd.list | xargs -0 echo rm
Again, test this first. Perhaps sort the output and make sure it is unique versus the original file.
For a smaller list of filenames I would recommend just using find with -and -not -name and -delete, but with a larger list that can be tricky.
You could tag the files you want to keep as read-only, then delete the wildcard with the appropriate setting in rm or find to skip read-only files. That assumes you own the read-only flag. You could tag the files as executable, and use find, if the read-only flag is not for you.
Another option would be to move the matching files to a temp folder, delete the wildcard, then move the files you want to keep back. That is assuming you can afford for the files to disappear temporarily.
To make them disappear for a shorter time, move the kept files out to a temp directory, move the original directory out, move the temp directory in, then delete the movced out directory.
If you are feeling brave, try something like
ls *.sac | fgrep -v -f gd.list | xargs echo rm
Note that I've put an echo in that xargs, just to make sure no one has a cut and paste accident.
Note also the limitations of this approach mentioned in the comments. As I said, if you are feeling brave...

find returning inverted results

In a few words a wrote this little script to clean up some directories where I had consolidated directories/files from multiple sources where I used the cp command with the --backup=numbered feature so that files with identical names would have a suffix like .~1~ appended to avoid overwriting. I then ran fdupes to remove duplicate files, in some cases fdupes removed the file which did not have the suffix appended from the cp command (the original file) so I wanted to scan the directories looking for files with the suffix appended by the cp command and if the file does not exist with the suffix removed I would move mv the file otherwise I would leave it to avoid deleting anything as fdupes did not think it was a duplicate.
The issues is the test condition if [ -f ... ] part of the code below returns inverted results than what it should and I cannot understand why. For example, when the file exists it would return false and when the file did not exist it would return true. I fixed it by reversing the actions that I wanted to do based on the inverted return code and verified it was working as intended and it was so I ran it as such but would like to know if anyone knows why it would behave the way it did. I am not a bash script expert by any means so its possible that I missed something simple.
#!/bin/bash
logfile=$$.log
exec > $logfile 2>&1
IFS='
'
#set -f
for FILE in $(find . -type f -regextype posix-extended -regex '^.*(\.~[0-9]+~)+$')
do
FILE2=${FILE%%.~[0-9]*} # remove the suffix
if [ -f "${FILE2}" ]
then
echo ERROR: "${FILE2}" already exists!
else
echo "${FILE}" renamed "${FILE2}"
mv "${FILE}" "${FILE2}"
fi
done
You might be able to see the problem by modifying your script to show both FILE and FILE2 in the error message. There are a few minor problems with the script which could cause some confusion (but not the "inverted" logic):
find output is not sorted. If you had more than one backup file, a randomly chosen one would replace the original file;
you could sort the output using an expression like |sort -t~ -n -k2 on the end of the find-command.
the regular expression allows multiple matches of the ~[0-9]~ pattern. Conceivably you could have some odd file which ends with ~1~~2~.
the part where the suffix is removed assumes a single ~[0-9]~ is on the end of the filename. An embedded ~0, e.g., foo~0bar~1~ would reduce FILE to foo. The workaround for that would be more cumbersome (since the suffix-stripping uses globbing), but could be done with a case statement which matched an explicit number of digits (likely three digits would be enough).

How to make this (l)unix script dynamically accept directory name in for-loop?

I am teaching myself more (l)unix skills and wanted to see if I could begin to write a program that will eventually read all .gz files and expand them. However, I want it to be super dynamic.
#!/bin/bash
dir=~/derp/herp/path/goes/here
for file in $(find dir -name '*gz')
do
echo $file
done
So when I excute this file, I simply go
bash derp.sh.
I don't like this. I feel the script is too brittle.
How can I rework my for loop so that I can say
bash derp.sh ~/derp/herp/path/goes/here (1)
I tried re-coding it as follows:
for file in $*
However, I don't want to have to type in bash
derp.sh ~/derp/herp/path/goes/here/*.gz.
How could I rewrite this so I could simply type what is in (1)? I feel I must be missing something simple?
Note
I tried
for file in $*/*.gz and that obviously did not work. I appreciate your assistance, my sources have been a wrox unix text, carpentry v5, and man files. Unfortunately, I haven't found anything that will what I want.
Thanks,
GeekyOmega
for dir in "$#"
do
for file in "$dir"/*.gz
do
echo $file
done
done
Notes:
In the outer loop, dir is assigned successively to each argument given on the command line. The special form "$#" is used so that the directory names that contain spaces will be processed correctly.
The inner loop runs over each .gz file in the given directory. By placing $dir in double-quotes, the loop will work correctly even if the directory name contains spaces. This form will also work correctly if the gz file names have spaces.
#!/bin/bash
for file in $(find "$#" -name '*.gz')
do
echo $file
done
You'll probably prefer "$#" instead of $*; if you were to have spaces in filenames, like with a directory named My Documents and a directory named Music, $* would effectively expand into:
find My Documents Music -name '*.gz'
where "$#" would expand into:
find "My Documents" "Music" -name '*.gz'
Requisite note: Using for file in $(find ...) is generally regarded as a bad practice, because it does tend to break if you have spaces or newlines in your directory structure. Using nested for loops (as in John's answer) is often a better idea, or using find -print0 and read as in this answer.

copy multiple files from directory tree to new different tree; bash script

I want to write a script that do specific thing:
I have a txt file e.g.
from1/from2/from3/apple.file;/to1/to2/to3;some not important stuff
from1/from2/banana.file;/to1/to5;some not important stuff
from1/from10/plum.file;/to1//to5/to100;some not important stuff
Now i want to copy file from each line (e.g. apple.file), from original directory tree to new, non existing directories, after first semicolon (;).
I try few code examples from similar questions, but nothing works fine and I'm too weak in bash scripting, to find errors.
Please help :)
need to add some conditions:
file not only need to be copy, but also rename. Example line in file.txt:
from1/from2/from3/apple.file;to1/to2/to3/juice.file;some1
from1/from2/banana.file;to1/to5/fresh.file;something different from above
so apple.file need to be copy and rename to juice.file and put in to1/to2/to3/juice.file
I think thaht cp will also rename file but
mkdir -p "$to"
from answer below will create full folder path with juice.file as folder
In addidtion after second semicolon in each line will be something different, so how to cut it off?
Thanks for all help
EDIT: There will be no spaces in input txt file.
Try this code..
cat file | while IFS=';' read from to some_not_important_stuff
do
to=${to:1} # strip off leading space
mkdir -p "$to" # create parent for 'to' if not existing yet
cp -i "$from" "$to" # option -i to get a warning when it would overwrite something
done
Using awk
(run the awk command first and confirm the output is fine, then add |sh to do the copy)
awk -F";" '{printf "cp %s %s\n",$1,$2}' file |sh
Using shell (get updated that need manually create folder, base on alfe's
while IFS=';' read from to X
do
mkdir -p $to
cp $from $to
done < file
I had this same problem and used tar to solve it! Posted here:
tmpfile=/tmp/myfile.tar
files="/some/folder/file1.txt /some/other/folder/file2.txt"
targetfolder=/home/you/somefolder
tar --file="$tmpfile" "$files"​
tar --extract --file="$tmpfile" --directory="$targetfolder"
In this case, tar will automatically create all (sub)folders for you! Best,
Nabi

Resources