How to delete the different extensions for many files? - rename

Now I have many files, but with messy names. For example:
pre_day_19861106.txt0911_Vw7%2FzJSe5KwHIE9EOxbaJaO1e6c%3D
pre_day_20101222.txt1501_zOPHyWs3aIC7Z54yHL0ts%2BiX9bo%3D
pre_day_19861107.txt0911_EdGtVc1aaXGAs747hzPCVCt7wU0%3D
pre_day_20101223.txt1501_bbCw6m7LvaUbZ5bGzAFNev0%2BGhw%3D
pre_day_19861108.txt0911_kM9nUGCfMTUoRXhir2AwOQ7QQtw%3D
pre_day_20101224.txt1501_Pu9u4pxwz8vT6py6G9ts6Lh%2B1yc%3D
pre_day_19861109.txt0911_PNhwc8hmNWCjQ9HpQIkIAIrTy5c%3D
pre_day_20101225.txt1501_ckB9uZy2BeMbF8St6ZGC3cURaIc%3D
pre_day_19861110.txt0911_TnnM2XtOI6cs370EAl1RRM4XGx0%3D
pre_day_20101226.txt1501_DqJNIrTjs6HUVcYpPCUAWdKVf0o%3D
pre_day_19861111.txt0911_qIN7qS2%2F4wgs3TQh4kJZsdYVQTs%3D
pre_day_20101227.txt1502_0PDIuHiRxRyyfBSDYJAAP8hYdrE%3D
pre_day_19861112.txt0912_DXdVQU4ejGj8%2BTvKtvIUU6uzucw%3D
pre_day_20101228.txt1502_rONlbztFSPCNNxwLeNQ0tDTwQEA%3D
I don't know why strange name suffix attach after the "txt". So how to delete them and just keep the "txt".

You can use find with awk like this;
find . -iname "*.txt*" | awk -F'.txt' '{ system("mv " $0 " " $1 ".txt") }'
This will rename pre_day_19861106.txt0911_Vw7%2FzJSe5KwHIE9EOxbaJaO1e6c%3D to pre_day_19861106.txt

In Windows DOS you can just execute
rename pre_day_????????.txt* *.txt
You can also use the free Bulk Rename Utility which provides a neat GUI for renaming.

Related

How do I replace ".net" with space using sed in Linux?

I'm using for loop, with arguments i. Each argument contains ".net" at the end and in directory they are in one line, divided by some space. Now I need to get rid of these ".net" using substitution of sed, but it's not working. I went through different options, the most recent one is
sed 's/\.(net)//g' $i;
which is obviously not correct, but I just can't find anything online about this.
To make it clear, lets say I have a directory with 5 files with names
file1.net
file2.net
file3.net
file4.net
file5.net
I would like my output to be
file1
file2
file3
file
file5
...Could somebody give me some advice?
You can use
for f in *.net; do mv "$f" "${f%.*}"; done
Details:
for f in *.net; - iterates over files with net extension
mv "$f" "${f%.*}" - renames the files with the file without net extension (${f%.*} removes all text - as few as possible - from the end of f till the first ., see Parameter expansion).
This is a work for perl's rename :
rename -n 's/\.net//' *.net
The -n is for test purpose. Remove it if the output looks good for you
This way:
sed -i.backup 's/\.net$//g' "$1";
It will create a backup for safeness

How can I remove these parts of my filenames?

I have some files I need to rename in bulk. For example:
Jul-0961_S7_R2_001.fastq.gz
Jul-0967_S22_rep1_R1.fastq.gz
Jul-0974_S32_R2_001.fastq.gz
I need to remove the S* part of the filename but I don't know the right regular regex to use.
Specifically:
Jul-0961_S7_R2_001.fastq.gz --> Jul-0961_R2_001.fastq.gz
Something like, rename 's/S*//' *.gz is what I'm looking for.
Is there a regex wizard out there who can show me the way? Thanks in advance.
You should be able to use something like this: s/_S[0-9]+_/_/
If the files are in the same format (i.e have the same number of underscores, you could use:
"ls" | awk -F_ '{ system("mv "$0" "$1"_"$3"_"$4) }'
Here we are using underscore as the delimiter and then building a command to execute with the system function

Removing specific strings from strings in a file

I want to remove specific fields in all strings in a semi-colon delimited file.
The file looks something like this :-
texta1;texta2;texta3;texta4;texta5;texta6;texta7
textb1;textb2;textb3;textb4;textb5;textb6;textb7
textc1;textc2;textc3;textc4;textc5;textc6;textc7
I would like to remove positions 2, 5 and 7 from all strings in the file.
Desired output :-
texta1;texta3;texta4;texta6
textb1;textb3;textb4;textb6
textc1;textc3;textc4;textc6
I am trying to write a small shell script using 'awk' but the code is not working as expected. I am still seeing the semicolons in between & at the end not being removed.
(Note- I was able to do it with 'sed' but my file has several hundred thousands of records & the sed code is taking a lot of time)
Could you please provide some help on this ? Thanks in advance.
Most simply with cut:
cut -d \; -f 1,3-4,6,8- filename
or
cut -d \; -f 2,5,7 --complement filename
I think --complement is GNU-specific, though. The 8- in the first example is not actually necessary for a file with only seven columns; it would include all columns from the eighth forward if they existed. I included it because it doesn't hurt and provides a more general solution to the problem.
I voted the answer by #Wintermute up, but if cut --complement is not available to you or you insist on using awk, then you can do:
awk -v scols=2,5,7 'BEGIN{FS=";"; OFS=";"} {
split(scols,acols,","); for(i in acols) $acols[i]=""; gsub(";;", ";"); print}' tmp.txt

Help needed to nab the malware viral activity using awk

I am facing issues with my server as sometimes the malwares are adding their code at the end or start of the files. I have fixed the security loopholes to the extent of my knowledge. My hosting provider has informed that the security is adequate now, but I have become paranoid with the viral/malware activity on my site. I have a plan, but I am not well versed with Linux editors like sed or awk or gawk so help needed from your side. I can do this using my PHP knowledge but that would be very resource intensive.
Since malwares/virus add code at the start or end of the file (so that the website does not show any error), can you please let me know how to write a command which would recursively look into all .php files (I will use the help to make changes in other type of files) in parent and all sub-directories and add a particular tag at the start and end of the file, say, XXXXXX_START, and YYYYYY_END.
Then I need a script which would read all the .php files and check if the first line of the code is XXXXX_START and last line is YYYYYYY_END and create a report if any file is found to be different.
I will setup a cron to check all the files and email the report to me if any discrepancy found.
I know this is not 100% foolproof as virus may add the data after the commented lines, but this is the best option I could think of.
I have tried the following commands to add data at the start -
sed -i -r '1i add here' *.txt
but this isn't recursive and it adds line to only the parent directory files.
Then I found this -
BEGIN and END are special patterns. They are not used to match input records. Rather, they are used for supplying start-up or clean-up information to your awk script. A BEGIN rule is executed, once, before the first input record has been read. An END rule is executed, once, after all the input has been read. For example:
awk 'BEGIN { print "Analysis of `foo'" }
/foo/ { ++foobar }
END { print "`foo' appears " foobar " times." }' BBS-list
But unfortunately, I could not decipher anything.
Any help on above mentioned details is highly appreciated. Any other suggestions are welcomed.
Regards,
Nitin
You can use the following to modify the files (also creates backup files called .bak):
find . -name "*.php" | xargs sed -i.bak '1iSTART_XXXX
$aEND_YYYY'
You could use the following shell script for checking the files:
for f in `find . -name "*.php" -print`
do
START_LINE=`head -1 $f`
END_LINE=`tail -1 $f`
if [[ $START_LINE != "START_XXXX" ]]
then
echo "$f: Mismatched header!"
fi
if [[ $END_LINE != "END_YYYY" ]]
then
echo "$f: Mismatched footer!"
fi
done
Use version control and/or backups; in the event of suspicious activity, zap the live site and reinstall from backups or your version control source.
$ find . -type f | grep "txt$" | xargs sed -i -r '1i add here'
Will apply that command to all files in or under the current directory. You could probably fold the grep logic into find, but I like simple incantations.

How to move files based on file (file name and location in file)

I tried but I failed, I have file like:
06faefb38081b44e35b4ee846bfb84b61694a5c4.zip D:/code/3635/
0a386c77a3ae35033006efec07bfc37f5236212c.zip D:/code/3622/
0b425b29c3e51f29b9af03db46df0aa8f726705b.zip D:/code/3624/
0ecb477c82ec540c8eb0642da144a400bbf3f7c0.zip D:/code/3624/
...
My goal is to move file in first column to location in second column. I tried with while+awk but this did not worked. I would appreciate any help!
awk '{ system("mv "$1" "$2) }' filename
With awk, you can use the system function to build a move command and excute it. Obviously ensure that you are running the command in the directory with the files.
Let's assume, you file has name "data.txt", thus your code might look like this:
while read line; do mv $line; done < data.txt
What you need is just add a mv (space) to the beginning of each line. So you have 100+ ways to do that. If you love awk:
awk '$1="mv "$1' file
will create the mv command, to execute them, you can:
awk '$1="mv "$1' file |sh
I prefer this than the system() function call.

Resources