Rename final part of a directory following final forwardslash - linux

I have a list of directories in a text file, which I am looping through. I simply want to rename the final part of the directory to a standardized name (this is the part of the directory after the final forwardslash). Is there a simple way to identify the final part of a file name and rename it in a loop?
For example, the directories are ...
/data/images/projects/MM/1/scans/16/7__ep2d_fid_basic_bold_293_Resting_State
/data/images/projects/MM/1/scans/20/7__ep2d_fid_basic_bold_293_Resting_State
/data/images/projects/MM/1/scans/03/8
I want to change them to...
/data/images/projects/MM/1/scans/16/rs
/data/images/projects/MM/1/scans/20/rs
/data/images/projects/MM/1/scans/03/rs
I can't figure out a way to do this, although it should be simple. Sorry, a newbie here.

I wrote a small script that will do the job for you.
It will read the content of an input file, remove everything after last slash + slash sign, and append /rs at the end, and then write the changes to a new file (called output.txt).
Create a script.sh and chmod +x it and add the following to it,
#!/bin/bash
# Usage: ./script.sh input.txt, where input.txt is your input file with the
# raw data you want to change.
while IFS='' read -r line
do
name=$line
echo "Before: $name"
name=`echo $name | awk -F '/' '{ for(i=1; i<=8; i++) {print $i} }' | tr '\n' '/'|sed 's/.$//'`
echo "$name/rs" >> output.txt
done < "$1"
# EOF
Run it by typing ./script.sh input.txt in the console.
Note: This is one way of doing it and it probably fits your current specific problem. It does not update the input file. Rather, it creates a new file (output.txt) with the new content.

while IFS='' read -r line
do
name=$line
echo "Before: $name"
name=`echo $name | awk -F '/' '{ for(i=1; i<=10; i++) {print $i} }' | tr '\n' '/'|sed 's/.$//'`
echo "$name/rsfMRI" > /sdata/images/projects/ASD_MM/1/datafiles/output.txt
for y in $(cat /sdata/images/projects/ASD_MM/1/datafiles/output.txt); do
mv $line $y
done
done < "$1"

Related

Replace filename to a string of the first line in multiple files in bash

I have multiple fasta files, where the first line always contains a > with multiple words, for example:
File_1.fasta:
>KY620313.1 Hepatitis C virus isolate sP171215 polyprotein gene, complete cds
File_2.fasta:
>KY620314.1 Hepatitis C virus isolate sP131957 polyprotein gene, complete cds
File_3.fasta:
>KY620315.1 Hepatitis C virus isolate sP127952 polyprotein gene, complete cds
I would like to take the word starting with sP* from each file and rename each file to this string (for example: File_1.fasta to sP171215.fasta).
So far I have this:
$ for match in "$(grep -ro '>')";do
fname=$("echo $match|awk '{print $6}'")
echo mv "$match" "$fname"
done
But it doesn't work, I always get the error:
grep: warning: recursive search of stdin
I hope you can help me!
you can use something like this:
grep '>' *.fasta | while read -r line ; do
new_name="$(echo $line | cut -d' ' -f 6)"
old_name="$(echo $line | cut -d':' -f 1)"
mv $old_name "$new_name.fasta"
done
It searches for *.fasta files and handles every "hitted" line
it splits each result of grep by spaces and gets the 6th element as new name
it splits each result of grep by : and gets the first element as old name
it
moves/renames from old filename to new filename
There are several things going on with this code.
For a start, .. I actually don't get this particular error, and this might be due to different versions.
It might resolve to the fact that grep interprets '>' the same as > due to bash expansion being done badly. I would suggest maybe going for "\>".
Secondly:
fname=$("echo $match|awk '{print $6}'")
The quotes inside serve unintended purpose. Your code should like like this, if anything:
fname="$(echo $match|awk '{print $6}')"
Lastly, to properly retrieve your data, this should be your final code:
for match in "$(grep -Hr "\>")"; do
fname="$(echo "$match" | cut -d: -f1)"
new_fname="$(echo "$match" | grep -o "sP[^ ]*")".fasta
echo mv "$fname" "$new_fname"
done
Explanations:
grep -H -> you want your grep to explicitly use "Include Filename", just in case other shell environments decide to alias grep to grep -h (no filenames)
you don't want to be doing grep -o on your file search, as you want to have both the filename and the "new filename" in one data entry.
Although, i don't see why you would search for '>' and not directory for 'sP' as such:
for match in "$(grep -Hro "sP[0-9]*")"
This is not the exact same behaviour, and has different edge cases, but it just might work for you.
Quite straightforward in (g)awk :
create a file "script.awk":
FNR == 1 {
for (i=1; i<=NF; i++) {
if (index($i, "sP")==1) {
print "mv", FILENAME, $i ".fasta"
nextfile
}
}
}
use it :
awk -f script.awk *.fasta > cmmd.txt
check the content of the output.
mv File_1.fasta sP171215.fasta
mv File_2.fasta sP131957.fasta
if ok, launch rename with . cmmd.txt
For all fasta files in directory, search their first line for the first word starting with sP and rename them using that word as the basename.
Using a bash array:
for f in *.fasta; do
arr=( $(head -1 "$f") )
for word in "${arr[#]}"; do
[[ "$word" =~ ^sP* ]] && echo mv "$f" "${word}.fasta" && break
done
done
or using grep:
for f in *.fasta; do
word=$(head -1 "$f" | grep -o "\bsP\w*")
[ -z "$word" ] || echo mv "$f" "${word}.fasta"
done
Note: remove echo after you are ok with testing.

concatenate the result of echo and a command output

I have the following code:
names=$(ls *$1*.txt)
head -q -n 1 $names | cut -d "_" -f 2
where the first line finds and stores all names matching the command line input into a variable called names, and the second grabs the first line in each file (element of the variable names) and outputs the second part of the line based on the "_" delim.
This is all good, however I would like to prepend the filename (stored as lines in the variable names) to the output of cut. I have tried:
names=$(ls *$1*.txt)
head -q -n 1 $names | echo -n "$names" cut -d "_" -f 2
however this only prints out the filenames
I have tried
names=$(ls *$1*.txt
head -q -n 1 $names | echo -n "$names"; cut -d "_" -f 2
and again I only print out the filenames.
The desired output is:
$
filename1.txt <second character>
where there is a single whitespace between the filename and the result of cut.
Thank you.
Best approach, using awk
You can do this all in one invocation of awk:
awk -F_ 'NR==1{print FILENAME, $2; exit}' *"$1"*.txt
On the first line of the first file, this prints the filename and the value of the second column, then exits.
Pure bash solution
I would always recommend against parsing ls - instead I would use a loop:
You can avoid the use of awk to read the first line of the file by using bash built-in functionality:
for i in *"$1"*.txt; do
IFS=_ read -ra arr <"$i"
echo "$i ${arr[1]}"
break
done
Here we read the first line of the file into an array, splitting it into pieces on the _.
Maybe something like that will satisfy your need BUT THIS IS BAD CODING (see comments):
#!/bin/bash
names=$(ls *$1*.txt)
for f in $names
do
pattern=`head -q -n 1 $f | cut -d "_" -f 2`
echo "$f $pattern"
done
If I didn't misunderstand your goal, this also works.
I've always done it this way, I just found out that this is a deprecated way to do it.
#!/bin/bash
names=$(ls *"$1"*.txt)
for e in $names;
do echo $e `echo "$e" | cut -c2-2`;
done

For each line in file execute command synchronously and save to newline of another file

I have a wget script named Chktitle.sh -- this script takes a command like below
$ Chktitle.sh "my url"
I then have a file name url.txt with over 100 lines with urls and ips to check for web-page titles.
Then i have results.txt as a blank file.
Is there any way I can perform a repetitive action like below for each line in the file:
Grab line1 from url.txt
-----
then execute Chktitle.sh "line1"
-----
Now save the result for line1 in results.txt
-----
Now goto Line2 ........
etc etc etc
I need to make sure that it will only execute the next line after the previous one has finished.
Can any one show me any easy way to perform this? I am happy to use Perl, sh, and consider other languages..
The contents of chktitle.sh:
#!/bin/bash
string=$1"/search/"
wget --quiet -O - $string \
| sed -n -e 's!.*<title>\(.*\)</title>.*!\1!p'
Maybe something like this could help (provided that I understood correctly) :
while read line; do
/path/to/Chktitle.sh x"$line" >> results.txt;
done < /path/to/input.txt
For each line in /path/to/input.txt, execute your script and append the output (>>) to results.txt.
Of course you could always add additional statements in your while loop :
while read line; do
# Initialise var to output of chktitle
var=$(/path/to/Chktitle.sh x"$line");
# Add conditions
if [ "$var" = "google" ]; then
echo "google" >> result.txt;
else
echo "not google" >> result.txt;
fi
done < /path/to/input.txt
Here is how you could do this in Perl:
use warnings;
use strict;
use LWP::Simple;
my $inputFile = 'url.txt';
open (my $fh, '<', $inputFile) or die "Could not open file '$inputFile': $!\n";
while (<$fh>) {
my $url=chomp;
my $str=get($url);
if (! defined $str) {
warn "Could not find page '$url'\n";
next;
}
my ($title)=$str=~ m{<title>(.*?)</title>}s;
if (! defined $title) {
warn "No title in document '$url'\n";
next;
}
print "$title\n";
}
close ($fh);
cat url.txt | xargs -I{} ./Chktitle.sh {} >> results.txt
See xargs, especially the -I switch.
This xargs call will read the input (url.txt) line by line and call ./Chktitle.sh with each such read line as a parameter.
The {} is the placeholder for the line read. You can also write
cat url.txt | xargs -Ifoo ./Chktitle.sh foo >> results.txt
(with foo as placeholder) but {} is the placeholder that is usually used for xargs.
You can create your script with 2 parameters as follows
HOW SCRIPT WORKS ON COMMAND LINE
< script > < path to url file > <path to excuting script>
.
The codes are broken down as follows with explanation
STEP 1
#!/bin/bash
rm -f "/root/Desktop/result.txt 2> /dev/null
remove any file that has the name result.txt so that i can create a new blank file
STEP 2
while read -r my_url; do
"$2" "$my_url" >> "/root/Desktop/result.txt"
done < "$1"
Set up a while do loop to read all lines in the url file (which is known as"$1").
Each line read is saved as "my_url".
The loop take your script script (Chktitle.sh - $2) followed by the line read known as "my_url" and execute it on the command line and redirect the output to result.txt. This is done for each line.
NOW LET US SUMMARIZE ALL THE CODES INTO ONE SCRIPT AS FOLLOWS
#!/bin/bash
rm -f result.txt 2> /dev/null
while read -r my_url; do
"$2" "$my_url" >> "/root/Desktop/result.txt"
done < "$1"

Finding contents of one file in another file

I'm using the following shell script to find the contents of one file into another:
#!/bin/ksh
file="/home/nimish/contents.txt"
while read -r line; do
grep $line /home/nimish/another_file.csv
done < "$file"
I'm executing the script, but it is not displaying the contents from the CSV file. My contents.txt file contains number such as "08915673" or "123223" which are present in the CSV file as well. Is there anything wrong with what I do?
grep itself is able to do so. Simply use the flag -f:
grep -f <patterns> <file>
<patterns> is a file containing one pattern in each line; and <file> is the file in which you want to search things.
Note that, to force grep to consider each line a pattern, even if the contents of each line look like a regular expression, you should use the flag -F, --fixed-strings.
grep -F -f <patterns> <file>
If your file is a CSV, as you said, you may do:
grep -f <(tr ',' '\n' < data.csv) <file>
As an example, consider the file "a.txt", with the following lines:
alpha
0891234
beta
Now, the file "b.txt", with the lines:
Alpha
0808080
0891234
bEtA
The output of the following command is:
grep -f "a.txt" "b.txt"
0891234
You don't need at all to for-loop here; grep itself offers this feature.
Now using your file names:
#!/bin/bash
patterns="/home/nimish/contents.txt"
search="/home/nimish/another_file.csv"
grep -f <(tr ',' '\n' < "${patterns}") "${search}"
You may change ',' to the separator you have in your file.
Another solution:
use awk and create your own hash(e.g. ahash), all controlled by yourself.
replace $0 to $i and you can match any fields you want.
awk -F"," '
{
if (nowfile==""){ nowfile = FILENAME; }
if(FILENAME == nowfile)
{
hash[$0]=$0;
}
else
{
if($0 ~ hash[$0])
{
print $0
}
}
} ' xx yy
I don't think you really need a script to perform what you're trying to do.
One command is enough. In my case, I needed an identification number in column 11 in a CSV file (with ";" as separator):
grep -f <(awk -F";" '{print $11}' FILE_TO_EXTRACT_PATTERNS_FROM.csv) TARGET_FILE.csv

Parsing a CSV string in Shell Script and writing it to a File

I am not a Linux scripting expert and I have exhausted my knowledge on this matter. Here is my situation.
I have a list of states passed as a command line argument to a shell script ( e.g "AL,AK,AS,AZ,AR,CA..." ). The Shell script needs to extract each of the state code and write it to a file ( states.txt) , with each state in one line. See below
AL
AK
AS
AZ
AR
CA
..
..
How can this be achieved using a linux shell script.
Thanks in advance.
Use tr:
echo "AL,AK,AS,AZ,AR,CA" | tr ',' '\n' > states.txt
echo "AL,AK,AS,AZ,AR,CA" | awk -F, '{for (i = 1; i <= NF; i++) print $i}';
Naive solution:
echo "AL,AK,AS,AZ,AR,CA" | sed 's/,/\n/g'
I think awk is the simplest solution, but you could try using cut in a loop.
Sample script (outputs to stdout, but you can just redirect it):
#!/bin/bash
# Check for input
if (( ${#1} == 0 )); then
echo No input data supplied
exit
fi
# Initialise first input
i=$1
# While $i still contains commas
while { echo $i| grep , > /dev/null; }; do
# Get first item of $i
j=`echo $i | cut -d ',' -f '1'`
# Shift off the first item of $i
i=`echo $i | cut --complement -d ',' -f '1'`
echo $j
done
# Display the last item
echo $i
Then you can just run it as ./script.sh "AL,AK,AS,AZ,AR,CA" > states.txt (assuming you save it as script.sh in the local directory and give it execute permission)

Resources