Using regex and cp: cannot stat

Using regex and cp: cannot stat - linux

I am trying to copy files over from an old file structure where data are stored in folders with improper names to a new (better) structure, but as there are 671 participants I need to copy, I want to use regex in order to streamline it (each participant has files saved in the same format). However, I keep getting a cp: cannot stat error message saying that no file/directory exists. I had assumed this meant that I had missed a / or put "" in the wrong location but I cannot see anything in the code that would suggest it.
My code is as follows (which I add a lot of comments so other collaborators can understand):
#!/bin/bash
# This code below copies the initial .nii file.
# These data are copied into my Trial Participant folders.
# Create a variable called parent_folder1 that describes initial mask directory e.g. folders for each participant which contains the files.
parent_folder1="/this/path/here/contains/Trial_Participants"
# The original folders are named according to ClinicalID_scandate_randomdigits e.g. folder 1234567890_20000101_987654.
# The destination folders are named according to TrialIDNumber e.g. LBC100001.
# The .nii files are saved under TrialIDNumber_1_ICV.nii.gz e.g. LBC1000001_1_ICV.nii.gz.
# These files need copied over from their directories into the Trial Participant folders, using the for loop function.
# The * symbol is used as a wildcard.
for i in $(ls -1d "${parent_folder1}"/*_20*); do
lbc=$(ls ${i}/finalMasks/*ICV* | sed 's/^.*\///'); lbc=${lbc:0:9}
cp "${parent_folder1}/${i}"/finalMasks/*_1_ICV.nii.gz /this/path/is/the/destination/path/${lbc}/
done
# This code uses regular expression to find the initial ICV file.
# ls asks for a list, -1 makes each new folder on a new line, d is for directory.
# *_20* refers to the name of the folders. The * covers the ClinicalID, _20* refers to the scan date and random digits.
# I have no idea what the | sed 's/^.*\///' does, but I think it strips the path.
# lbc=${lbc:0:9} is used to keep the ID numbers.
# cp copies the files that are named under TrialIDNumber(replaced by *)_1_ICV.nii.gz to the destination under the respective folder.

So after a bit of fooling around, I changed the code a lot (took out sed as it confuses me), and came up with this that worked. Thanks to those who commented!
# Create a variable called parent_folder1 that describes initial mask directory.
parent_folder1="/original/path/here"
# Iterate over directories in parent_folder1
for i in $(ls -1d "${parent_folder1}"/*_20*); do
# Extract the base name of the file in the finalMasks directory
lbc=$(basename $(ls "${i}"/finalMasks/*ICV*))
# Extract the LBC number from the file name
lbc=${lbc:0:9}
# Copy the file to the specific folder
cp "${i}"/finalMasks/${lbc}_1_ICV.nii.gz /destination/path/here/${lbc}/
done

Related

Recursively appending names of all files in a directory with exif specific png meta data field (aesthetic_score) with linux / EXIFtool

I am trying to rename all files located in a directory (recursively) with a specific meta data field appended to the end of the png file name.
the meta data field name is "aesthetic_score" with a value range from 1.0-9.0
when I type:
exiftool -Aesthetic_score -G1 -s testn.png
the result is:
[PNG] Aesthetic_score : 7.0
This is how I would like to append the png files recursively within a directory.
Note i would like to swap out the word aesthetic with the word chad in the append, and not all files will have this data field:
input file:
filename001.png (metadata aesthetic_score:7.0)
output:
filename001-chad-score-70.png
I tried to use Digikam and JExifToolGui-2.01, without success.
I am trying to perform this task in the cmd line, although other solutions are welcome. Thank you for your help.

So, this might work for you, I can't really test it; note that you would need to get rid of the echo before the mv for it to actually do something (rename rather than just show what it would do).
while read name
do
newname=$(exiftool -G1 -s "$name"|awk '$2~/FileName/{name=$4}; $2~/Aesthetic_score/{basename=gensub(/^(.+)\....$/,"\\1","1",name);ext=gensub(/^.*\.(...)$/,"\\1","1",name);gsub(/\./,"",$4);print basename"."$4"."ext}')
echo mv "$name" "$newname"
done <<<$( find -iname \*.png )
Basically the find at the very end finds all the pngs.
The while loop takes every name find throws it, and passes each file through exiftool (using your specs) and parses the output using awk, which then outputs the new name, which gets captured in the shell variable by the same name.
And finally the mv (without the echo) renames the files.

Iterate through files in a directory, create output files, linux

I am trying to iterate through every file in a specific directory (called sequences), and perform two functions on each file. I know that the functions (the 'blastp' and 'cat' lines) work, since I can run them on individual files. Ordinarily I would have a specific file name as the query, output, etc., but I'm trying to use a variable so the loop can work through many files.
(Disclaimer: I am new to coding.) I believe that I am running into serious problems with trying to use my file names within my functions. As it is, my code will execute, but it creates a bunch of extra unintended files. This is what I intend for my script to do:
Line 1: Iterate through every file in my "sequences" directory. (All of which end with ".fa", if that is helpful.)
Line 3: Recognize the filename as a variable. (I know, I know, I think I've done this horribly wrong.)
Line 4: Run the blastp function using the file name as the argument for the "query" flag, always use "database.faa" as the argument for the "db" flag, and output the result in a new file that is has the same name as the initial file, but with ".txt" at the end.
Line 5: Output parts of the output file from line 4 into a new file that has the same name as the initial file, but with "_top_hits.txt" at the end.
for sequence in ./sequences/{.,}*;
do
echo "$sequence";
blastp -query $sequence -db database.faa -out ${sequence}.txt -evalue 1e-10 -outfmt 7
cat ${sequence}.txt | awk '/hits found/{getline;print}' | grep -v "#">${sequence}_top_hits.txt
done
When I ran this code, it gave me six new files derived from each file in the directory (and they were all in the same directory - I'd prefer to have them all in their own folders. How can I do that?). They were all empty. Their suffixes were, ".txt", ".txt.txt", ".txt_top_hits.txt", "_top_hits.txt", "_top_hits.txt.txt", and "_top_hits.txt_top_hits.txt".
If I can provide any further information to clarify anything, please let me know.

If you're only interested in *.fa files I would limit your input to only those matching files like this:
for sequence in sequences/*.fa;
do

I can propose you the following improvements:
for fasta_file in ./sequences/*.fa # ";" is not necessary if you already have a new line for your "do"
do
# ${variable%something} is the part of $variable
# before the string "something"
# basename path/to/file is the name of the file
# without the full path
# $(some command) allows you to use the result of the command as a string
# Combining the above, we can form a string based on our fasta file
# This string can be useful to name stuff in a clean manner later
sequence_name=$(basename ${fasta_file%.fa})
echo ${sequence_name}
# Create a directory for the results for this sequence
# -p option avoids a failure in case the directory already exists
mkdir -p ${sequence_name}
# Define the name of the file for the results
# (including our previously created directory in its path)
blast_results=${sequence_name}/${sequence_name}_blast.txt
blastp -query ${fasta_file} -db database.faa \
-out ${blast_results} \
-evalue 1e-10 -outfmt 7
# Define a file name for the top hits
top_hits=${sequence_name}/${sequence_name}_top_hits.txt
# alternatively, using "%"
#top_hits=${blast_results%_blast.txt}_top_hits.txt
# No need to cat: awk can take a file as argument
awk '/hits found/{getline;print}' ${blast_results} \
| grep -v "#" > ${sequence_name}_top_hits.txt
done
I made more intermediate variables, with (hopefully) meaningful names.
I used \ to escape line ends and allow putting commands in several lines.
I hope this improves code readability.
I haven't tested. There may be typos.

You should be using *.fa if you only want files with a .fa ending. Additionally, if you want to redirect your output to new folders you need to create those directories somewhere using
mkdir 'folder_name'
then you need to redirect your -o outputs to those files, something like this
'command' -o /path/to/output/folder
To help you test this script out, you can run each line one by one to test them. You need to make sure each line works by itself before combining.
One last thing, be careful with your use of colons, it should look something like this:
for filename in *.fa; do 'command'; done

Is there a way to undo a batch-rename of file extensions?

Ok so I kinda dropped the ball. I was trying to understand how things work. I had a few html files on my computer that I was trying to rename as txt files. This was strictly a learning exercise. Following the instructions I found here using this code:
for file in *.html
do
mv "$file" "${file%.html}.txt"
done
produced this error:
mv: rename *.html to *.txt: No such file or directory
Long story short I ended up going rogue and renaming the html files, as well as a lot of other non html files as txt files. So now I have files labeled like
my_movie.mp4.txt
my_song.mp3.txt
my_file.txt.txt
This may be a really dumb question but.. Is there a way to check if a file has two extensions and if yes remove the last one? Or any other way to undo this mess?
EDIT
Doing this find . -name "*.*.txt" -exec echo {} \; | cat -b seems to tell me what was changed and where it is located. The cat -b part is not necessary but I like it. This still doesn't fix what I broke though.

I'm not sure if terminal can check for extensions "twice", but you can check for . in every name an if there's more than one occurence of ., then your file has more extensions. Then you can cut the extension off with finding first occurence of . in a string when going backwards... or last one if checking characters in string in a normal way.
I have a faster option for you if you can use python. You can strip the extension with:
for file in list_of_files:
os.rename(file,os.path.splitext(file)[0])
which can give you from your file.txt.txt your file.txt
Example:
You wrote that your command tells you what has changed, so just take those changed files and dump them into a file(path to file per line). Then you can easily run this:
with open('<path to list>') as f:
list_of_files = f.readlines()
for file in list_of_files:
os.rename(file.strip('\n'), os.path.splitext(file.strip('\n'))[0])
If not, then you'd need to get the list from python:
import os
results = []
for root, folder, filenames in os.walk(<your path to folder>):
for filename in filenames:
if filename.endswith('.txt.txt'):
results.append(os.path.join(root, filename))
With this you got a list of files ending with .txt.txt like this <your folder>\\<path_to_file>.
Get a path to your directory used in os.walk() without folder's name(it's already in list) so it'll be like this:
e.g. os.walk('/home/me/directory') -> path='/home/me/' and res is item already in a list, which looks like directory/...
for res in results:
path = '' # set the path here
file = os.path.join(path,r)
os.rename(file, os.path.splitext(file)[0])
Depending on what files you want to find change .txt.txt in if filename.endswith('...') to whatever you like and os.rename() will take file's name without extension which in your case means it strips the additional extension you don't want to have.

How do you format output string in bash script for input by another script?

I need to unzip a bunch of student assignment (jar) files so that I can use a script to submit the contents to the Moss (Stanford) plagiarism detection server. I did the same thing in Java which was trivial but I'm trying to re-implement to as a bash script.
I am trying to do the following:
Get a list of student names (each student has a directory).
In each student directory, sub-directories exist numbered from 1 to the
latest submission. I need to get the directory with the highest
number.
Inside of each of those submission directories contains a
jar file that I need. I copy each jar into a temp directory with the
same name as the student and unzip it.
I need that temp directory listing formatted as a string in the form
/tempDir/studentName1/.languageExt /tempDir/studentName2/.languageExt
The student directory has the basic structure:
Student_Root_Directory:
Student1
Student2
Student1
Sub-Directories: 1 2 3 4 5
1: student1.jar
2: student1.jar
...
Student2
Sub-Directories: 1 2 3
1. student2.jar
...
To do the first 3 steps above I did:
#!/bin/bash
# Extract all jar files into a temp directory called /home/moss/tempJarFiles/studentName
# $1 is the command line argument that contains the path to the institution submission dir.
# $2 is the language extension: .c, .cpp, .java, .py
students=`ls $1`
student_dir=$1
languageExt=$2
mossDir="/home/moss"
tempDir="/home/moss/tempJarStorage"
for student in $students
do
latestSubmissionDir=`ls -t $student_dir/$student | head -1`
for jarDir in $latestSubmissionDir
do
mkdir $tempDir/$student
cp $student_dir/$student/$jarDir/*.jar $tempDir/$student
unzip -d $tempDir/$student/ -o -j $tempDir/$student/$student.jar *.$languageExt
rm $tempDir/$student/$student.jar
done
done
...which results in a number of student directories being created in a temp directory that contains only the unzipped contents for the student submissions.
I need the ls output of the new temp directories formatted as a string that contains:
/tempDir/studentName1/\*.languageExt /tempDir/studentName2/\*.languageExt
I have tried variations on
find "$tempDir" -iname "*.$languageExt" -printf "%p/*.$languageExt"
using iname and not - but I either have output that contains extra directory information such as $tempDir/*.languageExt (when I just need the subdirectories $tempDir/$studentName/*.languageExt) or I have output where the path for every source file is also listed such as:
$tempDir/$studentName/studentNameA.java
$tempDir/$studentName/studentNameB.java
when I only need
$tempDir/$studentName/*.java
I think this should be really easy and I'm just over thinking it. Any hints for improving the script also appreciated.

Here's a revised version of the script hat may work:
#/bin/bash
# Extract all jar files into a temp directory called /home/moss/tempJarFiles/studentName
# $1 is the command line argument that contains the path to the institution submission dir.
# $2 is the language extension: c, cpp, java, py
students_dir=$1
languageExt=$2
studentPathsT=( "$students_dir"/*/ )
mossDir='/home/moss'
tempDir='/home/moss/tempJarStorage'
for studentPathT in "${studentPathsT[#]}"; do
student=$(basename "$studentPathT")
mkdir "$tempDir/$student"
submissionDirsT=( "$studentPathT"*/ )
latestSubmissionDirT=${submissionDirsT[${#submissionDirsT[#]-1]}
cp "$latestSubmissionDirT"*.jar "$tempDir/$student/"
unzip -d "$tempDir/$student/" -o -j "$tempDir/$student/*.jar" "*.$languageExt"
rm "$tempDir/$student"/*.jar
done
# Note that at this point `"$tempDir"/*/*.$languageExt` would expand
# to all extracted submission files, across all students.
# Finally, output each student's extracted files as an unexpanded glob à la
# /{tempDir}/{studentName1}/*.{languageExt}
for pT in "$tempDir"/*/; do
echo "$pT*.$languageExt"
# Note: If there is a chance that your filenames contain
# embedded newlines (rare in practice) using `echo` won't work properly
# as #Charles Duffy points out.
# If that is a concern, use
# printf '%s\0' "$pT*.$languageExt"
# and process the output with a utility that can process NUL characters
# as separators, such as `xargs -0`.
done
It avoids using ls and only uses pathname expansion and array variables so as to properly deal with paths that contain embedded spaces and other shell metacharacters.
suffix ...T in variable names indicates that a particular path or array of paths is *T*erminated, i.e, that it ends in a /.
The assumption is that the numbered subdirectories do not go beyond 9, as the implicit lexical sorting of pathname expansion is relied upon; if the numbers go higher, explicit numerical sorting must be applied.
Note that the globs (pathname patterns) passed to unzip are intentionally double-quoted, as they should be interpreted by unzip, not the shell.
Note that, based on your original code, I've assumed that $languageExt does NOT start with . (e.g., cpp rather than .cpp), despite what your comment says.

How to move and number files?

I working with linux, bash.
I have one directory with 100 folders in it, each one named different.
In each of these 100 folders, there is a file called first.bars (so I have 100 files named first.bars). Although all named first.bars, the files are actually slightly different.
I want to get all these files moved to one new folder and rename/number these files so that I know which file comes from which folder. So the first first.bars file must be renamed to 001.bars, the second to 002.bars.. etc.
I have tried the following:
ls -d * >> /home/directorywiththe100folders/list.txt
cat list.txt | while read line;
do cd $line;
mv first.bars /home/newfolder
This does not work because I can't have 100 files, named the same, in one folder. So I only need to know how to rename them. The renaming must be connected to the cat list.txt, because the first line is the folder containing the first file wich is moved and renamed. That file will be called 001.bars.

Try doing this :
$ rename 's/^.*?\./sprintf("%03d.", $c++)/e' *.bar
If you want more information about this command, see this recent response I gave earlier : How do I rename multiple files beginning with a Unix timestamp - imapsync issue

If the rename command is not available,
for d in /home/directorywiththe100folders/*/; do
newfile=$(printf "/home/newfolder/%d.bars" $(( c++ )) )
mv "$d/first.bars" "$newfile"
done

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string