How can I match two sections of a filename to list the files that match it? - linux

I have a bunch of files that need merging.
The name of the files are something like this-
JOHN_80_xyz_yeti.txt
JOHN_80_xyz_puma.txt
JOHN_80_def_yeti.txt
JOHN_80_def_puma.txt
JOHN_81_xyz_yeti.txt
JOHN_81_xyz_puma.txt
JOHN_81_def_yeti.txt
JOHN_81_def_puma.txt
JOHN_82_xyz_yeti.txt
JOHN_82_xyz_puma.txt
JOHN_82_def_yeti.txt
JOHN_82_def_puma.txt
JOHN_83_xyz_yeti.txt
JOHN_83_xyz_puma.txt
JOHN_83_def_yeti.txt
JOHN_83_def_puma.txt
I want to merge JOHN_80_xyz_yeti.txt and JOHN_80_def_yeti.txt; JOHN_80_xyz_puma.txt and JOHN_80_def_puma.txt; JOHN_81_xyz_yeti.txt and JOHN_81_def_yeti.txt; JOHN_81_xyz_puma.txt and JOHN_81_def_puma.txt; and so forth, recursively through my files in a bash for loop. What command can I use so that it finds the files that have "80" and "yeti" together and list/echo it as a variable to be used in a for loop?
The command that I want to use these files for is given below-
merge -1 JOHN_80_xyz_yeti.txt -2 JOHN_80_def_yeti.txt > merged.JOHN_80_yeti.txt
merge -1 JOHN_80_xyz_puma.txt -2 JOHN_80_def_puma.txt > merged.JOHN_80_puma.txt
I tried "find file name" but failed to get the desired results.

Loop through all the xyz files and use string substitution to replace xyz with def, and replace the entire suffix with chewbacca.
for xyzfile in *_xyz_*.txt; do
deffile=${file/_xyz_/_def_}
result=${file/_xyz_*/_chewbaccca.txt}
merge -1 "$xyzfile" -2 "$deffile" > "$result"

Assuming you want to dynamically collect the substrings -
mkdir -p merged # backup
for f in JOHN*txt # loop though glob
do IFS=_ read a b x c <<< "$f" # parse the filename
lst=( $a*$b*$c ) # get its match (always only one?)
[[ -e $lst ]] || continue # skip if already done (this is 2nd file)
merge -1 "${lst[0]}" -2 "${lst[1]}" > merged.${a}_${b}_${c} # do the thing.
mv "${lst[#]}" merged/ # move to backup
done
You can verify and then delete the merged folder, or undo/edit/redo till it's right.

Related

linux command rename dates (YYYY.MMDD) to numbers(001,002,...,066,067) sequentially

I have renamed many files by using 'rename'.
However, I find a problem with conversion dates to numbers.
The file name is 2021.0801, 2021.0802, .. etc. (Year.Month&date)
I need to change Month&date parts to numbers of 001, 002, etc.
So I need to rename
2021.0801
2021.0802
...
2021.0929
2021.0930
to
2021.001
2021.002
...
2021.0**
2021.0**
I saw I can do it when I use rename or #, ? but I could not see the specific way to solve this.
Could you please let me know the way to rename these?
p.s. I tried num=001; for i in {0801..0930}; do rename $i $num *; (($num++)); done but it showed
2021.001
2021.001001
2021.001001001001
...
Additionally, ls 2021.* shows only the files that I want to change.
Your script contains a few errors. I suggest to use https://www.shellcheck.net/ to check your scripts.
After fixing the errors, the (unfinished) result is
#!/bin/bash
num=001; for i in {0801..0930}; do rename "$i" "$num" ./*; ((num++)); done
This has 3 remaining problems.
The range {0801..0930} includes the numbers 0831, 0832, ... 0899, 0900. This can be fixed by using two ranges {0801..0831} {0901..0930}
The increment operation ((num++)) does not preserve the leading zeros. This can be fixed by conditionally adding 1 or 2 zeros.
You call rename for every combination which will check all files and probably rename only one. As you know the exact file names you can replace this with a mv command.
The final version is
num=1; for i in {0801..0831} {0901..0930}; do
if [[ num -lt 10 ]] ; then
new="00$num";
elif [[ num -lt 100 ]] ; then
new="0$num";
else
new="$num"; # this case is not necessary here
fi;
mv "2021.$i" "2021.$new";
((num++));
done
The script handles leading zeros for values of 1, 2 or 3 digits which is not needed here as all numbers are less than 100. For this case, it can be simplified as
num=1; for i in {0801..0831} {0901..0930}; do
if [[ num -lt 10 ]] ; then
new="00$num";
else
new="0$num";
fi;
mv "2021.$i" "2021.$new";
((num++));
done
The script will execute these commands:
mv 2021.0801 2021.001
mv 2021.0802 2021.002
...
mv 2021.0830 2021.030
mv 2021.0831 2021.031
mv 2021.0901 2021.032
mv 2021.0902 2021.033
...
mv 2021.0929 2021.060
mv 2021.0930 2021.061
You don't need the for loop. This single command will do it all :
rename -n 'BEGIN{our $num=1}{our $num;s/\d+$/sprintf("%03d", $num)/e; $num += 1}' 2021.*
Remove -n once the resulting renaming looks good.

Setting value of command prompt ( PS1) based on present directory string length

I know I can do this to reflect just last 2 directories in the PS1 value.
PS1=${PWD#"${PWD%/*/*}/"}#
but lets say we have a directory name that's really messy and will reduce my working space , like
T-Mob/2021-07-23--07-48-49_xperia-build-20191119010027#
OR
2021-07-23--07-48-49_nokia-build-20191119010027/T-Mob#
those are the last 2 directories before the prompt
I want to set a condition if directory length of either of the last 2 directories is more than a threshold e.g. 10 chars , shorten the name with 1st 3 and last 3 chars of the directory (s) whose length exceeds 10
e.g.
2021-07-23--07-48-49_xperia-build-20191119010027 &
2021-07-23--07-48-49_nokia-build-20191119010027
both gt 10 will be shortened to 202*027 & PS1 will be respectively
T-Mob/202*027/# for T-Mob/2021-07-23--07-48-49_xperia-build-20191119010027# and
202*027/T-Mob# for 2021-07-23--07-48-49_nokia-build-20191119010027/T-Mob#
A quick 1 Liner to get this done ?
I cant post this in comments so Updating here. Ref to Joaquins Answer ( thx J)
PS1=''`echo ${PWD#"${PWD%/*/*}/"} | awk -v RS='/' 'length() <=10{printf $0"/"}; length()>10{printf "%s*%s/", substr($0,1,3), substr($0,length()-2,3)};'| tr -d "\n"; echo "#"`''
see below o/p's
/root/my-applications/bin # it shortened as expected
my-*ons/bin/#cd - # going back to prev.
/root
my-*ons/bin/# #value of prompt is the same but I am in /root
A one-liner is basically always the wrong choice. Write code to be robust, readable and maintainable (and, for something that's called frequently or in a tight loop, to be efficient) -- not to be terse.
Assuming availability of bash 4.3 or newer:
# Given a string, a separator, and a max length, shorten any segments that are
# longer than the max length.
shortenLongSegments() {
local -n destVar=$1; shift # arg1: where do we write our result?
local maxLength=$1; shift # arg2: what's the maximum length?
local IFS=$1; shift # arg3: what character do we split into segments on?
read -r -a allSegments <<<"$1"; shift # arg4: break into an array
for segmentIdx in "${!allSegments[#]}"; do # iterate over array indices
segment=${allSegments[$segmentIdx]} # look up value for index
if (( ${#segment} > maxLength )); then # value over maxLength chars?
segment="${segment:0:3}*${segment:${#segment}-3:3}" # build a short version
allSegments[$segmentIdx]=$segment # store shortened version in array
fi
done
printf -v destVar '%s\n' "${allSegments[*]}" # build result string from array
}
# function to call from PROMPT_COMMAND to actually build a new PS1
buildNewPs1() {
# declare our locals to avoid polluting global namespace
local shorterPath
# but to cache where we last ran, we need a global; be explicit.
declare -g buildNewPs1_lastDir
# do nothing if the directory hasn't changed
[[ $PWD = "$buildNewPs1_lastDir" ]] && return 0
shortenLongSegments shorterPath 10 / "$PWD"
PS1="${shorterPath}\$"
# update the global tracking where we last ran this code
buildNewPs1_lastDir=$PWD
}
PROMPT_COMMAND=buildNewPs1 # call buildNewPs1 before rendering the prompt
Note that printf -v destVar %s "valueToStore" is used to write to variables in-place, to avoid the performance overhead of var=$(someFunction). Similarly, we're using the bash 4.3 feature namevars -- accessible with local -n or declare -n -- to allow destination variable names to be parameterized without the security risk of eval.
If you really want to make this logic only apply to the last two directory names (though I don't see why that would be better than applying it to all of them), you can do that easily enough:
buildNewPs1() {
local pathPrefix pathFinalSegments
pathPrefix=${PWD%/*/*} # everything but the last 2 segments
pathSuffix=${PWD#"$pathPrefix"} # only the last 2 segments
# shorten the last 2 segments, store in a separate variable
shortenLongSegments pathSuffixShortened 10 / "$pathSuffix"
# combine the unshortened prefix with the shortened suffix
PS1="${pathPrefix}${pathSuffixShortened}\$"
}
...adding the performance optimization that only rebuilds PS1 when the directory changed to this version is left as an exercise to the reader.
Probably not the best solution, but a quick solution using awk:
PS1=`echo ${PWD#"${PWD%/*/*}/"} | awk -v RS='/' 'length()<=10{printf $0"/"}; length()>10{printf "%s*%s/", substr($0,1,3), substr($0,length()-2,3)};'| tr -d "\n"; echo "#"`
I got this results with your examples:
T-Mob/202*027/#
202*027/T-Mob/#

Concatenate a string with an array for recursively copy file in bash

I have a concatenation problem between a string and an array
I want to copy all the files contained in the directories stored in the array, my command is in a loop (to recursively copy my files)
yes | cp -rf "./$WORK_DIR/${array[$i]}/"* $DEST_DIR
My array :
array=("My folder" "...")
I have in my array several folder names (they have spaces in their names) that I would like append to my $WORK_DIR to make it possible to copy the files for cp.
But I always have the following error
cp: impossible to evaluate './WORKDIR/my': No such files or folders
cp: impossible to evaluate 'folder/*': No such files or folders
This worked for me
#!/bin/bash
arr=("My folder" "This is a test")
i=0
while [[ ${i} -lt ${#arr[#]} ]]; do
echo ${arr[${i}]}
cp -rfv ./source/"${arr[${i}]}"/* ./dest/.
(( i++ ))
done
exit 0
I ran the script. It gave me the following output:
My folder
'./source/My folder/blah-folder' -> './dest/./blah-folder'
'./source/My folder/foo-folder' -> './dest/./foo-folder'
This is a test
'./source/This is a test/blah-this' -> './dest/./blah-this'
'./source/This is a test/foo-this' -> './dest/./foo-this'
Not sure of the exact difference, but hopefully this will help.

sed command issue with string replacement

I'm having a weird problem with the sed command.
I have a script that take a c file, copy it X times and then replace the name of the functions inside it by adding number to the name.
For example:
originalFile.c contains these functions check0, check1 check2
The script will generate those file:
originalFile1.c: check0 check1 check2
originalFile2.c: check3 check4 check5
originalFile3.c: check6 check7 check8
... and so on.
Now the problem... If I generate enough files so the number goes up to 10,20 or more I noticed something in the name of the function. The first function of the file is renamed incorrectly but the other are corrects. For example:
originalFileX.c: __check165__ check16 check17
...
originalFileZ.c: __check297__ __check298__ check29 -> in this file 2 names are incorrects.
Also, If I print the name with echo everything is correct. Do you have any idea what could be wrong?
Here is my script (I run it under OSX):
#!/bin/bash
NUMCHECK=3
# $1: filename
# $2: number of function in the file
# $3: number of function I want to generate
# $4: function basename
function replace_name() {
FILE_NUM=$((($3+($2-1))/$2))
TMP=0
for (( i=1; i<$FILE_NUM+1; i++ ))
do
cp $1.mm test/$1$i.mm
for (( j=0; j<$2; j++ ))
do
OLDNAME="$4$j"
NEWNAME="$4$TMP"
echo $OLDNAME:$NEWNAME
sed -i "" "s/$OLDNAME/$NEWNAME/g" test/$1$i.mm
TMP=$(($TMP+1))
done
done
}
replace_name check $NUMCHECK 60 check
Youre doing 3 runs of the sed in each file. Just imagine the following
sed -i s/check0/check150/g test/check51.mm
sed -i s/check1/check151/g test/check51.mm
sed -i s/check2/check152/g test/check51.mm
The
s/check0/check150/g changes the check0 to check150 - ok
s/check1/check151/g will change the previous check150 to check15150 (because it finds the check1 string in the check150 too, from the previous step).
etc...
You need more precisely define your regex. because here isn't any example input, can't help more.

Bash merge columned files into one file with rows

I have many data files in this format:
-1597.5421
-1909.6982
-1991.8743
-2033.5744
But I would like to merge them all into one data file with each original data file taking up one row with spaces in between so I can import it in excel.
-1597.5421 -1909.6982 -1991.8743 -2033.5744
-1789.3324 -1234.5678 -9876.5433 -9999.4321
And so on. Each file is named ALL.ene and every directory in my working directory contains it. Can someone give me a quick fix? Thanks!
:edit. Each file has 11 entries. Those were just examples.
for i in */ALL.ene
do
echo $(<$i)
done > result.txt
Assumptions:
I assume all your data files are of this format:
<something1><newline>
<something2><newline>
<something3><newline>
So for example, if the last newline is missing, the following script will miss the field corresponding to <something3>.
Usage: ./merge.bash -o <output file> <input file list or glob>
The script appends to any existing output files from previous runs. It also does not make any assumptions to how many fields of data every input file has. It blindly puts every line into a line in the output file separated by spaces.
#!/bin/bash
# set -o xtrace # uncomment to debug
declare output
[[ $1 =~ -o$ ]] && output="$2" && shift 2 || { \
echo "The first argument should always be -o <output>";
exit -1; }
declare -a files=("${#}") row
for file in "${files[#]}";
do
while read data; do
row+=("$data")
done < "$file"
echo "${row[#]}" >> "$output"
row=()
done
Example:
$ cat data1
-1597.5421
-1909.6982
-1991.8743
-2033.5744
$ cat data2
-1789.3324
-1234.5678
-9876.5433
-9999.4321
$ ./merge.bash -o test data{1,2}
$ cat test
-1597.5421 -1909.6982 -1991.8743 -2033.5744
-1789.3324 -1234.5678 -9876.5433 -9999.4321
This is what coreutils paste is good at, try:
paste -s data_files*

Resources