Filename manipulation - linux

Kindly help me with a unix script to modify the filename in required format as shown below:
AN_555a_orange_20190513.txt
AN_555b_apple_20190513.txt
Required format: Fruits names first character should be in Caps and also its position should be is changed to second:
AN_Orange_555a_20190513.txt
AN_Apple_555a_20190513.txt
And it should apply for all files present in directory,
below is the command i'm trying which is not working
for in in aaal*
do
out=${in#*_}
out=${out%_*_*_*}
out=${out%[0-9]}
out1=${out#*_}
out2=${out%_*}
AAAI_$out1$out2.txt
done

This script is simple, but worked with your sample:
#!/bin/bash
for i in AN*; do
NAME=$(echo $i | awk -F_ '{printf "%s_%s%s_%s_%s", $1,toupper( substr( $3,1,1)),(substr($3,2,100)),$2,$4,$5}')
echo "--> $NAME"
done

An interesting solution for this case is to use sed, just like this:
$ ls -1 | sed 's/\(AN_\)\([^_]*_\)\([a-z]*_\)\([0-9]*.txt\)/mv "&" "\1\u\3\2\4"/e'
Note the final e at the end of the sed command. It tells sed to execute the result of the substitution as a bash command.
So if you remove the e (which you could do at first, to check the substitution works as expected), you would get in the console:
$ ls -1 | sed 's/\(AN_\)\([^_]*_\)\([a-z]*_\)\([0-9]*.txt\)/mv "&" "\1\u\3\2\4"/'
mv "AN_555a_orange_20190513.txt" "AN_Orange_555a_20190513.txt"
mv "AN_555b_apple_20190513.txt" "AN_Apple_555b_20190513.txt"
(The sed substitution matches the several groups of characters, reorders them and creates the mv ... ... line. Note that & in the replacement pattern denotes the whole pattern matched, and \u tells sed to put the next character as upper case.)
Then add back that final e, and instead of printing these lines sed will execute them, effectively renaming the files.

This onliner could give you more idas:
awk -F_ '{printf "mv %s %s_%s%s_%s_%s\n", $0, $1,toupper(substr($3,1,1)), substr($3, 2),$2,$4}' <(ls *.txt)
This will print something like:
mv AN_555a_orange_20190513.txt AN_Orange_555a_20190513.txt
mv AN_555b_apple_20190513.txt AN_Apple_555b_20190513.txt
Then if are happy with the results, pipe it to sh for example:
awk -F_ '{printf "mv %s %s_%s%s_%s_%s\n", $0, $1,toupper(substr($3,1,1)), substr($3, 2),$2,$4}' <(ls *.txt) | sh

Related

Extract field after colon for lines where field before colon matches pattern

I have a file file1 which looks as below:
tool1v1:1.4.4
tool1v2:1.5.3
tool2v1:1.5.2.c8.5.2.r1981122221118
tool2v2:32.5.0.abc.r20123433554
I want to extract value of tool2v1 and tool2v2
My output should be 1.5.2.c8.5.2.r1981122221118 and 32.5.0.abc.r20123433554.
I have written the following awk but it is not giving correct result:
awk -F: '/^tool2v1/ {print $2}' file1
awk -F: '/^tool2v2/ {print $2}' file1
grep -E can also do the job:
grep -E "tool2v[12]" file1 |sed 's/^.*://'
If you have a grep that supports Perl compatible regular expressions such as GNU grep, you can use a variable-sized look-behind:
$ grep -Po '^tool2v[12]:\K.*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
The -o option is to retain just the match instead of the whole matching line; \K is the same as "the line must match the things to the left, but don't include them in the match".
You could also use a normal look-behind:
$ grep -Po '(?<=^tool2v[12]:).*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
And finally, to fix your awk which was almost correct (and as pointed out in a comment):
$ awk -F: '/^tool2v[12]/ { print $2 }' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
You can filter with grep:
grep '\(tool2v1\|tool2v2\)'
And then remove the part before the : with sed:
sed 's/^.*://'
This sed operation means:
^ - match from beginning of string
.* - all characters
up to and including the :
... and replace this matched content with nothing.
The format is sed 's/<MATCH>/<REPLACE>/'
Whole command:
grep '\(tool2v1\|tool2v2\)' file1|sed 's/^.*://'
Result:
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
the question has already been answered though, but you can also use pure bash to achieve the desired result
#!/usr/bin/env bash
while read line;do
if [[ "$line" =~ ^tool2v* ]];then
echo "${line#*:}"
fi
done < ./file1.txt
the while loop reads every line of the file.txt, =~ does a regexp match to check if the value of $line variable if it starts with toolv2, then it trims : backward

Extract substring after a character

I'm trying to extract substring after the last period (dot).
examples below.
echo "filename..txt" should return "txt"
echo "filename.txt." should return ""
echo "filename" should return ""
echo "filename.xml" should return "xml"
I tried below. but works only if the character(dot) exists once. But my filename may have (dot) for 0 or more times.
echo "filename.txt" | cut -d "." -f2
Let's use awk!
awk -F"." '{print (NF>1)? $NF : ""}' file
This sets field separator to . and prints the last one. But if there is none, it prints an empty string.
Test
$ cat file
filename..txt
filename.txt.
filename
filename.xml
$ awk -F"." '{print (NF>1)? $NF : ""}' file
txt
xml
One can make this portable (so it's not Linux-only), avoiding an ERE dependency, with the following:
$ sed -ne 's/.*\.//p' <<< "file..txt"
txt
$ sed -ne 's/.*\.//p' <<< "file.txt."
$ sed -ne 's/.*\.//p' <<< "file"
$ sed -ne 's/.*\.//p' <<< "file.xml"
xml
Note that for testing purposes, I'm using a "here-string" in bash. If your shell is not bash, use whatever your shell uses to feed data to sed.
The important bit here is the use of sed's -n option, which tells it not to print anything by default, combined with the substitute command's explicit p flag, which tells sed to print only upon a successful substitution, which obviously requires a dot to be included in the pattern.
With this solution, the difference between "file.txt." and "file" is that the former returns the input line replaced with null (so you may still get a newline depending on your usage), whereas the latter returns nothing, as sed is not instructed to print, as no . is included in the input. The end result may well be the same, of course:
$ printf "#%s#\n" $(sed -ne 's/.*\.//p' <<< "file.txt.")
##
$ printf "#%s#\n" $(sed -ne 's/.*\.//p' <<< "file")
##
Simple to do with awk:
awk -F"." '{ print $NF }'
What this does: With dot as a delimiter, extract the last field from the input.
Use sed in 2 steps: first remove string without a dot and than remove up to the last dot:
sed -e 's/^[^.]*$//' -e 's/.*\.//'
Test:
for s in file.txt.. file.txt. file.txt filename file.xml; do
echo "$s -> $(echo "$s" | sed -e 's/^[^.]*$//' -e 's/.*\.//')"
done
Testresult:
file.txt.. ->
file.txt. ->
file.txt -> txt
filename ->
file.xml -> xml
Actually the answer of #ghoti is roughly the same, just a bit shorter (better).
This solution can be used by other readers who wants to do something like this in another language.

concatenate the result of echo and a command output

I have the following code:
names=$(ls *$1*.txt)
head -q -n 1 $names | cut -d "_" -f 2
where the first line finds and stores all names matching the command line input into a variable called names, and the second grabs the first line in each file (element of the variable names) and outputs the second part of the line based on the "_" delim.
This is all good, however I would like to prepend the filename (stored as lines in the variable names) to the output of cut. I have tried:
names=$(ls *$1*.txt)
head -q -n 1 $names | echo -n "$names" cut -d "_" -f 2
however this only prints out the filenames
I have tried
names=$(ls *$1*.txt
head -q -n 1 $names | echo -n "$names"; cut -d "_" -f 2
and again I only print out the filenames.
The desired output is:
$
filename1.txt <second character>
where there is a single whitespace between the filename and the result of cut.
Thank you.
Best approach, using awk
You can do this all in one invocation of awk:
awk -F_ 'NR==1{print FILENAME, $2; exit}' *"$1"*.txt
On the first line of the first file, this prints the filename and the value of the second column, then exits.
Pure bash solution
I would always recommend against parsing ls - instead I would use a loop:
You can avoid the use of awk to read the first line of the file by using bash built-in functionality:
for i in *"$1"*.txt; do
IFS=_ read -ra arr <"$i"
echo "$i ${arr[1]}"
break
done
Here we read the first line of the file into an array, splitting it into pieces on the _.
Maybe something like that will satisfy your need BUT THIS IS BAD CODING (see comments):
#!/bin/bash
names=$(ls *$1*.txt)
for f in $names
do
pattern=`head -q -n 1 $f | cut -d "_" -f 2`
echo "$f $pattern"
done
If I didn't misunderstand your goal, this also works.
I've always done it this way, I just found out that this is a deprecated way to do it.
#!/bin/bash
names=$(ls *"$1"*.txt)
for e in $names;
do echo $e `echo "$e" | cut -c2-2`;
done

bash script to strip out some characters

Bash scripting. How can i get a simple while loop to go through a file with below content and strip out all character from T (including T) using sed
"2012-05-04T10:16:04Z"
"2012-04-05T15:27:40Z"
"2012-03-05T14:58:27Z"
"2011-11-29T15:04:09Z"
"2011-11-16T12:12:00Z"
Thanks
A simple awk command to do this:
awk -F '["T]' '{print $2}' file
2012-05-04
2012-04-05
2012-03-05
2011-11-29
2011-11-16
Through sed,
sed 's/"\|T.*//g' file
"matches double quotes \| or T.* starts from the first T match all the characters upto the last. Replacing the matched characters with an empty string will give you the desired output.
Example:
$ echo '"2012-05-04T10:16:04Z"' | sed 's/"\|T.*//g'
2012-05-04
With bash builtins:
while IFS='T"' read -r a a b; do echo "$a"; done < filename
Output:
2012-05-04
2012-04-05
2012-03-05
2011-11-29
2011-11-16

linux shell title case

I am wrinting a shell script and have a variable like this: something-that-is-hyphenated.
I need to use it in various points in the script as:
something-that-is-hyphenated, somethingthatishyphenated, SomethingThatIsHyphenated
I have managed to change it to somethingthatishyphenated by stripping out - using sed "s/-//g".
I am sure there is a simpler way, and also, need to know how to get the camel cased version.
Edit: Working function derived from #Michał's answer
function hyphenToCamel {
tr '-' '\n' | awk '{printf "%s%s", toupper(substr($0,1,1)), substr($0,2)}'
}
CAMEL=$(echo something-that-is-hyphenated | hyphenToCamel)
echo $CAMEL
Edit: Finally, a sed one liner thanks to #glenn
echo a-hyphenated-string | sed -E "s/(^|-)([a-z])/\u\2/g"
a GNU sed one-liner
echo something-that-is-hyphenated |
sed -e 's/-\([a-z]\)/\u\1/g' -e 's/^[a-z]/\u&/'
\u in the replacement string is documented in the sed manual.
Pure bashism:
var0=something-that-is-hyphenated
var1=(${var0//-/ })
var2=${var1[*]^}
var3=${var2// /}
echo $var3
SomethingThatIsHyphenated
Line 1 is trivial.
Line 2 is the bashism for replaceAll or 's/-/ /g', wrapped in parens, to build an array.
Line 3 uses ${foo^}, which means uppercase (while ${foo,} would mean 'lowercase' [note, how ^ points up while , points down]) but to operate on every first letter of a word, we address the whole array with ${foo[*]} (or ${foo[#]}, if you would prefer that).
Line 4 is again a replace-all: blank with nothing.
Line 5 is trivial again.
You can define a function:
hypenToCamel() {
tr '-' '\n' | awk '{printf "%s%s", toupper(substr($0,0,1)), substr($0,2)}'
}
CAMEL=$(echo something-that-is-hyphenated | hypenToCamel)
echo $CAMEL
In the shell you are stuck with being messy:
aa="aaa-aaa-bbb-bbb"
echo " $aa" | sed -e 's/--*/ /g' -e 's/ a/A/g' -e 's/ b/B/g' ... -e 's/ *//g'
Note the carefully placed space in the echo and the double space in the last -e.
I leave it as an exercise to complete the code.
In perl it is a bit easier as a one-line shell command:
perl -e 'print map{ $a = ucfirst; $a =~ s/ +//g; $a} split( /-+/, $ARGV[0] ), "\n"' $aa
For the records, here's a pure Bash safe method (that is not subject to pathname expansion)—using Bash≥4:
var0=something-that-is-hyphenated
IFS=- read -r -d '' -a var1 < <(printf '%s\0' "${var0,,}")
printf '%s' "${var1[#]^}"
This (safely) splits the lowercase expansion of var0 at the hyphens, with each split part in array var1. Then we use the ^ parameter expansion to uppercase the first character of the fields of this array, and concatenate them.
If your variable may also contain spaces and you want to act on them too, change IFS=- into IFS='- '.

Resources