Replace part of a file name in Linux - linux

I have a file name in Linux that is something like:
out-06307963554982-8091-20220726-121922-1658834362.208826.wav
I need to replace the first number in bash with X's and struggling to find the right solution. The number can vary in size but the 'out' will always remain the same.
Final file name should look like
out-XXXXXXXXXXXXXX-8091-20220726-121922-1658834362.208826.wav
OR alternatively we can cut everything in front of the second -
20220726-121922-1658834362.208826.wav

Try this:
#!/bin/bash
filename="out-06307963554982-8091-20220726-121922-1658834362.208826.wav"
echo "ORIGINAL=$filename"
IFS='-' read -r -a fields <<< "$filename"
new_second_field=$( echo "${fields[1]}" | tr '[:digit:]' 'X' )
fields[1]="$new_second_field"
new_filename=$(echo "${fields[#]}" | tr ' ' '-')
echo "NEW======$new_filename"
####################################################
# Cut everything in front of the second '-'
echo "$filename" | cut -d'-' -f4-
IFS='-' read -r -a fields <<< "$filename": splits the filename into an array where each item is text, split on -.
then tr is used to replace each digit in the second field by an X. This ensure that the number of X == number of digits.
the second field is replaced by the new value consisting only of Xs.
the new filename is "rebuilt" by doing an echo of the fields array, replacing all spaces by -.
For your alternative method, simply cut everything on - and display the results from the 4th item on (-f4-).
The output of this script is:
ORIGINAL=out-06307963554982-8091-20220726-121922-1658834362.208826.wav
NEW======out-XXXXXXXXXXXXXX-8091-20220726-121922-1658834362.208826.wav
20220726-121922-1658834362.208826.wav

Related

String split and extract the last field in bash

I have a text file FILENAME. I want to split the string at - of the first column field and extract the last element from each line. Here "$(echo $line | cut -d, -f1 | cut -d- -f4)"; alone is not giving me the right result.
FILENAME:
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram
code I tried:
while read line; do \
DNA="$(echo $line | cut -d, -f1 | cut -d- -f4)";
echo $DNA
done < ${FILENAME}
Result I want
1195060301
1195060302
1195060311
Would you please try the following:
while IFS=, read -r f1 _; do # set field separator to ",", assigns f1 to the 1st field and _ to the rest
dna=${f1##*-} # removes everything before the rightmost "-" from "$f1"
echo "$dna"
done < "$FILENAME"
Well, I had to do with the two lines of codes. May be someone has a better approach.
while read line; do \
DNA="$(echo $line| cut -d, -f1| rev)"
DNA="$(echo $DNA| cut -d- -f1 | rev)"
echo $DNA
done < ${FILENAME}
I do not know the constraints on your input file, but if what you are looking for is a 10-digit number, and there is only ever one 10-digit number per line... This should do niceley
grep -Eo '[0-9]{10,}' input.txt
1195060301
1195060302
1195060311
This essentially says: Show me all 10 digit numbers in this file
input.txt
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram
A sed approach:
sed -nE 's/.*-([[:digit:]]+)\,.*/\1/p' input_file
sed options:
-n: Do not print the whole file back, but only explicit /p.
-E: Use Extend Regex without need to escape its grammar.
sed Extended REgex:
's/.*-([[:digit:]]+)\,.*/\1/p': Search, capture one or more digit in group 1, preceded by anything and a dash, followed by a comma and anything, and print only the captured group.
Using awk:
awk -F[,] '{ split($1,arr,"-");print arr[length(arr)] }' FILENAME
Using , as a separator, take the first delimited "piece" of data and further split it into an arr using - as the delimiter and awk's split function. We then print the last index of arr.

Bash - replace in file strings from first array with strings from second array

I have two arrays. The first one is filled with values greped from the file that I want to replace with the new ones downloaded.
Please note that I don't know exactly how the first array will look like, meaning some values will have _, other - and some won't have any of that and immediately after the name will be placed : (colon).
Example arrays:
array1:
[account:123 shoppingcart-1:123 notification-core_1:123 notification-dispatcher_core_1:123 notification-dispatcher-smschannel_core_1:123]
array2:
[account_custom_2:124 shoppingcart_custom_2:124 notification_custom_2:124 notification-dispatcher_custom_2:124 notification-dispatcher-smschannel_custom_2:124]
Those arrays are the only example, there are more than 50 values to replace.
I am doing a comparison of every item in the second array with every item in the first array as shown below:
file_name="<path_to_file>/file.txt"
for i in "${!array1[#]}"
do
for j in "${!array2[#]}"
do
array_2_base="`echo ${array2[$j]} | awk -F_ '{print $1}'`"
if [[ "${array1[$i]}" == *"$array_2_base"* ]]
then
sed -i "s=${array1[$i]}=${array2[$j]}=g" $file_name
fi
done
done
Here I substring only the first part for every item in the second array so I can compare it with the item in the first array.
e.g. account_custom_2:124 -> account or notification-dispatcher_custom_2:124 -> notification-dispatcher.
This works nice but I encounter problem when notification is in notification-core_1:123 and notification-dispatcher_core_1:123 and notification-dispatcher-smschannel_core_1:123.
Can you please give advice on how to fix this or if you can suggest another approach to this?
The point is the base of array2 element may include other element
as a substring and will cause an improper replacement depending on the order
of matching.
To avoid this, you can sort the array in descending order so the
longer pattern comes first.
Assuming the strings in the arrays do not contain tab characters, would
you please try:
file_name="<path_to_file>/file.txt"
array1=(account:123 shoppingcart-1:123 notification-core_1:123 notification-dispatcher_core_1:123 notification-dispatcher-smschannel_core_1:123)
array2=(account_custom_2:124 shoppingcart_custom_2:124 notification_custom_2:124 notification-dispatcher_custom_2:124 notification-dispatcher-smschannel_custom_2:124)
# insert the following block to sort array2 in descending order
array2=( $(for j in "${array2[#]}"; do
array_2_base=${j%%_*}
printf "%s\t%s\n" "$array_2_base" "$j"
done | sort -r | cut -f2-) )
# the following code will work "as is"
for i in "${!array1[#]}"
do
for j in "${!array2[#]}"
do
array_2_base="`echo ${array2[$j]} | awk -F_ '{print $1}'`"
if [[ "${array1[$i]}" == *"$array_2_base"* ]]
then
sed -i "s=${array1[$i]}=${array2[$j]}=g" "$file_name"
delete="${array1[$i]}"
array1=( "${array1[#]/$delete}" )
fi
done
done
The script above will be inefficent in execution time due to the repetitive
invocation of sed -i command.
The script below will run faster
by pre-generating the sed script and executing it just once.
file_name="<path_to_file>/file.txt"
array1=(
account:123
shoppingcart-1:123
notification-core_1:123
notification-dispatcher_core_1:123
notification-dispatcher-smschannel_core_1:123
)
array2=(
account_custom_2:124
shoppingcart_custom_2:124
notification_custom_2:124
notification-dispatcher_custom_2:124
notification-dispatcher-smschannel_custom_2:124
)
while IFS=$'\t' read -r base a2; do # read the sorted list line by line
for a1 in "${array1[#]}"; do
if [[ $a1 == *$base* ]]; then
scr+="s=$a1=$a2=g;" # generate sed script by appending the "s" command
continue 2
fi
done
done < <(for j in "${array2[#]}"; do
array_2_base=${j%%_*} # substring before the 1st "_"
printf "%s\t%s\n" "$array_2_base" "$j"
# print base and original element side by side
done | sort -r)
sed -i "$scr" "$file_name" # execute the replacement at once
If number of items in your arrays are equal then you can process them in one loop
for i in "${!array1[#]}"; {
value=${array1[$i]}
new_value=${array2[$i]}
sed -i "s/$value/$new_value/" file
}
I found a way to fix this.
I am deleting string from the first array once replaced.
file_name="<path_to_file>/file.txt"
for i in "${!array1[#]}"
do
for j in "${!array2[#]}"
do
array_2_base="`echo ${array2[$j]} | awk -F_ '{print $1}'`"
if [[ "${array1[$i]}" == *"$array_2_base"* ]]
then
sed -i "s=${array1[$i]}=${array2[$j]}=g" $file_name
delete="${array1[$i]}"
array1=( "${array1[#]/$delete}" )
fi
done
done

Bash: transform key-value lines to CSV format [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Editor's note: I've clarified the problem definition, because I think the problem is an interesting one, and this question deserves to be reopened.
I've got a text file containing key-value lines in the following format - note that the # lines below are only there to show repeating blocks and are NOT part of the input:
Country:United Kingdom
Language:English
Capital city:London
#
Country:France
Language:French
Capital city:Paris
#
Country:Germany
Language:German
Capital city:Berlin
#
Country:Italy
Language:Italian
Capital city:Rome
#
Country:Russia
Language:Russian
Capital city:Moscow
Using shell commands and utilities, how can I transform such a file to CSV format, so it will look like this?
Country,Language,Capital city
United Kingdom,English,London
France,French,Paris
Germany,German,Berlin
Italy,Italian,Rome
Russia,Russian,Moscow
In other words:
Make the key names the column names of the CSV header row.
Make the values from each block a data row each.
[OP's original] Edit: My idea would be to separate the entries e.g. Country:France would become Country France, and then grep/sed the heading. However I have no idea how to move the headings from a single column to several separate ones.
A simple solution with cut, paste, and head (assumes input file file, outputs to file out.csv):
#!/usr/bin/env bash
{ cut -d':' -f1 file | head -n 3 | paste -d, - - -;
cut -d':' -f2- file | paste -d, - - -; } >out.csv
cut -d':' -f1 file | head -n 3 creates the header line:
cut -d':' -f1 file extracts the first :-based field from each input line, and head -n 3 stops after 3 lines, given that the headers repeat every 3 lines.
paste -d, - - - takes 3 input lines from stdin (one for each -) and combines them to a single, comma-separated output line (-d,)
cut -d':' -f2- file | paste -d, - - - creates the data lines:
cut -d':' -f2- file extracts everything after the : from each input line.
As above, paste then combines 3 values to a single, comma-separated output line.
agc points out in a comment that the column count (3) and the paste operands (- - -) are hard-coded above.
The following solution parameterizes the column count (set it via n=...):
{ n=3; pasteOperands=$(printf '%.s- ' $(seq $n))
cut -d':' -f1 file | head -n $n | paste -d, $pasteOperands;
cut -d':' -f2- file | paste -d, $pasteOperands; } >out.csv
printf '%.s- ' $(seq $n) is a trick that produces a list of as many space-separated - chars. as there are columns ($n).
While the previous solution is now parameterized, it still assumes that the column count is known in advance; the following solution dynamically determines the column count (requires Bash 4+ due to use of readarray, but could be made to work with Bash 3.x):
# Determine the unique list of column headers and
# read them into a Bash array.
readarray -t columnHeaders < <(awk -F: 'seen[$1]++ { exit } { print $1 }' file)
# Output the header line.
(IFS=','; echo "${columnHeaders[*]}") >out.csv
# Append the data lines.
cut -d':' -f2- file | paste -d, $(printf '%.s- ' $(seq ${#columnHeaders[#]})) >>out.csv
awk -F: 'seen[$1]++ { exit } { print $1 } outputs each input line's column name (the 1st :-separated field), remembers the column names in associative array seen, and stops at the first column name that is seen for the second time.
readarray -t columnHeaders reads awk's output line by line into array columnHeaders
(IFS=','; echo "${columnHeaders[*]}") >out.csv prints the array elements using a space as the separator (specified via $IFS); note the use of a subshell ((...)) so as to localize the effect of modifying $IFS, which would otherwise have global effects.
The cut ... pipeline uses the same approach as before, with the operands for paste being created based on the count of the elements of array columnHeaders (${#columnHeaders[#]}).
To wrap the above up in a function that outputs to stdout and also works with Bash 3.x:
toCsv() {
local file=$1 columnHeaders
# Determine the unique list of column headers and
# read them into a Bash array.
IFS=$'\n' read -d '' -ra columnHeaders < <(awk -F: 'seen[$1]++ { exit } { print $1 }' "$file")
# Output the header line.
(IFS=','; echo "${columnHeaders[*]}")
# Append the data lines.
cut -d':' -f2- "$file" | paste -d, $(printf '%.s- ' $(seq ${#columnHeaders[#]}))
}
# Sample invocation
toCsv file > out.csv
My bash script for this would be :
#!/bin/bash
count=0
echo "Country,Language,Capital city"
while read line
do
(( count++ ))
(( count -lt 3 )) && printf "%s," "${line##*:}"
(( count -eq 3 )) && printf "%s\n" "${line##*:}" && (( count = 0 ))
done<file
Output
Country,Language,Capital city
United Kingdom,English,London
France,French,Paris
Germany,German,Berlin
Italy,Italian,Rome
Russia,Russian,Moscow
Edit
Replaced [ stuff ] with (( stuff )) ie test with double parenthesis which is used for arithmetic expansion.
You can also write a slightly more generalized version of a bash script that can take the number of repeating rows holding the data and produce output on that basis to avoid hardcoding the header values and handle additional fields. (you could also just scan the field names for the first repeat and set the repeat rows in that manner as well).
#!/bin/bash
declare -i rc=0 ## record count
declare -i hc=0 ## header count
record=""
header=""
fn="${1:-/dev/stdin}" ## filename as 1st arg (default: stdin)
repeat="${2:-3}" ## number of repeating rows (default: 3)
while read -r line; do
record="$record,${line##*:}"
((hc == 0)) && header="$header,${line%%:*}"
if ((rc < (repeat - 1))); then
((rc++))
else
((hc == 0)) && { printf "%s\n" "${header:1}"; hc=1; }
printf "%s\n" "${record:1}"
record=""
rc=0
fi
done <"$fn"
There are any number of ways to approach the problem. You will have to experiment to find the most efficient for your data file size, etc. Whether you use a script, or a combination of shell tools, cut, paste, etc.. is to a large extent left to you.
Output
$ bash readcountry.sh country.txt
Country,Language,Capital city
United Kingdom,English,London
France,French,Paris
Germany,German,Berlin
Italy,Italian,Rome
Russia,Russian,Moscow
Output with 4 Fields
Example input file adding a Population field:
$ cat country2.txt
Country:United Kingdom
Language:English
Capital city:London
Population:20000000
<snip>
Output
$ bash readcountry.sh country2.txt 4
Country,Language,Capital city,Population
United Kingdom,English,London,20000000
France,French,Paris,10000000
Germany,German,Berlin,150000000
Italy,Italian,Rome,9830000
Russia,Russian,Moscow,622000000
Using datamash, tr, and join:
datamash -t ':' -s -g 1 collapse 2 < country.txt | tr ',' ':' |
datamash -t ':' transpose |
join -t ':' -a1 -o 1.2,1.3,1.1 - /dev/null | tr ':' ','
Output:
Country,Language,Capital city
United Kingdom,English,London
France,French,Paris
Germany,German,Berlin
Italy,Italian,Rome
Russia,Russian,Moscow

Split string at special character in bash

I'm reading filenames from a textfile line by line in a bash script. However the the lines look like this:
/path/to/myfile1.txt 1
/path/to/myfile2.txt 2
/path/to/myfile3.txt 3
...
/path/to/myfile20.txt 20
So there is a second column containing an integer number speparated by space. I only need the part of the string before the space.
I found only solutions using a "for-loop". But I need a function that explicitly looks for the " "-character (space) in my string and splits it at that point.
In principle I need the equivalent to Matlabs "strsplit(str,delimiter)"
If you are already reading the file with something like
while read -r line; do
(and you should be), then pass two arguments to read instead:
while read -r filename somenumber; do
read will split the line on whitespace and assign the first field to filename and any remaining field(s) to somenumber.
Three (of many) solutions:
# Using awk
echo "$string" | awk '{ print $1 }'
# Using cut
echo "$string" | cut -d' ' -f1
# Using sed
echo "$string" | sed 's/\s.*$//g'
If you need to iterate trough each line of the file anyways, you can cut off everything behind the space with bash:
while read -r line ; do
# bash string manipulation removes the space at the end
# and everything which follows it
echo ${line// *}
done < file
This should work too:
line="${line% *}"
This cuts the string at it's last occurrence (from left) of a space. So it will work even if the path contains spaces (as long as it follows by a space at end).
while read -r line
do
{ rev | cut -d' ' -f2- | rev >> result.txt; } <<< $line
done < input.txt
This solution will work even if you have spaces in your filenames.

Unix - how to use cut -d on one word

I have a string with two words but sometimes it may contain only one word and i need to get both words and if the second one is empty i want an empty string.
I am using the following:
STRING1=`echo $STRING|cut -d' ' -f1`
STRING2=`echo $STRING|cut -d' ' -f2`
When STRING is only one word both strings are equal but I need the second screen to be empty.
Your problem is (from cut(1))
`-f FIELD-LIST'
`--fields=FIELD-LIST'
Select for printing only the fields listed in FIELD-LIST. Fields
are separated by a TAB character by default. Also print any line
that contains no delimiter character, unless the
`--only-delimited' (`-s') option is specified.
You could specify -s when extracing the second word, or use
echo " $STRING" | cut -d' ' -f3
to extract the second word (note the fake separator in front of $STRING).
The shell has built-in functionality for this.
echo "First word: ${STRING%% *}"
echo "Last word: ${STRING##* }"
The double ## or %% is not compatible with older shells; they only had a single-separator variant, which trims the shortest possible match instead of the longest. (You can simulate longest suffix by extracting the shortest prefix, then trim everything else, but this takes two trims.)
Mnemonic: # is to the left of $ on the keyboard, % is to the right.
For your actual problem, I would add a simple check to see if the first extraction extracted the whole string; if so, the second should be left empty.
STRING1="${STRING%% *}"
case $STRING1 in
"$STRING" ) STRING2="" ;;
* ) STRING2="${STRING#$STRING1 }" ;;
esac
As an aside, there's also this:
set $STRING
STRING1=$1
STRING2=$2
Why not just use read:
STR='word1 word2'
read string1 string2 <<< "$STR"
echo "$string1"
word1
echo "$string2"
word2
Now the missing 2nd word:
STR='word1'
read string1 string2 <<< "$STR"
echo "$string1"
word1
echo "$string2" | cat -vte
$

Resources