Linux script with 'and' and 'or' operator - linux

I have a file where i need to replace a special keyword
(thorn) to (tab) and save it back. It is working just fine with below code.
#Skip the header line, and translate the thorn column separators (octal 376) to tabs
#and if there were any actual tabs in the raw file, translate them to something harmless - let's say a divide sign, octal 362
cat $input | tr '\11' '\362' | tr '\376' '\11' | tail -n 2 > $outputfile
Input file
1.header1þheader2þheader3
2.Thisþisþaþsample,input,thornþfile
3.forþtestingþscript
Output file
2.This is a sample,input,thorn file
3.for testing script
Notice the comma not getting replaced, which is what we need.
However I need to tune the code in such a way that when thorn is not present then consider comma as delimiter and replace keyword comma with tab and save it back. The problem is when I am using 'or' condition then the file when have thorn is not getting saved because its not passing the criteria of comma.
cat $input | tr '\11' '\362' | tr '\376' '\11' || tr ',' '\11' | tail -n 2 > $outputfile
I am using double pipe because when thorn is present I cannot replace ','.
Basically I am trying to figure out combination of 'and' and 'or' in linux script, but it's not working like below.
cat $input | ((tr '\11' '\367' | tr '\376' '\11') || (tr ',' '\11')) & tail -n +2 > newfile.csv

cat Input_file
1.header1þheader2þheader3
2.Thisþisþaþsample,input,thornþfile
3.forþthornþdelimited
4.for,comma,delimited
5.forþboth,thorn,andþcomma,delimted
For me, the thorn character þ in the above Input_File is represented by the two characters \303\276, and passing the above entries through this perl one-liner will produce the result that OP wanted:
cat Input_file | perl -ne 'if (s/\303\276/\t/g) {print} elsif (s/,/\t/g) {print} else {print}'
1.header1 header2 header3
2.This is a sample,input,thorn file
3.for thorn delimited
4.for comma delimited
5.for both,thorn,and comma,delimted

I have tried to get the resolution myself and it is working just fine.
Below is the code attached. however if any body can suggest any issue or any better resolution, that would be highly appreciated.
input='inputFile.csv'
if grep -q '[\376]' $input; then
cat $input | tr '\11' '\367'| tr '\376' '\11'
elif grep -q ',' $input; then
cat $input | tr ',' '\11'

Related

String split and extract the last field in bash

I have a text file FILENAME. I want to split the string at - of the first column field and extract the last element from each line. Here "$(echo $line | cut -d, -f1 | cut -d- -f4)"; alone is not giving me the right result.
FILENAME:
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram
code I tried:
while read line; do \
DNA="$(echo $line | cut -d, -f1 | cut -d- -f4)";
echo $DNA
done < ${FILENAME}
Result I want
1195060301
1195060302
1195060311
Would you please try the following:
while IFS=, read -r f1 _; do # set field separator to ",", assigns f1 to the 1st field and _ to the rest
dna=${f1##*-} # removes everything before the rightmost "-" from "$f1"
echo "$dna"
done < "$FILENAME"
Well, I had to do with the two lines of codes. May be someone has a better approach.
while read line; do \
DNA="$(echo $line| cut -d, -f1| rev)"
DNA="$(echo $DNA| cut -d- -f1 | rev)"
echo $DNA
done < ${FILENAME}
I do not know the constraints on your input file, but if what you are looking for is a 10-digit number, and there is only ever one 10-digit number per line... This should do niceley
grep -Eo '[0-9]{10,}' input.txt
1195060301
1195060302
1195060311
This essentially says: Show me all 10 digit numbers in this file
input.txt
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram
A sed approach:
sed -nE 's/.*-([[:digit:]]+)\,.*/\1/p' input_file
sed options:
-n: Do not print the whole file back, but only explicit /p.
-E: Use Extend Regex without need to escape its grammar.
sed Extended REgex:
's/.*-([[:digit:]]+)\,.*/\1/p': Search, capture one or more digit in group 1, preceded by anything and a dash, followed by a comma and anything, and print only the captured group.
Using awk:
awk -F[,] '{ split($1,arr,"-");print arr[length(arr)] }' FILENAME
Using , as a separator, take the first delimited "piece" of data and further split it into an arr using - as the delimiter and awk's split function. We then print the last index of arr.

how to replace a specific char occurrences in string after a given substring

I have a string Contain key=value format separated by #
I am trying to replace the '=' char occurrences with ':' in the value of TITLE using BASH script.
"ID=21566#OS=Linux#TARGET_END=Synchronica#DEPENDENCY=Independent#AUTOMATION_OS=Linux#AUTOMATION_TOOL=JSystem#TITLE=Session tracking. "DL Started" Status Reported.Level=none"
later on i am parsing this string to execute the eval operation
eval $(echo $test_line | sed 's/"//g' | tr '#' '\n' | tr ' ' '_' | sed 's/=/="/g' | sed 's/$/"/g')
When the sed 's/=/="/g' section will also change ..Level=none to
Level="none
This leads to
eval: line 52: unexpected EOF while looking for matching `"'
What will be right replace bash command to replace my string ?
As an alternative, consider pure-bash solution to bring the variables into bash, avoiding the (risky) eval.
IFS=# read -a kv <<<"ID=21566#OS=Linux#TARGET_END=Synchronica#..."
for kvp in "${kv[#]}" ; do
declare "$kvp"
done
I found the way to solve it.
I will add sed 's/=/:/8g' to my eval command.
It will replace 8th to nth occurrences of '='.
The action will only effect the value of TITLE as expected.
eval $(echo $test_line | sed 's/=/:/8g' | sed 's/"/"/g' | tr '#' '\n' | tr ' ' '_' | sed 's/=/="/g' | sed 's/$/"/g')
I did it like this :
echo '"ID=21566#OS:Linux#TARGET_END:Synchronica#DEPENDENCY:Independent#AUTOMATION_OS:Linux#AUTOMATION_TOOL:JSystem#TITLE:Session tracking. "DL Started" Status Reported.Level=none"' \
|
sed -E 's/(#)?([A-Z_]+)(=)/\1\2:/g'
Let me know if it works for you.

Unique emails in string

I have a string with emails, some duplicated. For example only:
"aaa#company.com,bbb#company.com,aaa#company.com,bbb#company.com,ccc#company.com"
I would like string to contain only unique emails, comma separated. Result should be:
"aaa#company.com,bbb#company.com,ccc#company.com"
Any easy way to do this?
P.S. emails vary, and I don't know what they will contain.
How about this:
echo "aaa#company.com,bbb#company.com,aaa#company.com,bbb#company.com,ccc#company.com" |
tr ',' '\n' |
sort |
uniq |
tr '\n' ',' |
sed -e 's/,$//'
I convert the separating commas into newlines so that I can then use tools (like sort, uniq, and grep) that work with lines.
Using awk and process-substitution only than to use sort and other tools.
awk -vORS="," '!seen[$1]++' < <(echo "aaa#company.com,bbb#company.com,aaa#company.com,bbb#company.com,ccc#company.com" | tr ',' '\n')
aaa#company.com,bbb#company.com,ccc#company.com
Or another way to use pure-bash and avoid tr completely would be
# Read into a bash array with field-separator as ',' read with '-a' for reading to an array
IFS=',' read -ra myArray <<< "aaa#company.com,bbb#company.com,aaa#company.com,bbb#company.com,ccc#company.com"
# Printing the array elements new line and feeding it to awk
awk -vORS="," '!seen[$1]++' < <(printf '%s\n' "${myArray[#]}")
aaa#company.com,bbb#company.com,ccc#company.com
With perl
$ s="aaa#company.com,bbb#company.com,aaa#company.com,bbb#company.com,ccc#company.com"
$ echo $s | perl -MList::MoreUtils=uniq -F, -le 'print join ",",uniq(#F)'
aaa#company.com,bbb#company.com,ccc#company.com
Getting the strings in an array:
IFS=','; read -r -a lst <<< "aaa#company.com,bbb#company.com,aaa#company.com,bbb#company.com,ccc#company.com"
Sorting and filtering:
IFS=$'\n' sort <<< "${lst[*]}" | uniq

How to clean up csv where the delimeter is also within double quotes in bash?

So I have csv that uses comma as the delimiter but it has some columns that have commas in them within double quotes that does not need to be delimiters. I need to be able to replace them with another character.
a,b,c,"d,blah,blah,blah",e,f
a,b,"c,blah,blah,blah",d
I want it to be
a,b,c,"d|blah|blah|blah",e,f
a,b,"c|blah|blah|blah",d
This is not easy in BASH but you can do this with grep -Eo:
while read -r; do
s=$(grep -Eo '"[^"]*"|[^,]*' <<< "$REPLY" | tr ',' '|' | tr ',' '|')
echo $s | tr ' ' ','
done < file
Output:
a,b,c,"d|blah|blah|blah",e,f
a,b,"c|blah|blah|blah",d
Here is a super ugly hack, but it works.
It does one replacement at a time and checks to see if the new string is different from the old string.
#!/bin/bash
newstr='a,b,c,"d,blah,blah,blah",e,f'
while [ "$oldstr" != "$newstr" ]
do
oldstr=$newstr
newstr=`echo $oldstr | sed 's/\([^"]*"[^,]*\)\(,\)\([^"]*".*\)/\1|\3/g'`
done
echo $newstr

Shell script tokenizer

I'm writing a script that queries my JBoss server for some database related data. The thing that is returned after the query looks like this:
ConnectionCount=7
ConnectionCreatedCount=98
MaxConnectionsInUseCount=10
ConnectionDestroyedCount=91
AvailableConnectionCount=10
InUseConnectionCount=0
MaxSize=10
I would like to tokenize this data so the numbers on the right hand side are stored in a variable in the format 7,98,10,91,10,0,10. I tried to use IFS with the equals sign, but that still keeps the parameter names (only the equals signs are eliminated).
I put your input data into file d.txt. The one-liner below extracts the numbers, comma-delimits them and assigns all that to variable TAB (tested with Korn shell):
$ TAB=$(awk -F= '{print $2}' d.txt | xargs echo | sed 's/ /,/g')
$ echo $TAB
7,98,10,91,10,0,10
Or just use cut/tr:
F=($(cut -d'=' -f2 input | tr '\n' ' '))
You can do it with one sed command too:
sed -n 's/^.*=\(.*\)/\1,/;H;${g;s/\n//g;s/,$//;p;}' file
7,98,10,91,10,0,10
A simple cut without any pipes :
arr=( $(cut -d'=' -f2 file) )
Outut
printf '%s\n' "${arr[#]}"
7
98
10
91
10
0
10

Resources