Shell Extract Text Before Digits in a String

Shell Extract Text Before Digits in a String - string

I've found several examples of extractions before a single character and examples of extracting numbers, but I haven't found anything about extracting characters before numbers.
My question:
Some of the strings I have look like this:
NUC320 Syllabus Template - 8wk
SLA School Template - UL
CJ101 Syllabus Template - 8wk
TECH201 Syllabus Template - 8wk
Test Clone ID17
In cases where the string doesn't contain the data I want, I need it to be skipped. The desired output would be:
NUC-320
CJ-101
TECH-201
SLA School Template - UL & Test Clone ID17 would be skipped.
I imagine the process being something to the effect of:
Extract text before " "
Condition - Check for digits in the string
Extract text before digits and assign it to a variable x
Extract digits and assign to a variable y
Concatenate $x"-"$y and assign to another variable z
More information:
The strings are extracted from a line in a couple thousand text docs using a loop. They will be used to append to a hyperlink and rename a file during the loop.
Edit:
#!/bin/sh
# my files are named 1.txt through 9999.txt i both
# increments the loop and sets the filename to be searched
i=1
while [ $i -lt 10000 ]
do
x=$(head -n 31 $i.txt | tail -1 | cut -c 7-)
if [ ! -z "$x" -a "$x" != " " ]; then
# I'd like to insert the hyperlink with the output on the
# same line (1.txt;cj101 Syllabus Template - 8wk;www.link.com/cj101)
echo "$i.txt;$x" >> syllabus.txt
# else
# rm $i.txt
fi
i=`expr $i + 1`
sleep .1
done

sed for printing lines starting with capital letters followed by digits. It also adds a - between them:
sed -n 's/^\([A-Z]\+\)\([0-9]\+\) .*/\1-\2/p' input
Gives:
NUC-320
CJ-101
TECH-201

A POSIX-compliant awk solution:
awk '{ if (match($1, /[0-9]+$/)) print substr($1, 1, RSTART-1) "-" substr($1, RSTART) }' \
file |
while IFS= read -r token; do
# Process token here (append to hyperlink, ...)
echo "[$token]"
done
awk is used to extract the reformatted tokens of interest, which are then processed in a shell while loop.
match($1, /[0-9]+$/) matches the 1st whitespace-separated field ($1) against extended regex [0-9]+$, i.e., matches only if the fields ends in one or more digits.
substr($1, 1, RSTART-1) "-" substr($1, RSTART) joins the part before the first digit with the run of digits using -, via the special RSTART variable, which indicates the 1-based character position where the most recent match() invocation matched.

awk '$1 ~/[0-9]/{sub(/...$/,"-&",$1);print $1}' file
NUC-320
CJ-101
TECH-201

Related

How to extract a substring from a string stored in a variable, based on a start / stop character

In the first line I'm after the value 64 and F2DD65
I want to catch the first variable by reading data from from a string in a variable, first from the beginning of the line untill the : character, and read the other variable from after the # character and 6 characters forward.
Is this possible?
This is the string:
var="64: (242,221,101) #F2DD65 srgb(242,221,101)"
my end result would be stored in variables:
var1="64"
var2="F2DD65"

var1=${var%%:*}
var2=${var##*#}
var2=${var2%% *}
Reference: Shell Parameter Expansion.

sed -rn 's/(^.*)(\:.*#)(.*)([[:space:]].*$)/\1 - \3/p' <<< "64: (242,221,101) #F2DD65 srgb(242,221,101)"
With sed, split the line into sections using regular expressions (-r). Substitute the line for the relevant section (the first and then third separated with a -.
awk -F [:#\ ] '{ print $1" - "$5 }' <<< "64: (242,221,101) #F2DD65 srgb(242,221,101)"
With awk, split the line based on a :, a # and a space as delimiters. Print the 1st and 5th delimited fields with a - in between.

With bash regular expressions:
var="64: (242,221,101) #F2DD65 srgb(242,221,101)"
re="^([^:]+): .* #([[:xdigit:]]+)"
if [[ $var =~ $re ]]; then
var1="${BASH_REMATCH[1]}"
var2="${BASH_REMATCH[2]}"
else
# String isn't the right format
echo Fail
fi

How can I truncate a line of text longer than a given length?

How would you go about removing everything after x number of characters? For example, cut everything after 15 characters and add ... to it.
This is an example sentence should turn into This is an exam...

GnuTools head can use chars rather than lines:
head -c 15 <<<'This is an example sentence'
Although consider that head -c only deals with bytes, so this is incompatible with multi-bytes characters like UTF-8 umlaut ü.
Bash built-in string indexing works:
str='This is an example sentence'
echo "${str:0:15}"
Output:
This is an exam
And finally something that works with ksh, dash, zsh…:
printf '%.15s\n' 'This is an example sentence'
Even programmatically:
n=15
printf '%.*s\n' $n 'This is an example sentence'
If you are using Bash, you can directly assign the output of printf to a variable and save a sub-shell call with:
trim_length=15
full_string='This is an example sentence'
printf -v trimmed_string '%.*s' $trim_length "$full_string"

Use sed:
echo 'some long string value' | sed 's/\(.\{15\}\).*/\1.../'
Output:
some long strin...
This solution has the advantage that short strings do not get the ... tail added:
echo 'short string' | sed 's/\(.\{15\}\).*/\1.../'
Output:
short string
So it's one solution for all sized outputs.

Use cut:
echo "This is an example sentence" | cut -c1-15
This is an exam
This includes characters (to handle multi-byte chars) 1-15, c.f. cut(1)
-b, --bytes=LIST
select only these bytes
-c, --characters=LIST
select only these characters

Awk can also accomplish this:
$ echo 'some long string value' | awk '{print substr($0, 1, 15) "..."}'
some long strin...
In awk, $0 is the current line. substr($0, 1, 15) extracts characters 1 through 15 from $0. The trailing "..." appends three dots.

Todd actually has a good answer however I chose to change it up a little to make the function better and remove unnecessary parts :p
trim() {
if (( "${#1}" > "$2" )); then
echo "${1:0:$2}$3"
else
echo "$1"
fi
}
In this version the appended text on longer string are chosen by the third argument, the max length is chosen by the second argument and the text itself is chosen by the first argument.
No need for variables :)

Using Bash Shell Expansions (No External Commands)
If you don't care about shell portability, you can do this entirely within Bash using a number of different shell expansions in the printf builtin. This avoids shelling out to external commands. For example:
trim () {
local str ellipsis_utf8
local -i maxlen
# use explaining variables; avoid magic numbers
str="$*"
maxlen="15"
ellipsis_utf8=$'\u2026'
# only truncate $str when longer than $maxlen
if (( "${#str}" > "$maxlen" )); then
printf "%s%s\n" "${str:0:$maxlen}" "${ellipsis_utf8}"
else
printf "%s\n" "$str"
fi
}
trim "This is an example sentence." # This is an exam…
trim "Short sentence." # Short sentence.
trim "-n Flag-like strings." # Flag-like strin…
trim "With interstitial -E flag." # With interstiti…
You can also loop through an entire file this way. Given a file containing the same sentences above (one per line), you can use the read builtin's default REPLY variable as follows:
while read; do
trim "$REPLY"
done < example.txt
Whether or not this approach is faster or easier to read is debatable, but it's 100% Bash and executes without forks or subshells.

Extract orders and match to trades from two files

I have two attached files (orders1.txt and trades1.txt) I need to write a Bash script (possibly awk?) to extract orders and match them to trades.
The output should produce a report that prints comma separated values containing “ClientID, OrderID, Price, Volume”.
In addition to this for each client, I need to print the total volume and turnover (turnover is the subtotal of price * volume on each trade).
Can someone please help me with a bash script that will do the above using the attached files?
Any help would be greatly appreciated
orders1.txt
Entry Time, Client ID, Security ID, Order ID
25455410,DOLR,XGXUa,DOLR1435804437
25455410,XFKD,BUP3d,XFKD4746464646
25455413,QOXA,AIDl,QOXA7176202067
25455415,QOXA,IRUXb,QOXA6580494597
25455417,YXKH,OBWQs,YXKH4575139017
25455420,JBDX,BKNs,JBDX6760353333
25455428,DOLR,AOAb,DOLR9093170513
25455429,JBDX,QMP1Sh,JBDX2756804453
25455431,QOXA,QIP1Sh,QOXA6563975285
25455434,QOXA,XMUp,QOXA5569701531
25455437,XFKD,QLOJc,XFKD8793976660
25455438,YXKH,MRPp,YXKH2329856527
25455442,JBDX,YBPu,JBDX0100506066
25455450,QOXA,BUPYd,QOXA5832015401
25455451,QOXA,SIOQz,QOXA3909507967
25455451,DOLR,KID1Sh,DOLR2262067037
25455454,DOLR,JJHi,DOLR9923665017
25455461,YXKH,KBAPBa,YXKH2637373848
25455466,DOLR,EPYp,DOLR8639062962
25455468,DOLR,UQXKz,DOLR4349482234
25455474,JBDX,EFNs,JBDX7268036859
25455481,QOXA,XCB1Sh,QOXA4105943392
25455486,YXKH,XBAFp,YXKH0242733672
25455493,JBDX,BIF1Sh,JBDX2840241688
25455500,DOLR,QSOYp,DOLR6265839896
25455503,YXKH,IIYz,YXKH8505951163
25455504,YXKH,ZOIXp,YXKH2185348861
25455513,YXKH,MBOOp,YXKH4095442568
25455515,JBDX,P35p,JBDX9945514579
25455524,QOXA,YXOKz,QOXA1900595629
25455528,JBDX,XEQl,JBDX0126452783
25455528,XFKD,FJJMp,XFKD4392227425
25455535,QOXA,EZIp,QOXA4277118682
25455543,QOXA,YBPFa,QOXA6510879584
25455551,JBDX,EAMp,JBDX8924251479
25455552,QOXA,JXIQp,QOXA4360008399
25455554,DOLR,LISXPh,DOLR1853653280
25455557,XFKD,LOX14p,XFKD1759342196
25455558,JBDX,YXYb,JBDX8177118129
25455567,YXKH,MZQKl,YXKH6485420018
25455569,JBDX,ZPIMz,JBDX2010952336
25455573,JBDX,COPe,JBDX1612537068
25455582,JBDX,HFKAp,JBDX2409813753
25455589,QOXA,XFKm,QOXA9692126523
25455593,XFKD,OFYp,XFKD8556940415
25455601,XFKD,FKQLb,XFKD4861992028
25455606,JBDX,RIASp,JBDX0262502677
25455608,DOLR,HRKKz,DOLR1739013513
25455615,DOLR,ZZXp,DOLR6727725911
25455623,JBDX,CKQPp,JBDX2587184235
25455630,YXKH,ZLQQp,YXKH6492126889
25455632,QOXA,ORPz,QOXA3594333316
25455640,XFKD,HPIXSh,XFKD6780729432
25455648,QOXA,ABOJe,QOXA6661411952
25455654,XFKD,YLIp,XFKD6374702721
25455654,DOLR,BCFp,DOLR8012564477
25455658,JBDX,ZMDKz,JBDX6885176695
25455665,JBDX,CBOe,JBDX8942732453
25455670,JBDX,FRHMl,JBDX5424320405
25455679,DOLR,YFJm,DOLR8212353717
25455680,XFKD,XAFp,XFKD4132890550
25455681,YXKH,PBIBOp,YXKH6106504736
25455684,DOLR,IFDu,DOLR8034515043
25455687,JBDX,JACe,JBDX8243949318
25455688,JBDX,ZFZKz,JBDX0866225752
25455693,QOXA,XOBm,QOXA5011416607
25455694,QOXA,IDQe,QOXA7608439570
25455698,JBDX,YBIDb,JBDX8727773702
25455705,YXKH,MXOp,YXKH7747780955
25455710,YXKH,PBZRYs,YXKH7353828884
25455719,QOXA,QFDb,QOXA2477859437
25455720,XFKD,PZARp,XFKD4995735686
25455722,JBDX,ZLKKb,JBDX3564523161
25455730,XFKD,QFH1Sh,XFKD6181225566
25455733,JBDX,KWVJYc,JBDX7013108210
25455733,YXKH,ZQI1Sh,YXKH7095815077
25455739,YXKH,XIJp,YXKH0497248757
25455739,YXKH,ZXJp,YXKH5848658513
25455747,JBDX,XASd,JBDX4986246117
25455751,XFKD,XQIKz,XFKD5919379575
25455760,JBDX,IBXPb,JBDX8168710376
25455763,XFKD,EVAOi,XFKD8175209012
25455765,XFKD,JXKp,XFKD2750952933
25455773,XFKD,PTBAXs,XFKD8139382011
25455778,QOXA,XJp,QOXA8227838196
25455783,QOXA,CYBIp,QOXA2072297264
25455792,JBDX,PZI1Sh,JBDX7022115629
25455792,XFKD,XIKQl,XFKD6434550362
25455792,DOLR,YKPm,DOLR6394606248
25455796,QOXA,JXOXPh,QOXA9672544909
25455797,YXKH,YIWm,YXKH5946342983
25455803,YXKH,JZEm,YXKH5317189370
25455810,QOXA,OBMFz,QOXA0985316706
25455810,QOXA,DAJPp,QOXA6105975858
25455810,JBDX,FBBJl,JBDX1316207043
25455819,XFKD,YXKm,XFKD6946276671
25455821,YXKH,UIAUs,YXKH6010226371
25455828,DOLR,PTJXs,DOLR1387517499
25455836,DOLR,DCEi,DOLR3854078054
25455845,YXKH,NYQe,YXKH3727923537
25455853,XFKD,TAEc,XFKD5377097556
25455858,XFKD,LMBOXo,XFKD4452678489
25455858,XFKD,AIQXp,XFKD5727938304
trades1.txt
# The first 8 characters is execution time in microseconds since midnight
# The next 14 characters is the order ID
# The next 8 characters is the zero padded price
# The next 8 characters is the zero padded volume
25455416QOXA6580494597 0000013800001856
25455428JBDX6760353333 0000007000002458
25455434DOLR9093170513 0000000400003832
25455435QOXA6563975285 0000034700009428
25455449QOXA5569701531 0000007500009023
25455447YXKH2329856527 0000038300009947
25455451QOXA5832015401 0000039900006432
25455454QOXA3909507967 0000026900001847
25455456DOLR2262067037 0000034700002732
25455471YXKH2637373848 0000010900006105
25455480DOLR8639062962 0000027500001975
25455488JBDX7268036859 0000005200004986
25455505JBDX2840241688 0000037900002029
25455521YXKH4095442568 0000046400002150
25455515JBDX9945514579 0000040800005904
25455535QOXA1900595629 0000015200006866
25455533JBDX0126452783 0000001700006615
25455542XFKD4392227425 0000035500009948
25455570XFKD1759342196 0000025700007816
25455574JBDX8177118129 0000022400000427
25455567YXKH6485420018 0000039000008327
25455573JBDX1612537068 0000013700001422
25455584JBDX2409813753 0000016600003588
25455603XFKD4861992028 0000017600004552
25455611JBDX0262502677 0000007900003235
25455625JBDX2587184235 0000024300006723
25455658XFKD6374702721 0000046400009451
25455673JBDX6885176695 0000010900009258
25455671JBDX5424320405 0000005400003618
25455679DOLR8212353717 0000041100003633
25455697QOXA5011416607 0000018800007376
25455696QOXA7608439570 0000013000007463
25455716YXKH7747780955 0000037000006357
25455719QOXA2477859437 0000039300009840
25455723XFKD4995735686 0000045500009858
25455727JBDX3564523161 0000021300000639
25455742YXKH7095815077 0000023000003945
25455739YXKH5848658513 0000042700002084
25455766XFKD5919379575 0000022200003603
25455777XFKD8175209012 0000033300006350
25455788XFKD8139382011 0000034500007461
25455793QOXA8227838196 0000011600007081
25455784QOXA2072297264 0000017000004429
25455800XFKD6434550362 0000030000002409
25455801QOXA9672544909 0000039600001033
25455815QOXA6105975858 0000034800008373
25455814JBDX1316207043 0000026500005237
25455831YXKH6010226371 0000011400004945
25455838DOLR1387517499 0000046200006129
25455847YXKH3727923537 0000037400008061
25455873XFKD5727938304 0000048700007298
I have the following script:
'''
#!/bin/bash
declare -A volumes
declare -A turnovers
declare -A orders
# Read the first file, remembering for each order the client id
while read -r line
do
# Jump over comments
if [[ ${line:0:1} == "#" ]] ; then continue; fi;
details=($(echo $line | tr ',' " "))
order_id=${details[3]}
client_id=${details[1]}
orders[$order_id]=$client_id
done < $1
echo "ClientID,OrderID,Price,Volume"
while read -r line
do
# Jump over comments
if [[ ${line:0:1} == "#" ]] ; then continue; fi;
order_id=$(echo ${line:8:20} | tr -d '[:space:]')
client_id=${orders[$order_id]}
price=${line:28:8}
volume=${line: -8}
echo "$client_id,$order_id,$price,$volume"
price=$(echo $price | awk '{printf "%d", $0}')
volume=$(echo $volume | awk '{printf "%d", $0}')
order_turnover=$(($price*$volume))
old_turnover=${turnovers[$client_id]}
[[ -z "$old_turnover" ]] && old_turnover=0
total_turnover=$(($old_turnover+$order_turnover))
turnovers[$client_id]=$total_turnover
old_volumes=${volumes[$client_id]}
[[ -z "$old_volumes" ]] && old_volumes=0
total_volume=$((old_volumes+volume))
volumes[$client_id]=$total_volume
done < $2
echo "ClientID,Volume,Turnover"
for client_id in ${!volumes[#]}
do
volume=${volumes[$client_id]}
turnover=${turnovers[$client_id]}
echo "$client_id,$volume,$turnover"
done
Can anyone think of anything more elegant?
Thanks in advance
C

Assumption 1: the two files are ordered, so line x represents an action that is older than x+1. If not, then further work is needed.
The assumption makes our work easier. Let's first change the delimiter of traders into a comma:
sed -i 's/ /,/g' traders.txt
This will be done in place for sake of simplicity. So, you now have traders which is comma separated, as is orders. This is the Assumption 2.
Keep working on traders: split all columns and add titles1. More on the reasons why in a moment.
gawk -i inplace -v INPLACE_SUFFIX=.bak 'BEGINFILE{FS=",";OFS=",";print "execution time,order ID,price,volume";}{print substr($1,1,8),substr($1,9),substr($2,1,9),substr($2,9)}' traders.txt
Ugly but works. Now let's process your data using the following awk script:
BEGIN {
FS=","
OFS=","
}
{
if (1 == NR) {
getline line < TRADERS # consume title line
print "Client ID,Order ID,Price,Volume,Turnover"; # consume title line. Remove print to forget it
getline line < TRADERS # reads first data line
split(line, transaction, ",")
next
}
if (transaction[2] == $4) {
print $2, $4, transaction[3], transaction[4], transaction[3]*transaction[4]
getline line < TRADERS # reads new data line
split(line, transaction, ",")
}
}
called by:
gawk -f script -v TRADERS=traders.txt orders.txt
And there you have it. Some caveats:
check the numbers, as implicit gawk number conversion might not be correct with zero-padded numbers. There is a fix for that in case;
getline might explode if we run out of lines from traders. I haven't put any check, that's up to you
no control over timestamps. Match is based on Order ID.
Output file:
Client ID,Order ID,Price,Volume,Turnover
QOXA,QOXA6580494597,000001380,00001856,2561280
JBDX,JBDX6760353333,000000700,00002458,1720600
DOLR,DOLR9093170513,000000040,00003832,153280
QOXA,QOXA6563975285,000003470,00009428,32715160
QOXA,QOXA5569701531,000000750,00009023,6767250
YXKH,YXKH2329856527,000003830,00009947,38097010
QOXA,QOXA5832015401,000003990,00006432,25663680
QOXA,QOXA3909507967,000002690,00001847,4968430
DOLR,DOLR2262067037,000003470,00002732,9480040
YXKH,YXKH2637373848,000001090,00006105,6654450
DOLR,DOLR8639062962,000002750,00001975,5431250
JBDX,JBDX7268036859,000000520,00004986,2592720
JBDX,JBDX2840241688,000003790,00002029,7689910
YXKH,YXKH4095442568,000004640,00002150,9976000
JBDX,JBDX9945514579,000004080,00005904,24088320
QOXA,QOXA1900595629,000001520,00006866,10436320
JBDX,JBDX0126452783,000000170,00006615,1124550
XFKD,XFKD4392227425,000003550,00009948,35315400
XFKD,XFKD1759342196,000002570,00007816,20087120
JBDX,JBDX8177118129,000002240,00000427,956480
YXKH,YXKH6485420018,000003900,00008327,32475300
JBDX,JBDX1612537068,000001370,00001422,1948140
JBDX,JBDX2409813753,000001660,00003588,5956080
XFKD,XFKD4861992028,000001760,00004552,8011520
JBDX,JBDX0262502677,000000790,00003235,2555650
JBDX,JBDX2587184235,000002430,00006723,16336890
XFKD,XFKD6374702721,000004640,00009451,43852640
JBDX,JBDX6885176695,000001090,00009258,10091220
JBDX,JBDX5424320405,000000540,00003618,1953720
DOLR,DOLR8212353717,000004110,00003633,14931630
QOXA,QOXA5011416607,000001880,00007376,13866880
QOXA,QOXA7608439570,000001300,00007463,9701900
YXKH,YXKH7747780955,000003700,00006357,23520900
QOXA,QOXA2477859437,000003930,00009840,38671200
XFKD,XFKD4995735686,000004550,00009858,44853900
JBDX,JBDX3564523161,000002130,00000639,1361070
YXKH,YXKH7095815077,000002300,00003945,9073500
YXKH,YXKH5848658513,000004270,00002084,8898680
XFKD,XFKD5919379575,000002220,00003603,7998660
XFKD,XFKD8175209012,000003330,00006350,21145500
XFKD,XFKD8139382011,000003450,00007461,25740450
QOXA,QOXA8227838196,000001160,00007081,8213960
QOXA,QOXA2072297264,000001700,00004429,7529300
XFKD,XFKD6434550362,000003000,00002409,7227000
QOXA,QOXA9672544909,000003960,00001033,4090680
QOXA,QOXA6105975858,000003480,00008373,29138040
JBDX,JBDX1316207043,000002650,00005237,13878050
YXKH,YXKH6010226371,000001140,00004945,5637300
DOLR,DOLR1387517499,000004620,00006129,28315980
YXKH,YXKH3727923537,000003740,00008061,30148140
XFKD,XFKD5727938304,000004870,00007298,35541260
1: requires gawk 4.1.0 or higher

rearranging column based on condition

I have a *.csv file. with value as below
"ASDP02","8801942183589"
"ASDP06","8801939151023"
"CSDP04","8801963981740"
"ASDP09","8801946305047"
"ASDP12","8801941195677"
"ASDP05","8801922826186"
"CSDP08","8801983008938"
"ASDP04","8801944346555"
"CSDP11","8801910831518"
or sometimes the value is as below
"8801989353984","KSDP05"
"8801957608165","ASDP11"
"8801991455848","CSDP10"
"8801981363116","CSDP07"
"8801921247870","KSDP07"
"8801965386240","CSDP06"
"8801956293036","KSDP10"
"8801984383904","KSDP11"
"8801944211742","ASDP09"
I just want to put the numeric value (e.g. 8801989353984) always in 1st column. Is it possible using BASH script?

Sed is also your friend here
Input
cat 41189347
"ASDP02","8801942183589"
"ASDP06","8801939151023"
"CSDP04","8801963981740"
"ASDP09","8801946305047"
"ASDP12","8801941195677"
"ASDP05","8801922826186"
"CSDP08","8801983008938"
"ASDP04","8801944346555"
"CSDP11","8801910831518"
Script
sed -E 's/^("[[:alpha:]]+.*"),("[[:digit:]]+")$/\2,\1/' 41189347
Output
"8801942183589","ASDP02"
"8801939151023","ASDP06"
"8801963981740","CSDP04"
"8801946305047","ASDP09"
"8801941195677","ASDP12"
"8801922826186","ASDP05"
"8801983008938","CSDP08"
"8801944346555","ASDP04"
"8801910831518","CSDP11"

awk to the rescue!
$ awk -F, -v OFS=, '$1~/[A-Z]/{t=$2;$2=$1;$1=t}1' file
if first field has alpha chars, swap first and second columns and print.

Bash can do the work but awk might be a better choice for rearrange your file:
sample.csv:
"ASDP02","8801942183589"
"8801944211742","ASDP09"
command:
awk -F, 'BEGIN{OFS=","}{$1=$1;if(substr($1, 2, length($1) - 2) + 0 == substr($1, 2, length($1) - 2)){print $1,$2}else{print $2,$1}}' sample.csv
substr($1, 2, length($1) - 2) + 0 == substr($1, 2, length($1) - 2) checks the column is numeric or not. If it is, print the original line otherwise switch column1 and column2
Output:
"8801942183589","ASDP02"
"8801944211742","ASDP09"

You can create a pure bash script to generate other file which has the structure you need:
#!/bin/bash
csv_file="/path/to/your/csvfile"
output_file="/path/to/output_file"
#Optional
rm -rf "${output_file}"
readarray -t LINES < <(cat < "${csv_file}" 2> /dev/null)
for item in "${LINES[#]}"; do
if [[ $item =~ ^\"([0-9A-Z]+)\"\,\"([0-9]+)\" ]]; then
echo "\"${BASH_REMATCH[2]}\",\"${BASH_REMATCH[1]}\"" >> "${output_file}"
else
echo "$item" >> "${output_file}"
fi
done
This works even if your file is "mixed" I mean with some lines in the right format and other lines in the bad format.

The following commands assume that the cells in the CSV files do not contain newlines and commas. Otherwise, you should write a more complicated script in Perl, PHP, or other programming language capable of parsing CSV files properly. But Bash, definitely, is not appropriate for this task.
Perl
perl -F, -nle '#F = reverse #F if $F[0] =~ /^"\d+"$/;
print join(",", #F)' file
Beware, If the cells contain newlines, or commas, use Perl's Text::CSV module, for instance. Although it is a simple task in Perl, it goes beyond the scope of the current question.
The command splits the input lines by commas (-F,) and stores the result into #F array, for each line. The items in the array are reversed, if the first field $F[0] matches the regular expression. You can also swap the items this way: ($F[0], $F[1]) = ($F[1], $F[0]).
Finally, the joins the array items with commas, and prints to the standard output.
If you want to edit the file in-place, use -i option: perl -i.backup -F, ....
AWK
awk -F, -vOFS=, '/^"[0-9]+",/ {print; next}
{ t = $1; $1 = $2; $2 = t; print }' file
The input and output field separators are set to , with -F, and -vOFS=,.
If the line matches the pattern /^"[0-9]+",/ (the line begins with a "numeric" CSV column), the script prints the record and advances to the next record. Otherwise the next block is executed.
In the next block, it swaps the first two columns and prints the result to the standard output.
If you want to edit the file in-place, see answers to this question.

How to pass quoted arguments but with blank spaces in linux

I have a file with these arguments and their values this way
# parameters.txt
VAR1 001
VAR2 aaa
VAR3 'Hello World'
and another file to configure like this
# example.conf
VAR1 = 020
VAR2 = kab
VAR3 = ''
when I want to get the values in a function I use this command
while read p; do
VALUE=$(echo $p | awk '{print $2}')
done < parameters.txt
the firsts arguments throw the right values, but the last one just gets the 'Hello for the blank space, my question is how do I get the entire 'Hello World' value?

If you can use bash, there is no need to use awk: read and shell parameter expansion can be combined to solve your problem:
while read -r name rest; do
# Drop the '= ' part, if present.
[[ $rest == '= '* ]] && value=${rest:2} || value=$rest
# $value now contains the line's value,
# but *including* any enclosing ' chars, if any.
# Assuming that there are no *embedded* ' chars., you can remove them
# as follows:
value=${value//\'/}
done < parameters.txt
read by default also breaks a line into fields by whitespace, like awk, but unlike awk it has the ability to assign the remainder of the line to a varaible, namely the last one, if fewer variables than fields found are specified;
read's -r option is generally worth specifying to avoid unexpected interpretation of \ chars. in the input.
As for your solution attempt:
awk doesn't know about quoting in input - by default it breaks input into fields by whitespace, irrespective of quotation marks.
Thus, a string such as 'Hello World' is simply broken into fields 'Hello and World'.
However, in your case you can split each input line into its key and value using a carefully crafted FS value (FS is the input field separator, which can be also be set via option -F; the command again assumes bash, this time for use of <(...), a so-called process substitution, and $'...', an ANSI C-quoted string):
while IFS= read -r value; do
# Work with $value...
done < <(awk -F$'^[[:alnum:]]+ (= )?\'?|\'' '{ print $2 }' parameters.txt)
Again the assumption is that values contain no embedded ' instances.
Field separator regex $'^[[:alnum:]]+ (= )?\'?|\'' splits each line so that $2, the 2nd field, contains the value, stripped of enclosing ' chars., if any.
xargs is the rare exception among the standard utilities in that it does understand single- and double-quoted strings (yet also without support for embedded quotes).
Thus, you could take advantage of xargs' ability to implicitly strip enclosing quotes when it passes arguments to the specified command, which defaults to echo (again assumes bash):
while read -r name rest; do
# Drop the '= ' part, if present.
[[ $rest == '= '* ]] && value=${rest:2} || value=$rest
# $value now contains the line's value, strippe of any enclosing
# single quotes by `xargs`.
done < <(xargs -L1 < parameters.txt)
xargs -L1 process one (1) line (-L) at a time and implicitly invokes echo with all tokens found on each line, with any enclosing quotes removed from the individual tokens.

The default field separator in awk is the space. So you are only printing the first word in the string passed to awk.
You can specify the field separator on the command line with -F[field separator]
Example, setting the field separator to a comma:
$ echo "Hello World" | awk -F, '{print $1}'
Hello World

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Shell Extract Text Before Digits in a String - string

sed for printing lines starting with capital letters followed by digits. It also adds a - between them: sed -n 's/^\([A-Z]\+\)\([0-9]\+\) .*/\1-\2/p' input Gives: NUC-320 CJ-101 TECH-201

awk '$1 ~/[0-9]/{sub(/...$/,"-&",$1);print $1}' file NUC-320 CJ-101 TECH-201

Related

How to extract a substring from a string stored in a variable, based on a start / stop character

How can I truncate a line of text longer than a given length?

Extract orders and match to trades from two files

rearranging column based on condition

How to pass quoted arguments but with blank spaces in linux

Categories

Resources