I am looking to strip the below input lines from the filename and I am using this file:
cat <<EOF >./tz.txt
2019/12/_MG_0263.CR2.xmp: bopt:keywordlist="pinhole,car,2019"
2019/12/_MG_0262.CR2.xmp: bopt:keywordlist="pinhole,car,2019"
2020/06/ok/_MG_0003.CR2.xmp: bopt:keywordlist="lowkey,car,Chiaroscuro,2020"
2020/06/ok/_MG_0002.CR2.xmp: bopt:keywordlist="lowkey,car,Chiaroscuro,2020"
2020/04/_MG_0137.CR2.xmp: bopt:keywordlist="red,car,2020"
2020/04/_MG_0136.CR2.xmp: bopt:keywordlist="red,car,2020"
2020/04/_MG_0136.CR2.xmp: bopt:keywordlist="red,car,2020"
EOF
and now I am using the below script (stored in file ab.sh) to just exclude the [filename.xmp: bopt:] (e.g. _MG_0263.CR2.xmp: bopt:) from each line so that the output looks like this:
2019/12/ keywordlist="pinhole,car,2019"
2019/12/ keywordlist="pinhole,car,2019"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
Above is the complete expected output. Some folders may have different structures, like the one 2020/06/ok/
The script code is below:
#!/bin/bash
file="./tz.txt"
while read line ; do
# variable a generates the folter structure with a variable range of considered columns
# using awk to figure out how many columns (aka folders) there are in the structure
a=$( cut -d"/" -f 1-$( awk -F'/' '{ print NF-1 }' $line ) $line )
# | |
# -this bit should create a number for-
# -the cut command -
# then b variable stores the last bit in the string
b=$( cut -d":" -f 3 $line )
# and below combine results from above variables
echo ${a} ${b}
done < ${file}
In the attached image is an illustration of the logic used to split the string in columns and get only the relevant data.
The problem is that I get the below error and I am not sure where I’ve gone wrong.
Thank you for any suggestions or help
$ sh ~/ab.sh
awk: fatal: cannot open file `2019/12/_MG_0263.CR2.xmp:' for
reading (No such file or directory)
cut: '2019/12/_MG_0263.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="pinhole,car,2019"': No such file or directory
cut: '2019/12/_MG_0263.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="pinhole,car,2019"': No such file or directory
awk: fatal: cannot open file `2019/12/_MG_0262.CR2.xmp:' for reading (No such file or directory)
cut: '2019/12/_MG_0262.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="pinhole,car,2019"': No such file or directory
cut: '2019/12/_MG_0262.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="pinhole,car,2019"': No such file or directory
awk: fatal: cannot open file `2020/06/ok/_MG_0003.CR2.xmp:' for reading (No such file or directory)
cut: '2020/06/ok/_MG_0003.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="lowkey,car,Chiaroscuro,2020"': No such file or directory
cut: '2020/06/ok/_MG_0003.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="lowkey,car,Chiaroscuro,2020"': No such file or directory
....
One awk idea to replace the while loop:
awk -F':' '
{ gsub(/[^/]+$/,"",$1) # strip everything after last "/" from 1st field
print $1, $3
}' "${file}"
# or as a one-liner sans comments:
awk -F':' '{gsub(/[^/]+$/,"",$1); print $1, $3}' "${file}"
This generates:
2019/12/ keywordlist="pinhole,car,2019"
2019/12/ keywordlist="pinhole,car,2019"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
One sed alternative:
$ sed -En 's|^(.*)/[^/]+:.*:([^:]+)$|\1/ \2|p' "${file}"
Where:
-En - enable support for extended regexs, suppress automatic printing of input lines
since data includes the / character we'll use | as the sed script delimiter
^(.*)/ - [1st capture group] match everything up to the last / before ...
[^/]+: - matching everything that's not a / up to the 1st :, then ...
.*: - match everything up to next :
([^:]+)$ - [2nd capture group] lastly match everything at end of line that is not :
\1/ \2 - print 1st capture group + / + 2nd capture group
This generates:
2019/12/ keywordlist="pinhole,car,2019"
2019/12/ keywordlist="pinhole,car,2019"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
First of all, the final parameter to the awk command should be a filename. You are passing it a variable containing the contents of one line of the input file. This is why you are getting the awk: fatal: cannot open file errors.
Secondly, you are making the same mistake with the cut command, resulting in the : No such file or directory errors.
Both awk and cut are designed to work on complete files. You can chain them together so that the output of one becomes the input of another by using the pipe character: |. For example:
cat ${file} | awk ... | cut ...
But this can quickly become complex and unwieldy. A better solution is to use the Stream Editor sed. sed will read it's input line by line and can perform quite complex operations on each line before outputting the result, line by line.
This should do what you want:
#!/bin/bash
file="/tz.txt"
sed -En 's/^([0-9]{4}\/[0-9]{2}\/).*bopt:(.*)$/\1 \2/p' ${file}
Here is an explanation of the quoted expression:
s/pat/rep/p Search for pat and if found, replace with rep and print the result.
In our case, pat is:
^ The beginning of a line
( Start remembering what follows
[0-9]{4} Any digit repeated exactly 4 times
\/ The / character (escaped)
[0-9]{2}\/ Any digit repeated exactly 2 times, followed by /
) Stop remembering
.*bopt: any 0 or more characters followed by bopt:
(.*) remember 0 or more characters...
$ ...up to the end of the line.
And rep is:
\1 \2 The first thing remembered, followed by a space, followed by the second thing we remembered.
I have written the following command
#!/bin/bash
awk -v value=$newvalue -v row=$rownum -v col=1 'BEGIN{FS=OFS=","} NR==row {$col=value}1' "${file}".csv >> temp.csv && mv temp.csv "${file}".csv
Sample Input of file.csv
Header,1
Field1,Field2,Field3
1,ABC,4567
2,XYZ,7890
Assuiming $newvalue=3 ,$rownum=4 and col=1, then the above code will replace:
Required Output
Header,1
Field1,Field2,Field3
1,ABC,4567
3,XYZ,7890
So if I know the row and column, is it possible to replace the said value using grep, sed?
Edit1: Field3 will always have a unique value for their respective rows. ( in case that info helps anyway)
Assuming your CSV file is as simple as what you show (no commas in quoted fields), and your newvalue does not contain characters that sed would interpret in a special way (e.g. ampersands, slashes or backslashes), the following should work with just sed (tested with GNU sed):
sed -Ei "$rownum s/[^,]*/$newvalue/$col" file.csv
Demo:
$ cat file.csv
Header,1
Field1,Field2,Field3
1,ABC,4567
3,XYZ,7890
$ rownum=3
$ col=2
$ newvalue="NEW"
$ sed -Ei "$rownum s/[^,]*/$newvalue/$col" file.csv
$ cat file.csv
Header,1
Field1,Field2,Field3
1,NEW,4567
3,XYZ,7890
Explanations: $rownum is used as the address (here the line number) where to apply the following command. s is the sed substitute command. [^,]* is the regular expression to search for and replace: the longest possible string not containing a comma. $newvalue is the replacement string. $col is the occurrence to replace.
If newvalue may contain ampersands, slashes or backslashes we must sanitize it first:
sanitizednewvalue=$(sed -E 's/([/\&])/\\\1/g' <<< "$newvalue")
sed -Ei "$rownum s/[^,]*/$sanitizednewvalue/$col" file.csv
Demo:
$ newvalue='NEW&\/&NEW'
$ sanitizednewvalue=$(sed -E 's/([/\&])/\\\1/g' <<< "$newvalue")
$ echo "$sanitizednewvalue"
NEW\&\\\/\&NEW
$ sed -Ei "$rownum s/[^,]*/$sanitizednewvalue/$col" file.csv
$ cat file.csv
Header,1
Field1,Field2,Field3
1,NEW&\/&NEW,4567
3,XYZ,7890
With sed, how about:
#!/bin/bash
newvalue=3
rownum=4
col=1
sed -i -E "${rownum} s/(([^,]+,){$((col-1))})[^,]+/\\1${newvalue}/" file.csv
Result of file.csv
Header,1
Field1,Field2,Field3
1,ABC,4567
3,XYZ,7890
${rownum} matches the line number.
(([^,]+,){n}) matches the n-time repetition of the group of
non-comma characters followed by a comma. Then it should be the substring
before the target (to be substituted) column by assigning n to
col - 1.
Let's Try to Implement sed command
Let us consider a sample CSV file with the following content:
$ cat file
Solaris,25,11
Ubuntu,31,2
Fedora,21,3
LinuxMint,45,4
RedHat,12,5
To remove the 1st field or column :
$ sed 's/[^,]*,//' file
25,11
31,2
21,3
45,4
12,5
This regular expression searches for a sequence of non-comma([^,]*) characters and deletes them which results in the 1st field getting removed.
To print only the last field, OR remove all fields except the last field:
$ sed 's/.*,//' file
11
2
3
4
5
This regex removes everything till the last comma(.*,) which results in deleting all the fields except the last field.
To print only the 1st field:
$ sed 's/,.*//' file
Solaris
Ubuntu
Fedora
LinuxMint
RedHat
This regex(,.*) removes the characters starting from the 1st comma till the end resulting in deleting all the fields except the last field.
To delete the 2nd field:
$ sed 's/,[^,]*,/,/' file
Solaris,11
Ubuntu,2
Fedora,3
LinuxMint,4
RedHat,5
The regex (,[^,]*,) searches for a comma and sequence of characters followed by a comma which results in matching the 2nd column, and replaces this pattern matched with just a comma, ultimately ending in deleting the 2nd column.
Note: To delete the fields in the middle gets more tougher in sed since every field has to be matched literally.
To print only the 2nd field:
$ sed 's/[^,]*,\([^,]*\).*/\1/' file
25
31
21
45
12
The regex matches the first field, second field and the rest, however groups the 2nd field alone. The whole line is now replaced with the 2nd field(\1), hence only the 2nd field gets displayed.
Print only lines in which the last column is a single digit number:
$ sed -n '/.*,[0-9]$/p' file
Ubuntu,31,2
Fedora,21,3
LinuxMint,45,4
RedHat,12,5
The regex (,[0-9]$) checks for a single digit in the last field and the p command prints the line which matches this condition.
To number all lines in the file:
$ sed = file | sed 'N;s/\n/ /'
1 Solaris,25,11
2 Ubuntu,31,2
3 Fedora,21,3
4 LinuxMint,45,4
5 RedHat,12,5
This is simulation of cat -n command. awk does it easily using the special variable NR. The '=' command of sed gives the line number of every line followed by the line itself. The sed output is piped to another sed command to join every 2 lines.
Replace the last field by 99 if the 1st field is 'Ubuntu':
$ sed 's/\(Ubuntu\)\(,.*,\).*/\1\299/' file
Solaris,25,11
Ubuntu,31,99
Fedora,21,3
LinuxMint,45,4
RedHat,12,5
This regex matches 'Ubuntu' and till the end except the last column and groups each of them as well. In the replacement part, the 1st and 2nd group along with the new number 99 is substituted.
Delete the 2nd field if the 1st field is 'RedHat':
$ sed 's/\(RedHat,\)[^,]*\(.*\)/\1\2/' file
Solaris,25,11
Ubuntu,31,2
Fedora,21,3
LinuxMint,45,4
RedHat,,5
The 1st field 'RedHat', the 2nd field and the remaining fields are grouped, and the replacement is done with only 1st and the last group , resuting in getting the 2nd field deleted.
To insert a new column at the end(last column) :
$ sed 's/.*/&,A/' file
Solaris,25,11,A
Ubuntu,31,2,A
Fedora,21,3,A
LinuxMint,45,4,A
RedHat,12,5,A
The regex (.*) matches the entire line and replacing it with the line itself (&) and the new field.
To insert a new column in the beginning(1st column):
$ sed 's/.*/A,&/' file
A,Solaris,25,11
A,Ubuntu,31,2
A,Fedora,21,3
A,LinuxMint,45,4
A,RedHat,12,5
Same as last example, just the line matched is followed by the new column
I hope this will help. Let me know if you need to use Awk or any other command.
Thank you
Here is part of the complete file that I am trying to filter:
Hashmode: 13761 - VeraCrypt PBKDF2-HMAC-SHA256 + XTS 512 bit + boot-mode (Iterations: 200000)
Speed.#2.........: 2038 H/s (56.41ms) # Accel:128 Loops:32 Thr:256 Vec:1
Speed.#3.........: 2149 H/s (53.51ms) # Accel:128 Loops:32 Thr:256 Vec:1
Speed.#*.........: 4187 H/s
The aim is to print the following:
13761 VeraCrypt PBKDF2-HMAC-SHA256 4187 H/s
Here is what I tried.
The complete file is called complete.txt
cat complete.txt | grep Hashmode | awk '{print $2,$4,$5}' > mode.txt
Output:
13761 VeraCrypt PBKDF2-HMAC-SHA256
Then:
cat complete.txt | grep Speed.# | awk '{print $2,$3}' > speed.txt
Output:
4187 H/s
Then:
paste mode.txt speed.txt
The issue is that the lines do not match. There are approx 200 types of modes to filter within the file 'complete.txt'
I also have a feeling that this can be done using a much simpler command with sed or awk.
I am guessing you are looking for something like the following.
awk '/Hashmode:/ { if(label) print label, speed; label = $2 " " $4 " " $5 }
/Speed\.#/ { speed = $2 " " $ 3 }
END { if (label) print label, speed }' complete.txt
We match up the Hashmode line with the last Speed.# line which follows, then print when we see a new Hashmode, or reach end of file. (Failing to print the last one is a common beginner bug.)
This might work for you (GNU sed):
sed -E '/Hashmode:/{:a;x;s/^[^:]*: (\S+) -( \S+ \S+ ).*\nSpeed.*:\s*(\S+ \S+).*/\1\2\3/p;x;h;d};H;$!d;ba' file
If a line contains Hashmode, swap to the hold space and using pattern matching, manipulate its contents to the desired format and print, swap back to the pattern space, copy the current line to the hold space and delete the current line.
Otherwise, append the current line to the hold space and delete the current line, unless the current line is the last line in the file, in which case process the line as if it contained Hashmode.
N.B. The first time Hashmode is encountered, nothing is output. Subsequent matches and the end-of-file condition will be the only times printing occurs.
I have a file like this (tens of variables) :
PLAY="play"
APPS="/opt/play/apps"
LD_FILER="/data/mysql"
DATA_LOG="/data/log"
I need a script that will output the variables into another file like this (with space between them):
PLAY=${PLAY} APPS=${APPS} LD_FILER=${LD_FILER}
Is it possible ?
I would say:
$ awk -F= '{printf "%s=${%s} ", $1,$1} END {print ""}' file
PLAY=${PLAY} APPS=${APPS} LD_FILER=${LD_FILER} DATA_LOG=${DATA_LOG}
This loops through the file and prints the content before = in a format var=${var} together with a space. At the end, it prints a new line.
Note this leaves a trailing space at the end of the line. If this matters, we can check how to improve it.
< input sed -e 's/\(.*\)=.*/\1=${\1}/' | tr \\n \ ; echo
sed 's/"\([^"]*"\)"/={\1}/;H;$!d
x;y/\n/ /;s/.//' YourFile
your sample exclude last line so if this is important
sed '/DATA_LOG=/ d
s/"\([^"]*"\)"/={\1}/;H;$!d
x;y/\n/ /;s/.//' YourFile
I have a script that generates two lines as output each time. I'm really just interested in the second line. Moreover I'm only interested in the text that appears between a pair of #'s on the second line. Additionally, between the hashes, another delimiter is used: ^A. It would be great if I can also break apart each part of text that is ^A-delimited (Note that ^A is SOH special character and can be typed by using Ctrl-A)
output | sed -n '1p' #prints the 1st line of output
output | sed -n '1,3p' #prints the 1st, 2nd and 3rd line of output
your.program | tail +2 | cut -d# -f2
should get you 2/3 of the way.
Improving Grumdrig's answer:
your.program | head -n 2| tail -1 | cut -d# -f2
I'd probably use awk for that.
your_script | awk -F# 'NR == 2 && NF == 3 {
num_tokens=split($2, tokens, "^A")
for (i = 1; i <= num_tokens; ++i) {
print tokens[i]
}
}'
This says
1. Set the field separator to #
2. On lines that are the 2nd line, and also have 3 fields (text#text#text)
3. Split the middle (2nd) field using "^A" as the delimiter into the array named tokens
4. Print each token
Obviously this makes a lot of assumptions. You might need to tweak it if, for example, # or ^A can appear legitimately in the data, without being separators. But something like that should get you started. You might need to use nawk or gawk or something, I'm not entirely sure if plain awk can handle splitting on a control character.
bash:
read
read line
result="${line#*#}"
result="${result%#*}"
IFS=$'\001' read result -a <<< "$result"
$result is now an array that contains the elements you're interested in. Just pipe the output of the script to this one.
here's a possible awk solution
awk -F"#" 'NR==2{
for(i=2;i<=NF;i+=2){
split($i,a,"\001") # split on SOH
for(o in a ) print o # print the splitted hash
}
}' file