Sed, Awk for combining the output of two cut statements

Sed, Awk for combining the output of two cut statements - linux

I'm trying to combine the below outputs into one command. The issue is that the field I'm trying to grab is in reverse order. I was told that cut doesn't support a "reverse" option and to use AWK for this purpose but it didn't end up working for my purpose. I'm trying to take the output of the ls- l against the /dev/block to return the partitions and automatically build a dd if= / of= for each outputted line based on the output of the command.
I tried piping the output to awk:
cut -d' ' -f23,25 ... | awk '{print $2,$1}'
however, the result was when using sed to input the prefix and suffix, it wasn't in the appropriate order.
I built the two statements below which individually return the expected output, just looking for the "right" way to combine both of these statements in the most efficient manner using sed / awk.
ls -l /dev/block/platform/msm_sdcc.1/by-name/ | cut -d' ' -f 25 | sed "s/^/dd if=/"
ls -l /dev/block/platform/msm_sdcc.1/by-name/ | cut -d' ' -f 23 | sed "s/.*/of=\/external_sd\/&.dsk/"
Any assistance will be appreciated.
Thank you.

If you're already using awk, I don't think you'll need cut or sed. You can probably do something like the following, though I'll have to trust you on the field numbers
ls -l /dev/block/platform/msm_sdcc.1/by-name | awk '{print "dd if=/"$25 " of=/" $23 ".dsk"}'
awk will split on all whitespace, not just the space character, so it's possible the fields will shift some, though it may be more reliable too.

Related

How to strip stdout before logging into file? [duplicate]

Without using sed or awk, only cut, how do I get the last field when the number of fields are unknown or change with every line?

You could try something like this:
echo 'maps.google.com' | rev | cut -d'.' -f 1 | rev
Explanation
rev reverses "maps.google.com" to be moc.elgoog.spam
cut uses dot (ie '.') as the delimiter, and chooses the first field, which is moc
lastly, we reverse it again to get com

Use a parameter expansion. This is much more efficient than any kind of external command, cut (or grep) included.
data=foo,bar,baz,qux
last=${data##*,}
See BashFAQ #100 for an introduction to native string manipulation in bash.

It is not possible using just cut. Here is a way using grep:
grep -o '[^,]*$'
Replace the comma for other delimiters.
Explanation:
-o (--only-matching) only outputs the part of the input that matches the pattern (the default is to print the entire line if it contains a match).
[^,] is a character class that matches any character other than a comma.
* matches the preceding pattern zero or more time, so [^,]* matches zero or more non‑comma characters.
$ matches the end of the string.
Putting this together, the pattern matches zero or more non-comma characters at the end of the string.
When there are multiple possible matches, grep prefers the one that starts earliest. So the entire last field will be matched.
Full example:
If we have a file called data.csv containing
one,two,three
foo,bar
then grep -o '[^,]*$' < data.csv will output
three
bar

Without awk ?...
But it's so simple with awk:
echo 'maps.google.com' | awk -F. '{print $NF}'
AWK is a way more powerful tool to have in your pocket.
-F if for field separator
NF is the number of fields (also stands for the index of the last)

There are multiple ways. You may use this too.
echo "Your string here"| tr ' ' '\n' | tail -n1
> here
Obviously, the blank space input for tr command should be replaced with the delimiter you need.

This is the only solution possible for using nothing but cut:
echo "s.t.r.i.n.g." | cut -d'.' -f2-
[repeat_following_part_forever_or_until_out_of_memory:] | cut -d'.' -f2-
Using this solution, the number of fields can indeed be unknown and vary from time to time. However as line length must not exceed LINE_MAX characters or fields, including the new-line character, then an arbitrary number of fields can never be part as a real condition of this solution.
Yes, a very silly solution but the only one that meets the criterias I think.

If your input string doesn't contain forward slashes then you can use basename and a subshell:
$ basename "$(echo 'maps.google.com' | tr '.' '/')"
This doesn't use sed or awk but it also doesn't use cut either, so I'm not quite sure if it qualifies as an answer to the question as its worded.
This doesn't work well if processing input strings that can contain forward slashes. A workaround for that situation would be to replace forward slash with some other character that you know isn't part of a valid input string. For example, the pipe (|) character is also not allowed in filenames, so this would work:
$ basename "$(echo 'maps.google.com/some/url/things' | tr '/' '|' | tr '.' '/')" | tr '|' '/'

the following implements A friend's suggestion
#!/bin/bash
rcut(){
nu="$( echo $1 | cut -d"$DELIM" -f 2- )"
if [ "$nu" != "$1" ]
then
rcut "$nu"
else
echo "$nu"
fi
}
$ export DELIM=.
$ rcut a.b.c.d
d

An alternative using perl would be:
perl -pe 's/(.*) (.*)$/$2/' file
where you may change \t for whichever the delimiter of file is

It is better to use awk while working with tabular data. You don't have to master on command. If it can be achieved by awk, why not use that? I suggest you do not waste your precious time, and use a handful of commands to get the job done.
Example:
# $NF refers to the last column in awk
ll | awk '{print $NF}'

If you have a file named filelist.txt that is a list paths such as the following:
c:/dir1/dir2/file1.h
c:/dir1/dir2/dir3/file2.h
then you can do this:
rev filelist.txt | cut -d"/" -f1 | rev

Adding an approach to this old question just for the fun of it:
$ cat input.file # file containing input that needs to be processed
a;b;c;d;e
1;2;3;4;5
no delimiter here
124;adsf;15454
foo;bar;is;null;info
$ cat tmp.sh # showing off the script to do the job
#!/bin/bash
delim=';'
while read -r line; do
while [[ "$line" =~ "$delim" ]]; do
line=$(cut -d"$delim" -f 2- <<<"$line")
done
echo "$line"
done < input.file
$ ./tmp.sh # output of above script/processed input file
e
5
no delimiter here
15454
info
Besides bash, only cut is used.
Well, and echo, I guess.

choose -1
choose supports negative indexing (the syntax is similar to Python's slices).

I realized if we just ensure a trailing delimiter exists, it works. So in my case I have comma and whitespace delimiters. I add a space at the end;
$ ans="a, b"
$ ans+=" "; echo ${ans} | tr ',' ' ' | tr -s ' ' | cut -d' ' -f2
b

How to clean output, prints the desired information with less CPU usage

I have 20GB log file, where it contains lots of fields, the field or column numbers 2 contains numbers. I use the below commands to print only column 2
zcat /path to file location/$date*/logfile_*.dat.zip | awk '/Read:ROP/' | nawk -F "=" '{print $2}'
the result of this command is:
"93711994166", Key
since i want only the number then i append the below command to my original command to clean the output:
| awk -F, '{print $1}' | sed 's/"//g'
the result is:
93711994166
my final purpose is to print only numbers having length other than 11 digits, therefore, I append the following to my final command:
-vE '^.{11}$'
so my final command is:
zcat /path to file location/$date*/logfile_*.dat.zip | awk '/Read:ROP/' | nawk -F "=" '{print $2}' | awk -F, '{print $1}' | sed 's/"//g' | grep -vE '^.{11}$' >/tmp/$file
this command takes long time to execute also causes high CPU usage. I want to achieve the following:
print all numbers with length not equal to 11 digits.
print all numbers that do not start with 93 (regardless of their length)
clean, effective and not cpu or memory costly command
I have another requirement which is to print also the numbers that not started with 93.
Note:
the log file contains lots of different lines but i use awk '/Read:ROP/' to work on the below output and extract numbers
Read:ROP (CustomerId="93700001865", Key=1, ActiveEndDate=2025-01-19 20:12:22, FirstCallDate=2018-01-08 12:30:30, IsFirstCallPassed=true, IsLocked=false, LTH={Data=["1|
MOC|07.07.2020 09:18:58|48000.0|119||OnPeakAccountID|480|19250||", "1|RECHARGE|04.07.2020 10:18:32|-4500.0|0|0", "1|RECHARGE|04.07.2020 10:18:59|-4500.0|0|0"], Index=0
}, LanguageID=2, LastKnownPeriod="Active", LastRechargeAmount=4500, LastRechargeDate=2020-07-04 10:18:59, VoucherRchFraudCounter=0, c_BlockPAYG=true, s_PackageKeyCount
er=13, s_OfferId="xyz", OnPeakAccountID_FU={Balance=18850});

20GB log file [...] zcat
Using zcat on 20GB log files is quite expensive. Check top when running your command line above.
It might be worth keeping the data from the first filtering step:
zcat /path to file location/$date*/logfile_*.dat.zip | awk '/Read:ROP/' > filter_data.out
and work with the filtered data. I assume here that this awk step can remove the majority of the data.
Bonus points: This step can be parallelized by running the zcat [...] |awk [...] pipe file-by-file, and you only need to do this once for each file.
The other steps don't look particularly expensive unless there are a lot of data lines left even after filtering.

sed '/.*Read:ROP.*([^=]="\([^"]*\)".*/!d; s//\1/'
/.../ - match regex
.*Read:ROP.* - match Read:ROP followed by anything with anything in front, ie. awk '/Read:ROP/'
([^=]*=" - match a (, followed by anything except =, then a =, then a ", ie. nawk -F "=" '{print $2}'
\([^"]*\) - match everythjing inside qoutes. I guess [0-9] would be fine also
".* - delete rest of line
! - if the line doesn't match the regex
d - remove the line
s - substitute
// - reuse the regex in /.../
\1 - substitute for first backreference, ie. for \([^"]*\)

Bash: Flip strings to the other side of the delimiter

Basically, I have a file formatted like
ABC:123
And I would like to flip the strings around the delimiter, so it would look like this
123:ABC
I would prefer to do this with bash/linux tools.
Thanks for any help!

That's reasonably easy with internal bash commands, assuming two fields, as per the following transcript:
pax:~$ x='abc:123'
pax:~$ echo "${x#*:}:${x%:*}"
123:abc
The first substitution ${x#*:} removes everything from the start up to the colon. The second, ${x%:*}, removes everything from the colon to the end.
Then you just re-join them with the colon in-between.
It doesn't matter for your particular data but % and # use the shortest possible pattern. The %% and ## variants will give you the longest possible pattern (greedy).
As an aside, this is ideal if you doing it for one string at a time since you don't need to kick up an external process to do the work for you. But, if you're processing an entire file, there are better ways to do it, such as with awk:
pax:~$ printf "abc:123\ndef:456\nghi:789\n" | awk -F: '{print $2 FS $1}'
123:abc
456:def
789:ghi

#!/bin/sh -x
var1=$(echo -e 'ABC:123' | cut -d':' -f1)
var2=$(echo -e 'ABC:123' | cut -d':' -f2)
echo -e "${var2}":"${var1}"
I use cut to split the string into two parts, and store both of those parts as variables.
From there, it's possible to use echo to re-arrange the variables as you see fit.

Using sed.
sed -E 's/(.*):(.*)/\2:\1/' file.txt
Using paste and cut with process substitution.
paste -d: <(cut -d : -f2 file.txt) <(cut -d : -f1 file.txt)
A slower/slowest shell solution on large set of data/files.
while IFS=: read -r left rigth; do printf '%s:%s\n' "$rigth" "$left"; done < file.txt

Empty string as a output field seperator for Cut

How can I use cut with --output-delimiter=""? I want to join two columns using cut.
I tried the following command. However cat -v shows that there are non printable characters. Specifically "^#". Any suggestions to how can I overcome this?
cut -d, -f 3,6 --output-delimiter="" file1.csv | cat -v
This is the content of my file
011,IBM,Palmisano,t,t,t
012,INTC,Otellini,t,t,t
013,SAP,Snabe,t,t,t
014,VMW,Maritz,t,t,t
015,ORCL,Ellison,t,t,t
017,RHT,Whitehurst,t,t,t
When i run my command I'm seeing
Palmisano^#t
Otellini^#t
Snabe^#t
Maritz^#t
Ellison^#t
Whitehurst^#t
Expected output: Basically I want to exclude ^# in the output
Palmisanot
Otellinit
Snabet
Maritzt
Ellisont
Whitehurstt
Thank you.

The output delimiter is not an empty string, but probably the NULL character. You might want to try
cut -d, -f 3,6 --output-delimiter=$'\00' file1.csv
(Assuming your shell supports $'...'-quoting; bash and zsh are fine here, not sure about others).
edit:
cut apparently puts the NULL character if the output separator is set to the empty string. I do not see a way around it.
If awk is an acceptable solution, this will do the trick:
awk -F, '{print $3 $6}' file*
If you want to be more verbose and explicit:
awk 'BEGIN{FS=","; OFS=""}; {print $3,$6}' file*
FS="," sets the field separator to ,.
OFS="" sets the Output Field Separator to the empty string.

You probably don't want to cut by fields but instead by characters or perhaps bytes. See the description of -c and/or -b in the man page, instead of using -f.

How to find the last field using 'cut'

Without using sed or awk, only cut, how do I get the last field when the number of fields are unknown or change with every line?

You could try something like this:
echo 'maps.google.com' | rev | cut -d'.' -f 1 | rev
Explanation
rev reverses "maps.google.com" to be moc.elgoog.spam
cut uses dot (ie '.') as the delimiter, and chooses the first field, which is moc
lastly, we reverse it again to get com

Use a parameter expansion. This is much more efficient than any kind of external command, cut (or grep) included.
data=foo,bar,baz,qux
last=${data##*,}
See BashFAQ #100 for an introduction to native string manipulation in bash.

It is not possible using just cut. Here is a way using grep:
grep -o '[^,]*$'
Replace the comma for other delimiters.
Explanation:
-o (--only-matching) only outputs the part of the input that matches the pattern (the default is to print the entire line if it contains a match).
[^,] is a character class that matches any character other than a comma.
* matches the preceding pattern zero or more time, so [^,]* matches zero or more non‑comma characters.
$ matches the end of the string.
Putting this together, the pattern matches zero or more non-comma characters at the end of the string.
When there are multiple possible matches, grep prefers the one that starts earliest. So the entire last field will be matched.
Full example:
If we have a file called data.csv containing
one,two,three
foo,bar
then grep -o '[^,]*$' < data.csv will output
three
bar

Without awk ?...
But it's so simple with awk:
echo 'maps.google.com' | awk -F. '{print $NF}'
AWK is a way more powerful tool to have in your pocket.
-F if for field separator
NF is the number of fields (also stands for the index of the last)

There are multiple ways. You may use this too.
echo "Your string here"| tr ' ' '\n' | tail -n1
> here
Obviously, the blank space input for tr command should be replaced with the delimiter you need.

This is the only solution possible for using nothing but cut:
echo "s.t.r.i.n.g." | cut -d'.' -f2-
[repeat_following_part_forever_or_until_out_of_memory:] | cut -d'.' -f2-
Using this solution, the number of fields can indeed be unknown and vary from time to time. However as line length must not exceed LINE_MAX characters or fields, including the new-line character, then an arbitrary number of fields can never be part as a real condition of this solution.
Yes, a very silly solution but the only one that meets the criterias I think.

If your input string doesn't contain forward slashes then you can use basename and a subshell:
$ basename "$(echo 'maps.google.com' | tr '.' '/')"
This doesn't use sed or awk but it also doesn't use cut either, so I'm not quite sure if it qualifies as an answer to the question as its worded.
This doesn't work well if processing input strings that can contain forward slashes. A workaround for that situation would be to replace forward slash with some other character that you know isn't part of a valid input string. For example, the pipe (|) character is also not allowed in filenames, so this would work:
$ basename "$(echo 'maps.google.com/some/url/things' | tr '/' '|' | tr '.' '/')" | tr '|' '/'

the following implements A friend's suggestion
#!/bin/bash
rcut(){
nu="$( echo $1 | cut -d"$DELIM" -f 2- )"
if [ "$nu" != "$1" ]
then
rcut "$nu"
else
echo "$nu"
fi
}
$ export DELIM=.
$ rcut a.b.c.d
d

An alternative using perl would be:
perl -pe 's/(.*) (.*)$/$2/' file
where you may change \t for whichever the delimiter of file is

It is better to use awk while working with tabular data. You don't have to master on command. If it can be achieved by awk, why not use that? I suggest you do not waste your precious time, and use a handful of commands to get the job done.
Example:
# $NF refers to the last column in awk
ll | awk '{print $NF}'

If you have a file named filelist.txt that is a list paths such as the following:
c:/dir1/dir2/file1.h
c:/dir1/dir2/dir3/file2.h
then you can do this:
rev filelist.txt | cut -d"/" -f1 | rev

Adding an approach to this old question just for the fun of it:
$ cat input.file # file containing input that needs to be processed
a;b;c;d;e
1;2;3;4;5
no delimiter here
124;adsf;15454
foo;bar;is;null;info
$ cat tmp.sh # showing off the script to do the job
#!/bin/bash
delim=';'
while read -r line; do
while [[ "$line" =~ "$delim" ]]; do
line=$(cut -d"$delim" -f 2- <<<"$line")
done
echo "$line"
done < input.file
$ ./tmp.sh # output of above script/processed input file
e
5
no delimiter here
15454
info
Besides bash, only cut is used.
Well, and echo, I guess.

choose -1
choose supports negative indexing (the syntax is similar to Python's slices).

I realized if we just ensure a trailing delimiter exists, it works. So in my case I have comma and whitespace delimiters. I add a space at the end;
$ ans="a, b"
$ ans+=" "; echo ${ans} | tr ',' ' ' | tr -s ' ' | cut -d' ' -f2
b

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Sed, Awk for combining the output of two cut statements - linux

Related

How to strip stdout before logging into file? [duplicate]

How to clean output, prints the desired information with less CPU usage

Bash: Flip strings to the other side of the delimiter

Empty string as a output field seperator for Cut

How to find the last field using 'cut'

Categories

Resources