Empty string as a output field seperator for Cut - linux

How can I use cut with --output-delimiter=""? I want to join two columns using cut.
I tried the following command. However cat -v shows that there are non printable characters. Specifically "^#". Any suggestions to how can I overcome this?
cut -d, -f 3,6 --output-delimiter="" file1.csv | cat -v
This is the content of my file
011,IBM,Palmisano,t,t,t
012,INTC,Otellini,t,t,t
013,SAP,Snabe,t,t,t
014,VMW,Maritz,t,t,t
015,ORCL,Ellison,t,t,t
017,RHT,Whitehurst,t,t,t
When i run my command I'm seeing
Palmisano^#t
Otellini^#t
Snabe^#t
Maritz^#t
Ellison^#t
Whitehurst^#t
Expected output: Basically I want to exclude ^# in the output
Palmisanot
Otellinit
Snabet
Maritzt
Ellisont
Whitehurstt
Thank you.

The output delimiter is not an empty string, but probably the NULL character. You might want to try
cut -d, -f 3,6 --output-delimiter=$'\00' file1.csv
(Assuming your shell supports $'...'-quoting; bash and zsh are fine here, not sure about others).
edit:
cut apparently puts the NULL character if the output separator is set to the empty string. I do not see a way around it.
If awk is an acceptable solution, this will do the trick:
awk -F, '{print $3 $6}' file*
If you want to be more verbose and explicit:
awk 'BEGIN{FS=","; OFS=""}; {print $3,$6}' file*
FS="," sets the field separator to ,.
OFS="" sets the Output Field Separator to the empty string.

You probably don't want to cut by fields but instead by characters or perhaps bytes. See the description of -c and/or -b in the man page, instead of using -f.

Related

How can I isolate a single value from a list within an awk field?

Lets say i have a file called test and in this file contains some data:
jon:TX:34:red,green,yellow,black,orange
I'm trying to make it so it will only print the 4th field up until the comma and nothing else. But I need to leave the current FS in place because the fields are separated by the ":". Hope this makes sense.
I have been running this command:
awk '{ FS=":"; print $4 }' /test
I want my output to look like this.
jon:TX:34:red
or if you could even just figure out how i could just print the 4th field would be a good help too
red
It's overkill for your needs but in general to print the yth ,-separated subfield of the xth :-separated field of any input would be:
$ awk -F':' -v s=',' -v x=4 -v y=1 '{split($x,a,s); print a[y]}' file
red
Or
awk -F '[:,]' '{print $4}' test
output
red
It sounds like you are trying to extract the first field of the fourth field. Top level fields are delimited by ":" and the nested field is delimited by ",".
Combining two cut processes achieves this easily:
<input.txt cut -d: -f4 | cut -d, -f1
If you want all fields until the first comma, extract the first comma-delimited field without first cutting on colon:
cut -d, -f1 input.txt
if you want a purely regex approach :
echo 'jon:TX:34:red,green,yellow,black,orange' |
mawk NF=NF FS='.+:|,.+' OFS=
red
if you only want "red" without the trailing newline ("\n"), use RS/ORS instead of FS/OFS — (the % is the command prompt, i.e. no trailing \n):
mawk2 8 RS='.+:|,.+' ORS=
red%
if u wanna hard-code in the $4 :
gawk '$_= $4' FS=,\|: # gawk or nawk
mawk '$!NF=$4' FS=,\|: # any mawk
red
and if you only want the non-numeric text :
nawk NF=NF FS='[!-<]+' OFS='\f\b'
jon
TX
red
green
yellow
black
orange
If you have
jon:TX:34:red,green,yellow,black,orange
and desired output is
jon:TX:34:red
then just treat input as comma-separated and get 1st field, which might be expressed in GNU AWK as
echo "jon:TX:34:red,green,yellow,black,orange" | awk 'BEGIN{FS=","}{print $1}'
gives output
jon:TX:34:red
Explanation: I inform GNU AWK that , character is field separator (FS), for each line I print 1st column ($1)
(tested in GNU Awk 5.0.1)

How can I get the second column of a very large csv file using linux command?

I was given this question during an interview. I said I could do it with java or python like xreadlines() function to traverse the whole file and fetch the column, but the interviewer wanted me to just use linux cmd. How can I achieve that?
You can use the command awk. Below is an example of printing out the second column of a file:
awk -F, '{print $2}' file.txt
And to store it, you redirect it into a file:
awk -F, '{print $2}' file.txt > output.txt
You can use cut:
cut -d, -f2 /path/to/csv/file
I'd add to Andreas answer, but can't comment yet.
With csv, you have to give awk a field seperator argument, or it will define fields bound by whitespace instead of commas. (Obviously, csv that uses a different field seperator will need a different character to be declared.)
awk -F, '{print $2}' file.txt

Sed, Awk for combining the output of two cut statements

I'm trying to combine the below outputs into one command. The issue is that the field I'm trying to grab is in reverse order. I was told that cut doesn't support a "reverse" option and to use AWK for this purpose but it didn't end up working for my purpose. I'm trying to take the output of the ls- l against the /dev/block to return the partitions and automatically build a dd if= / of= for each outputted line based on the output of the command.
I tried piping the output to awk:
cut -d' ' -f23,25 ... | awk '{print $2,$1}'
however, the result was when using sed to input the prefix and suffix, it wasn't in the appropriate order.
I built the two statements below which individually return the expected output, just looking for the "right" way to combine both of these statements in the most efficient manner using sed / awk.
ls -l /dev/block/platform/msm_sdcc.1/by-name/ | cut -d' ' -f 25 | sed "s/^/dd if=/"
ls -l /dev/block/platform/msm_sdcc.1/by-name/ | cut -d' ' -f 23 | sed "s/.*/of=\/external_sd\/&.dsk/"
Any assistance will be appreciated.
Thank you.
If you're already using awk, I don't think you'll need cut or sed. You can probably do something like the following, though I'll have to trust you on the field numbers
ls -l /dev/block/platform/msm_sdcc.1/by-name | awk '{print "dd if=/"$25 " of=/" $23 ".dsk"}'
awk will split on all whitespace, not just the space character, so it's possible the fields will shift some, though it may be more reliable too.

Removing characters from grep output

I've been whittling down my grep output (which comes down to a listing of numbers that I intend to associate with other fields.) My problem is that numbers above 999 have commas in them, and I'm wondering how to print the output with out the commas.
so instead of the output being:
1,200,300
it would just be:
1200300
Any suggestions for an additional pipe command that I could add?
Thanks
Try this
< your command > | tr -d ','
tr will remove all commas
< your command > | sed -e 's/,//g'
This will replace all commas with "nothing" without changing anything else.
instead of grep use a single awk command like below
awk '/your pattern/{gsub(",","");print}' your_file

How to reverse order of fields using AWK?

I have a file with the following layout:
123,01-08-2006
124,01-09-2007
125,01-10-2009
126,01-12-2010
How can I convert it into the following by using AWK?
123,2006-08-01
124,2007-09-01
125,2009-10-01
126,2009-12-01
Didn't read the question properly the first time. You need a field separator that can be either a dash or a comma. Once you have that you can use the dash as an output field separator (as it's the most common) and fake the comma using concatenation:
awk -F',|-' 'OFS="-" {print $1 "," $4,$3,$2}' file
Pure awk
awk -F"," '{ n=split($2,b,"-");$2=b[3]"-"b[2]"-"b[1];$i=$1","$2 } 1' file
sed
sed -r 's/(^.[^,]*,)([0-9]{2})-([0-9]{2})-([0-9]{4})/\1\4-\3-\2/' file
sed 's/\(^.[^,]*,\)\([0-9][0-9]\)-\([0-9][0-9]\)-\([0-9]\+\)/\1\4-\3-\2/' file
Bash
#!/bin/bash
while IFS="," read -r a b
do
IFS="-"
set -- $b
echo "$a,$3-$2-$1"
done <"file"
Unfortunately, I think standard awk only allows one field separator character so you'll have to pre-process the data. You can do this with tr but if you really want an awk-only solution, use:
pax> echo '123,01-08-2006
124,01-09-2007
125,01-10-2009
126,01-12-2010' | awk -F, '{print $1"-"$2}' | awk -F- '{print $1","$4"-"$3"-"$2}'
This outputs:
123,2006-08-01
124,2007-09-01
125,2009-10-01
126,2010-12-01
as desired.
The first awk changes the , characters to - so that you have four fields separated with the same character (this is the bit I'd usually use tr ',' '-' for).
The second awk prints them out in the order you specified, correcting the field separators at the same time.
If you're using an awk implementation that allows multiple FS characters, you can use something like:
gawk -F ',|-' '{print $1","$4"-"$3"-"$2}'
If it doesn't need to be awk, you could use Perl too:
$ perl -nle 'print "$1,$4-$3-$2" while (/(\d{3}),(\d{2})-(\d{2})-(\d{4})\s*/g)' < file.txt

Resources