UNIX Shell Script remove one column from the file

UNIX Shell Script remove one column from the file - linux

I have a file like the following:
Header1:value1|value2|value3|
Header2:value4|value5|value6|
The column number is unknown and I have a function which can return the column number.
And I want to write a script which can remove one column from the file. For exampple, after removing column 1, I will get:
Header1:value2|value3|
Header2:value5|value6|
I use cut to achieve this and so far I can give the values after removing one column but without the headers. For example
value2|value3|
value5|value6|
Could anyone tell me how can I add headers back? Or any command can do that directly? Thanks.

Replace the colon with a pipe, do your cut command, then replace the first pipe with a colon again:
sed 's/:/|/' input.txt | cut ... | sed 's/|/:/'
You may need to adjust the column number for the cut command, to ensure you don't count the header.

Turn the ':' into '|', so that the header is another field, rather than part of the first field. You can do that either in whatever generates the data to begin with, or by passing the data through tr ':' '|' before cut. The rest of your fields will be offset by +1 then, but that should be easy enough to compensate for.

Your problem is that HeaderX are followed by ':' which is not the '|' delimiter you use in cut.
You could separate first your lines in two parts with :, with something like
"cut -f 1 --delimiter=: YOURFILE", then remove the first column and then put back the headers.

awk can handle multiple delimiters. So another alternative is...
jkern#ubuntu:~/scratch$ cat ./data188
Header1:value1|value2|value3|
Header2:value4|value5|value6|
jkern#ubuntu:~/scratch$ awk -F"[:|]" '{ print $1 $3 $4 }' ./data188
Header1value2value3
Header2value5value6

you can do it just with sed without cut:
sed 's/:[^|]*|/:/' input.txt

My solution:
$ sed 's,:,|,' data | awk -F'|' 'BEGIN{OFS="|"}{$2=""; print}' | sed 's,||,:,'
Header1:value2|value3|
Header2:value5|value6|
replace : with |
-F'|' tells awk to use | symbol as field separator
in each line we replace 2nd (because header now becomes first) field with empty string and printing result line with new field separator (|)
return back header by replacing first | with :
Not perfect, but should works.

$ cat file.txt | grep 'Header1' | awk -F"1" '{ print $1 $2 $3 $4}'
This will print all values in separate columns. You can print any number of columns.

Just chiming in with a Perl solution:
(rearrange/remove fields as needed)
-l effectively adds a newline to every print statement
-a autosplit mode splits each line using the -F expression into array #F
-n adds a loop around the -e code
-e your 'one liner' follows this option
$ perl -F[:\|] -lane 'print "$F[0]:$F[1]|$F[2]|$F[3]"' input.txt

Related

Simplest way to replace (\n) with (" ") in bash?

In a plain text file, I am trying to get from
Item 1
Item 2
to
"Item 1" "Item 2"
I tried using the tr command (cat FILE.txt | tr "\n" "\" \""),
but that did not work.
I also tried using cat FILE.txt | tr '\n' '\" \"', but again, no avail.
Can anyone help me do it?
Also, as a bonus question, what is the easiest way to get the first double quote?
With my method, if I get it to work, I will end up with:
Item 1" "Item 2"
P.S. Thanks Jorengarenar for helping me with the edit.

One possibility would be to use awk.
awk '{ printf " " "\""$0"\"" }' FILE
For the bonus question, just remove the second quote after the $0 variable.
awk '{ printf " " "\""$0"" }' FILE
If you want another delimiter, you can change the first argument to whatever you like.

Try this:
awk '{printf spacer "\"" $0 "\""; spacer=" "} END {print ""}' FILE.txt
Explanation: for each line, this prints a spacer (which is initially empty), a literal double-quote, the original line (not including its terminating newline), and another literal double-quote. Then, it sets spacer to a single space, so that for all but the first line there'll be a space printed before it. printf doesn't add a newline, so all of this gets printed as a single long line. But at the end, we need to add a final newline, which a normal print takes care of.

If you want to change the file itself, one way using ed:
ed -s file.txt <<'EOF'
1,$ s/^/"/
1,$ s/$/" /
1,$ j
w
EOF
First add a double quote to the beginning of every line, then a double quote and space to the end, and finally join all the lines into one and write the changed file back to disk.

In two steps:
1.cat FILE.txt | tr '\n' ' '
2.sed -E 's:([a-zA-Z0-9]+):"\1":g' FILE.txt

This might work for you (GNU utils):
sed 's/.*/"&"/' file | paste -sd\
or:
sed 's/.*/\\"&\\"/' file | xargs
N.B. The -d option in paste is a backslash followed by a space for the delimiter.

or leverage awk
[gmn]awk -v RS='^$' 'gsub(/^|\n$/,"\"")+gsub(/\n/,"\" \"")'
or
[gmn]awk -v RS='' 'gsub(/^|\n?$/,"\"")+gsub(/\n/,"\" \"")'
not as elegant of a one-liner as i hoped for
third version feels more elegant, but one extra space created at the tail end :
[gmn]awk -v ORS=' ' 'gsub(/^|$/,"\"")'
which can be rectified by one extra sed, which defeats the elegance :
[gmn]awk -v ORS=' ' 'gsub(/^|$/,"\"")' | gsed -z 's/.$//'

String split and extract the last field in bash

I have a text file FILENAME. I want to split the string at - of the first column field and extract the last element from each line. Here "$(echo $line | cut -d, -f1 | cut -d- -f4)"; alone is not giving me the right result.
FILENAME:
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram
code I tried:
while read line; do \
DNA="$(echo $line | cut -d, -f1 | cut -d- -f4)";
echo $DNA
done < ${FILENAME}
Result I want
1195060301
1195060302
1195060311

Would you please try the following:
while IFS=, read -r f1 _; do # set field separator to ",", assigns f1 to the 1st field and _ to the rest
dna=${f1##*-} # removes everything before the rightmost "-" from "$f1"
echo "$dna"
done < "$FILENAME"

Well, I had to do with the two lines of codes. May be someone has a better approach.
while read line; do \
DNA="$(echo $line| cut -d, -f1| rev)"
DNA="$(echo $DNA| cut -d- -f1 | rev)"
echo $DNA
done < ${FILENAME}

I do not know the constraints on your input file, but if what you are looking for is a 10-digit number, and there is only ever one 10-digit number per line... This should do niceley
grep -Eo '[0-9]{10,}' input.txt
1195060301
1195060302
1195060311
This essentially says: Show me all 10 digit numbers in this file
input.txt
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram

A sed approach:
sed -nE 's/.*-([[:digit:]]+)\,.*/\1/p' input_file
sed options:
-n: Do not print the whole file back, but only explicit /p.
-E: Use Extend Regex without need to escape its grammar.
sed Extended REgex:
's/.*-([[:digit:]]+)\,.*/\1/p': Search, capture one or more digit in group 1, preceded by anything and a dash, followed by a comma and anything, and print only the captured group.

Using awk:
awk -F[,] '{ split($1,arr,"-");print arr[length(arr)] }' FILENAME
Using , as a separator, take the first delimited "piece" of data and further split it into an arr using - as the delimiter and awk's split function. We then print the last index of arr.

grep with two or more words, one line by file with many files

everyone. I have
file 1.log:
text1 value11 text
text text
text2 value12 text
file 2.log:
text1 value21 text
text text
text2 value22 text
I want:
value11;value12
value21;value22
For now I grep values in separated files and paste later in another file, but I think this is not a very elegant solution because I need to read all files more than one time, so I try to use grep for extract all data in a single cat | grep line, but is not the result I expected.
I use:
cat *.log | grep -oP "(?<=text1 ).*?(?= )|(?<=text2 ).*?(?= )" | tr '\n' '; '
or
cat *.log | grep -oP "(?<=text1 ).*?(?= )|(?<=text2 ).*?(?= )" | xargs
but I get in each case:
value11;value12;value21;value22
value11 value12 value21 value22
Thank you so much.

Try:
$ awk -v RS='[[:space:]]+' '$0=="text1" || $0=="text2"{getline; printf "%s%s",sep,$0; sep=";"} ENDFILE{if(sep)print""; sep=""}' *.log
value11;value12
value21;value22
For those who prefer their commands spread over multiple lines:
awk -v RS='[[:space:]]+' '
$0=="text1" || $0=="text2" {
getline
printf "%s%s",sep,$0
sep=";"
}
ENDFILE {
if(sep)print""
sep=""
}' *.log
How it works
-v RS='[[:space:]]+'
This tells awk to treat any sequence of whitespace (newlines, blanks, tabs, etc) as a record separator.
$0=="text1" || $0=="text2"{getline; printf "%s%s",sep,$0; sep=";"}
This tells awk to look file records that matches either text1 ortext2`. For those records and those records only the commands in curly braces are executed. Those commands are:
getline tells awk to read in the next record.
printf "%s%s",sep,$0 tells awk to print the variable sep followed by the word in the record.
After we print the first match, the command sep=";" is executed which tells awk to set the value of sep to a semicolon.
As we start each file, sep is empty. This means that the first match from any file is printed with no separator preceding it. All subsequent matches from the same file will have a ; to separate them.
ENDFILE{if(sep)print""; sep=""}
After the end of each file is reached, we print a newline if sep is not empty and then we set sep back to an empty string.
Alternative: Printing the second word if the first word ends with a number
In an alternative interpretation of the question (hat tip: David C. Rankin), we want to print the second word on any line for which the first word ends with a number. In that case, try:
$ awk '$1~/[0-9]$/{printf "%s%s",sep,$2; sep=";"} ENDFILE{if(sep)print""; sep=""}' *.log
value11;value12
value21;value22
In the above, $1~/[0-9]$/ selects the lines for which the first word ends with a number and printf "%s%s",sep,$2 prints the second field on that line.
Discussion
The original command was:
$ cat *.log | grep -oP "(?<=text1 ).*?(?= )|(?<=text2 ).*?(?= )" | tr '\n' '; '
value11;value12;value21;value22;
Note that, when using most unix commands, cat is rarely ever needed. In this case, for example, grep accepts a list of files. So, we could easily do without the extra cat process and get the same output:
$ grep -hoP "(?<=text1 ).*?(?= )|(?<=text2 ).*?(?= )" *.log | tr '\n' '; '
value11;value12;value21;value22;

I agree with #John1024 and how you approach this problem will really depend on what the actual text is you are looking for. If for instance your lines of concern start with text{1,2,...} and then what you want in the second field can be anything, then his approach is optimal. However, if the values in the first field and vary and what you are really interested in is records where you have valueXX in the second field, then an approach keying off the second field may be what you are looking for.
Taking for example your second field, if the text you are interested in is in the form valueXX (where XX are two or more digits at the end of the field), you can process only those records where your second field matches and then use a simple conditional testing whether FNR == 1 to control the ';' delimiter output and ENDFILE to control the new line similar to:
awk '$2 ~ /^value[0-9][0-9][0-9]*$/ {
printf "%s%s", (FNR == 1) ? "" : ";", $2
}
ENDFILE {
print ""
}' file1.log file2.log
Example Use/Output
$ awk '$2 ~ /^value[0-9][0-9][0-9]*$/ {
printf "%s%s", (FNR == 1) ? "" : ";", $2
}
ENDFILE {
print ""
}' file1.log file2.log
value11;value12
value21;value22
Look things over and consider your actual input files and then either one of these two approaches should get you there.

If I understood you correctly, you want the values but search for the text[12] ie. to get the word after matching search word, not the matching search word:
$ awk -v s="^text[12]$" ' # set the search regex *
FNR==1 { # in the beginning of each file
b=b (b==""?"":"\n") # terminate current buffer with a newline
}
{
for(i=1;i<NF;i++) # iterate all but last word
if($i~s) # if current word matches search pattern
b=b (b~/^$|\n$/?"":";") $(i+1) # add following word to buffer
}
END { # after searching all files
print b # output buffer
}' *.log
Output:
value11;value12
value21;value22
* regex could be for example ^(text1|text2)$, too.

search for a string and after getting result cut that word and store result in variable

I Have a file name abc.lst i ahve stored that in a variable it contain 3 words string among them i want to grep second word and in that i want to cut the word from expdp to .dmp and store that into variable
example:-
REFLIST_OP=/tmp/abc.lst
cat $REFLIST_OP
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
Desired Output:-
expdp_TEST_P119_*_18112017.dmp
I Have tried below command :-
FULL_DMP_NAME=`cat $REFLIST_OP|grep /orabackup|awk '{print $2}'`
echo $FULL_DMP_NAME
/data/abc/GOon/expdp_TEST_P119_*_18112017.dmp

REFLIST_OP=/tmp/abc.lst
awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
Test Results:
$ REFLIST_OP=/tmp/abc.lst
$ cat "$REFLIST_OP"
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
$ awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
expdp_TEST_P119_*_18112017.dmp
To save in variable
myvar=$( awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP" )

Following awk may help you on same.
awk -F'/| ' '{print $6}' Input_file
OR
awk -F'/| ' '{print $6}' "$REFLIST_OP"
Explanation: Simply making space and / as a field separator(as per your shown Input_file) and then printing 6th field of the line which is required by OP.
To see the field number and field's value you could use following command too:
awk -F'/| ' '{for(i=1;i<=NF;i++){print i,$i}}' "$REFLIST_OP"

Using sed with one of these regex
sed -e 's/.*\/\([^[:space:]]*\).*/\1/' abc.lst capture non space characters after /, printing only the captured part.
sed -re 's|.*/([^[:space:]]*).*|\1|' abc.lst Same as above, but using different separator, thus avoiding to escape the /. -r to use unescaped (
sed -e 's|.*/||' -e 's|[[:space:]].*||' abc.lst in two steps, remove up to last /, remove from space to end. (May be easiest to read/understand)
myvar=$(<abc.lst); myvar=${myvar##*/}; myvar=${myvar%% *}; echo $myvar
If you want to avoid external command (sed)

How to concatenate multiple lines of output to one line?

If I run the command cat file | grep pattern, I get many lines of output. How do you concatenate all lines into one line, effectively replacing each "\n" with "\" " (end with " followed by space)?
cat file | grep pattern | xargs sed s/\n/ /g
isn't working for me.

Use tr '\n' ' ' to translate all newline characters to spaces:
$ grep pattern file | tr '\n' ' '
Note: grep reads files, cat concatenates files. Don't cat file | grep!
Edit:
tr can only handle single character translations. You could use awk to change the output record separator like:
$ grep pattern file | awk '{print}' ORS='" '
This would transform:
one
two
three
to:
one" two" three"

Piping output to xargs will concatenate each line of output to a single line with spaces:
grep pattern file | xargs
Or any command, eg. ls | xargs. The default limit of xargs output is ~4096 characters, but can be increased with eg. xargs -s 8192.
grep xargs

In bash echo without quotes remove carriage returns, tabs and multiple spaces
echo $(cat file)

This could be what you want
cat file | grep pattern | paste -sd' '
As to your edit, I'm not sure what it means, perhaps this?
cat file | grep pattern | paste -sd'~' | sed -e 's/~/" "/g'
(this assumes that ~ does not occur in file)

This is an example which produces output separated by commas. You can replace the comma by whatever separator you need.
cat <<EOD | xargs | sed 's/ /,/g'
> 1
> 2
> 3
> 4
> 5
> EOD
produces:
1,2,3,4,5

The fastest and easiest ways I know to solve this problem:
When we want to replace the new line character \n with the space:
xargs < file
xargs has own limits on the number of characters per line and the number of all characters combined, but we can increase them. Details can be found by running this command: xargs --show-limits and of course in the manual: man xargs
When we want to replace one character with another exactly one character:
tr '\n' ' ' < file
When we want to replace one character with many characters:
tr '\n' '~' < file | sed s/~/many_characters/g
First, we replace the newline characters \n for tildes ~ (or choose another unique character not present in the text), and then we replace the tilde characters with any other characters (many_characters) and we do it for each tilde (flag g).

Here is another simple method using awk:
# cat > file.txt
a
b
c
# cat file.txt | awk '{ printf("%s ", $0) }'
a b c
Also, if your file has columns, this gives an easy way to concatenate only certain columns:
# cat > cols.txt
a b c
d e f
# cat cols.txt | awk '{ printf("%s ", $2) }'
b e

I like the xargs solution, but if it's important to not collapse spaces, then one might instead do:
sed ':b;N;$!bb;s/\n/ /g'
That will replace newlines for spaces, without substituting the last line terminator like tr '\n' ' ' would.
This also allows you to use other joining strings besides a space, like a comma, etc, something that xargs cannot do:
$ seq 1 5 | sed ':b;N;$!bb;s/\n/,/g'
1,2,3,4,5

Here is the method using ex editor (part of Vim):
Join all lines and print to the standard output:
$ ex +%j +%p -scq! file
Join all lines in-place (in the file):
$ ex +%j -scwq file
Note: This will concatenate all lines inside the file it-self!

Probably the best way to do it is using 'awk' tool which will generate output into one line
$ awk ' /pattern/ {print}' ORS=' ' /path/to/file
It will merge all lines into one with space delimiter

paste -sd'~' giving error.
Here's what worked for me on mac using bash
cat file | grep pattern | paste -d' ' -s -
from man paste .
-d list Use one or more of the provided characters to replace the newline characters instead of the default tab. The characters
in list are used circularly, i.e., when list is exhausted the first character from list is reused. This continues until
a line from the last input file (in default operation) or the last line in each file (using the -s option) is displayed,
at which time paste begins selecting characters from the beginning of list again.
The following special characters can also be used in list:
\n newline character
\t tab character
\\ backslash character
\0 Empty string (not a null character).
Any other character preceded by a backslash is equivalent to the character itself.
-s Concatenate all of the lines of each separate input file in command line order. The newline character of every line
except the last line in each input file is replaced with the tab character, unless otherwise specified by the -d option.
If ‘-’ is specified for one or more of the input files, the standard input is used; standard input is read one line at a time,
circularly, for each instance of ‘-’.

On red hat linux I just use echo :
echo $(cat /some/file/name)
This gives me all records of a file on just one line.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

UNIX Shell Script remove one column from the file - linux

Replace the colon with a pipe, do your cut command, then replace the first pipe with a colon again: sed 's/:/|/' input.txt | cut ... | sed 's/|/:/' You may need to adjust the column number for the cut command, to ensure you don't count the header.

Your problem is that HeaderX are followed by ':' which is not the '|' delimiter you use in cut. You could separate first your lines in two parts with :, with something like "cut -f 1 --delimiter=: YOURFILE", then remove the first column and then put back the headers.

awk can handle multiple delimiters. So another alternative is... jkern#ubuntu:~/scratch$ cat ./data188 Header1:value1|value2|value3| Header2:value4|value5|value6| jkern#ubuntu:~/scratch$ awk -F"[:|]" '{ print $1 $3 $4 }' ./data188 Header1value2value3 Header2value5value6

you can do it just with sed without cut: sed 's/:[^|]*|/:/' input.txt

$ cat file.txt | grep 'Header1' | awk -F"1" '{ print $1 $2 $3 $4}' This will print all values in separate columns. You can print any number of columns.

Related

Simplest way to replace (\n) with (" ") in bash?

String split and extract the last field in bash

grep with two or more words, one line by file with many files

search for a string and after getting result cut that word and store result in variable

How to concatenate multiple lines of output to one line?

Categories

Resources