How to split a line on Thorn character 'þ' in linux? - linux

How to split a line on Thorn character 'þ' in linux ?
I have tried the following
awk -F 'þ' '{print $2}'
awk -F '\xC3\xBE\x02' '{print $2}'
awk -F 'þ' '{print $2}'
nothing worked.
EDIT:
The file is located in HDFS(Hadoop File System) path the following command works on command line but not in shell script (when shell script is executed, it gives an empty output ie thorn char is not recognized!!
Command line:
~/etltestsar/DoubleClick$ hadoop fs -cat /raw/doubleclick/data/dt=2015-03-30/NetworkMatchtablesActivity_7657_03-30-2015_advertiser.log.gz|gunzip|tail -n +2|awk -F 'þ' '
Warning: $HADOOP_HOME is deprecated.
3848762
3963771
4112862
4140939
4199580
4199584
.....
Same command in shell script produces no output
hadoop#node28-19-88:~/etltestsar/DoubleClick$ sh testthorn.sh
Warning: $HADOOP_HOME is deprecated.

Get a different awk? GNU awk 4.1.1 in bash 4.1.17(9) on cygwin:
$ cat file
fooþbar
$ awk -F 'þ' '{print $2}' file
bar

Related

How to replace one or more consecutive symbols with one symbol in shell

I have a file containing consecutive symbols (as pipe "|") like
ANKRD54,LIAR,allergy,|||
ANKRD54,LIAR,asthma,||20447076||
ANKRD54,LIAR,autism,||||
ANKRD54,LIAR,cancer,|||
ANKRD54,LIAR,chronic_obstructive_pulmonary_disease,|||
ANKRD54,LIAR,dental_caries,||||
Now using shell or a sed command in shell is it possible to replace multiple pipe with one pipe like
ANKRD54,LIAR,allergy,|
ANKRD54,LIAR,asthma,|20447076|
ANKRD54,LIAR,autism,|
ANKRD54,LIAR,cancer,|
ANKRD54,LIAR,chronic_obstructive_pulmonary_disease,|
ANKRD54,LIAR,dental_caries,|
I guess the easiest way is use built-in commands: cat your_file | tr -s '|'
Pass your text to sed (e.g. via a pipe)
cat your_file | sed "s/|\+/|/g"
You can do that with a simple awk gsub as:-
awk -F"," -v OFS="," '{gsub(/[|]+/,"|",$4)}1' file
See it in action:-
$ cat file
ANKRD54,LIAR,allergy,|||
ANKRD54,LIAR,asthma,||20447076||
ANKRD54,LIAR,autism,||||
ANKRD54,LIAR,cancer,|||
ANKRD54,LIAR,chronic_obstructive_pulmonary_disease,|||
ANKRD54,LIAR,dental_caries,||||
$ awk -F"," -v OFS="," '{gsub(/[|]+/,"|",$4)}1' file
NKRD54,LIAR,allergy,|
ANKRD54,LIAR,asthma,|20447076|
ANKRD54,LIAR,autism,|
ANKRD54,LIAR,cancer,|
ANKRD54,LIAR,chronic_obstructive_pulmonary_disease,|
ANKRD54,LIAR,dental_caries,|

Linux cut string

In Linux (Cento OS) I have a file that contains a set of additional information that I want to removed. I want to generate a new file with all characters until to the first |.
The file has the following information:
ALFA12345|7890
Beta0-XPTO-2|30452|90 385|29
ZETA2334423 435; 2|2|90dd5|dddd29|dqe3
The output expected will be:
ALFA12345
Beta0 XPTO-2
ZETA2334423 435; 2
That is removed all characters after the character | (inclusive).
Any suggestion for a script that reads File1 and generates File2 with this specific requirement?
Try
cut -d'|' -f1 oldfile > newfile
And, to round out the "big 3", here's the awk version:
awk -F\| '{print $1}' in.dat
You can use a simple sed script.
sed 's/^\([^|]*\).*/\1/g' in.dat
ALFA12345
Beta0-XPTO-2
ZETA2334423 435; 2
Redirect to a file to capture the output.
sed 's/^\([^|]*\).*/\1/g' in.dat > out.dat
And with grep:
$ grep -o '^[^|]*' file1
ALFA12345
Beta0-XPTO-2
ZETA2334423 435; 2
$ grep -o '^[^|]*' file1 > file2

Grep - returning both the line number and the name of the file

I have a number of log files in a directory. I am trying to write a script to search all the log files for a string and echo the name of the files and the line number that the string is found.
I figure I will probably have to use 2 grep's - piping the output of one into the other since the -l option only returns the name of the file and nothing about the line numbers. Any insight in how I can successfully achieve this would be much appreciated.
Many thanks,
Alex
$ grep -Hn root /etc/passwd
/etc/passwd:1:root:x:0:0:root:/root:/bin/bash
combining -H and -n does what you expect.
If you want to echo the required informations without the string :
$ grep -Hn root /etc/passwd | cut -d: -f1,2
/etc/passwd:1
or with awk :
$ awk -F: '/root/{print "file=" ARGV[1] "\nline=" NR}' /etc/passwd
file=/etc/passwd
line=1
if you want to create shell variables :
$ awk -F: '/root/{print "file=" ARGV[1] "\nline=" NR}' /etc/passwd | bash
$ echo $line
1
$ echo $file
/etc/passwd
Use -H. If you are using a grep that does not have -H, specify two filenames. For example:
grep -n pattern file /dev/null
My version of grep kept returning text from the matching line, which I wasn't sure if you were after... You can also pipe the output to an awk command to have it ONLY print the file name and line number
grep -Hn "text" . | awk -F: '{print $1 ":" $2}'

Is there any equivalent command grep -nP "\t" some_file , using sed or awk

I am trying to find the occurance of tab in a file some_file and print those line with leading line number.
grep -nP "\t" some_file works well for me but I want sed or awk equivalent command for the same.
To emulate: grep -nP "\t" file.txt
Here's one way using GNU awk:
awk '/\t/ { print NR ":" $0 }' file.txt
Here's one way using GNU sed:
< file.txt sed -n '/\t/{ =;p }' | sed '{ N;s/\n/:/ }'
Well, you can always do it in sed:
cat -n test.txt | sed -n "/\t/p"
Unfortunately, sed can only print line numbers to stdout with a new line, so in any case, more than one command is necessary. A more lengthy (unnecessary so) version of the above, but one only using sed, would be:
sed = test.txt | sed -n "N;s/\n/ /;/\t/p"
but I like the one with cat more. CATS ARE NICE.

awk script header: #!/bin/bash or #!/bin/awk -f?

In an awk file, e.g example.awk, should the header be #!/bin/bash or #!/bin/awk -f?
The reason for my question is that if I try this command in the console I receive the correct file.txt with "line of text":
awk 'BEGIN {print "line of text"}' >> file.txt
but if i try execute the following file with ./example.awk:
#! /bin/awk -f
awk 'BEGIN {print "line of text"}' >> file.txt
it returns an error:
$ ./awk-usage.awk
awk: ./awk-usage.awk:3: awk 'BEGIN {print "line of text"}' >> file.txt
awk: ./awk-usage.awk:3: ^ invalid char ''' in expression
If I change the header to #!/bin/bash or #!/bin/sh it works.
What is my error? What is the reason of that?
Since you explicitly run the awk command, you should use #!/bin/bash. You can use #!/bin/awk if you remove the awk command and include only the awk program (e.g. BEGIN {print "line of text"}), but then you need to append to file using awk syntax (print ... >> file).
awk -f takes a file containing the awk script, so that is completely wrong here.
Your script is a shell script that happens to contains an awk command.
#! /bin/sh tells your shell to execute the file as a shell command with /bin/sh - and it is a shell command. If you replace that with #! /bin/awk -f then the file is executed with awk, basically the same as executing
/bin/awk -f awk 'BEGIN {print "line of text"}' >> file.txt

Resources