How to remove words from a file in UNIX? - linux

first file of information page
name/joe/salary1 50 10 2
name/don/miles2
20 4 3
name/sam/lb3 0 200 50
can some one please tell me how can I remove all the words in the above file, so my output will looks as follows
50 10 2
20 4 3
0 200 50

Use awk instead. The following code says to go through each field, check if its an integer. If it is, print them out. No need complicated regex.
$ awk '{for(i=1;i<=NF;i++) if($i+0==$i) {printf $i" "} print ""}' file
50 10 2
20 4 3
0 200 50

sed -e "s/[a-zA-Z/]/ /g" file
will do it, though I like codaddict's way more if you want to preserver number and whitespace. This way strips out all letters and the '/' symbol, replacing them all with space.
If you want to modify the file in place, pass the -i switch. This command will output what the file would look like.

Looks like you want to preserve only the digits and the space. If yes, you can do:
sed 's/[^0-9 ]//g' inputFile
EDIT: Change in requirements, if a digit is found with a letter, it should be treated as part of the word.
This Perl script does it:
perl -ne 's/(?:\d*[a-z\/]+\d*)*//g;print' input

If your file has this structure, I suggest first to filter out the first line, then remove all characters from beginning of line up to the first space:
sed -ni '2,$s/^[^ ]*//p' file

Remove everything on each line until first space character (also removes leading spaces):
sed 's/\S*\s*//' file

Related

Bash script for replacing texts

I have a file.txt file as in the following example:
Part 1
Some texts #abc d#ae}gd1 l2#4.
Part 2
Some texts again #efd de#gm}dg 12#a.
I want "#" is replaced by "hi" in the whole file, however in part 2 I also want to put the part from # up to the first character that's not in 0-9A-Za-z inside "check{ }".
So this is the output:
Part 1
Some texts hiabc dhiae}gd1 l2hi4.
Part 2
Some texts again check{hiefd} decheck{higm}}dg 12check{hia}.
I only knew how to replace # by "hi" in the whole:
awk '{gsub(/#/,"hi")} 1' file.txt > output.txt ;
It's really difficult for me to find a way to handle the requirement in part 2.
Thank for any help.
I wanted to find a solution for this.
This sed command should do the trick:
sed '/Part 1/,/Part 2/s/#/hi/g
/Part 2/,$s/#\([0-9A-Za-z]*\)/check{hi\1}/g
' file

Sed/AWK search/replace string after every matched pattern

I have data in variable like json -
v={k1:v1,k2:v2,k3:v3,k4:v4,k5:v5,k6:v6,k7:v7,k8:v8};
where key and value could be any values.
I need to split this into multiple lines after every 10 character..which i did by
echo "${v}" | sed -r 's/.{10}/&\n/g'
This does the split as per sed . But now i need to make sure split should happen only after comma character found after every 10 characters...so that out put should have meaningful lines ..
output should be ..
k1:v1,k2:v2,
.....
Whole idea is not break lines in between
Thanks
You may use
sed -r 's/.{10}[^,]*,/&\n/g'
See the sed demo online.
The .{10}[^,]*, pattern matches
.{10} - any 10 chars
[^,]* - 0 or more chars other than ,
, - a comma.
The &\n replacement pattern replaces with the whole match (&) and appends a newline to it.
If actually you just want to add a newline after every 2nd comma then that's this in GNU sed and some other seds:
$ echo "$v" | sed 's/,[^,]*,/&\n/g'
k1:v1,k2:v2,
k3:v3,k4:v4,
k5:v5,k6:v6,
k7:v7,k8:v8
or this for portability across all seds in all shells:
sed 's/,[^,]*,/&\
/g'
or this using any awk:
awk '{gsub(/,[^,]*,/,"&\n")}1'
Sorry and thanks for feed back here is the expected output-
{k1:v1,k2:v2,
k3:v3,k4:v4,
k5:v5,k6:v6,
k7:v7,k8:v8}
Rule should be look for first 10 characters then first comma when found add a new line after that pattern.
Wiktor code snippt does this actually. However , i am happy to see to achieve this with other ways. Thanks all.

Replace a line with a number if part of it is matches using sed

I know this is a very simple question and been discussed many times, but I can't understand where I am doing wrong in my command.
I would like to replace the lines which starts with "It" as 99999. Each row starts with several blank spaces.
infile.txt
3
2
3
4
It is not a number = /home/kayan/data
3
5
It is not a number = /home/kayan/data
4
5
I used
sed -i 's/^I/99999/g' infile.txt
But it is not working.
Due to starting space, add it to pattern search
sed -i 's/^[[:blank:]]*I.*/99999/' infile.txt
using the change function
sed -i '/^[[:blank:]]*I/ c\
9999' infile.txt
keeping starting space
sed -i 's/^\([[:blank:]]*\)I.*/\199999/' infile.txt
No need of the g, there is only 1 change per line possible
give this a try:
sed -i 's/^\s*It.*/9999/'
What you are replacing there is just the ^I part, i.e. the first letter. Use ^I.* instead to match the whole remaining line and it also gets replaced.

Print first N words of a file

Is there any way to print the first N words of a file? I've tried cut but it reads a document line-by-line. The only solution I came up with is:
sed ':a;N;$!ba;s/\n/δ/g' file | cut -d " " -f -20 | sed 's/δ/\n/g'
Essentially, replacing newlines with a character that doesn't not exist in the file, applying "cut" with space as delimiter and then restoring the newlines.
Is there any better solution?
You could use awk to print the first n words:
$ awk 'NR<=8{print;next}{exit}' RS='[[:blank:]]+|\n' file
This would print the first 8 words. Each word is output on a separate line, are you looking to keep the original format of the file?
Edit:
The following will preserve the original format of the file:
awk -v n=8 'n==c{exit}n-c>=NF{print;c+=NF;next}{for(i=1;i<=n-c;i++)printf "%s ",$i;print x;exit}' file
Demo:
$ cat file
one two
thre four five six
seven 8 9
10
$ awk -v n=8 'n==c{exit}n-c>=NF{print;c+=NF;next}{for(i=1;i<=n-c;i++)printf "%s ",$i;print x;exit}' file
one two
thre four five six
seven 8
A small caveat: if the last line printed doesn't use a single space as a separator this line will lose it's formatting.
$ cat file
one two
thre four five six
seven 8 9
10
# the 8th word fell on 3rd line: this line will be formatted with single spaces
$ awk -v n=8 'n==c{exit}n-c>=NF{print;c+=NF;next}{for(i=1;i<=n-c;i++)printf "%s ",$i;print x;exit}' file
one two
thre four five six
seven 8
Assuming words are non-white space separated by white space, you can use tr to convert the document to one-word-per-line format and then count the first N lines:
tr -s ' \011' '\012' < file | head -n $N
where N=20 or whatever value you want for the number of words. Note that tr is a pure filter; it only reads from standard input and only writes to standard output. The -s option 'squeezes' out duplicate replacements, so you get one newline per sequence of blanks or tabs in the input. (If there is leading white space in the file, you get an initial blank line. There are various ways to deal with that, such as grab the first N+1 lines out output after all, or filter out all blank lines.)
Using GNU awk so we can set the RS to a regexp and access the matching string using RT:
$ cat file
the quick
brown fox jumped over
the
lazy
dog's back
$ gawk -v c=3 -v RS='[[:space:]]+' 'NR<=c{ORS=(NR<c?RT:"\n");print}' file
the quick
brown
$ gawk -v c=6 -v RS='[[:space:]]+' 'NR<=c{ORS=(NR<c?RT:"\n");print}' file
the quick
brown fox jumped over
$ gawk -v c=9 -v RS='[[:space:]]+' 'NR<=c{ORS=(NR<c?RT:"\n");print}' file
the quick
brown fox jumped over
the
lazy
dog's
Why not try turning your words into lines, and then just using head -n 20 instead?
For example:
for i in `cat somefile`; do echo $i; done | head -n 20
It's not elegant, but it does have considerably less line-noise regex.
One way with perl:
perl -lane 'push #a,#F;END{print "#a[0..9]"}' file
Note: indexing starts at zero so the example will print the first ten words. The words will be printed on a single line separated by a single space.

Awk or shell script for executing following program

mansa, amit, janani ,[rakesh]
aruna,mahesh,,prathiksha
This is my input.
I need a shell script or a awk command that gives me output in following manner
mansa
amit
janani
rakesh
aruna
mahesh
prathiksha
The script should remove all ,'s brackets.
I tried this
awk -F "\[\][,]+" '{for(i=1;i<=NF;i++){print $i}}'
but its printing one extra line after each record.
Easier with grep:
$ grep -o '[a-z]\+' file
mansa
amit
janani
rakesh
aruna
mahesh
prathiksha
Another option might be tr:
tr -cs '[:alpha:]' '[\n*]' < file
Although it would create empty lines if there is leading whitespace, which could then be filtered out:
tr -cs '[:alpha:]' '[\n*]' < file | awk NF
Assuming you only want to remove square brackets and split the items from within comma delimeters, you could use the following:
perl -pe 's/,+/,/g ; s/[\[\]]//g ; s/\s*,\s*/\n/g' foo.txt
The reason I recommend this approach is in the event that your named values have numbers in them or other non-alpha characters you may want to preserve.
The perl expression above contains 3 regular expressions. First part reduces multiple commas into one (to avoid empty values between commas. The second part removes the square braces. The third part splits values by replacing commas (with whitespace on either left or right) with newlines.
Output would be as follows:
mansa
amit
janani
rakesh
aruna
mahesh
prathiksha

Resources