How do I get a string after the first occurrence of a number?
For example, I have a file with multiple lines:
34 abcdefg
10 abcd 123
999 abc defg
I want to get the following output:
abcdefg
abcd 123
abc defg
Thank you.
You could use Awk for this, loop through all the columns in each line upto NF (last column in each line) and once matching the first word, print the column next to it. The break statement would exit the for loop after the first iteration.
awk '{ for(i=1;i<=NF;i++) if ($i ~ /[[:digit:]]+/) { print $(i+1); break } }' file
It is not clear what you exactly want, but you can try to express it in sed.
Remove everything until the first digit, the next digits and any spaces.
sed 's/[^0-9]*[0-9]\+ *//'
Imagine the following two input files :
001 ham
03spam
3 spam with 5 eggs
A quick solution with awk would be :
awk '{sub(/[^0-9]*[0-9]+/,"",$0); print $1}' <file>
This line substitutes the first string of anything that does not contain a number followed by a number by an empty set (""). This way $0 is redefined and you can reprint the first field or the remainder of the field. This line gives exactly the following output.
ham
spam
spam
If you are interested in the remainder of the line
awk '{sub(/[^0-9]*[0-9]+ */,"",$0); print $0}' <file>
This will have as an output :
ham
spam
spam with 5 eggs
Be aware that an extra " *" is needed in the regular expression to remove all trailing spaces after the number. Without it you would get
awk '{sub(/[^0-9]*[0-9]+/,"",$0); print $0}' <file>
ham
spam
spam with 5 eggs
You can remove digits and whitespaces using sed:
sed -E 's/[0-9 ]+//' file
grep can do the job:
$ grep -o -P '(?<=[0-9] ).*' inputFIle
abcdefg
abcd 123
abc defg
For completeness, here is a solution with perl:
$ perl -lne 'print $1 if /[0-9]+\s*(.*)/' inputFIle
abcdefg
abcd 123
abc defg
Related
Just like this.
Before:
1
19:22
abcde
2
19:23
3
19:24
abbff
4
19:25
abbc
After:
1
19:22
abcde
3
19:24
abbff
4
19:25
abbc
I want remove the section having no alphabet like section 2.
I think that I should use perl or sed. But I don't know how to do.
I tried like this. But it didn't work.
sed 's/[0-9]\n[0-9]\n%s\n//'
sed is for doing s/old/new/ on individual lines, that is all. For anything else you should be using awk:
$ awk -v RS= -v ORS='\n\n' '/[[:alpha:]]/' file
1
19:22
abcde
3
19:24
abbff
4
19:25
abbc
The above is simply this:
RS= tells awk the input records are separated by blank lines.
ORS='\n\n' tells awk the output records must also be separated by blank lines.
/[[:alpha:]]/ searches for and prints records that contain alphabetic characters.
Simple enough in Perl. The secret is to put Perl in "paragraph mode" by setting the input record separator ($/) to an empty string. Then we only print records if they contain a letter.
#!/usr/bin/perl
use strict;
use warnings;
# Paragraph mode
local $/ = '';
# Read from STDIN a record (i.e. paragraph) at a time
while (<>) {
# Only print records that include a letter
print if /[a-z]/i;
}
This is written as a Unix filter, i.e. it reads from STDIN and writes to STDOUT. So if it's in a file called filter, you can call it like this:
$ filter < your_input_file > your_output_file
Alternatively this is a simple command line script in Perl (-00 is the command line option to put Perl into paragraph mode):
$ perl -00 -ne'print if /[a-z]/' < your_input_file > your_output_file
If there's exactly one blank line after each paragraph you can use a long awk oneliner (three patterns, so probably not a oneliner actually):
$ echo '1
19:22
abcde
2
19:23
3
19:24
abbff
4
19:25
abbc
' | awk '/[^[:space:]]/ { accum = accum $0 "\n" } /^[[:space:]]*$/ { if(on) print accum $0; on = 0; accum = "" } /[[:alpha:]]/ { on = 1 }'
1
19:22
abcde
3
19:24
abbff
4
19:25
abbc
The idea is to accumulate non-blank lines, setting flag once an alphabetical character found, and on a blank input line, flush the whole accumulated paragraph if that flag is set, reset accum to empty string and reset flag to zero.
(Note that if the last line of input is not necessarily empty you might need to add an END block that checks if currently there's a paragraph unflushed and flush it as needed.)
This might work for you (GNU sed):
sed ':a;$!{N;/^$/M!ba};/[[:alpha:]]/!d' file
Gather up lines delimited by an empty line or end-of-file and delete the latest collection if it does not contain an alpha character.
This presupposes that the file format is fixed as in the example. To be more accurate use:
sed -r ':a;$!{N;/^$/M!ba};/^[1-9][0-9]*\n[0-9]{2}:[0-9]{2}\n[[:alpha:]]+\n?$/!d' file
Similar to the solution of Ed Morton but with the following assumptions:
The text blocks consist of 2 or 3 lines.
If there is a third line, it contains characters from any alphabet.
In essence, under these conditions we only need to check for a third field:
awk 'BEGIN{RS=;ORS="\n\n";FS="\n"}(NF<3)' file
or similar without BEGIN:
awk -v RS= -v ORS='\n\n' -F '\n' '(NF<3)' file
I have a requirement to print two different words in alternative white spaces in the file.
For example,
ABCD
EFGH
IGKL
MNOP
The above scenario, I want print ab and /ab alternatively like below:
ab
ABCD
/ab
ab
EFGH
/ab
ab
IGKL
/ab
ab
MNOP
/ab
*I want this one by one in a line by line format(Not horizontal format).*I know sed 's|^[[:blank:]]*$|</ab>|' this command is almost near to my case. But I don't know how to apply this. Please, someone, help me.
With gnu sed
sed -e 'i\ab' -e 'a\/ab' infile
How this work ?
On each line
first insert ab before with 'i\ab'
next append /ab after with 'a\/ab'
You must use 2 separates commands with '-e' to do that.
You can't use sed 'i\ab;a\/ab' because the first command i (insert) don't know where end the text to insert and get all the line.
So the inserted text is ab;a/ab before each line.
Another way to do that with all sed is
sed -e 'i\
ab
a\
/ab' infile
If you are ok with awk then following may help you here.
awk -v start="ab" -v end="/ab" '{print start ORS $0 ORS end}' Input_file
In case you need to save output into Input_file itself then append > temp_file && mv temp_file Input_file in above code too.
Is there a way to delete all the characters up to and including the first occurrence of a certain character?
123:abc
12:cba
1234:cccc
and the output would be:
abc
cba
cccc
Using sed:
sed 's/^[^:]*://' file
abc
cba
cccc
Or using awk:
awk -F: '{print $2}' file
abc
cba
cccc
You could use cut:
$ cut -d":" -f2- myfile.txt
use awk
echo "123:abc" | awk -F ":" '{print $2}'
-F means to use : as the separator to split the string.
{print $2} means to print the second substring.
If the data is in a variable, you can use parameter expansion:
$ var=123:abc
$ echo ${var#*:}
abc
$
The # means to remove the shortest pattern of *: (anything followed by a colon) from the front of the string, as you said in your requirement "delete all the characters up to the first occurrence of certain character + that character", not to get the second field where the delimiter is the colon.
I have a file "fruit.xml" that looks like the below:
FRUIT="Apples"
FRUIT="Bananas"
FRUIT="Peaches"
I want to use a single SED line command to find all occurrences of NAME=" and I want strip the value between the "" from all the matches found.
So the result should look like:
Apples
Bananas
Peaches
This is the command I am using:
sed 's/.*FRUIT="//' fruit.xml
The problem is that it leaves the last " at the end of the value I need. eg: Apples".
Just catch the group and print it back: catch everything from " until another " is found with the () (or \(...\) if you don't use the -r option). Then, print it back with \1:
$ sed -r 's/.*FRUIT="([^"]*)"/\1/' file
Apples
Bananas
Peaches
You can also use field separators with awk: tell awk that your field separators are either FRUIT=" or ". This way, the desired content becomes the 2nd field.
$ awk -FS='FRUIT="|"' '{print $2}' file
Apples
Bananas
Peaches
To make your command work, just strip the " at the end of the line:
$ sed -e 's/.*FRUIT="//' -e 's/"$//' file
^^ ^^^^^^^^^^^
| replace " in the end of line with nothing
-e to allow you use multiple commands
This would be enough if you want to keep the leading spaces,
sed 's/\bFRUIT="\([^"]*\)"/\1/' fruit.xml
OR
sed 's/\bFRUIT="\|"//g' fruit.xml
Try this, this replaces the line with the founded fruit in the quotes:
sed 's/.*FRUIT="\(.*\)"/\1/' test.xml
Use a simple cut command
cut -d '"' -f2 fruits.xml
Output:
Apples
Bananas
Peaches
assuming 1 occurence per value and with this format
sed 's/.*="//;s/".*$//' fruit.xml
Right now I have a bash shell script that takes the input of a text file with the syntax for example, "Smith, Bob". The end goal is to take the first letter of the first name and append the first 7 characters of the last name. I am currently in a pickle.
echo "Extracting first letter"
cut -d "," -f2 $1 > first.txt
cut -b2 first.txt > second.txt
echo "First letter extracted"
echo "Extracting 7 characters"
cut -d "," -f1 $1 > letters.txt
cat second.txt | tr '[:upper:]' '[:lower:]' > lowernames.txt
I have two files, one with the first letter, the other with the first 7 characters, but can't combine the two. Any suggestions?
Here are three solutions, one using sed, one using awk, and one using python:
Using sed
Here is a sed solution. Using the same test file as sehe:
$ cat file
Smith, Bob
Doe, John
Snow, John
Pattitucci, John
$ sed -E 's/([^,]{1,7})[^,]*,\s*(\S).*/\2\1/' file
BSmith
JDoe
JSnow
JPattitu
How it works
The idea is to capture the first 7 letters of the last name to group 1 and the first letter of the last name to group 2. The regex to do that consists of the following parts:
([^,]{1,7})
This captures up to seven characters of the last name.
`[^,]*,
This matches any characters after the first seven of the last name and the comma which follows.
\s*
This matches any spaces which follow the comma
(\S)
This matches the first character of the first name
.*
This matches any remaining characters of the first name.
Using awk
$ awk -F', *' '{print substr($2,1,1) substr($1,1,7)}' file
BSmith
JDoe
JSnow
JPattitu
How it works
-F', *'
This declares the field separator to be a comma followed by zero or more spaces
substr($1,1,7)
This selects the first seven characters of the last name
substr($2,1,1)
This selects the first character of the first name
Using python
$ python3 -c 'for line in open("file"): last, first=line.strip().split(", "); print(first[:1] + last[:7])'
BSmith
JDoe
JSnow
JPattitu
You can do this without any external process:
while read surname firstname
do
surname="${surname%,}"
echo "${firstname:0:1}${surname:0:7}"
done
See it Live On IdeOne
Input
Smith, Bob
Doe, John
Snow, John
Pattitucci, John
Output
BSmith
JDoe
JSnow
JPattitu
using awk :
awk -F ', ' '{printf("%s%s\n",substr($2,1,1),subsstr($1,1,7))}' file
input:
Smith, Bob
Doe, John
Snow, John
Pattitucci, John
output:
BSmith
JDoe
JSnow
JPattitucci
the input text is splited on ', ' and substr will extract the 1st character of 2nd field