Cut number from string - string

I want to cut several numbers from a .txt file to add them later up. Here is an abstract from the .txt file:
anonuser pts/25 127.0.0.1 Mon Nov 16 17:24 - crash (10+23:07)
I want to get the "10" before the "+" and I only want the number, nothing else. This number should be written to another .txt file. I used this code, but it only works if the number has one digit:
awk ' /^'anonuser' / {split($NF,k,"[(+0:)][0-9][0-9]");print k[1]} ' log2.txt > log3.txt

With GNU grep:
grep -Po '\(\K[^+]*' file > new_file
Output to new_file:
10
See: PCRE Regex Spotlight: \K

What if you use the match() function in awk?
$ awk '/^anonuser/ && match($NF,/^\(([0-9]*)/,a) {print a[1]}' file
10
How does this work?
/^anonuser/ && match() {print a[1]} if the line starts with anonuser and the pattern is found, print it.
match($NF,/^\(([0-9]*)/,a) in the last field ((10+23:07)), look for the string ( + digits and capture these in the array a[].
Note also that this approach allows you to store the values you capture, so that you can then sum them as you indicate in the question.

The following uses the same approach as the OP, and has a couple of advantages, e.g. it does not require anything special, and it is quite robust (with respect to assumptions about the input) and maintainable:
awk '/^anonuser/ {split($NF,k,/+/); gsub(/[^0-9]/,"",k[1]); print k[1]}'

for anything more complex use awk but for simple task sed is easy enough
sed -r '/^anonuser/{s/.*\(([0-9]+)\+.*/\1/}'
find the number between a ( and + sign.

I am not sure about the format in the file.
Can you use simple cut commands?
cut -d"(" -f2 log2.txt| cut -d"+" -f1 > log3.txt

Related

How To Delete All Words Before X Characters

I'm using code from this question How To Delete All Words After X Characters and I'm having a trouble keeping (not deleting) all the words after 30 characters.
Original code:
awk 'BEGIN{FS=OFS="" } length>30{i=30; while($i~/\w/) i++; NF=i-1; }1'
My attempt:
awk 'BEGIN{FS=OFS="" } length>30{i=30; while($i~/\w/) i++; NF=i+1; }1'
Basically, I understand I need to change the NF which was NF=i-1 so I tried changing it to NF=i+1 but obviously I'm only getting one field. How can I specify NF to print the rest of the line?
Sample data:
StackOverflow Users Are Brilliant And Hard Working
#character 30 ---------------^
Desired output:
And Hard Working
If you could please help me keep the rest of the line by using NF, I would really appreciate your positive input and support.
It is much easier using gnu grep:
grep -oP '^.{30}\w*\W*\K.*' file
And Hard Working
Where \K is used for reseting matched information.
RegEx Breakup:
^: Start
.{30}: Match first 30 characters
\w*: followed by 0 or more word characters
\W*: followed by 0 or more non-word characters
\K: reset matched information so far
.*: Match anything after this position
Using awk you can use this solution:
awk '{sub(/^.{30}[_[:alnum:]]*[[:blank:]]*/, "")} 1' file
And Hard Working
Finally a sed solution:
sed -E 's/^.{30}[_[:alnum:]]*[[:blank:]]*//' file
And Hard Working
another awk
awk '{print substr($0, index(substr($0,30),FS)+30)}'
find the delimiter index after the 30th char, take a substring from that index on.
I can't imagine why your considering anything related to NF for this since you're not doing anything with fields, you're just splitting each line at a blank char. It sounds like this is all you need for both questions, using GNU awk for gensub():
$ awk '{print gensub(/(.{30}\S*)\s+(.*)/,"\\1",1)}' file
StackOverflow Users Are Brilliant
$ awk '{print gensub(/(.{30}\S*)\s+(.*)/,"\\2",1)}' file
And Hard Working
or it's briefer using GNU sed:
$ sed -E 's/(.{30}\S*)\s+(.*)/\1/' file
StackOverflow Users Are Brilliant
$ sed -E 's/(.{30}\S*)\s+(.*)/\2/' file
And Hard Working
With the use of NF, you can try
awk '{for(i=1;i<=NF;i++){a+=length($i)+1;if(a>30){for(j=i+1;j<=NF;j++)b=b $j" ";print b;exit}}}'
cut -c30- file | cut -d' ' -f2-
this will keep only the words that start after 30th character (index >= 31)

Using Sed or Awk to divide a file into two based on whether a line contains a numeric value

I have used sed and awk for little while now, but I am having a challenge with the below problem. I am asking for an experienced sed/awk guru to help.
I have a file where some lines have numbers and some lines do not, like:
afjjdjfj.uihuihi
trfg.rtyhd
0rtgfd.tjbghhh
hbvfd4.rtgbvdgf
00fhfg.fdrgf
rtygfd.ijhniuh
etc.
I would like to have exactly two files out of this one, where every line is represented in one of the two files (none are deleted).
One containing all lines with any numbers 0-9 on them so given above file result would be:
0rtgfd.tjbghhh
hbvfd4.rtgbvdgf
00fhfg.fdrgf
and another file containing the rest of the lines that do not have any numbers 0-9 on them, so given the above, file it would be:
afjjdjfj.uihuihi
trfg.rtyhd
rtygfd.ijhniuh
I've tried different strategies in both sed and awk and nothing is giving me exactly what I need.
What would be the best sed or awk one liner to solve this problem?
Thank you for your time,
Tom
Easily with Awk:
awk '/[0-9]/{print > file1; next} {print > file2}' inputfile
With single GNU sed command:
sed -ne '/[0-9]/w with_digits.txt' -e '//!w no_digits.txt' input
Results:
> cat no_digits.txt
afjjdjfj.uihuihi
trfg.rtyhd
rtygfd.ijhniuh
> cat with_digits.txt
0rtgfd.tjbghhh
hbvfd4.rtgbvdgf
00fhfg.fdrgf
w filename Write the pattern space to filename.
If you don't mind running twice over the input, you can use just grep:
grep '[0-9]' input > with_digits
grep -v '[0-9]' input > without_digits
perl -MFile::Slurp -lpe '/\d/ ? append_file("digits.txt",$_) : append_file("no_digits.txt",$_)' input.txt

how to cut CSV file

I have the following CSV file
more file.csv
Number,machine_type,OS,Version,Mem,CPU,HW,Volatge
1,HG652,linux,23.12,256,III,LOP90,220
2,HG652,linux,23.12,256,III,LOP90,220
3,HG652,SCO,MK906G,526,1G,LW1005,220
4,HG652,solaris,1172,1024,2Core,netra,220
5,HG652,solaris,1172,1024,2Core,netra,220
Please advice how to cut CSV file ( by cut or sed or awk command )
in order to get a partial CSV file
Command need to get value that represent the fields that we want to cut from the CSV
According to example 1 ( value should be 6 )
Example 1
on this example we cut the 6 fields from left to right , ( in this case CSV will look like this )
Number,machine_type,OS,Version,Mem,CPU
1,HG652,linux,23.12,256,III
2,HG652,linux,23.12,256,III
3,HG652,SCO,MK906G,526,1G
4,HG652,solaris,1172,1024,2Core
5,HG652,solaris,1172,1024,2Core
cut is your friend:
$ cut -d',' -f-6 file
Number,machine_type,OS,Version,Mem,CPU
1,HG652,linux,23.12,256,III
2,HG652,linux,23.12,256,III
3,HG652,SCO,MK906G,526,1G
4,HG652,solaris,1172,1024,2Core
5,HG652,solaris,1172,1024,2Core
Explanation
-d',' set comma as field separator
-f-6 print up to the field number 6 based on that delimiter. It is equivalent to -f1-6, as 1 is default.
Also awk can make it, if necessary:
$ awk -v FS="," 'NF{for (i=1;i<=6;i++) printf "%s%s", $i, (i==6?RS:FS)}' file
Number,machine_type,OS,Version,Mem,CPU
1,HG652,linux,23.12,256,III
2,HG652,linux,23.12,256,III
3,HG652,SCO,MK906G,526,1G
4,HG652,solaris,1172,1024,2Core
5,HG652,solaris,1172,1024,2Core
the cut commandline is rather simple and well suited in your case:
cut -d, -f1-6 yourfile
So everybody agrees to say that the cut way is the best way to go in this case. But we can also talk about the awk solution, and there I may point out that in fedorqui's answer, a clever trick is used to silence empty lines (NF as a selection pattern), but it has the disadvantage of e.g. removing blank lines from the original file. I propose below another solution (en passant, using the -F option instead of the variable passing mechanism on FS that preserves any empty line and also respects lines with less than 6 fields, e.g. prints these lines without adding extra commas there:
awk -F, '{min=(NF>6?6:NF); for (i=1;i<=min-1;i++) printf "%s,", $i; printf "%s\n", $6}' yourfile
This works nicely because printf-ing $6 is never an error, even in case the line has less than 6 fields. This is true with my gawk 4.0.1, at least...

How to display the first word of each line in my file using the linux commands?

I have a file containing many lines, and I want to display only the first word of each line with the Linux commands.
How can I do that?
You can use awk:
awk '{print $1}' your_file
This will "print" the first column ($1) in your_file.
Try doing this using grep :
grep -Eo '^[^ ]+' file
try doing this with coreutils cut :
cut -d' ' -f1 file
I see there are already answers. But you can also do this with sed:
sed 's/ .*//' fileName
The above solutions seem to fit your specific case. For a more general application of your question, consider that words are generally defined as being separated by whitespace, but not necessarily space characters specifically. Columns in your file may be tab-separated, for example, or even separated by a mixture of tabs and spaces.
The previous examples are all useful for finding space-separated words, while only the awk example also finds words separated by other whitespace characters (and in fact this turns out to be rather difficult to do uniformly across various sed/grep versions). You may also want to explicitly skip empty lines, by amending the awk statement thus:
awk '{if ($1 !="") print $1}' your_file
If you are also concerned about the possibility of empty fields, i.e., lines that begin with whitespace, then a more robust solution would be in order. I'm not adept enough with awk to produce a one-liner for such cases, but a short python script that does the trick might look like:
>>> import re
>>> for line in open('your_file'):
... words = re.split(r'\s', line)
... if words and words[0]:
... print words[0]
...or on Windows (if you have GnuWin32 grep) :
grep -Eo "^[^ ]+" file

Count the number of occurrences in a string. Linux

Okay so what I am trying to figure out is how do I count the number of periods in a string and then cut everything up to that point but minus 2. Meaning like this:
string="aaa.bbb.ccc.ddd.google.com"
number_of_periods="5"
number_of_periods=`expr $number_of_periods-2`
string=`echo $string | cut -d"." -f$number_of_periods`
echo $string
result: "aaa.bbb.ccc.ddd"
The way that I was thinking of doing it was sending the string to a text file and then just greping for the number of times like this:
grep -c "." infile
The reason I don't want to do that is because I want to avoid creating another text file for I do not have permission to do so. It would also be simpler for the code I am trying to build right now.
EDIT
I don't think I made it clear but I want to make finding the number of periods more dynamic because the address I will be looking at will change as the script moves forward.
If you don't need to count the dots, but just remove the penultimate dot and everything afterwards, you can use Bash's built-in string manuipulation.
${string%substring}
Deletes shortest match of $substring from back of $string.
Example:
$ string="aaa.bbb.ccc.ddd.google.com"
$ echo ${string%.*.*}
aaa.bbb.ccc.ddd
Nice and simple and no need for sed, awk or cut!
What about this:
echo "aaa.bbb.ccc.ddd.google.com"|awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
(further shortened by helpful comment from #steve)
gives:
aaa.bbb.ccc.ddd
The awk command:
awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
works by separating the input line into fields (FS) by ., then joining them as output (OFS) with ., but the number of fields (NF) has been reduced by 2. The final 1 in the command is responsible for the print.
This will reduce a given input line by eliminating the last two period separated items.
This approach is "shell-agnostic" :)
Perhaps this will help:
#!/bin/sh
input="aaa.bbb.ccc.ddd.google.com"
number_of_fields=$(echo $input | tr "." "\n" | wc -l)
interesting_fields=$(($number_of_fields-2))
echo $input | cut -d. -f-${interesting_fields}
grep -o "\." <<<"aaa.bbb.ccc.ddd.google.com" | wc -l
5

Resources