How to extract a word between two symbols in shell script

How to extract a word between two symbols in shell script - search

I have a line
MOD record for (Batch Upload Base Interest Slab Version: Online Services) Appl: GU
I need to print Batch Upload Base Interest Slab Version .
Starting string (
End string :
How do i achieve this. Pls help

awk can make it
awk -F"[(:]" '{print $2}'
-F"[(:]" sets the delimiters to be either : or ( (happy face!).
Test
$ echo "MOD record for (Batch Upload Base Interest Slab Version: Online Services) Appl: GU" | awk -F"[:(]" '{print $2}'
Batch Upload Base Interest Slab Version

Try this....
$ input=(Batch Upload Base Interest Slab Version: Online Services) Appl:
$ startString=${x#*(}
$ outputstring=${y%:*}
$ echo $outputstring

Related

How To Delete All Words Before X Characters

I'm using code from this question How To Delete All Words After X Characters and I'm having a trouble keeping (not deleting) all the words after 30 characters.
Original code:
awk 'BEGIN{FS=OFS="" } length>30{i=30; while($i~/\w/) i++; NF=i-1; }1'
My attempt:
awk 'BEGIN{FS=OFS="" } length>30{i=30; while($i~/\w/) i++; NF=i+1; }1'
Basically, I understand I need to change the NF which was NF=i-1 so I tried changing it to NF=i+1 but obviously I'm only getting one field. How can I specify NF to print the rest of the line?
Sample data:
StackOverflow Users Are Brilliant And Hard Working
#character 30 ---------------^
Desired output:
And Hard Working
If you could please help me keep the rest of the line by using NF, I would really appreciate your positive input and support.

It is much easier using gnu grep:
grep -oP '^.{30}\w*\W*\K.*' file
And Hard Working
Where \K is used for reseting matched information.
RegEx Breakup:
^: Start
.{30}: Match first 30 characters
\w*: followed by 0 or more word characters
\W*: followed by 0 or more non-word characters
\K: reset matched information so far
.*: Match anything after this position
Using awk you can use this solution:
awk '{sub(/^.{30}[_[:alnum:]]*[[:blank:]]*/, "")} 1' file
And Hard Working
Finally a sed solution:
sed -E 's/^.{30}[_[:alnum:]]*[[:blank:]]*//' file
And Hard Working

another awk
awk '{print substr($0, index(substr($0,30),FS)+30)}'
find the delimiter index after the 30th char, take a substring from that index on.

I can't imagine why your considering anything related to NF for this since you're not doing anything with fields, you're just splitting each line at a blank char. It sounds like this is all you need for both questions, using GNU awk for gensub():
$ awk '{print gensub(/(.{30}\S*)\s+(.*)/,"\\1",1)}' file
StackOverflow Users Are Brilliant
$ awk '{print gensub(/(.{30}\S*)\s+(.*)/,"\\2",1)}' file
And Hard Working
or it's briefer using GNU sed:
$ sed -E 's/(.{30}\S*)\s+(.*)/\1/' file
StackOverflow Users Are Brilliant
$ sed -E 's/(.{30}\S*)\s+(.*)/\2/' file
And Hard Working

With the use of NF, you can try
awk '{for(i=1;i<=NF;i++){a+=length($i)+1;if(a>30){for(j=i+1;j<=NF;j++)b=b $j" ";print b;exit}}}'

cut -c30- file | cut -d' ' -f2-
this will keep only the words that start after 30th character (index >= 31)

How to extract domain from a text file using Ubuntu Command?

I have a file of URLs, in the format as shown below:
com.blendtuts/S
°=
com.blengineering.www/:http
±=
com.blenheimgang.www/le-porsche-museum-en-details/porsche-museum-3
²=
com.blenheimsi
³=
com.blenkov.www/page/media/18/34/376
´=
com.blentwell.www/bookmarks.php/jackroldan/sp
¸=
com.blentwell.www/tags.php/I
The file size is in GigaBytes. Say around 250 GB of the file size.
I was trying to reverse the words in the file and extract only the domains from the text. I tried to make it using Ubuntu OS terminal commands.
Let me tell you what I have tried:
First I removed the data after “/” using the following command:
~$ ex -sc '%s/$\/$.*/\1/ | x' newfile.txt > ddm.txt
And the result as:
com.blendtuts/
°=
com.blengineering.www/
±=
com.blenheimgang.www/
²=
com.blenheimsi
³=
com.blenkov.www/
´=
com.blentwell.www/
¸=
com.blentwell.www/
Now I reversed the complete text in the file using the solution from : How to reverse all the words in a file with bash in Ubuntu?
And got the following result:
/blendtuts.com
°= /www.blengineering.com
±= /www.blenheimgang.com
²= blenheimsi.com
³= /www.blenkov.com
µ= /www.blentwell.com
¶= /www.blentwell.com
•= /www.blentwell.com
/www.blentwell.com
But still the problem is not solved. I would like to how it is possible to extract URLs and put them into another file using Ubuntu. As you can see above the output what still I have is not the domain, it has a backslash with it.
If there is another solution to such a problem using any other operating system, do let me know. I prefer to go with Ubuntu.
I would like to extract domains out of the file and separate them to another file and that too in a proper format.
If I get the unique domain then it will be an excellent solution to my query. Otherwise, I am using command as:
$ sort filename.txt | uniq > save_to_file.txt
Hope to hear a solution.
Please check here is the sample file: Sample File

Please consider the following for domain extraction and reversion:
awk -F '/' '/com\./ {split($1, arr, /\W+/, seps); for (i=length(arr); i>=1; i--){s = s seps[i] arr[i];} print s ; s="";}'

Remove invalid entries, Mostly we are not interested in lines which starts with ASCII character and ends with character '='
We are interested in URL before first /
Reverse the URL
I have tried below command on your content which gives the lis of URLs
cat -v filename.txt | grep -v '^M-.=' | awk -F '/' '{print $1}' | awk -F '.' 'BEGIN{ORS="";}{ for (i=NF; i>0; i--) if ( i == 1 ) { print $i } else { print $i".";} print "\n"; }'
Output
www.blendschutzrollo.com
blendtuts.com
www.blengineering.com
www.blenheimgang.com
.
.
.

I have got this answer:
$ perl -F/ -anle 'print reverse(split("([^.]*)", $F[0])) if /\./' file_name.txt
One can refer to : https://askubuntu.com/questions/847307/how-to-do-this-in-a-single-command-on-ubuntu-16-04

Cut number from string

I want to cut several numbers from a .txt file to add them later up. Here is an abstract from the .txt file:
anonuser pts/25 127.0.0.1 Mon Nov 16 17:24 - crash (10+23:07)
I want to get the "10" before the "+" and I only want the number, nothing else. This number should be written to another .txt file. I used this code, but it only works if the number has one digit:
awk ' /^'anonuser' / {split($NF,k,"[(+0:)][0-9][0-9]");print k[1]} ' log2.txt > log3.txt

With GNU grep:
grep -Po '\(\K[^+]*' file > new_file
Output to new_file:
10
See: PCRE Regex Spotlight: \K

What if you use the match() function in awk?
$ awk '/^anonuser/ && match($NF,/^\(([0-9]*)/,a) {print a[1]}' file
10
How does this work?
/^anonuser/ && match() {print a[1]} if the line starts with anonuser and the pattern is found, print it.
match($NF,/^\(([0-9]*)/,a) in the last field ((10+23:07)), look for the string ( + digits and capture these in the array a[].
Note also that this approach allows you to store the values you capture, so that you can then sum them as you indicate in the question.

The following uses the same approach as the OP, and has a couple of advantages, e.g. it does not require anything special, and it is quite robust (with respect to assumptions about the input) and maintainable:
awk '/^anonuser/ {split($NF,k,/+/); gsub(/[^0-9]/,"",k[1]); print k[1]}'

for anything more complex use awk but for simple task sed is easy enough
sed -r '/^anonuser/{s/.*\(([0-9]+)\+.*/\1/}'
find the number between a ( and + sign.

I am not sure about the format in the file.
Can you use simple cut commands?
cut -d"(" -f2 log2.txt| cut -d"+" -f1 > log3.txt

Inner join on two text files

Looking to perform an inner join on two different text files. Basically I'm looking for the inner join equivalent of the GNU join program. Does such a thing exist? If not, an awk or sed solution would be most helpful, but my first choice would be a Linux command.
Here's an example of what I'm looking to do
file 1:
0|Alien Registration Card LUA|Checklist Update
1|Alien Registration Card LUA|Document App Plan
2|Alien Registration Card LUA|SA Application Nbr
3|Alien Registration Card LUA|tmp_preapp-DOB
0|App - CSCE Certificate LUA|Admit Type
1|App - CSCE Certificate LUA|Alias 1
2|App - CSCE Certificate LUA|Alias 2
3|App - CSCE Certificate LUA|Alias 3
4|App - CSCE Certificate LUA|Alias 4
file 2:
Alien Registration Card LUA
Results:
0|Alien Registration Card LUA|Checklist Update
1|Alien Registration Card LUA|Document App Plan
2|Alien Registration Card LUA|SA Application Nbr
3|Alien Registration Card LUA|tmp_preapp-DOB

Here's an awk option, so you can avoid the bash dependency (for portability):
$ awk -F'|' 'NR==FNR{check[$0];next} $2 in check' file2 file1
How does this work?
-F'|' -- sets the field separator
'NR==FNR{check[$0];next} -- if the total record number matches the file record number (i.e. we're reading the first file provided), then we populate an array and continue.
$2 in check -- If the second field was mentioned in the array we created, print the line (which is the default action if no actions are provided).
file2 file1 -- the files. Order is important due to the NR==FNR construct.

Should not the file2 contain LUA at the end?
If yes, you can still use join:
join -t'|' -12 <(sort -t'|' -k2 file1) file2

Looks like you just need
grep -F -f file2 file1

You may modify this script:
cat file2 | while read line; do
grep $line file1 # or whatever you want to do with the $line variable
done
while loop reads file2 line by line and gives that line to the grep command that greps that line in file1. There're some extra output that maybe removed with grep options.

You can use paste command to combine file :
paste [option] source files [>destination file]
for your example it would be
paste file1.txt file2.txt >result.txt

perl get linux cached memory number

I'm writing a perl script and I would really like to get the amount of cached memory currently being used on my linux box. When you run "free -m", you get this output:
total used free shared buffers cached
Mem: 496 322 173 0 33 106
-/+ buffers/cache: 183 312
Swap: 1023 25 998
The number under "cached" is the value I want. I've been using Linux::SysInfo,which helps me get a lot of useful information about my box, but seems to be lacking cached memory. Does anyone know of another module or elegant way in perl to get the amount of cached memory on my machine? I know that I could get it by running this:
my $val = `free -m`;
And then running regex on val, but I'd prefer another solution if one exists. Thanks!

Running strace free -m show that it is using /proc/mem:
open("/proc/meminfo", O_RDONLY) = 3
cat /proc/meminfo confirms that this contains the information you're looking for.

I am not sure if you only want a Perl solution, or if any command line solution will be acceptable. Just in case, here is a simple AWK solution:
free -m | awk '/^Mem:/{print $NF}'
that will print the number you are interested in.
You could assign it to some shell variable if that was necessary:
$ c_val=`free -m | awk '/^Mem:/{print $NF}'`
$ echo $c_val
will display the value to verify.
Explanation of awk command:
/^Mem:/ searches for a line that contains the string Mem: at the start. If it is found it prints the last item on that line which is the number we are interested in. In awk the line is split into fields based on white space. $0 is the whole line, $1 the first field, $2 the second etc. The number of fields per line is given by the pre-defined awk NF variable, so we can access the last field on the line with $NF.
We could have also used this awk command:
awk 'NR==2{print $NF}'
which makes use of the pre-defined awk NR variable that contains the current line number. In this case we print the last item (field) on the 2nd line.

You can read it from /proc/meminfo:
perl -ne's/^Cached: *//&&print' /proc/meminfo
or directly only value in kB:
perl -anE'/^Cached/&&say$F[1]' /proc/meminfo

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string