Use Bash Perl to fetch substring off of command output - linux

Let's presume the text I'm working with is (which is outputted by pecl install xdebug):
| - A list of all settings: https://xdebug.org/docs-settings.php |
| - A list of all functions: https://xdebug.org/docs-functions.php |
| - Profiling instructions: https://xdebug.org/docs-profiling2.php |
| - Remote debugging: https://xdebug.org/docs-debugger.php |
| |
| |
| NOTE: Please disregard the message |
| You should add "extension=xdebug.so" to php.ini |
| that is emitted by the PECL installer. This does not work for |
| Xdebug. |
| |
+----------------------------------------------------------------------+
running: find "/tmp/pear/temp/pear-build-defaultuserNxuIJy/install-xdebug-2.9.2" | xargs ls -dils
1078151 4 drwxr-xr-x 3 root root 4096 Feb 3 17:40 /tmp/pear/temp/pear-build-defaultuserNxuIJy/install-xdebug-2.9.2
1078337 4 drwxr-xr-x 3 root root 4096 Feb 3 17:40 /tmp/pear/temp/pear-build-defaultuserNxuIJy/install-xdebug-2.9.2/usr
1078338 4 drwxr-xr-x 3 root root 4096 Feb 3 17:40 /tmp/pear/temp/pear-build-defaultuserNxuIJy/install-xdebug-2.9.2/usr/local
1078339 4 drwxr-xr-x 3 root root 4096 Feb 3 17:40 /tmp/pear/temp/pear-build-defaultuserNxuIJy/install-xdebug-2.9.2/usr/local/lib
1078340 4 drwxr-xr-x 3 root root 4096 Feb 3 17:40 /tmp/pear/temp/pear-build-defaultuserNxuIJy/install-xdebug-2.9.2/usr/local/lib/php
1078341 4 drwxr-xr-x 3 root root 4096 Feb 3 17:40 /tmp/pear/temp/pear-build-defaultuserNxuIJy/install-xdebug-2.9.2/usr/local/lib/php/extensions
1078342 4 drwxr-xr-x 2 root root 4096 Feb 3 17:40 /tmp/pear/temp/pear-build-defaultuserNxuIJy/install-xdebug-2.9.2/usr/local/lib/php/extensions/no-debug-non-zts-20180731
1078336 2036 -rwxr-xr-x 1 root root 2084800 Feb 3 17:40 /tmp/pear/temp/pear-build-defaultuserNxuIJy/install-xdebug-2.9.2/usr/local/lib/php/extensions/no-debug-non-zts-20180731/xdebug.so
Build process completed successfully
Installing '/usr/local/lib/php/extensions/no-debug-non-zts-20180731/xdebug.so'
install ok: channel://pecl.php.net/xdebug-2.9.2
configuration option "php_ini" is not set to php.ini location
You should add "zend_extension=/usr/local/lib/php/extensions/no-debug-non-zts-20180731/xdebug.so" to php.ini
I want to extract this part off of this output and save it in a variable for later use:
zend_extension=/usr/local/lib/php/extensions/no-debug-non-zts-20180731/xdebug.so
I have attempted doing it like this with Perl without success:
echo $OUTPUT | perl -lne 'm/You should add "(.*)"/; print $1'
How do I get the substring dynamically with perl? What's the pattern that I need to use?

With the $OUTPUT text placed in a file output.txt
cat output.txt | perl -wnE'say $1 if /You should add "(zend_extension=.*)"/'
This uses the specifics of the shown text, in particular the seemingly unique zend_extension=... preface for the path, to distinguish the needed line from an earlier "You should add" pattern. Change as needed, to what is more suitable for your problem.
If the text is thrown at the one-liner as one string in your code then add -0777 flag to test.
Otherwise please clarify how that $OUTPUT comes about.
Tested with a bash script
#!/bin/bash
# Last modified: 2020 Feb 03 (12:58)
OUTPUT=$(cat "output.txt")
echo $OUTPUT | perl -wnE'say $1 if /You should add "(zend_extension=.*)"/'
where output.txt is a file with the text from the question, and the right line is printed.

You can use this perl:
perl -lne 'print $1 if /You should add "(?!extension=xdebug\.so)([^"]+)"/' <<< "$OUTPUT"
zend_extension=/usr/local/lib/php/extensions/no-debug-non-zts-20180731/xdebug.so
Negative lookahead (?!extension=xdebug\.so) will ignore line extension=xdebug.so in output.
Alternatively you may match You should add at the line start:
perl -lne 'print $1 if /^You should add "([^"]+)"/' <<< "$OUTPUT"

Probably OP meant to use
echo $OUTPUT | perl -ne 'm/You should add "(.*)"/ && print $1'
or
echo $OUTPUT | perl -ne 'print $1 if m/You should add "(.*)"/'

Related

Unix/Linux display average file size with restriction

how do I display the average file size (rounded down). Use only: cat, echo, ls, wc, here is what I was able to do so far: echo "$(cat * | wc -w; ls -l | wc -l)" I have both of the numbers, I just can't divide them, any help would be appreciated and thanks in advance
Is it allowed to use the shell for the division?
$ ls -l
total 20
-rw-r--r-- 1 james james 968 Dec 29 2016 bar
-rw-r--r-- 1 james james 900 Dec 29 2016 bar.asc
-rw-r--r-- 1 james james 39 Dec 29 2016 compr.txt
-rw-r--r-- 1 james james 1056 Dec 28 2016 foo
-rw-r--r-- 1 james james 896 Dec 29 2016 foo.asc
$ cat * | wc -c
3859
$ ls | wc -l
5
$ echo $(( $(cat * | wc -c) / $(ls | wc -l) )) # solution part
771
$ echo 5*771 | bc
3855
You can do
n=( * ); s=( $(ls -sk) ); echo $(( ${s[1]} / ${#n[#]} ))
use an array to count the number of files in the directory and ls to get the total size in Kbytes, then print the result of the quotient.

Find regular expression matching condition

I have a set of files including a date in their name:
MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
I want to select the files matching a condition on this date. In this example: date > 20151230
I tried things like:
find . -regex ".*.SUB.nc" | cut -d "." -f 4 | while read a; do if [ $a -ge 20151201 ]; then echo $a; fi; done
BUT:
1) This is returning only a part of the filename, whereas I would like to return the entire filename.
2) There may be a more elegant way than using while read/do
thanks in advance!
Rearranging your code becomes:
#!/usr/bin/env bash
find . -regex ".*.SUB.nc" \
| rev | cut -d '.' -f 3 | rev \
| while read a; do
if [ $a -ge 20151201 ]; then
echo $a
fi
done
rev | cut -d '.' -f 3 | rev is used because
if you give absolute path or
the subdirectories have . in them
then it won't be the 4th field, but it will always be the 3rd last field.
This will give the output:
20151231
20151229
20151230
To show the complete file names change echo $a with ls *$a*. Output:
MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
I tested this script with file names whose dates are less than 20151201. For example MERRA2_400.tavg1_2d_slv_Nx.20151200.SUB.nc. The results are consistent.
Perhaps a more efficient way to accomplish your task is using a grep regex like:
find . -regex ".*.SUB.nc" | grep -E "201512(0[1-9]|[1-9][0-9])|201[6-9][0-9][0-9][0-9]"
This will work just fine.
find . -regex ".*.SUB.nc" | rev | cut -d '.' -f 3 | rev | while read a; do if [ $a -ge 20151201 ]; then echo `ls -R | grep $a` ;fi ;done
rev | cut -d '.' -f 3 | rev is used because
if you give absolute path or
the subdirectories have . in them
then it won't be the 4th field now, but it will always be the 3rd last field always.
ls -R | grep $a so that you can recursively find out the name of the file.
Assume is the files and file structure is :
[root#localhost temp]# ls -lrt -R
.:
total 8
-rw-r--r--. 1 root root 0 Apr 25 16:15 MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
-rw-r--r--. 1 root root 0 Apr 25 16:15 MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
-rw-r--r--. 1 root root 0 Apr 25 16:15 MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
drwxr-xr-x. 2 root root 4096 Apr 25 16:32 temp.3
drwxr-xr-x. 3 root root 4096 Apr 25 17:13 temp2
./temp.3:
total 0
./temp2:
total 4
-rw-r--r--. 1 root root 0 Apr 25 16:27 MERRA2_400.tavg1_2d_slv_Nx.20151111.SUB.nc
-rw-r--r--. 1 root root 0 Apr 25 16:27 MERRA2_400.tavg1_2d_slv_Nx.20151222.SUB.nc
drwxr-xr-x. 2 root root 4096 Apr 25 17:13 temp21
./temp2/temp21:
total 0
-rw-r--r--. 1 root root 0 Apr 25 17:13 MERRA2_400.tavg1_2d_slv_Nx.20151333.SUB.nc
Running above command gives :
MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151333.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151222.SUB.nc

Split texts into smaller texts of n number of words

I have a large number of texts (several thousand) in a txt format and would like to split them into 500-word long chunks and to save these chunks into separate folders.
< *.txt tr -c A-Za-z0-9 \\n | grep -v '^$' | split -l 500
can do the job but it splits texts to one word per line, whereas I would like to retain the original format.
I was wondering if there is a bash command or Python script to do this.
You should also be able to do that with csplit, but I had better luck with the perl solution found here; https://unix.stackexchange.com/questions/66513/how-can-i-split-a-large-text-file-into-chunks-of-500-words-or-so
Thanks to Joseph R.
$ cat generatewordchunks.pl
perl -e '
undef $/;
$file=<>;
while($file=~ /\G((\S+\s+){500})/gc)
{
$i++;
open A,">","chunk-$i.txt";
print A $1;
close A;
}
$i++;
if($file=~ /\G(.+)\Z/sg)
{
open A,">","chunk-$i.txt";
print A $1;
}
' $1
$ ./generatewordchunks.pl woord.list
$ ls -ltr
total 13
-rwxrwx--- 1 root vboxsf 5934 Jul 31 16:03 woord.list
-rwxrwx--- 1 root vboxsf 362 Jul 31 16:08 generatewordchunks.pl
-rwxrwx--- 1 root vboxsf 4203 Jul 31 16:11 chunk-1.txt
-rwxrwx--- 1 root vboxsf 1731 Jul 31 16:11 chunk-2.txt

Print permissions from file arguments in Bash script

I'm having trouble reading the permissions of file arguments. I looks like it has something to do with hidden files but I'm not sure why.
Current Code:
#!/bin/bash
if [ $# = 0 ]
then
echo "Usage ./checkPerm filename [filename2 ... filenameN]"
exit 0
fi
for file in $#
do
ls -l | grep $file | cut -f1 -d' '
# Do Something
done
I can get the permissions for each input, but when a hidden file is run through through the loop it re-prints the permissions of all files.
-bash-4.1$ ll test*
-rw-r--r-- 1 user joe 0 Nov 11 19:07 test1
-r-xr-xr-x 1 user joe 0 Nov 11 19:07 test2*
-r--r----- 1 user joe 0 Nov 11 19:07 test3
-rwxr-x--- 1 user joe 0 Nov 11 19:07 test4*
-bash-4.1$ ./checkPerm test*
-rw-r--r--
-rw-r--r--
-r-xr-xr-x
-r--r-----
-rwxr-x---
-r--r-----
-rw-r--r--
-r-xr-xr-x
-r--r-----
-rwxr-x---
-bash-4.1$
What is going on in the loop?
It's your grep:
ls -l | grep 'test2*'
This will grep out anything starting with test since you're basically asking for anything starting with test that might end with 0 or more 2s in it, as specified by the 2*.
To get your intended result, simply remove your loop and replace it with this:
ls -l "$#" | cut -d' ' -f1
Or keep your loop, but remove the grep:
ls -l $file | cut -d' ' -f1
Also, technically, none of those files are hidden. Hidden files in bash start with ., like .bashrc.
When you do the ls -l inside the loop and then grep the results, if there are files that contain test1 in the name, but not at the start, they are selected by the grep, giving you extra results. You could see that by doing:
ls -l | grep test
and seeing that there are many more entries than the 4 you get with ls -l test*.
Inside your loop, you should probably use just:
ls -ld "$file" | cut -d' ' -f1

Using variables with sed

I'm trying to delete a part of a file using sed in Linux (Ubuntu). Specifically, I want to delete the first lines of a log file until the first occurrence of the current system date (using the pattern '10 Jan 13').
So, I store the date in a variable
root#server:/# VAR_DATE=`date -R | cut -c6-11`
And after that, I use sed
root#server:/# cat log_file.txt | sed -n -e '/$VAR_DATE/,$p'
But it doesn't work. I've tried a lot of combinations with the same result:
root#server:/# cat log_file.txt | sed -n -e '/"$VAR_DATE"/,$p'
root#server:/# cat log_file.txt | sed -n -e '/"${VAR_DATE}"/,$p'
root#server:/# cat log_file.txt | sed -n -e "/$VAR_DATE/,$p"
What I'm doing wrong?
Use double quotes so the variable $vardate gets expanded by the shell and escape the last $ so it's not expanded by the shell sed -n "/$vardate/,\$p" file:
$ cat file
6 Jan 13
7 Jan 13
8 Jan 13
9 Jan 13
10 Jan 13
11 Jan 13
12 Jan 13
13 Jan 13
$ vardate="10 Jan 13"
$ sed -n "/$vardate/,\$p" file
10 Jan 13
11 Jan 13
12 Jan 13
13 Jan 13

Resources