Perl - Print substituted string on command line only if substitution occured - linux

I want to print all .sh files in a directory. So I tried the following command -
ll | perl -pe 's/(.*)([0-9]*:[0-9]* *)(.*\.sh)/$3/'
The last match (.*sh) is what I am interested in. The command works correctly for all those files that have the .sh extension. However, it also prints all the other files (the entire line i.e.) where the substitution was not possible.
Is there a way I can specify that perl should print the output only if substitution actually occurred?

If you use -n instead of -p, Perl will loop over the lines without automatically printing them out. Then you can add your own print statement that's conditional on the success of the substitution:
ll | perl -ne 'print if s/(.*)([0-9]*:[0-9]* *)(.*\.sh)/$3/'
This is only a bit longer than the equivalent sed -ne 's/.../.../p', which is what I would use for simple cases that don't require more complicated logic beyond the substitution+print.
Also, those extra groups aren't doing anything in this example, so you could simplify to this:
ll | perl -ne 'print if s/^.*[0-9]:[0-9]*\s+(.*\.sh)$/$1/'
or, since you're doing the printing yourself instead of letting -p do it for you, you can skip the substitution and leave $_ alone, while just printing out the part you want. That means you don't get the newline for free, though. Here I used -l to compensate for that fact:
ll | perl -lne 'print $1 if /^.*[0-9]:[0-9]*\s+(.*\.sh)$/'
But you could also use the modern convenience operator say:
ll | perl -nE 'say $1 if /^.*[0-9]:[0-9]*\s+(.*\.sh)$/'

Why not use find? This is exactly what it was created for:
find . -type f -regex ".*\.sh"

Related

Copying files from text list

I have a list of sample names in text format e.g.
Sample1
Sample2
etc....
Im trying to find and copy files with these names and a specific extension using the below one liner
find ./ | egrep fq.gz | fgrep -f list.txt | perl -ne 'chomp; system "cp $_ /data/copy_of_files/"'
No errors are thrown up but nothing is copied.
This line works until I pass the output to perl (list of correct files prints in termial from the fgrep) so I think my issue is with the perl section...
any suggestions?
Both your original command and this command worked for me:
find . -name '*.gz' | fgrep -f list.txt | \
perl -ne 'chomp; system("cp $_ <DIR>");'
Have you verified that your user or group have write permissions to /data/copy_of_files?
Suggestions:
Don't use Perl here, but instead, create a pipeline that just spits
out the cp commands. As soon as you're satisfied with what you
see, append | sh -x.
Be absolutely sure that none of your file names contain whitespace or other characters special to the shell. If some do, but only a little (e.g. only spaces), you may get by with appropriate quoting in the cp commands, but if anything is possible in filenames, a different approach will be required, and I would probably write the whole thing in Perl using File::Find::Rule.

Complex shell wildcard

I want to use echo to display(not content) directories that start with atleast 2 characters but can't begin with "an"
For example if had the following in the directory:
a as an23 an23 blue
I would only get
as blue back
I tried echo ^an* but that returns the directory with 1 charcter too.
Is there any way i can do this in the form of echo globalpattern
You can use the shells extended globbing feature, in bash:
bash$ setsh -s extglob
bash$ echo !(#(?|an*))
The !() construct inverts its internal expression, see this for more.
In zsh:
zsh$ setopt extendedglob
zsh$ print *~(?|an*)
In this case the ~ negates the pattern before the tilde. See the manual for more.
Since you want at least two characters in the names, you can use printf '%s\n' ??* to echo each such name on a separate line. You can then eliminate those names that start with an with grep -v '^an', leading to:
printf '%s\n' ??* | grep -v '^an'
The quotes aren't strictly necessary in the grep command with modern shells. Once upon a quarter of a century or so ago, the Bourne shell had ^ as a synonym for | so I still use quotes around carets.
If you absolutely must use echo instead of printf, then you'll have to map white space to newlines (assuming you don't have any names that contain white space).
I'm trying with just the echo command, no grep either?
What about:
echo [!a]?* a[!n]*
The first term lists all the two-plus character names not beginning with a; the second lists all the two-plus character names where the first is a and the second is not n.
This should do it, but you'd likely be better off with ls or even find:
echo * | tr ' ' '\012' | egrep '..' | egrep -v '^an'
Shell globbing is a form of regex, but it's not as powerful as egrep regex's.

Linux Prompt Change Content Within File based on File Name

I know how to do a search and replace amongst group of files:
perl -pi -w -e 's/search/replace/g;' *.php
So I can use that to search for a keyword or phrase and change it. But I have a more complicated task I dont know how to do.
I want to do a search and replace among all my php files to search for a specific Keyword and replace it with the File Name minus the extension.
Example: Search the file Mountains.php for the keyword Trees and everywhere you see Trees, replace it with Mountains
Of course I want to be able to do that in batch, for a few hundred php files all with different names, however, all containing the search term Trees.
If someone is looking for an extra challenge, haha, it would be even better if I could do a more complex scenario such as....
Example: Search the file MountainTowns.php for the keyword Trees and everywhere you see Trees, replace it with "Mountain Towns" (note the extra space, Capital Letters could would indicate where spaces go)
Thanks for your time and considering my question.
Well, the filename is in $ARGV, so there is not much more work needed.
perl -i -pe '($x=$ARGV)=~s{.php$}{};s{Trees}{$x}g' BlueMountains.php RedMountains.php
Add in
$x=~s{(.)([A-Z])}{$1 $2}g;
to add the space before upcased letters, for a complete line of
perl -i -pe '($x=$ARGV)=~s{.php$}{};$x=~s{(.)([A-Z])}{$1 $2}g;s{Trees}{$x}g' BlueRedMountains.php
This might work for you:
printf "%s\n" *.php |perl -pwe 's|(.*).php|perl -pi -we "s/Trees/$1/g;" $&|' | bash
This uses perl to write a script to do you bidding.
Other little languages could be employed, like awk or:
printf "%s\n" *.php |sed 'h;s/\.php//;s/\B[A-Z]/ &/;G;s|\(.*\)\n\(.*\)|sed -i "s/Trees/\1/g" \2|' | bash
This uses sed to provide a solution for the second request.
You want a separate replacement for each file, so run a separate search and replace for each:
for file in *.php; do sed -i "s/foo/${file%.*}/g" "$file"; done
And your second request is a bit harder, it at least requires a subshell.
for file in *; do sed -i "s/bar/$(echo ${file%.*} | sed 's/\(.\)\([A-Z]\)/\1 \2/')/g" "$file"; done
It's a bit more readable if you put it in a script:
#!/bin/bash
for file in "$#"; do
replacement=$(echo ${file%.*} | sed 's/\(.\)\([A-Z]\)/\1 \2/')
sed -i "s/bar/$replacement/g" "$file";
done
This will work over all the arguments passed it, so call with ./script.sh *.php.

How to stop sed from buffering?

I have a program that writes to fd3 and I want to process that data with grep and sed. Here is how the code looks so far:
exec 3> >(grep "good:"|sed -u "s/.*:\(.*\)/I got: \1/")
echo "bad:data1">&3
echo "good:data2">&3
Nothing is output until I do a
exec 3>&-
Then, everything that I wanted finally arrives as I expected:
I got: data2
It seems to reply immediately if I use only a grep or only a sed, but mixing them seems to cause some sort of buffering. How can I get immediate output from fd3?
I think I found it. For some reason, grep doesn't automatically do line buffering. I added a --line-buffered option to grep and now it responds immediately.
You only need to tell grep and sed to not bufferize lines:
grep --line-buffered
and
sed -u
An alternate means to stop sed from buffering is to run it through the s2p sed-to-Perl translator and insert a directive to have it command-buffered, perhaps like
BEGIN { $| = 1 }
The other reason to do this is that it gives you the more convenient notation from EREs instead of the backslash-annoying legacy BREs. You also get the full complement of Unicode properties, which is often critical.
But you don’t need the translator for such a simple sed command. And you do not need both grep and sed, either. These all work:
perl -nle 'BEGIN{$|=1} if (/good:/) { s/.*:(.*)/I got: $1/; print }'
perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:(.*)/I got: $1/; print'
perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:/I got: /; print'
Now you also have access to the minimal quantifier, *?, +?, ??, {N,}?, and {N,M}?. These now allow things like .*? or \S+? or [\p{Pd}.]??, which may well be preferable.
You can merge the grep into the sed like so:
exec 3> >(sed -une '/^good:/s//I got: /p')
echo "bad:data1">&3
echo "good:data2">&3
Unpacking that a bit: You can put a regexp (between slashes as usual) before any sed command, which makes it only be applied to lines that match that regexp. If the first regexp argument to the s command is the empty string (s//whatever/) then it will reuse the last regexp that matched, which in this case is the prefix, so that saves having to repeat yourself. And finally, the -n option tells sed to print only what it is specifically told to print, and the /p suffix on the s command tells it to print the result of the substitution.
The -e option is not strictly necessary but is good style, it just means "the next argument is the sed script, not a filename".
Always put sed scripts in single quotes unless you need to substitute a shell variable in there, and even then I would put everything but the shell variable in single quotes (the shell variable is, of course, double-quoted). You avoid a bunch of backslash-related grief that way.
On a Mac, brew install coreutils and use gstdbuf to control buffering of grep and sed.
Turn off buffering in pipe seems to be the easiest and most generic answer. Using stdbuf (coreutils) :
exec 3> >(stdbuf -oL grep "good:" | sed -u "s/.*:\(.*\)/I got: \1/")
echo "bad:data1">&3
echo "good:data2">&3
I got: data2
Buffering has other dependencies, for example depending on mawk either gawk reading this pipe :
exec 3> >(stdbuf -oL grep "good:" | awk '{ sub(".*:", "I got: "); print }')
In that case, mawk would retain the input, gawk wouldn't.
See also How to fix stdio buffering

Linux command line: split a string

I have long file with the following list:
/drivers/isdn/hardware/eicon/message.c//add_b1()
/drivers/media/video/saa7134/saa7134-dvb.c//dvb_init()
/sound/pci/ac97/ac97_codec.c//snd_ac97_mixer_build()
/drivers/s390/char/tape_34xx.c//tape_34xx_unit_check()
(PROBLEM)/drivers/video/sis/init301.c//SiS_GetCRT2Data301()
/drivers/scsi/sg.c//sg_ioctl()
/fs/ntfs/file.c//ntfs_prepare_pages_for_non_resident_write()
/drivers/net/tg3.c//tg3_reset_hw()
/arch/cris/arch-v32/drivers/cryptocop.c//cryptocop_setup_dma_list()
/drivers/media/video/pvrusb2/pvrusb2-v4l2.c//pvr2_v4l2_do_ioctl()
/drivers/video/aty/atyfb_base.c//aty_init()
/block/compat_ioctl.c//compat_blkdev_driver_ioctl()
....
It contains all the functions in the kernel code. The notation is file//function.
I want to copy some 100 files from the kernel directory to another directory, so I want to strip every line from the function name, leaving just the filename.
It's super-easy in python, any idea how to write a 1-liner in the bash prompt that does the trick?
Thanks,
Udi
cat "func_list" | sed "s#//.*##" > "file_list"
Didn't run it :)
You can use pure Bash:
while read -r line; do echo "${line%//*}"; done < funclist.txt
Edit:
The syntax of the echo command is doing the same thing as the sed command in Eugene's answer: deleting the "//" and everything that comes after.
Broken down:
"echo ${line}" is the same as "echo $line"
the "%" deletes the pattern that follows it if it matches the trailing portion of the parameter
"%" makes the shortest possible match, "%%" makes the longest possible
"//*" is the pattern to match, "*" is similar to sed's ".*"
See the Parameter Expansion section of the Bash man page for more information, including:
using ${parameter#word} for matching the beginning of a parameter
${parameter/pattern/string} to do sed-style replacements
${parameter:offset:length} to retrieve substrings
etc.
here's a one liner in (g)awk
awk -F"//" '{print $1}' file
Here's one using cut and rev
cat file | rev | cut -d'/' -f2-| rev

Resources