How is it possible to extract substring based on regex in linux shell?

How is it possible to extract substring based on regex in linux shell? - linux

The git submodule status | grep yourSubmodule command gives one of these results:
+1b2377f523dca6fa0c49bd7fa56eeb32011774e1 yourSubmodule (remotes/origin/HEAD)
1b2377f523dca6fa0c49bd7fa56eeb32011774e1 yourSubmodule (remotes/origin/HEAD)
I would like to extract the hash from the result. I created a regex, which captures the hash in a group:
^(?:\+|\s)([0-9a-z]+)\syourSubmodule\s
How is it possible to use it in the shell? Maybe the grep is not even needed. I've found the documentation of sed command, but it is very confusing.

I suggest:
git submodule status | awk '/yourSubmodule/{print $1}'
Output:
+1b2377f523dca6fa0c49bd7fa56eeb32011774e1
1b2377f523dca6fa0c49bd7fa56eeb32011774e1
or with + and space as field separator:
git submodule status | awk -F '[+ ]' '/yourSubmodule/{print $2}'
Output:
1b2377f523dca6fa0c49bd7fa56eeb32011774e1
1b2377f523dca6fa0c49bd7fa56eeb32011774e1

How is it possible to use it in the shell? Maybe the grep is not even needed. Using Bash's =~ operator (emulating git with cat below, not forcing useless use of it):
$ cat file |
for stdin in /dev/stdin
do
while IFS= read -r line
do [[ "$line" =~ yourSubmodule ]] &&
[[ "$line" =~ [a-f0-9]{40} ]] &&
echo $BASH_REMATCH
done < "$stdin"
done
Output:
1b2377f523dca6fa0c49bd7fa56eeb32011774e1
1b2377f523dca6fa0c49bd7fa56eeb32011774e1

Instead of constructing the whole list, just ask for what you want directly:
git rev-parse :path/to/submodule # what's recorded for it
git -C path/to/submodule rev-parse # # what's there now

Related

Bash regex for just numbers and dots

There's a folder with two files in it like: filename-3.0.1-extra.jar and filename-3.0.1.jar. The number and dots in the middle are the version, which can change. I'm trying to copy filename-3.0.1.jar to another folder.
Something like:
cp folder1/filename-*.jar otherfolder/
But the wildcard * matches both files. I'm trying to copy just the file without the -extra at the end. So I'm trying to match filename on just numbers and dots when I copy, something like this:
cp folder1/filename-[0-9.].jar otherfolder/.
But that's not the right syntax for the regex. Would appreciate any help here!
UPDATE:
I got it somewhat working with this:
ls | grep -e "filename-[0-9]\.[0-9]\.[0-9]\.jar"
But the regex seems a bit rigid. Is there a way to shorten it to something like "filename-([0-9]+[\.])+jar"?
So that even cases like filename-32.430.3.jar would also get captured?

Using extglob you can do this:
shopt -s extglob
cp folder1/filename-+([0-9.]).jar otherfolder/
Here +([0-9.]) will match 1 or more of any digits or dots.
Based on your edited question it appears you're trying to use a grep with a regular expression. You can use this grep solution:
printf '%s\n' *.* | grep -E '^filename-([0-9]+\.)+jar$'
filename-3.0.1.jar

you can do something like
cp "folder1/${##*.}" otherfolder
or
cd folder1 && cp -r -v $(echo -e $(ls | grep -e "[0-9]*\.*")) otherfolder/. && cd ..

Given:
$ ls -1 *.jar
filename-3.0.1-extra.jar
filename-3.0.1.jar
You can use a loop and filter out those that match *-extra*:
for fn in *.jar; do # with this glob, what DO you want
[[ $fn != #(*-extra*) ]] && echo "$fn" # and what you DONT want
done
Prints:
filename-3.0.1.jar
So your loop could be:
for fn in *.jar; do
[[ $fn != #(*-extra*) ]] && cp "$fn" otherfolder/
done

Using grep in an if statement

My goal is to write a shell script take the users that I have already filtered out of a file and check whether those users have a certain string, and if they do, label them as major, if not, nonmajor. My trouble is coming from my first if statement, and I'm not sure if grep is the right way to go in an if statement. Here is what I have:
(
while read i
do
username=`echo $i | grep -v 'CMPSC 1513' | grep -P -v '(?!.*CPSMA 2923)CPSMA' | cut -d'|' -f2`
fullname=`echo $i | grep -v 'CMPSC 1513' | grep -P -v '(?!.*CPSMA 2923)CPSMA' | cut -d'|' -f3`
id=`echo $i | grep -v 'CMPSC 1513' | grep -P -v '(?!.*CPSMA 2923)CPSMA' | cut -d'|' -f4`
if [ $username ]
then
if grep -q "|0510"
then
echo $username":(password):(UID):(GID):"$fullname"+"$id":/home/STUDENTS/majors:/bin/bash"
else
echo $username":(password):(UID):(GID):"$fullname"+"$id":/home/STUDENTS/nonmajors:/bin/bash"
fi
fi
done
)<./cs_roster.txt
Just some info, this is contained in a while loop. In the while loop, i determine whether the person listed should even be major or nonmajor, and my if [ $username ] has been tested and does return all the correct users. At this point the while loop is only running once and then stopping.

Just remove the square brackets and pass $i to grep:
if echo $i | grep -q "|0510"
In your code sample, grep does not have anything to work on.

The "binary operator expected" occurs because you are invoking the command [ with the arguments "grep" and "-q" (you are not invoking grep at all), and [ expects a binary operator where you have specified -q. [ is a command, treated no differently that grep or ls or cat. It is better (IMO) to spell it test, and when invoked by the name test it does not require that its last argument be ]. If you want to use grep in an if statement, just do something like:
if echo "$username" | grep -q "|0510"; then ...
(Although I suspect, depending on the context, there are better ways to accomplish your goal.)
The basic syntax of an if statement is if pipeline; then.... In the common case, the pipeline is the simple command test, and at some point in pre-history, the decision was made to provide the name [ for the test command with the added caveat that its final argument must be ]. I believe this was done in an effort to make if statements look more natural, as if the [ is an operator in the language. Just ignore [ and always use test and much confusion will be avoided.

You can use this code as an exercise. Write an awk script for it, or start with something like
while IFS='|' read -r f1 username fullname id otherfields; do
# I don't know which field you want to test. I will rest with id
if [[ $id =~ ^0510 ]]; then
subdir=majors
else
subdir=nonmajors
fi
echo "${username}:(password):(UID):(GID):${fullname}+${id}:/home/STUDENTS/${subdir}:/bin/bash"
done < <( grep -v 'CMPSC 1513' ./cs_roster.txt | grep -P -v '(?!.*CPSMA 2923)CPSMA' )
This is nice for learning some bash syntax, but consider an awk script for avoiding a while-loop.

Searching a string in shell script

I am trying to learn shell script. So sorry if my question is so simple.
I am having a file called one.txt and if either strings 1.2 or 1.3 is present in the string then I have to display the success message else the failure message.
The code I tried is follows,
#!/bin/bash
echo "checking"
if grep -q 1.2 /root/one | grep -q 1.3 /root/one; then
echo " vetri Your NAC version"
fi
What I am doing wrong here ?

You can also include the OR in your grep pattern like so:
grep '1.2\|1.3' /root/one
details here
Update:
as twalberg pointed out in the comment, my answer was not precise enough. The better pattern is:
grep '1\.2\|1\.3' /root/one
Or even better, because more compact:
grep '1\.[23]' /root/one

You have to use ||
#!/bin/bash
echo "checking"
if grep -q 1.2 /root/one || grep -q 1.3 /root/one; then
echo " vetri Your NAC version"
fi
Single | operator is called pipe. It will pass the output of the command before | to the command after |.

It is better to join these these greps with | (OR operator):
grep '1.2\|1.3'
or
grep -E '1.2|1.3'

I guess the easier way to do this is to create a variable to check the count of occurrences:
#!/bin/bash
echo "checking"
CHECK=`egrep -c '1\.(2|3)' /root/one`
if [ "$CHECK" -gt 0 ]; then
echo "vetri Your NAC version"
fi

Removing lines matching a pattern

I want to search for patterns in a file and remove the lines containing the pattern. To do this, am using:
originalLogFile='sample.log'
outputFile='3.txt'
temp=$originalLogFile
while read line
do
echo "Removing"
echo $line
grep -v "$line" $temp > $outputFile
temp=$outputFile
done <$whiteListOfErrors
This works fine for the first iteration. For the second run, it throws :
grep: input file ‘3.txt’ is also the output
Any solutions or alternate methods?

The following should be equivalent
grep -v -f "$whiteListOfErrors" "$originalLogFile" > "$outputFile"

originalLogFile='sample.log'
outputFile='3.txt'
tmpfile='tmp.txt'
temp=$originalLogFile
while read line
do
echo "Removing"
echo $line
grep -v "$line" $temp > $outputFile
cp $outputfile $tmpfile
temp=$tmpfile
done <$whiteListOfErrors

Use sed for this:
sed '/.*pattern.*/d' file
If you have multiple patterns you may use the -e option
sed -e '/.*pattern1.*/d' -e '/.*pattern2.*/d' file
If you have GNU sed (typical on Linux) the -i option is comfortable as it can modify the original file instead of writing to a new file. (But handle with care, in order to not overwrite your original)

Used this to fix the problem:
while read line
do
echo "Removing"
echo $line
grep -v "$line" $temp | tee $outputFile
temp=$outputFile
done <$falseFailures

Trivial solution might be to work with alternating files; e.g.
idx=0
while ...
let next='(idx+1) % 2'
grep ... $file.$idx > $file.$next
idx=$next
A more elegant might be the creation of one large grep command
args=( )
while read line; do args=( "${args[#]}" -v "$line" ); done < $whiteList
grep "${args[#]}" $origFile

how to loop files in linux from svn status

As being quite a newbie in linux, I have the follwing question.
I have list of files (this time resulting from svn status) and i want to create a script to loop them all and replace tabs with 4 spaces.
So I want from
....
D HTML/templates/t_bla.tpl
M HTML/templates/t_list_markt.tpl
M HTML/templates/t_vip.tpl
M HTML/templates/upsell.tpl
M HTML/templates/t_warranty.tpl
M HTML/templates/top.tpl
A + HTML/templates/t_r1.tpl
....
to something like
for i in <files>; expand -t4;do cp $i /tmp/x;expand -t4 /tmp/x > $i;done;
but I dont know how to do that...

You can use this command:
svn st | cut -c8- | xargs ls
This will cut the first 8 characters leaving only a list of file names, without Subversion flags. You can also add grep before cut to filter only some type of changes, like /^M/. xargs will pass the list of files as arguments to a given command (ls in this case).

I would use sed, like so:
for i in files
do
sed -i 's/\t/ /' "$i"
done
That big block in there is four spaces. ;-)
I haven't tested that, but it should work. And I'd back up your files just in case. The -i flag means that it will do the replacements on the files in-place, but if it messes up, you'll want to be able to restore them.
This assumes that $files contains the filenames. However, you can also use Adam's approach at grabbing the filenames, just use the sed command above without the "$i".

Not asking for any votes, but for the record I'll post the combined answer from #Adam Byrtek and #Dan Fego:
svn st | cut -c8- | xargs sed -i 's/\t/ /'

I could not test it with real subversion output, but this should do the job:
svn st | cut -c8- | while read file; do expand -t4 $file > "$file-temp"; mv "$file-temp" "$file"; done
svn st | cut -c8- will generate a list of files without subversion flags. read will then save each entry in the variable $file and expand is used to replace the tabs with four spaces in each file.

Not quite what you're asking, but perhaps you should be looking into commit hooks in subversion?
You could create a hook to block check-ins of any code that contains tabs at the start of a line, or contains tabs at all.
In the repo directory on your subversion server there'll be a directory called hooks. Put something in there which is executable called 'pre-commit' and it'll be run before anything is allowed to be committed. It can return a status to block the commit if you wish.
Here's what I have to stop php files with syntax errors being checked in:
#!/bin/bash
REPOS="$1"
TXN="$2"
PHP="/usr/bin/php"
SVNLOOK=/usr/bin/svnlook
$SVNLOOK log -t "$TXN" "$REPOS" | grep "[a-zA-Z0-9]" > /dev/null
if [ $? -ne 0 ]
then
echo 1>&2
echo "You must enter a comment" 1>&2
exit 1
fi
CHANGED=`$SVNLOOK changed -t "$TXN" "$REPOS" | awk '{print $2}'`
for LINE in $CHANGED
do
FILE=`echo $LINE | egrep \\.php$`
if [ $? == 0 ]
then
MESSAGE=`$SVNLOOK cat -t "$TXN" "$REPOS" "${FILE}" | $PHP -l`
if [ $? -ne 0 ]
then
echo 1>&2
echo "***********************************" 1>&2
echo "PHP error in: ${FILE}:" 1>&2
echo "$MESSAGE" | sed "s| -| $FILE|g" 1>&2
echo "***********************************" 1>&2
exit 1
fi
fi
done

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How is it possible to extract substring based on regex in linux shell? - linux

Instead of constructing the whole list, just ask for what you want directly: git rev-parse :path/to/submodule # what's recorded for it git -C path/to/submodule rev-parse # # what's there now

Related

Bash regex for just numbers and dots

Using grep in an if statement

Searching a string in shell script

Removing lines matching a pattern

how to loop files in linux from svn status

Categories

Resources