Shell scripting : to print selected text in the string - linux

Log file name: "/home/msubra/WORK/tmo/LOG/BCH1043.9987.log"
From the above string i need to extract the content BCH1043.
The directory structure may differ so the solution should check for the string with BCH until the dot

No need to call basename, you can use parameter substitution that is built-in to the shell for the whole thing:
$ cat x.sh
filepath="/home/msubra/WORK/tmo/LOG/BCH1043.9987.log"
# Strip off the path. Everything between and including the slashes.
filename=${filepath##/*/}
# Then strip off everything after and including the first dot.
part1=${filename%%.*}
echo $part1
$ ./x.sh
BCH1043
$
A dot in the filepath will not cause trouble either.
See section 4.5.4 here for more info: http://docstore.mik.ua/orelly/unix3/korn/ch04_05.htm
Oh and resist the temptation to get tricky and do it all in one line. Breaking into separate components is much easier to debug and maintain down the road, and who knows you may need to use those components too (the path and the rest of the file name).

basename will reduce /home/msubra/WORK/tmo/LOG/BCH1043.9987.log to BCH1043.9987.log
echo /home/msubra/WORK/tmo/LOG/BCH1043.9987.log | basename
You can use regular expressions, awk, perl, sed etc to extract "BCH1043" from "BCH1043.9987.log". First I need to know what the range of possible filenames is before I can suggest a regular expression for you.

Use basename to extract only the filename and then use parameter expansion to strip off the data you don't want.
log=/home/msubra/WORK/tmo/LOG/BCH1043.9987.log
log=$(basename "$log")
echo "${log%%.*}"
The following is almost equivalent but doesn't use the external basename process. However there are cases where it will give different results (though whether those cases are relevant here is up to you and your usage/input). See this answer for examples/details.
log=/home/msubra/WORK/tmo/LOG/BCH1043.9987.log
log=${log#*/}
echo "${log%%.*}"

try like this:
a="/home/msubra/WORK/tmo/LOG/BCH1043.9987.log"
echo ${a##*/} | cut -d "." -f 1
or
basename $a | cut -d "." -f 1
or
var=${a##*/}; echo ${var%%.*}
output:
BCH1043
It dosent include dot. Your question is not clear, but you can extract like that
${a##*/} will extract after last / like same as basename

Related

How to extract string in shell script

I have file names like Tarun_Verma_25_02_2016_10_00_10.csv. How can I extract the string like 25_02_2016_10_00_10 from it in shell script?
It is not confirmed that how many numeric parts there would be after "firstName"_"lastName"
A one-line solution would be preferred.
with sed
$ echo Tarun_Verma_25_02_2016_10_00_10.csv | sed -r 's/[^0-9]*([0-9][^.]*)\..*/\1/'
25_02_2016_10_00_10
extract everything between the first digit and dot.
If you want some control over which parts you pick out (assuming the format is always like <firstname>_<lastname>_<day>_<month>_<year>_<hour>_<minute>_<second>.csv) awk would be pretty handy
echo "Tarun_Verma_25_02_2016_10_00_10.csv" | awk -F"[_.]" 'BEGIN{OFS="_"}{print $3,$4,$5,$6,$7,$8}'
Here awk splits by both underscore and period, sets the Output Field Seperator to an underscore, and then prints the parts of the file name that you are interested in.
ksh93 supports the syntax bash calls extglobs out-of-the-box. Thus, in ksh93, you can do the following:
f='Tarun_Verma_25_02_2016_10_00_10.csv'
f=${f##+([![:digit:]])} # trim everything before the first digit
f=${f%%+([![:digit:]])} # trim everything after the last digit
echo "$f"
To do the same in bash, you'll want to run the following command first
shopt -s extglob
Since this uses shell-native string manipulation, it runs much more quickly than invoking an external command (sed, awk, etc) when processing only a single line of input. (When using ksh93 rather than bash, it's quite speedy even for large inputs).

convert this linux statement into a statement which is supported by windows command prompt

This is my statement supported by unix environment
"cat document.xml | grep \'<w:t\' | sed \'s/<[^<]*>//g\' | grep -v \'^[[:space:]]*$\'"
But I want to execute that statement in windows command prompt .
How do I do that? and what are the commands which are similar to cat, grep,sed .
please tell me the exact code supported for windows similar to above command
The double quotes around the pipeline in your question are a syntax error, and the backslashed single quotes should apparently really not have backslashes, but I assume it's just an artefact of a slightly imprecise presentation.
Here's what the code does.
cat document.xml |
This is a useless use of cat but its purpose is to feed the contents of this file into the pipeline.
grep '<w:t' |
This looks for lines containing the literal string <w:t (probably the start of a tag in the XML format in the file). The single quotes quote the string so that it is not interpreted by the shell (otherwise the < would be interpreted as a redirection operator); they are consumed by the shell, and not passed through to grep.
sed 's/<[^<]*>//g' |
This replaces every pair of open/close brokets with an empty string. The regular expression [^<]* matches zero or more occurrences of a character which can be anything except <. If the XML is well-formed, these should always occur in pairs, and so we effectively remove all XML tags.
grep -v '^[[:space:]]*$'
This removes any line which is empty or consists entirely of whitespace.
Because sed is a superset of grep, the program could easily be rephrased as a single sed script. Perhaps the easiest solution for your immediate problem would be to obtain a copy of sed for your platform.
sed -e '/<w:t/!d' -e 's/<[^<]*>//g' -e '/[^[:space]]/!d' document.xml
I understand quoting rules on Windows may be different; try with double quotes instead of single, or put the script in a file and use sed -f file document.xml where file contains the script itself, like this:
/<w:t/!d
s/<[^<]*>//g
/[^[:space]]/!d
This is a rather crude way to extract the CDATA from an XML document, anyway; perhaps some XML processor would be the proper way forward. E.g. xmlstarlet appears to be available for Windows. It works even if the XML input doesn't have the beginning and ending <w:t> tags on the same line, with nothing else on it. (In fact, parsing XML with line-oriented tools is a massive antipattern.)
May try with "powershell" ?
It is included since Win8 I think,
for sure on W10 it is.
I've just tested a "cat" command and it works.
"grep" don't but may be adapt like this :
PowerShell equivalent to grep -f
and
https://communary.wordpress.com/2014/11/10/grep-the-powershell-way/
The equivalent of grep on windows would be findstr and the equivalent of cat would be type.

Capture part of name so that I can loop in linux

how do i capture a pattern in a filename and use that to do in linux?
example in a folder contains these files:
BBB137O19_rc.fa
BBB921N08_cleaned.fa
BBB002O19_cc.fa
I would like to capture the front part of the filename and use that to do things like renaming, run a program etc. Apparently, basename is greedy and works for everything before the extension.
thanks in advance
I tried this command but failed
for i in *.fa; base=$(basename $i _*.fa); comb="${base}_ec.txt"; mv ec.txt $comb; done
You can use BASH string manipulations:
s='BBB921N08_cleaned.fa'
echo "${s%%_*}"
BBB921N08
Also, sed can be used:
echo "$i"|sed 's/_.*//'
This removes _ and any character (.) occuring any number of times after it (*).
Sed with its regular expressions is especially useful, if you have more complicated patterns to process.

Replacing strings with special characters with linux sed

I've read lots of posts to understand how to correctly escape white spaces and special characters inside strings using sed, but still i can't make it, here's what i'm trying to achieve.
I have a file containing the some strings like this one:
JAVA_OPTS="$JAVA_OPTS -Dorg.apache.catalina.jsessionid=some_value"
and i'm trying to replace 'some_value' using the following:
sed -i "s/^\(JAVA_OPTS=\"\$JAVA_OPTS[ \t]*-Dorg\.apache\.catalina\.jsessionid*=\s*\).*\$/\1$DORG_APACHE_CATALINA_JSESSIONID/" $JBOSS_CONFIGURATION/jboss.configuration
$JBOSS_CONFIGURATION is a variable containing an absolute Linux path.
jboss.configuration is a file i'm pointing as the target for replace
operations.
$DORG_APACHE_CATALINA_JSESSIONID contains the value i want instead
of 'some_value'.
Please note that the pattern:
JAVA_OPTS="$JAVA_OPTS -D
Is always present, and org.apache.catalina.jsessionid is an example of a variable value i'm trying to replace with this script.
What's missing/wrong ? i tried also escaping whitespaces using \s without success,
and echoing the whole gives me the following:
echo "s/^\(JAVA_OPTS=\"\$JAVA_OPTS[ \t]*-Dorg\.apache\.catalina\.jsessionid*=\s*\).*\$/\1$DORG_APACHE_CATALINA_JSESSIONID/"
s/^\(JAVA_OPTS="$JAVA_OPTS[ \t]*-Dorg\.apache\.catalina\.jsessionid*=\s*\).*$/\1/
is echo interpreting the search pattern as sed does ?
any info/help/alternative ways of doing it are highly welcome,
thank you all
echo 'JAVA_OPTS="$JAVA_OPTS -Dorg.apache.catalina.jsessionid=some_value"' | (export DORG_APACHE_CATALINA_JSESSIONID=FOO/BAR/FOOBAR; sed "s/^\(JAVA_OPTS=\"\$JAVA_OPTS[ \t]*-Dorg\.apache\.catalina\.jsessionid*=\s*\).*\$/\1${DORG_APACHE_CATALINA_JSESSIONID////\/}\"/")
Note the bash expansion (in order to escape any / that may trip up sed) and the extra \" after $DORG_APACHE_CATALINA_JSESSIONID in order to properly close the double quote. Other than that your sed expression works for me and the above command outputs the follwoing result:
JAVA_OPTS="$JAVA_OPTS -Dorg.apache.catalina.jsessionid=FOO/BAR/FOOBAR"
You can use sed like this:
sed -r '/\$JAVA_OPTS -D/{s/^(.+=).*$/\1'"$DORG_APACHE_CATALINA_JSESSIONID"'/;}' $JBOSS_CONFIGURATION/jboss.configuration
You can specify a pattern that'll match the desired string rather than trying to specify it exactly.
The following should work for you:
sed -i 's#^\(JAVA_OPTS.*Dorg.apache.catalina.jsessionid\)=\([^"]*\)"#\1='"$DORG_APACHE_CATALINA_JSESSIONID"'"#' $JBOSS_CONFIGURATION/jboss.configuration
sed 's/=\w.*$/='"$DORG_APACHE_CATALINA_JSESSIONID"'/' $JBOSS_CONFIGURATION/jboss.configuration

Extracting sub-strings in Unix

I'm using cygwin on Windows 7. I want to loop through a folder consisting of about 10,000 files and perform a signal processing tool's operation on each file. The problem is that the files names have some excess characters that are not compatible with the operation. Hence, I need to extract just a certain part of the file names.
For example if the file name is abc123456_justlike.txt.rna I need to use abc123456_justlike.txt. How should I write a loop to go through each file and perform the operation on the shortened file names?
I tried the cut - b1-10 command but that doesn't let my tool perform the necessary operation. I'd appreciate help with this problem
Try some shell scripting, using the ${NAME%TAIL} parameter substitution: the contents of variable NAME are expanded, but any suffix material which matches the TAIL glob pattern is chopped off.
$ NAME=abc12345.txt.rna
$ echo ${NAME%.rna} #
# process all files in the directory, taking off their .rna suffix
$ for x in *; do signal_processing_tool ${x%.rna} ; done
If there are variations among the file names, you can classify them with a case:
for x in * ; do
case $x in
*.rna )
# do something with .rna files
;;
*.txt )
# do something else with .txt files
;;
* )
# default catch-all-else case
;;
esac
done
Try sed:
echo a.b.c | sed 's/\.[^.]*$//'
The s command in sed performs a search-and-replace operation, in this case it replaces the regular expression \.[^.]*$ (meaning: a dot, followed by any number of non-dots, at the end of the string) with the empty string.
If you are not yet familiar with regular expressions, this is a good point to learn them. I find manipulating string using regular expressions much more straightforward than using tools like cut (or their equivalents).
If you are trying to extract the list of filenames from a directory use the below command.
ls -ltr | awk -F " " '{print $9}' | cut -c1-10

Resources