How to extract string in shell script

How to extract string in shell script - linux

I have file names like Tarun_Verma_25_02_2016_10_00_10.csv. How can I extract the string like 25_02_2016_10_00_10 from it in shell script?
It is not confirmed that how many numeric parts there would be after "firstName"_"lastName"
A one-line solution would be preferred.

with sed
$ echo Tarun_Verma_25_02_2016_10_00_10.csv | sed -r 's/[^0-9]*([0-9][^.]*)\..*/\1/'
25_02_2016_10_00_10
extract everything between the first digit and dot.

If you want some control over which parts you pick out (assuming the format is always like <firstname>_<lastname>_<day>_<month>_<year>_<hour>_<minute>_<second>.csv) awk would be pretty handy
echo "Tarun_Verma_25_02_2016_10_00_10.csv" | awk -F"[_.]" 'BEGIN{OFS="_"}{print $3,$4,$5,$6,$7,$8}'
Here awk splits by both underscore and period, sets the Output Field Seperator to an underscore, and then prints the parts of the file name that you are interested in.

ksh93 supports the syntax bash calls extglobs out-of-the-box. Thus, in ksh93, you can do the following:
f='Tarun_Verma_25_02_2016_10_00_10.csv'
f=${f##+([![:digit:]])} # trim everything before the first digit
f=${f%%+([![:digit:]])} # trim everything after the last digit
echo "$f"
To do the same in bash, you'll want to run the following command first
shopt -s extglob
Since this uses shell-native string manipulation, it runs much more quickly than invoking an external command (sed, awk, etc) when processing only a single line of input. (When using ksh93 rather than bash, it's quite speedy even for large inputs).

Related

how to escape file path in bash script variable

I would like to escape a file path that is stored in a variable in a bash script.
I read several threads about escaping back ticks or but it seems not working as it should:
I have this variable:
The variables value is entered during the bash script execution as user parameter
CONFIG="/home/teams/blabla/blabla.yaml"
I would need to change this to: \/home\/teams\/blabla\/blabla.yaml
How can I do that with in the script via sed or so (not manually)?

With GNU bash and its Parameter Expansion:
echo "${CONFIG//\//\\/}"
Output:
\/home\/teams\/blabla\/blabla.yaml

Using the solution from this question, in your case it will look like this:
CONFIG=$(echo "/home/teams/blabla/blabla.yaml" | sed -e 's/[]\/$*.^[]/\\&/g')

echo "/home/teams/blabla/blabla.yaml" | sed 's/\//\\\//g'
\/home\/teams\/blabla\/blabla.yaml
explanation:
backslash is used to set the following letter/symbol as an regular expression or vice versa. double backslash is used when you need a backslash as letter.

Why does that need escaping? Is this an XY Problem?
If the issue is that you are trying to use that variable in a substitution regex, then the examples given should work, but you might benefit by removing some of the "leaning toothpick syndrom", which many tools can do just by using a different match delimiter. sed, for example:
$: sed "s,SOME_PLACEHOLDER_VALUE,$CONFIG," <<< SOME_PLACEHOLDER_VALUE
/home/teams/blabla/blabla.yaml
Be very careful about this, though. Commas are perfectly valid characters in a filename, as are almost anything but NULLs. Know your data.

Find line starts with and replace in linux using sed [duplicate]

This question already has answers here:
Replace whole line when match found with sed
(4 answers)
Closed 4 years ago.
How do I find line starts with and replace complete line?
File output:
xyz
abc
/dev/linux-test1/
Code:
output=/dev/sda/windows
sed 's/^/dev/linux*/$output/g' file.txt
I am getting below Error:
sed: -e expression #1, char 9: unknown option to `s'
File Output expected after replacement:
xyz
abc
/dev/sda/windows

Let's take this in small steps.
First we try changing "dev" to "other":
sed 's/dev/other/' file.txt
/other/linux-test1/
(Omitting the other lines.) So far, so good. Now "/dev/" => "/other/":
sed 's//dev///other//' file.txt
sed: 1: "s//dev///other//": bad flag in substitute command: '/'
Ah, it's confused, we're using '/' as both a command delimiter and literal text. So we use a different delimiter, like '|':
sed 's|/dev/|/other/|' file.txt
/other/linux-test1/
Good. Now we try to replace the whole line:
sed 's|^/dev/linux*|/other/|' file.txt
/other/-test1/
It didn't replace the whole line... Ah, in sed, '*' means the previous character repeated any number of times. So we precede it with '.', which means any character:
sed 's|^/dev/linux.*|/other/|' file.txt
/other/
Now to introduce the variable:
sed 's|^/dev/linux.*|$output|' file.txt
$output
The shell didn't expand the variable, because of the single quotes. We change to double quotes:
sed "s|^/dev/linux.*|$output|" file.txt
/dev/sda/windows

This might work for you (GNU sed):
output="/dev/sda/windows"; sed -i '\#/dev/linux.*/#c'"$output" file
Set the shell variable and change the line addressed by /dev/linux.*/ to it.
N.B. The shell variable needs to interpolated hence the ; i.e. the variable may be set on a line on its own. Also the the delimiter for the sed address must be changed so as not to interfere with the address, hence \#...#, and finally the shell variable should be enclosed in double quotes to allow full interpolation.

I'd recommend not doing it this way. Here's why.
Sed is not a programming language. It's a stream editor with some constructs that look and behave like a language, but it offers very little in the way of arbitrary string manipulation, format control, etc.
Sed only takes data from a file or stdin (also a file). Embedding strings within your sed script is asking for errors -- constructs like s/re/$output/ are destined to fail at some point, almost regardless of what workarounds you build into your sed script. The best solutions for making sed commands like this work is to do your input sanitization OUTSIDE of sed.
Which brings me to ... this may be the wrong tool for this job, or might be only one component of the toolset for the job.
The error you're getting is obviously because the sed command you're using is horribly busted. The substitute command is:
s/pattern/replacement/flags
but the command you're running is:
s/^/dev/linux*/$output/g
The pattern you're searching for is ^, the null at the beginning of the line. Your replacement pattern is dev, then you have a bunch of text that might be interpreted as flags. This plainly doesn't work, when your search string contains the same character that you're using as a delimiter to the options for the substitute command.
In regular expressions and in sed, you can escape things. You while you might get some traction with s/^\/dev\/linux.*/$output/, you'd still run into difficulty if $output contained slashes. If you're feeding this script to sed from bash, you could use ${output//\//\\\/}, but you can't handle those escapes within sed itself. Sed has no variables.
In a proper programming language, you'd have better separation of variable content and the commands used for the substitution.
output="/dev/sda/windows"
awk -v output="$output" '$1~/\/dev\/linux/ { $0=output } 1' file.txt
Note that I've used $1 here because in your question, your input lines (and output) appear to have a space at the beginning of each line. Awk automatically trims leading and trailing space when assigning field (positional) variables.
Or you could even do this in pure bash, using no external tools:
output="/dev/sda/windows"
while read -r line; do
[[ "$line" =~ ^/dev/linux ]] && line="$output"
printf '%s\n' "$line"
done < file.txt
This one isn't resilient in the face of leading whitespace. Salt to taste.
So .. yes, you can do this with sed. But the way commands get put together in sed makes something like this risky, and despite the available workarounds like switching your substitution command delimiter to another character, you'd almost certainly be better off using other tools.

Select lines between two patterns using variables inside SED command

I'm new to shell scripting. My requirement is to retrieve lines between two pattern, its working fine if I run it from the terminal without using variables inside sed cmd. But the problem arises when I put all those below cmd in a file and tried to execute it.
#!/bin/sh
word="ajp-qdcls2228.us.qdx.com%2F156.30.35.204-8009-34"
upto="2017-01-03 23:00"
fileC=`cat test.log`
output=`echo $fileC | sed -e "n/\$word/$upto/p"`
printf '%s\n' "$output"
If I use the below cmd in the terminal it works fine
sed -n '/ajp-qdcls2228.us.qdx.com%2F156.30.35.204-8009-34/,/2017-01-03 23:00/ p' test.log
Please suggest a workaround.

If we put aside for a moment the fact you shouldn't cat a file to a variable and then echo it for sed filtering, the reason why your command is not working is because you're not quoting the file content variable, fileC when echoing. This will munge together multiple whitespace characters and turn them into a single space. So, you're losing newlines from the file, as well as multiple spaces, tabs, etc.
To fix it, you can write:
fileC=$(cat test.log)
output=$(echo "$fileC" | sed -n "/$word/,/$upto/p")
Note the double-quotes around fileC (and a fixed sed expression, similar to your second example). Without the quotes (try echo $fileC), your fileC is expanded (with the default IFS) into a series of words, each being one argument to echo, and echo will just print those words separated with a single space. Additionally, if the file contains some of the globbing characters (like *), those patterns are also expanded. This is a common bash pitfall.
Much better would be to write it like this:
output=$(sed -n "/$word/,/$upto/p" test.log)
And if your patterns include some of the sed metacharacters, you should really escape them before using with sed, like this:
escape() {
sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1";
}
output=$(sed -n "/$(escape "$word")/,/$(escape "$upto")/ p" test.log)

The correct approach will be something like:
word="ajp-qdcls2228.us.qdx.com%2F156.30.35.204-8009-34"
upto="2017-01-03 23:00"
awk -v beg="$word" -v end="$upto" '$0==beg{f=1} f{print; if ($0==end) exit}' file
but until we see your sample input and output we can't know for sure what it is you need to match on (full lines, partial lines, all text on one line, etc.) or what you want to print (include delimiters, exclude one, exclude both, etc.).

convert this linux statement into a statement which is supported by windows command prompt

This is my statement supported by unix environment
"cat document.xml | grep \'<w:t\' | sed \'s/<[^<]*>//g\' | grep -v \'^[[:space:]]*$\'"
But I want to execute that statement in windows command prompt .
How do I do that? and what are the commands which are similar to cat, grep,sed .
please tell me the exact code supported for windows similar to above command

The double quotes around the pipeline in your question are a syntax error, and the backslashed single quotes should apparently really not have backslashes, but I assume it's just an artefact of a slightly imprecise presentation.
Here's what the code does.
cat document.xml |
This is a useless use of cat but its purpose is to feed the contents of this file into the pipeline.
grep '<w:t' |
This looks for lines containing the literal string <w:t (probably the start of a tag in the XML format in the file). The single quotes quote the string so that it is not interpreted by the shell (otherwise the < would be interpreted as a redirection operator); they are consumed by the shell, and not passed through to grep.
sed 's/<[^<]*>//g' |
This replaces every pair of open/close brokets with an empty string. The regular expression [^<]* matches zero or more occurrences of a character which can be anything except <. If the XML is well-formed, these should always occur in pairs, and so we effectively remove all XML tags.
grep -v '^[[:space:]]*$'
This removes any line which is empty or consists entirely of whitespace.
Because sed is a superset of grep, the program could easily be rephrased as a single sed script. Perhaps the easiest solution for your immediate problem would be to obtain a copy of sed for your platform.
sed -e '/<w:t/!d' -e 's/<[^<]*>//g' -e '/[^[:space]]/!d' document.xml
I understand quoting rules on Windows may be different; try with double quotes instead of single, or put the script in a file and use sed -f file document.xml where file contains the script itself, like this:
/<w:t/!d
s/<[^<]*>//g
/[^[:space]]/!d
This is a rather crude way to extract the CDATA from an XML document, anyway; perhaps some XML processor would be the proper way forward. E.g. xmlstarlet appears to be available for Windows. It works even if the XML input doesn't have the beginning and ending <w:t> tags on the same line, with nothing else on it. (In fact, parsing XML with line-oriented tools is a massive antipattern.)

May try with "powershell" ?
It is included since Win8 I think,
for sure on W10 it is.
I've just tested a "cat" command and it works.
"grep" don't but may be adapt like this :
PowerShell equivalent to grep -f
and
https://communary.wordpress.com/2014/11/10/grep-the-powershell-way/

The equivalent of grep on windows would be findstr and the equivalent of cat would be type.

Linux command to replace string in LARGE file with another string

I have a huge SQL file that gets executed on the server. The dump is from my machine and in it there are a few settings relating to my machine. So basically, I want every occurance of "c://temp" to be replace by "//home//some//blah"
How can this be done from the command line?

sed is a good choice for large files.
sed -i.bak -e 's%C://temp%//home//some//blah%' large_file.sql
It is a good choice because doesn't read the whole file at once to change it. Quoting the manual:
A stream editor is used to perform
basic text transformations on an input
stream (a file or input from a
pipeline). While in some ways similar
to an editor which permits scripted
edits (such as ed), sed works by
making only one pass over the
input(s), and is consequently more
efficient. But it is sed's ability to
filter text in a pipeline which
particularly distinguishes it from
other types of editors.
The relevant manual section is here. A small explanation follows
-i.bak enables in place editing leaving a backup copy with .bak extension
s%foo%bar% uses s, the substitution command, which
substitutes matches of first string
in between the % sign, 'foo', for the second
string, 'bar'. It's usually written as s//
but because your strings have plenty
of slashes, it's more convenient to
change them for something else so you
avoid having to escape them.
Example
vinko#mithril:~$ sed -i.bak -e 's%C://temp%//home//some//blah%' a.txt
vinko#mithril:~$ more a.txt
//home//some//blah
D://temp
//home//some//blah
D://temp
vinko#mithril:~$ more a.txt.bak
C://temp
D://temp
C://temp
D://temp

Just for completeness. In place replacement using perl.
perl -i -p -e 's{c://temp}{//home//some//blah}g' mysql.dmp
No backslash escapes required either. ;)

Try sed? Something like:
sed 's/c:\/\/temp/\/\/home\/\/some\/\/blah/' mydump.sql > fixeddump.sql
Escaping all those slashes makes this look horrible though, here's a simpler example which changes foo to bar.
sed 's/foo/bar/' mydump.sql > fixeddump.sql
As others have noted, you can choose your own delimiter, which would prevent the leaning toothpick syndrome in this case:
sed 's|c://temp\\|home//some//blah|' mydump.sql > fixeddump.sql
The clever thing about sed is that it operating on a stream rather than a file all at once, so you can process huge files using only a modest amount of memory.

There's also a non-standard UNIX utility, rpl, which does the exact same thing that the sed examples do; however, I'm not sure whether rpl operates streamwise, so sed may be the better option here.

The sed command can do that.
Rather than escaping the slashes, you can choose a different delimiter (_ in this case):
sed -e 's_c://temp/_/home//some//blah/_' file1.txt > file2.txt

perl -pi -e 's#c://temp#//home//some//blah#g' yourfilename
The -p will treat this script as a loop, it will read the specified file line by line running the regex search and replace.
-i This flag should be used in conjunction with the -p flag. This commands Perl to edit the file in place.
-e Just means execute this perl code.
Good luck

gawk
awk '{gsub("c://temp","//home//some//blah")}1' file

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string