How to get Substring from Filename in Unix shellscripting

How to get Substring from Filename in Unix shellscripting - linux

I want a shell script to get MMDDYYYY from the file with a name as mentioned below file
linuxbox.23566.MMDDYYYYHHMMSS.zip

Using bash string functions:
for file in *.zip; do
file="${file%.*}"
file="${file##*.}"
echo "${file:0:8}"
done
Explaination:
file="${file%.*}": Gets rid of the extension and stores the new name in file variable
file="${file##*.}": Gets rid of the longest match from beginning and stores the name in file variable
echo "${file:0:8}": echoes the first 8 characters of whats left.
Demo:
$ ls
linuxbox.23566.MMDDYYYYHHMMSS.zip
$ for file in *; do file="${file%.*}"; file="${file##*.}"; echo "${file:0:8}"; done
MMDDYYYY

With cut:
$ cut -d. -f3 <<< "linuxbox.23566.MMDDYYYYHHMMSS.zip" | cut -c-8
MMDDYYYY
Because the first part is returning:
$ cut -d. -f3 <<< "linuxbox.23566.MMDDYYYYHHMMSS.zip"
MMDDYYYYHHMMSS
And then it gets the first 8 chars.

Related

Piping into a part of bash command stored in variable [duplicate]

This question already has answers here:
Conditional step in a pipeline
(2 answers)
Can I make a shell function in as a pipeline conditionally "disappear", without using cat?
(1 answer)
Closed 4 months ago.
EMPTY_VAR=''
MMDDYYYY='6.18.1997'
PIPE_VAR=' | xargs echo "1+" | bc'
echo "$MMDDYYYY" | cut -d "." -f 2${EMPTY_VAR}
>> 18
Command above would give me correct output, which is 18, but if I try to use PIPE_VAR instead it would give me bunch of errors:
echo "$MMDDYYYY" | cut -d "." -f 2${PIPE_VAR}
cut: '|': No such file or directory
cut: xargs: No such file or directory
cut: echo: No such file or directory
cut: '"1+"': No such file or directory
cut: '|': No such file or directory
cut: bc: No such file or directory
OR:
echo "$MMDDYYYY" | cut -d "." -f 2"$PIPE_VAR"
cut: invalid field value ‘| xargs echo "1+" | bc’
Try 'cut --help' for more information.
What I'm really trying to find out is that even possible to combine commands like this?

You can't put control operators like | in a variable, at least not without resorting to something like eval. Syntax parsing comes before parameter expansion when evaluating the command line, so Bash is only ever going to see that | as a literal character and not pipeline syntax. See BashParsing for more details.
Conditionally adding a pipeline is hard to do well, but having a part of the pipeline conditionally execute one command or another is more straightforward. It might look something like this:
#!/bin/bash
MMDDYYYY='6.18.1997'
echo "$MMDDYYYY" | cut -d "." -f 2 |
if some_conditional_command ; then
xargs echo "1+" | bc
else
cat
fi

It looks like you're trying to calculate the next day. That's hard to do with plain arithmetic, particularly with month/year ends.
Let date do the work. This is GNU date. It can't parse 6.18.1997 but it can parse 6/18/1997
for MMDDYYYY in '2.28.1996' '2.28.1997'; do
date_with_slashes=${MMDDYYYY//./\/}
next_day=$(date -d "$date_with_slashes + 1 day" '+%-m.%-d.%Y')
echo "$next_day"
done
2.29.1996
3.1.1997

UNIX: Grep a specific word and all the text following it

I have a variable in Unix, that stores multiple lines of alpha-numeric characters. I want to grep to a specific word and get all the text following it.
For example, $Variable contains:
Hello, User
Your files are:
File1 : Exists
File2 : None
Let us say I want to find File2, which is the last line and I want if it is Yes or None or whatever text is present after the colon and save it to another variable.

Use sed instead
sed -n '/the word you are looking for/,$p' <file name>
or since you said it was in a variable something more like:
echo "$variable" | sed -n '/the word you are looking for/,$p'
sed -n says do not print.
the pattern says from "the word you are looking for" to $ which is the end of file do the p command which is print :)
If you have to stop before the end of the file then you have to replace $ with the end pattern
If you just want to save the results to another variable:
new_variable=$(echo "$variable" | sed -n '/the word you are looking for/,$p')
Also note that is the string you are looking for has / in it then you must escape it with \ so it would look like
new_variable=$(echo "$variable" | sed -n '/the word you are\/ looking for/,$p')

So you have a variable defined as:
$ var="abc\ndef\nghi\njkl\nmn"
Then, if you want to print "line" containing "ghi" and following this way:
$ echo -e $var | sed -n '/ghi/,$p'

grep is to Globally search for a Regular Expression and Print the matching string. That is not what you want to do, you want to take a Stream of input and EDit it to output part of it. Guess what tool does THAT in UNIX.
$ echo "$var"
Hello, User
Your files are:
File1 : Exists
File2 : None
$ var2=$(echo "$var" | sed -n 's/^File2 : //p')
$ echo "$var2"
None

Given:
variable="Hello, User
Your files are:
File1 : Exists
File2 : None"
You can get the information for File2 into another variable file2 using:
file2=$(echo "$variable" | sed -n '/File2/ s/File2 *: *//p')
The double quotes preserve newlines in the variable. The -n suppresses the default printing. The pattern matches the line containing File2 followed by any number of spaces, a colon and any number of additional spaces; it is replaced by nothing, and the remainder of the line is printed by sed and that is captured in the variable file2. If there can be spaces in front of File2 in the data, you can arrange to match and remove them too.

Extract part of a file name in bash

I have a folder with lots of files having a pattern, which is some string followed by a date and time:
BOS_CRM_SUS_20130101_10-00-10.csv (3 strings before date)
SEL_DMD_20141224_10-00-11.csv (2 strings before date)
SEL_DMD_SOUS_20141224_10-00-10.csv (3 strings before date)
I want to loop through the folder and extract only the part before the date and output into a file.
Output
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_
This is my script but it is not working
#!/bin/bash
# script variables
FOLDER=/app/list/l088app5304d1/socles/Data/LEMREC/infa_shared/Shell/Check_Header_T24/
LOG_FILE=/app/list/l088app5304d1/socles/Data/LEMREC/infa_shared/Shell/Check_Header_T24/log
echo "Starting the programme at: $(date)" >> $LOG_FILE
# Getting part of the file name from FOLDER
for file in `ls $FOLDER/*.csv`
do
mv "${file}" "${file/date +%Y%m%d HH:MM:SS}" 2>&1 | tee -a $LOG_FILE
done #> $LOG_FILE

Use sed with extended-regex and groups to achieve this.
cat filelist | sed -r 's/(.*)[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv/\1/'
where filelist is a file with all the names you care about. Of course, this is just a placeholder because I don't know how you are going to list all eligible files. If a glob will do, for example, you can do
ls mydir/*.csv | sed -r 's/(.*)[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv/\1/'

Assuming you wont have numbers in the first part, you could use:
$ for i in *csv;do str=$(echo $i|sed -r 's/[0-9]+.*//'); echo $str; done
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_
Or with parameter substitution:
$ for i in *csv;do echo ${i%_*_*}_; done
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_

When you use ${var/pattern/replace}, the pattern must be a filename glob, not command to execute.
Instead of using the substitution operator, use the pattern removal operator
mv "${file}" "${file%_*-*-*.csv}.csv"
% finds the shortest match of the pattern at the end of the variable, so this pattern will just match the date and time part of the filename.

The substitution:
"${file/date +%Y%m%d HH:MM:SS}"
is unlikely to do anything, because it doesn't execute date +%Y%m%d HH:MM:SS. It just treats it as a pattern to search for, and it's not going to be found.
If you did execute the command, though, you would get the current date and time, which is also (apparently) not what you find in the filename.
If that pattern is precise, then you can do the following:
echo "${file%????????_??-??-??.csv}" >> "$LOG_FILE"

using grep:
ls *.csv | grep -Po "\K^([A-Za-z]+_)+"
output:
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_

Assigning a variable after the contents are 'cut' in bash

I am iterating through a folder of files using bash, but I need to cut the preceding path. For instance if I have this '/temp/test/filename' I want to cut off the '/temp/test/' and store the file name to a variable so I can write a log with the filename in it.
Can anyone help me out? The problem is that the variable temp is always empty.
Here is my bash code:
#!/bin/bash
for file in /temp/test/*
do
if [[ ! -f "$file" ]]
then
continue
fi
temp="$file"|cut -d'/' -f3
$file > /var/log/$temp$(date +%Y%m%d%H%M%S).log
done
exit

Try that :
$ x=/temp/test/filename
$ echo ${x##*/}
filename
Another solution is to use basename :
$ basename /temp/test/filename
filename
The first solution is a parameter expansion and it's a bash builtin, so we increase performance.
Your line temp="$file"|cut -d'/' -f3 is broken.
when you want to store the output of a command in a variable, you should do var=$(command)
you need to pass the value to the STDIN of the command with a here-string (<<<) or with echo value | command
finally, if you'd want to use cut :
$ temp=$(cut -d/ -f4 <<< /temp/test/filename)
$ echo $temp
filename

Add a prefix string to beginning of each line

I have a file as below:
line1
line2
line3
And I want to get:
prefixline1
prefixline2
prefixline3
I could write a Ruby script, but it is better if I do not need to.
prefix will contain /. It is a path, /opt/workdir/ for example.

# If you want to edit the file in-place
sed -i -e 's/^/prefix/' file
# If you want to create a new file
sed -e 's/^/prefix/' file > file.new
If prefix contains /, you can use any other character not in prefix, or
escape the /, so the sed command becomes
's#^#/opt/workdir#'
# or
's/^/\/opt\/workdir/'

awk '$0="prefix"$0' file > new_file
In awk the default action is '{print $0}' (i.e. print the whole line), so the above is equivalent to:
awk '{print "prefix"$0}' file > new_file
With Perl (in place replacement):
perl -pi 's/^/prefix/' file

You can use Vim in Ex mode:
ex -sc '%s/^/prefix/|x' file
% select all lines
s replace
x save and close

If your prefix is a bit complicated, just put it in a variable:
prefix=path/to/file/
Then, you pass that variable and let awk deal with it:
awk -v prefix="$prefix" '{print prefix $0}' input_file.txt

Here is a hightly readable oneliner solution using the ts command from moreutils
$ cat file | ts prefix | tr -d ' '
And how it's derived step by step:
# Step 0. create the file
$ cat file
line1
line2
line3
# Step 1. add prefix to the beginning of each line
$ cat file | ts prefix
prefix line1
prefix line2
prefix line3
# Step 2. remove spaces in the middle
$ cat file | ts prefix | tr -d ' '
prefixline1
prefixline2
prefixline3

If you have Perl:
perl -pe 's/^/PREFIX/' input.file

Using & (the whole part of the input that was matched by the pattern”):
cat in.txt | sed -e "s/.*/prefix&/" > out.txt
OR using back references:
cat in.txt | sed -e "s/\(.*\)/prefix\1/" > out.txt

Using the shell:
#!/bin/bash
prefix="something"
file="file"
while read -r line
do
echo "${prefix}$line"
done <$file > newfile
mv newfile $file

While I don't think pierr had this concern, I needed a solution that would not delay output from the live "tail" of a file, since I wanted to monitor several alert logs simultaneously, prefixing each line with the name of its respective log.
Unfortunately, sed, cut, etc. introduced too much buffering and kept me from seeing the most current lines. Steven Penny's suggestion to use the -s option of nl was intriguing, and testing proved that it did not introduce the unwanted buffering that concerned me.
There were a couple of problems with using nl, though, related to the desire to strip out the unwanted line numbers (even if you don't care about the aesthetics of it, there may be cases where using the extra columns would be undesirable). First, using "cut" to strip out the numbers re-introduces the buffering problem, so it wrecks the solution. Second, using "-w1" doesn't help, since this does NOT restrict the line number to a single column - it just gets wider as more digits are needed.
It isn't pretty if you want to capture this elsewhere, but since that's exactly what I didn't need to do (everything was being written to log files already, I just wanted to watch several at once in real time), the best way to lose the line numbers and have only my prefix was to start the -s string with a carriage return (CR or ^M or Ctrl-M). So for example:
#!/bin/ksh
# Monitor the widget, framas, and dweezil
# log files until the operator hits <enter>
# to end monitoring.
PGRP=$$
for LOGFILE in widget framas dweezil
do
(
tail -f $LOGFILE 2>&1 |
nl -s"^M${LOGFILE}> "
) &
sleep 1
done
read KILLEM
kill -- -${PGRP}

Using ed:
ed infile <<'EOE'
,s/^/prefix/
wq
EOE
This substitutes, for each line (,), the beginning of the line (^) with prefix. wq saves and exits.
If the replacement string contains a slash, we can use a different delimiter for s instead:
ed infile <<'EOE'
,s#^#/opt/workdir/#
wq
EOE
I've quoted the here-doc delimiter EOE ("end of ed") to prevent parameter expansion. In this example, it would work unquoted as well, but it's good practice to prevent surprises if you ever have a $ in your ed script.

Here's a wrapped up example using the sed approach from this answer:
$ cat /path/to/some/file | prefix_lines "WOW: "
WOW: some text
WOW: another line
WOW: more text
prefix_lines
function show_help()
{
IT=$(CAT <<EOF
Usage: PREFIX {FILE}
e.g.
cat /path/to/file | prefix_lines "WOW: "
WOW: some text
WOW: another line
WOW: more text
)
echo "$IT"
exit
}
# Require a prefix
if [ -z "$1" ]
then
show_help
fi
# Check if input is from stdin or a file
FILE=$2
if [ -z "$2" ]
then
# If no stdin exists
if [ -t 0 ]; then
show_help
fi
FILE=/dev/stdin
fi
# Now prefix the output
PREFIX=$1
sed -e "s/^/$PREFIX/" $FILE

You can also achieve this using the backreference technique
sed -i.bak 's/\(.*\)/prefix\1/' foo.txt
You can also use with awk like this
awk '{print "prefix"$0}' foo.txt > tmp && mv tmp foo.txt

Using Pythonize (pz):
pz '"preix"+s' <filename

Simple solution using a for loop on the command line with bash:
for i in $(cat yourfile.txt); do echo "prefix$i"; done
Save the output to a file:
for i in $(cat yourfile.txt); do echo "prefix$i"; done > yourfilewithprefixes.txt

You can do it using AWK
echo example| awk '{print "prefix"$0}'
or
awk '{print "prefix"$0}' file.txt > output.txt
For suffix: awk '{print $0"suffix"}'
For prefix and suffix: awk '{print "prefix"$0"suffix"}'

For people on BSD/OSX systems there's utility called lam, short for laminate. lam -s prefix file will do what you want. I use it in pipelines, eg:
find -type f -exec lam -s "{}: " "{}" \; | fzf
...which will find all files, exec lam on each of them, giving each file a prefix of its own filename. (And pump the output to fzf for searching.)

If you need to prepend a text at the beginning of each line that has a certain string, try following. In the following example, I am adding # at the beginning of each line that has the word "rock" in it.
sed -i -e 's/^.*rock.*/#&/' file_name

SETLOCAL ENABLEDELAYEDEXPANSION
YourPrefix=blabla
YourPath=C:\path
for /f "tokens=*" %%a in (!YourPath!\longfile.csv) do (echo !YourPrefix!%%a) >> !YourPath!\Archive\output.csv

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to get Substring from Filename in Unix shellscripting - linux

I want a shell script to get MMDDYYYY from the file with a name as mentioned below file linuxbox.23566.MMDDYYYYHHMMSS.zip

With cut: $ cut -d. -f3 <<< "linuxbox.23566.MMDDYYYYHHMMSS.zip" | cut -c-8 MMDDYYYY Because the first part is returning: $ cut -d. -f3 <<< "linuxbox.23566.MMDDYYYYHHMMSS.zip" MMDDYYYYHHMMSS And then it gets the first 8 chars.

Related

Piping into a part of bash command stored in variable [duplicate]

UNIX: Grep a specific word and all the text following it

Extract part of a file name in bash

Assigning a variable after the contents are 'cut' in bash

Add a prefix string to beginning of each line

Categories

Resources