How to delete every character after a space? - linux

I am making a script that creates a backup of my /home/ directory called backup.sh. When the backup completes I want the script to spit out the size of the backup in megabytes. Here are the lines I am having trouble with:
# creates an approximate size of the file + the file location
backup_text=$(du $new_backup)
# take off the file name at the end and add an 'M' to specify Megabytes
backup_text=${backup_text%[:blank:]*}M
# print string to console
echo $backup_text
Here is the output I keep getting:
20 /backups/Thu_Aug_22_15:52M
As you can see, the backup size is 20M, which is correct, but the /backups/... part remains. What did I do wrong in my script?
Sorry, probably a noob question, just starting scripting =)

Replacement-Pattern Expansion
There are a number of ways to deal with this with Bash pattern matching, but I'd use replacement expansion with extglob. For example:
$ shopt -u extglob
$ backup_text='foo bar'
$ echo ${backup_text/+([[:blank:]])*/}
foo

Double the braces on your character class:
backup_text=${backup_text%[[:blank:]]*}M
The whole bracket expression ([:blank:]) counts as a single "character" (token) within a character class ([...]), so you need the brackets for both.

Use -h option to get the size in a nice format instead of adding 'M'
Also, you don't need any magic to cut out the filename. Just get the first word!
And do not forget the quotes around "$new_backup". This is very important because things will go wild if your $new_backup contains a space.
sizeStr=$(du -h "$new_backup" | awk '{print $1}')

Related

How to extract string in shell script

I have file names like Tarun_Verma_25_02_2016_10_00_10.csv. How can I extract the string like 25_02_2016_10_00_10 from it in shell script?
It is not confirmed that how many numeric parts there would be after "firstName"_"lastName"
A one-line solution would be preferred.
with sed
$ echo Tarun_Verma_25_02_2016_10_00_10.csv | sed -r 's/[^0-9]*([0-9][^.]*)\..*/\1/'
25_02_2016_10_00_10
extract everything between the first digit and dot.
If you want some control over which parts you pick out (assuming the format is always like <firstname>_<lastname>_<day>_<month>_<year>_<hour>_<minute>_<second>.csv) awk would be pretty handy
echo "Tarun_Verma_25_02_2016_10_00_10.csv" | awk -F"[_.]" 'BEGIN{OFS="_"}{print $3,$4,$5,$6,$7,$8}'
Here awk splits by both underscore and period, sets the Output Field Seperator to an underscore, and then prints the parts of the file name that you are interested in.
ksh93 supports the syntax bash calls extglobs out-of-the-box. Thus, in ksh93, you can do the following:
f='Tarun_Verma_25_02_2016_10_00_10.csv'
f=${f##+([![:digit:]])} # trim everything before the first digit
f=${f%%+([![:digit:]])} # trim everything after the last digit
echo "$f"
To do the same in bash, you'll want to run the following command first
shopt -s extglob
Since this uses shell-native string manipulation, it runs much more quickly than invoking an external command (sed, awk, etc) when processing only a single line of input. (When using ksh93 rather than bash, it's quite speedy even for large inputs).

Bash separation of line with newlines instead of spaces

I did two following commands:
With the first one I listed content of directory and stored it in variable.
Second one shows content of variable.
Now I decided that I want to separate listing not with spaces but with newlines, I do the following:
I get a mess. Why?
It's worth to note that when I changed command so, it worked as I wanted:
Could someone please explain, why 0x20 or 32 ( I tried this number too) is not treated Bash as space in this case?
tr simply doesn't recognize hex but octal. This would work:
tr '\040' '\n'
And the easier way to show your files is
shopt -s nullglob ## Optional.
printf '%s\n' *
The problem with tr '\0x20' is, tr is treating all the character sequence as literal characters. And the characters are 0, x, 2. Note all of theese characters were replaced in the output by \n. That's why you have .t instead of txt. Also 2 didn't appear too.
This is not bash, its tr which is making you unhappy. If you really want to iterate over file names there are better ways to do that.
for f in *; do
# do work with $f. But always use quotes. Like `"$f"`
done

Writing a bash script that writes a variable to the end of a line in another file

I am trying to add a line to a bash script that does a bunch of other stuff and what I want it to do is write to the end of a line in another file. I have a file with a line of IP addresses which is all one line. This script that I have written asks for user input and one of those things it asks for is an IP address. That gets stored as a variable inside_ip I want to write that to the end of a line in another file. I found a similar question and the solution was
sed -i.bck '$s/$/yourText2/' list.txt
I tried to put that in a file with
sed -i.bck '$s/$/ $inside_ip/' list.txt
but it actually writes $inside_ip to the end of the file, so I just need it to print the variable.
Does the following work for you?
echo $inside_ip >> list.txt
use "'s instead of ''s as in sed -i.bck "s/$/ $inside_ip/" list.txt
Single quotes stop variables from being expanded. Double quotes allow them to be expanded. Hence:
sed -i.bck '$s/$/ '"$inside_ip/" list.txt
That protects the $s in single quotes; you want sed to see the $ and the s, not the value of your (probably unset) shell variable $s. Of course, if the file only contains one line, then the leading $ is not critical; you could leave it out, or replace it with 1. The /$/ would be left alone anyway, but the double quotes following expand the variable, preserving any spaces inside it (though IP addresses don't usually contain spaces).

How do I insert the results of several commands on a file as part of my sed stream?

I use DJing software on linux (xwax) which uses a 'scanning' script (visible here) that compiles all the music files available to the software and outputs a string which contains a path to the filename and then the title of the mp3. For example, if it scans path-to-mp3/Artist - Test.mp3, it will spit out a string like so:
path-to-mp3/Artist - Test.mp3[tab]Artist - Test
I have tagged all my mp3s with BPM information via the id3v2 tool and have a commandline method for extracting that information as follows:
id3v2 -l name-of-mp3.mp3 | grep TBPM | cut -D: -f2
That spits out JUST the numerical BPM to me. What I'd like to do is prepend the BPM number from the above command as part of the xwax scanning script, but I'm not sure how to insert that command in the midst of the script. What I'd want it to generate is:
path-to-mp3/Artist - Test.mp3[tab][bpm]Artist - Test
Any ideas?
It's not clear to me where in that script you want to insert the BPM number, but the idea is this:
To embed the output of one command into the arguments of another, you can use the "command substitution" notation `...` or $(...). For example, this:
rm $(echo abcd)
runs the command echo abcd and substitutes its output (abcd) into the overall command; so that's equivalent to just rm abcd. It will remove the file named abcd.
The above doesn't work inside single-quotes. If you want, you can just put it outside quotes, as I did in the above example; but it's generally safer to put it inside double-quotes (so as to prevent some unwanted postprocessing). Either of these:
rm "$(echo abcd)"
rm "a$(echo bc)d"
will remove the file named abcd.
In your case, you need to embed the command substitution into the middle of an argument that's mostly single-quoted. You can do that by simply putting the single-quoted strings and double-quoted strings right next to each other with no space in between, so that Bash will combine them into a single argument. (This also works with unquoted strings.) For example, either of these:
rm a"$(echo bc)"d
rm 'a'"$(echo bc)"'d'
will remove the file named abcd.
Edited to add: O.K., I think I understand what you're trying to do. You have a command that either (1) outputs out all the files in a specified directory (and any subdirectories and so on), one per line, or (2) outputs the contents of a file, where the contents of that file is a list of files, one per line. So in either case, it's outputting a list of files, one per line. And you're piping that list into this command:
sed -n '
{
# /[<num>[.]] <artist> - <title>.ext
s:/\([0-9]\+.\? \+\)\?\([^/]*\) \+- \+\([^/]*\)\.[A-Z0-9]*$:\0\t\2\t\3:pi
t
# /<artist> - <album>[/(Disc|Side) <name>]/[<ABnum>[.]] <title>.ext
s:/\([^/]*\) \+- \+\([^/]*\)\(/\(disc\|side\) [0-9A-Z][^/]*\)\?/\([A-H]\?[A0-9]\?[0-9].\? \+\)\?\([^/]*\)\.[A-Z0-9]*$:\0\t\1\t\6:pi
t
# /[<ABnum>[.]] <name>.ext
s:/\([A-H]\?[A0-9]\?[0-9].\? \+\)\?\([^/]*\)\.[A-Z0-9]*$:\0\t\t\2:pi
}
'
which runs a sed script over that list. What you want is for all of the replacement-strings to change from \0\t... to \0\tBPM\t..., where BPM is the BPM number computed from your command. Right? And you need to compute that BPM number separately for each file, so instead of relying on seds implicit line-by-line looping, you need to handle the looping yourself, and process one line at a time. Right?
So, you should change the above command to this:
while read -r LINE ; do # loop over the lines, saving each one as "$LINE"
BPM=$(id3v2 -l "$LINE" | grep TBPM | cut -D: -f2) # save BPM as "$BPM"
sed -n '
{
# /[<num>[.]] <artist> - <title>.ext
s:/\([0-9]\+.\? \+\)\?\([^/]*\) \+- \+\([^/]*\)\.[A-Z0-9]*$:\0\t'"$BPM"'\t\2\t\3:pi
t
# /<artist> - <album>[/(Disc|Side) <name>]/[<ABnum>[.]] <title>.ext
s:/\([^/]*\) \+- \+\([^/]*\)\(/\(disc\|side\) [0-9A-Z][^/]*\)\?/\([A-H]\?[A0-9]\?[0-9].\? \+\)\?\([^/]*\)\.[A-Z0-9]*$:\0\t'"$BPM"'\t\1\t\6:pi
t
# /[<ABnum>[.]] <name>.ext
s:/\([A-H]\?[A0-9]\?[0-9].\? \+\)\?\([^/]*\)\.[A-Z0-9]*$:\0\t'"$BPM"'\t\t\2:pi
}
' <<<"$LINE" # take $LINE as input, rather than reading more lines
done
(where the only change to the sed script itself was to insert '"$BPM"'\t in a few places to switch from single-quoting to double-quoting, then insert the BPM, then switch back to single-quoting and add a tab).

Perl line runs 30 times quicker with single quotes than with double quotes

We have a task to change some strings in binary files to lowercase (from mixed/upper/whatever). The relevant strings are references to the other files (it's in connection with an upgrade where we are also moving from Windows to linux as a server environment, so the case suddenly matters). We have written a script which uses a perl loop to do this. We have a directory containing around 300 files (total size of the directory is around 150M) so it's some data but not huge amounts.
The following perl code takes about 6 minutes to do the job:
for file_ref in `ls -1F $forms6_convert_dir/ | grep -v "/" | sed 's/\(.*\)\..*/\1/'`
do
(( updated++ ))
write_line "Converting case of string: $file_ref "
perl -i -pe "s{(?i)$file_ref}{$file_ref}g" $forms6_convert_dir/*
done
while the following perl code takes over 3 hours!
for file_ref in `ls -1F $forms6_convert_dir/ | grep -v "/" | sed 's/\(.*\)\..*/\1/'`
do
(( updated++ ))
write_line "Converting case of string: $file_ref "
perl -i -pe 's{(?i)$file_ref}{$file_ref}g' $forms6_convert_dir/*
done
Can anyone explain why? Is it that the $file_ref is getting left as the string $file_ref instead of substituted with the value in the single quotes version? in which case, what is it replacing in this version? What we want is to replace all occurances of any filename with itself but in lowercase. If we run strings on the files before and after and search for the filenames then both appeared to have made the same changes. However, if we run diff on the files produced by the two loops (diff firstloop/file1 secondloop/file1) then it reports that they differ.
This is running from within a bash script on linux.
The shell doesn't do variable substitution for single quoted strings. So, the second one is a different program.
As the other answers said, the shell doesn't substitute variables inside single quotes, so the second version is executing the literal Perl statement s{(?i)$file_ref}{$file_ref}g for every line in every file.
As you said in a comment, if $ is the end-of-line metacharacter, $file_ref could never match anything. $ matches before the newline at end-of-line, so the next character would have to be a newline. Therefore, Perl doesn't interpret $ as the metacharacter; it interprets it as the beginning of a variable interpolation.
In Perl, the variable $file_ref is undef, which is treated as the empty string when interpolated. So you're really executing s{(?i)}{}g, which says to replace the empty string with the empty string, and do that for all occurrences in a case-insensitive manner. Well, there's an empty string between every pair of characters, plus one at the beginning and end of each line. Perl is finding each one and replacing it with the empty string. This is a no-op, but it's an expensive one, hence the 3-hour run time.
You must be mistaken about both versions making the same changes. As I just explained, the single-quoted version is just an expensive no-op; it doesn't make any changes at all to the file contents (it just makes a fresh copy of each file). The files you ran it on must have already been converted to lower case.
With double quotes you are using the shell variable, with single quotes Perl is trying to use a variable of that name.
You might wish to consider writing the whole lot in either Perl or Bash to speed things up. Both languages can read files and do pattern matching. In Perl you can change to lower-case using the lc built-in function, and in Bash 4 you can use ${file,,}.

Resources