Split bash string by newline characters - string

I found this.
And I am trying this:
x='some
thing'
y=(${x//\n/})
And I had no luck, I thought it could work with double backslash:
y=(${x//\\n/})
But it did not.
To test I am not getting what I want I am doing:
echo ${y[1]}
Getting:
some
thing
Which I want to be:
some
I want y to be an array [some, thing]. How can I do this?

Another way:
x=$'Some\nstring'
readarray -t y <<<"$x"
Or, if you don't have bash 4, the bash 3.2 equivalent:
IFS=$'\n' read -rd '' -a y <<<"$x"
You can also do it the way you were initially trying to use:
y=(${x//$'\n'/ })
This, however, will not function correctly if your string already contains spaces, such as 'line 1\nline 2'. To make it work, you need to restrict the word separator before parsing it:
IFS=$'\n' y=(${x//$'\n'/ })
...and then, since you are changing the separator, you don't need to convert the \n to space anymore, so you can simplify it to:
IFS=$'\n' y=($x)
This approach will function unless $x contains a matching globbing pattern (such as "*") - in which case it will be replaced by the matched file name(s). The read/readarray methods require newer bash versions, but work in all cases.

There is another way if all you want is the text up to the first line feed:
x='some
thing'
y=${x%$'\n'*}
After that y will contain some and nothing else (no line feed).
What is happening here?
We perform a parameter expansion substring removal (${PARAMETER%PATTERN}) for the shortest match up to the first ANSI C line feed ($'\n') and drop everything that follows (*).

Related

concatenate two strings and one variable using bash

I need to generate filename from three parts, two strings, and one variable.
for f in `cat files.csv`; do echo fastq/$f\_1.fastq.gze; done
files.csv has the following lines:
Sample_11
Sample_12
I need to generate the following:
fastq/Sample_11_1.fastq.gze
fastq/Sample_12_1.fastq.gze
My problem is that I got the below files:
_1.fastq.gze_11
_1.fastq.gze_12
the string after the variable deletes the string before it.
I appreciate any help
Regards
By the way your idiom: for f in cat files.csv should be avoid. Refer: Dangerous Backticks
while read f
do
echo "fastq/${f}/_1.fastq.gze"
done < files.csv
You can make it a one-liner with xargs and printf.
xargs printf 'fastq/%s_1.fastq.gze\n' <files.csv
The function of printf is to apply the first argument (the format string) to each argument in turn.
xargs says to run this command on as many files as it can fit onto the command line (splitting it up into multiple invocations if the input file is too large to fit all the arguments onto a single command line, subject to the ARG_MAX constant in your kernel).
Your best bet, generally, is to wrap the variable name in braces. So, in this case:
echo fastq/${f}_1.fastq.gz
See this answer for some details about the general concept, as well.
Edit: An additional thought looking at the now-provided output makes me think that this isn't a coding problem at all, but rather a conflict between line-endings and the terminal/console program.
Specifically, if the CSV file ends its lines with just a carriage return (ASCII/Unicode 13), the end of Sample_11 might "rewind" the line to the start and overwrite.
In that case, based loosely on this article, I'd recommend replacing cat (if you understandably don't want to re-architect the actual script with something like while) with something that will strip the carriage returns, such as:
for f in $(tr -cd '\011\012\040-\176' < temp.csv)
do
echo fastq/${f}_1.fastq.gze
done
As the cited article explains, Octal 11 is a tab, 12 a line feed, and 40-176 are typeable characters (Unicode will require more thinking). If there aren't any line feeds in the file, for some reason, you probably want to replace that with tr '\015' '\012', which will convert the carriage returns to line feeds.
Of course, at that point, better is to find whatever produces the file and ask them to put reasonable line-endings into their file...

expr bash for sed a line in log does not work

my goal is to sed the 100th line and convert it to a string, then separate the data of the sentence to word
#!/bin/bash
fid=log.txt;
sentence=`expr sed -n '100p' ${fid}`;
for word in $sentence
do
echo $word
done
but apparently this has failed.
expr: syntax error
would somebody please let me know what have I done wrong? previously for number it worked.
The expr does not seem to serve a useful purpose here, and if it did, a sed command would certainly not be a valid or useful thing to pass to it, under most circumstances. You should probably just take it out.
However, the following loop is also problematic. Unquoted variables in shell script are very frequently an error. In this case, you can't quote the thing you pass to the for loop (that would cause the loop to only run once, with the loop variable set to the quoted string) but you also cannot prevent the shell from performing wildcard expansion on the unquoted string. So if the string happened to contain *, the shell will expand that to a list of files in the current directory, for example.
Fortunately, this can all be done in an only slightly more complicated sed script.
sed '100!d;s/[ \t]\+/\n/g;q' "$fid"
That is, if the line number is not 100, delete this line and start over with the next line. Otherwise, we are at line 100; replace runs of whitespace with newlines, (print) and quit.
(The backslash escape codes \t and \n are not universally portable; and \+ for repetition is also an optional extension. I believe there are also sed variants which dislike semicolon as a command separator. Consult your sed manual page, experiment, and if everything else fails, maybe switch to Awk or Perl. Just in case, here is a version which works even on Mac OSX:
sed '100!d
s/[ ][ ]*/\
/g;q' log.txt
The stuff inside the square brackets are a space and a literal tab; in Bash, with default keybindings, type ctrl-V, tab to produce a literal tab.)
Incidentally, this also gets rid of the variable capture antipattern. There are good reasons to capture output to a variable, but if it can be avoided, you often end up with a simpler, more robust and efficient, as well as more idiomatic and elegant script. (I see no reason to put the log file name in a variable, either, in this isolated case; but in a larger script, it might make sense.)
I don't think you need expr command in this case.
expr is used to do calculations. Something like:
expr 1 + 1
Just this one is fine:
sentence=`sed -n '100p' ${fid}`;
#!/bin/bash
fid=log.txt;
sentence=$(sed -n '100p' ${fid});
for word in $sentence
do
echo $word
done
put a dollar sign and parenthesis solve the problem

Bash string manipulation -- removing characters?

I'm having a heck of a time removing characters in Bash. I have a string that's formatted like temp=53.0'C. I want to remove everything thats not 53.0.
I'm normally a Python programmer, and the way I'd do this in Python would be to split the string into an array of characters, and remove the unnecessary elements, before putting the array back onto string form.
But I can't figure out how to do that in Bash.
How do I remove the desired characters?
You can use Bash parameter substitution like this:
a="temp=53.0'C"
a=${a/*=/} # Remove everything up to and including = sign
a=${a/\'*/} # Remove single quote and everything after it
echo $a
53.0
Further examples are available here.
You could use sed with a regex which corresponds to the format of the string you want to be returned:
$ var="temp=53.0'C"
$ echo "$var" | sed -r 's/.*=([0-9][0-9]\.[0-9]).*/\1/g'
53.0
What exactly are the "rules" around what your original string looks like, and what the section to output looks like?
Same thing with BASH_REMATCH
> [[ $tmp =~ [0-9]+\.[0-9]+ ]] && echo ${BASH_REMATCH[0]}
53.0
Also agree with Josh but would improve the pattern match to consider the full range of floating point numbers.
.*=[ ]*([0-9]*\.[0-9]+)[cC].*
If you do not understand the pattern above, take the time to find out. Learning pattern matching will be one of the most useful things you ever do.
Test your pattern with something like http://www.freeformatter.com/regex-tester.html and then tailor for the platform you are using (e.g. Unix will probably need the brackets escaped with a backslash)

A Linux Shell Script Problem

I have a string separated by dot in Linux Shell,
$example=This.is.My.String
I want to
1.Add some string before the last dot, for example, I want to add "Good.Long" before the last dot, so I get:
This.is.My.Goood.Long.String
2.Get the part after the last dot, so I will get
String
3.Turn the dot into underscore except the last dot, so I will get
This_is_My.String
If you have time, please explain a little bit, I am still learning Regular Expression.
Thanks a lot!
I don't know what you mean by 'Linux Shell' so I will assume bash. This solution will also work in zsh, etcetera:
example=This.is.My.String
before_last_dot=${example%.*}
after_last_dot=${example##*.}
echo ${before_last_dot}.Goood.Long.${after_last_dot}
This.is.My.Goood.Long.String
echo ${before_last_dot//./_}.${after_last_dot}
This_is_My.String
The interim variables before_last_dot and after_last_dot should explain my usage of the % and ## operators. The //, I also think is self-explanatory but I'd be happy to clarify if you have any questions.
This doesn't use sed (or even regular expressions), but bash's inbuilt parameter substitution. I prefer to stick to just one language per script, with as few forks as possible :-)
Other users have given good answers for #1 and #2. There are some disadvantages to some of the answers for #3. In one case, you have to run the substitution twice. In another, if your string has other underscores they might get clobbered. This command works in one go and only affects dots:
sed 's/\(.*\)\./\1\n./;h;s/[^\n]*\n//;x;s/\n.*//;s/\./_/g;G;s/\n//'
It splits the line before the last dot by inserting a newline and copies the result into hold space:
s/\(.*\)\./\1\n./;h
removes everything up to and including the newline from the copy in pattern space and swaps hold space and pattern space:
s/[^\n]*\n//;x
removes everything after and including the newline from the copy that's now in pattern space
s/\n.*//
changes all dots into underscores in the copy in pattern space and appends hold space onto the end of pattern space
s/\./_/g;G
removes the newline that the append operation adds
s/\n//
Then the sed script is finished and the pattern space is output.
At the end of each numbered step (some consist of two actual steps):
Step Pattern Space Hold Space
This.is.My\n.String This.is.My\n.String
This.is.My\n.String .String
This.is.My .String
This_is_My\n.String .String
This_is_My.String .String
Solution
Two versions of this, too:
Complex: sed 's/\(.*\)\([.][^.]*$\)/\1.Goood.Long\2/'
Simple: sed 's/.*\./&Goood.Long./' - thanks Dennis Williamson
What do you want?
Complex: sed 's/.*[.]\([^.]*\)$/\1/'
Simpler: sed 's/.*\.//' - thanks, glenn jackman.
sed 's/\([^.]*\)[.]\([^.]*[.]\)/\1_\2/g'
With 3, you probably need to run the substitute (in its entirety) at least twice, in general.
Explanation
Remember, in sed, the notation \(...\) is a 'capture' that can be referenced as '\1' or similar in the replacement text.
Capture everything up to a string starting with a dot followed by a sequence of non-dots (which you also capture); replace by what came before the last dot, the new material, and the last dot and what came after it.
Ignore everything up to the last dot followed by a capture of a sequence of non-dots; replace with the capture only.
Find and capture a sequence of non-dots, a dot (not captured), followed by a sequence of non-dots and a dot; replace the first dot with an underscore. This is done globally, but the second and subsequent matches won't touch anything already matched. Therefore, I think you need ceil(log2N) passes, where N is the number of dots to be replaced. One pass deals with 1 dot to replace; two passes deals with 2 or 3; three passes deals with 4-7, and so on.
Here's a version that uses Bash's regex matching (Bash 3.2 or greater).
[[ $example =~ ^(.*)\.(.*)$ ]]
echo ${BASH_REMATCH[1]//./_}.${BASH_REMATCH[2]}
Here's a Bash version that uses IFS (Internal Field Separator).
saveIFS=$IFS
IFS=.
array=($e) # * split the string at each dot
lastword=${array[#]: -1}
unset "array[${#array}-1]" # *
IFS=_
echo "${array[*]}.$lastword" # The asterisk as a subscript when inside quotes causes IFS (an underscore in this case) to be inserted between each element of the array
IFS=$saveIFS
* use declare -p array after these steps to see what the array looks like.
1.
$ echo 'This.is.my.string' | sed 's}[^\.][^\.]*$}Good Long.&}'
This.is.my.Good Long.string
before: a dot, then no dot until the end. after: obvious, & is what matched the first part
2.
$ echo 'This.is.my.string' | sed 's}.*\.}}'
string
sed greedy matches, so it will extend the first closure (.*) as far as possible i.e. to the last dot.
3.
$ echo 'This.is.my.string' | tr . _ | sed 's/_\([^_]*\)$/\.\1/'
This_is_my.string
convert all dots to _, then turn the last _ to a dot.
(caveat: this will turn 'This.is.my.string_foo' to 'This_is_my_string.foo', not 'This_is_my.string_foo')
You don't need regular expressions at all (those complex things hurt my eyes!) if you use Awk and are a little creative.
1. echo $example| awk -v ins="Good.long" -F . '{OFS="."; $NF = ins"."$NF;print}'
What this does:
-v ins="Good.long" tells awk to create a variable called 'ins' with "Good.long" as content,
-F . tells awk to use the dot as a separator for your fields for input,
-OFS tells awk to use the dot as a separator for your fields as output,
NF is the number of fields, so $NF represents the last field,
the $NF=... part replaces the last field, it appends the current last string to what you want to insert (the variable called "ins" declared earlier).
2. echo $example| awk -F . '{print $NF}'
$NF is the last field, so that's all!
3. echo $example| awk -F . '{OFS="_"; $(NF-1) = $(NF-1)"."$NF; NF=NF-1; print}'
Here we have to be creative, as Awk AFAIK doesn't allow deleting fields. Of course, we set the output field separateor to underscore.
$(NF-1) = $(NF-1)"."$NF: First, we replace the second last field with the last glued to the second last, with a dot between.
Then, we fool awk to make it think the Number of fields is equal to the number of fields minus one, hence deleting the last field!
Note you can't say $NF="", because then it would display two underscores.

sed: Find pattern over two lines, not replace after that pattern

Wow, this one has really got me. Gonna need some tricky sed skill here I think. Here is the output value of command text I'm trying to replace:
...
fast
n : abstaining from food
The value I'd like to replace it with, is:
...
Noun
: abstaining from food
This turns out to be tricker that I thought. Because 'fast' is listed a number of times and because it is listed in other places at the beginning of the line. So I came up with this to define the range:
sed '/fast/,/^ n : / s/fast/Noun/'
Which I thought would do, but... Unfortunately, this doesn't end the replacement and the rest of the output following this match are replaced with Noun. How to get sed to stop replacement after the match? Even better, can I find a two line pattern match and replace it?
Try this:
sed "h; :b; \$b ; N; /^${1}\n n/ {h;x;s//Noun\n/; bb}; \$b ; P; D"
Unfortunately, Paul's answer reads the whole file in which makes any additional processing you might want to do difficult. This version reads the lines in pairs.
By enclosing the sed script in double quotes instead of single quotes, you can include shell variables such as positional parameters. I would recommend surrounding them with curly braces so they are set apart from the adjacent characters. When using double quotes, you'll have to be careful of the shell wanting to do its various expansions. In this example, I've escaped the dollar signs that signify the last line of the input file for the branch commands. Otherwise the shell will try to substitute the value of a variable $b which is likely to be null thus making sed unhappy.
Another technique would be to use single quotes and close and open them each time you have a shell variable:
sed 'h; :b; $b ; N; /^'${1}'\n n/ {h;x;s//Noun\n/; bb}; $b ; P; D'
# ↑open close↑ ↑open close↑
I'm assuming that the "[/code]" in your expected result is a typo. Let me know if it's not.
This seems to do what you want:
sed -e ':a;N;$!ba;s/fast\n n/Noun\n/'
I essentially stole the answer from here.
This might work for you:
sed '$!N;s/^fast\n\s*n :/Noun\n :/;P;D' file
...
Noun
: abstaining from food

Resources