Substring string in bash - linux

I got this string:
xavier.blodot
wisoyo.hadi
And I want this output: firstletter+lastname
xblodot
whadi
Regards!!

You should use this approach:
sed -r 's:(.).*\.(.+):\1\.\2:g' {YOUR_FILE.TXT}

Another bit smaller sed:
sed -E 's/^(.)[^.]+\./\1/' file
xblodot
whadi
Or using awk:
awk -F. '{print substr($1,1,1) $2}' file
xblodot
whadi

The title says in bash, so I’m assuming it must be in bash only (without using sed, awk or other external processes):
while read ns; do
echo "${ns::1}${ns#*.}"
done < the_input_file.txt
Making it more resilient, if needed, is up to you. It depends on how much you (dis)trust the input. This may include, for example, IFS= read -r ns, a check that [[ "$ns" == +([a-z]).+([a-z]) ]], and arbitrary other consistency checks.

name=xavier.blodot
shortened_name="${name:0:1}${name##*.}"
You would have to catch the case, when the name does not contain a period.

Related

Extract just file path from string

I have a file that contains strings in this format:
MD5 (TestImages/IMG_0627.JPG) = 6ed611b3e777c5f7b729fa2f2412d656
I am trying to figure out a way to extract the file path, so that I would get a string like this:
TestImages/IMG_0627.JPG
For a different part of my script, I am using this code to remove everything before and after the brackets, and I could of course do something similar, however I'm sure there is a better way?
shortFile=${line#*MD5 }
shortFile=${shortFile%%)*}
Anyone have any suggestions?
You could use sed but that has the overhead of starting a new process.
echo $line | sed -r 's/MD5 \((.*)\).*/\1/'
Just to throw a non-sed answer onto the pile. (Also slightly cheaper since it avoids the pipeline and sub-shell.)
awk -F '[()]' '{print $2}' <<<"$line"
That said the substring expansion option is a reasonable one if it does what you need. (Though it looks like you missed the ( in the first expansion.)
Another way with cut can be :
echo $line|cut -d "(" -f2|cut -d ")" -f1
sed -e 's/^.*(\([^)]*\)).*$/\1/' < infile.txt

Append text to file without line breaking

On a Linux machine, I have list of IPs as follows:
107.6.38.55
108.171.207.62
108.171.244.138
108.171.246.87
I want to use some function to add the word "or" at the end of each line without breaking each line, like this:
107.6.38.55 or
108.171.207.62 or
108.171.244.138 or
108.171.246.87 or
Every implementation I have experimented with in sed or awk has given me incorrect results as it keeps trying to line break or add input in strange spots. What is the easiest way to achieve this goal?
With awk '$0=$0" or"' and the sed suggestions I've tried thus far I get the following formatting:
107.6.38.55
or
108.171.207.62
or
108.171.244.138
or
108.171.246.87
or
Not sure what you have been trying but the following works for me on Ubuntu 12.04
awk '{print $0" or"}'
Or as fedorqui suggests
awk '$0=$0" or"'
Or as glenn jackman suggests
awk '{print $0, "or"}'
[EDIT]
It turns out the OP's file had CRLF line breaks so dos2unix had to be run first to address the format issue
The following two worked for me:
sed 's/.*/& or/'
sed 's/$/ or/'
Or use ed, the standard text editor:
With bash you can use the lovely here-strings together with ANSI-C quotings
ed -s filename <<< $',s/.$/& or/\nwq'
or a pipe with printf
printf "%s\n" ',s/.$/& or/' 'wq' | ed -s filename
or if you like echo better
{ echo ',s/.$/& or/'; echo "wq"; } | ed -s filename
or interactively (if you love question marks):
$ ed filename
,s/.$/& or/
wq
Remark. I'm using the substitution s/.$/& or/ and not s/$/ or/ just so as not to append or in an empty line.

Running awk on file, with regular expressions

I would like to find all occurrences of INPUT in a file, JUST INPUT. I have the following, but it finds everything with INPUT*
awk '{for(i=1;i<=NF;i++){if($i~/^INPUT/){print $i}}}'
I would like to support that though, so if I have INPUT* or INPUT? or INPUT. (any regular expression) instead of INPUT in the above, it should work for that.
Anyone know how to fix the above to do that? Thanks.
I'm trying to do the following in a perl script using $INPUT
`awk '{for(i=1;i<=NF;i++){if($i~/^$INPUT$/){print $i}}}' $file`
but I can't get it to work any ideas?
If you want to use backticks, then escape all dollar signs (assuming you have something, e.g., 'INPUT' in $INPUT)::
`awk '{for(i=1;i<=NF;i++){if(\$i~/^$INPUT\$/){print \$i}}}' $file | wc -l`;
awk can count the number of matches for you too (this one counts once one per line):
`awk '/\y$INPUT\y/{s++} END{print s}' $file`;
and using native Perl, which is recommended:
my $cnt;
open my $f, "<", "input" or die("$!");
while (<$f>) {
$cnt++ while /\bINPUT\b/g;
}
close $f;
print $cnt;
The regular expression you use is achored at the beginning ^ but not the end $. Try:
awk '{for(i=1;i<=NF;i++){if($i~/^INPUT$/){print $i}}}'
If you want to match INPUT anywhere in the field, try:
awk '{for(i=1;i<=NF;i++){if($i~/INPUT/){print $i}}}'

How to stop sed from buffering?

I have a program that writes to fd3 and I want to process that data with grep and sed. Here is how the code looks so far:
exec 3> >(grep "good:"|sed -u "s/.*:\(.*\)/I got: \1/")
echo "bad:data1">&3
echo "good:data2">&3
Nothing is output until I do a
exec 3>&-
Then, everything that I wanted finally arrives as I expected:
I got: data2
It seems to reply immediately if I use only a grep or only a sed, but mixing them seems to cause some sort of buffering. How can I get immediate output from fd3?
I think I found it. For some reason, grep doesn't automatically do line buffering. I added a --line-buffered option to grep and now it responds immediately.
You only need to tell grep and sed to not bufferize lines:
grep --line-buffered
and
sed -u
An alternate means to stop sed from buffering is to run it through the s2p sed-to-Perl translator and insert a directive to have it command-buffered, perhaps like
BEGIN { $| = 1 }
The other reason to do this is that it gives you the more convenient notation from EREs instead of the backslash-annoying legacy BREs. You also get the full complement of Unicode properties, which is often critical.
But you don’t need the translator for such a simple sed command. And you do not need both grep and sed, either. These all work:
perl -nle 'BEGIN{$|=1} if (/good:/) { s/.*:(.*)/I got: $1/; print }'
perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:(.*)/I got: $1/; print'
perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:/I got: /; print'
Now you also have access to the minimal quantifier, *?, +?, ??, {N,}?, and {N,M}?. These now allow things like .*? or \S+? or [\p{Pd}.]??, which may well be preferable.
You can merge the grep into the sed like so:
exec 3> >(sed -une '/^good:/s//I got: /p')
echo "bad:data1">&3
echo "good:data2">&3
Unpacking that a bit: You can put a regexp (between slashes as usual) before any sed command, which makes it only be applied to lines that match that regexp. If the first regexp argument to the s command is the empty string (s//whatever/) then it will reuse the last regexp that matched, which in this case is the prefix, so that saves having to repeat yourself. And finally, the -n option tells sed to print only what it is specifically told to print, and the /p suffix on the s command tells it to print the result of the substitution.
The -e option is not strictly necessary but is good style, it just means "the next argument is the sed script, not a filename".
Always put sed scripts in single quotes unless you need to substitute a shell variable in there, and even then I would put everything but the shell variable in single quotes (the shell variable is, of course, double-quoted). You avoid a bunch of backslash-related grief that way.
On a Mac, brew install coreutils and use gstdbuf to control buffering of grep and sed.
Turn off buffering in pipe seems to be the easiest and most generic answer. Using stdbuf (coreutils) :
exec 3> >(stdbuf -oL grep "good:" | sed -u "s/.*:\(.*\)/I got: \1/")
echo "bad:data1">&3
echo "good:data2">&3
I got: data2
Buffering has other dependencies, for example depending on mawk either gawk reading this pipe :
exec 3> >(stdbuf -oL grep "good:" | awk '{ sub(".*:", "I got: "); print }')
In that case, mawk would retain the input, gawk wouldn't.
See also How to fix stdio buffering

how to read each line from a .dat file in unix?

trade.dat is my file which consists of lines of data.
i have to concatanate each line of that file with comma (,)
help me please
If you mean just add a comma to the end of each line:
sed 's/$/,/' <oldfile >newfile
If you mean join all lines together into one line, separating each with a comma:
awk '{printf "%s,",$0}' <oldfile >newfile
Or the more correct one without a trailing comma (thanks, #hacker, for pointing out the error):
awk 'BEGIN {s=""} {printf "%s%s",s,$0;s=","}' <oldfile >newfile
If you want the output of any of those in a shell variable, simply use the $() construct, such as:
str=$(awk 'BEGIN {s=""} {printf "%s%s",s,$0;s=","}' <oldfile)
I find it preferable to use $() rather than backticks since it allows me to nest commands, something backticks can't do.
Two obligatory perl versions (credit goes to William Pursell for the second one):
perl -i -p -e 'chomp($_); $_ = "$_,\n"' trade.dat
perl -i -p -e 's/$/,/' trade.dat
Note that
this does not make backups of the original file by default (use -i.bak for that).
this answer appends a comma to every line. To join all lines together into a single line, separated by commas, look at William Purcell's answer.
tryfullline=""
for line in $(cat trade.dat)
do
fullline="$fullline,$line"
done And then use $fullline to show youe file concatenated
hope this'll helps ;p
perl -pe 's/\n/,/ unless eof'
First thing that comes into my head:
gawk -- '{ if(a) { printf ",%s",$0; } else { printf "%s",$0; a=1 } }' trade.dat
if I correctly understand what you want.
Answering the question in the title, one way to get each line in a variable in a loop in BASH is to:
cat file.dat | while read line; do echo -n "$line",; done
That will leave a trailing comma, but shows how to read each line.
But clearly a sed or awk or perl solutions are the best suited to the problem described in the body of your question.

Resources