how to restrict length of string present in a line using linux - linux

I have data of the following form:
num1 This is a string
num2 This is another string
I want to limit length of all strings which are after the first tab..such that length(string)<4. Therefore, the output which I get is:
num1 This is a string
num2 This is another
I can do this using python. But I am trying to find a linux equivalent in order to achieve the same.

In bash, you can use the following to limit the string, in this case, from index 0 to index 17.
$ var="this is a another string"
$ echo ${var:0:17}
this is a another

Using awk, by columns :
$ awk '{print $1, $2, $3, $4}' file
or with sed :
sed -r 's#^(\S+\s+\S+\s+\S+\s+\S+).*#\1#' file
or by length using cut :
$ cut -c 1-23 file

If you'd like to truncate strings on word boundaries, you could use fold with the -s option:
awk -F"\t" '{
printf "%s\t", $1; system(sprintf("fold -sw 17 <<< \"%s\" | sed q", $2))
}'
The drawback is fold and sed need to be called for each line (sed q is the same as tail -n1).

Related

Select subdomains using print command

cat a.txt
a.b.c.d.e.google.com
x.y.z.google.com
rev a.txt | awk -F. '{print $2,$3}' | rev
This is showing:
e google
x google
But I want this output
a.b.c.d.e.google
b.c.d.e.google
c.d.e.google
e.google
x.y.z.google
y.z.google
z.google
With your shown samples, please try following awk code. Written and tested in GNU awk should work in any awk.
awk '
BEGIN{
FS=OFS="."
}
{
nf=NF
for(i=1;i<(nf-1);i++){
print
$1=""
sub(/^[[:space:]]*\./,"")
}
}
' Input_file
Here is one more awk solution:
awk -F. '{while (!/^[^.]+\.[^.]+$/) {print; sub(/^[^.]+\./, "")}}' file
a.b.c.d.e.google.com
b.c.d.e.google.com
c.d.e.google.com
d.e.google.com
e.google.com
x.y.z.google.com
y.z.google.com
z.google.com
Using sed
$ sed -En 'p;:a;s/[^.]+\.(.*([^.]+\.){2}[[:alpha:]]+$)/\1/p;ta' input_file
a.b.c.d.e.google.com
b.c.d.e.google.com
c.d.e.google.com
d.e.google.com
e.google.com
x.y.z.google.com
y.z.google.com
z.google.com
Using bash:
IFS=.
while read -ra a; do
for ((i=${#a[#]}; i>2; i--)); do
echo "${a[*]: -i}"
done
done < a.txt
Gives:
a.b.c.d.e.google.com
b.c.d.e.google.com
c.d.e.google.com
d.e.google.com
e.google.com
x.y.z.google.com
y.z.google.com
z.google.com
(I assume the lack of d.e.google.com in your expected output is typo?)
For a shorter and arguably simpler solution, you could use Perl.
To auto-split the line on the dot character into the #F array, and then print the range you want:
perl -F'\.' -le 'print join(".", #F[0..$#F-1])' a.txt
-F'\.' will auto-split each input line into the #F array. It will split on the given regular expression, so the dot needs to be escaped to be taken literally.
$#F is the number of elements in the array. So #F[0..$#F-1] is the range of elements from the first one ($F[0]) to the penultimate one. If you wanted to leave out both "google" and "com", you would use #F[0..$#F-2] etc.

sed only print substring in a string

I am trying to get a substring in a string that is in a large line of data.
The regex (INC............) matches the substring I am trying to get the value of at https://regexr.com/, but I am unable to get the value of the substring into a variable or print it out.
The part of the string around this value is
......TemplateID2":null,"Incident Number":"INC000006743193","Priority":"High","mc_ueid":null,"Assint......
I am getting the error char 26: unknown option to `s' when I try this or the entire string is printed out.
cat /tmp/file1 | sed -n 's/\(INC............\)/\1/p'
cat /tmp/file1 | sed -n 's/./*\(INC............).*/\1/'
Using sed, you need to remove what precedes and follows the string:
sed 's/.*\(INC............\).*/\1/' file
But you can also use grep, if your implementation supports the -o option:
grep -o 'INC............' file
Perl can be used, too:
perl -lne 'print $1 if /(INC............)/' file
That looks like JSON. If it's got {braces} around it which you cut out before posting (tsk tsk), you should definitely use jq if it's available. That said, this page needs some awk!
POSIX (works everywhere):
awk 'match($0, /INC[^"]+/) {print substr($0, RSTART, RLENGTH)}' /tmp/file1`
GNU (works on GNU/Linux):
gawk 'match($0, /INC[^"]+/, a) {print a[0]}' /tmp/file1
If you have more than one match per line (GNU):
gawk '{while(match($0=substr($0, RSTART+RLENGTH), /INC[0-9]+/, a)) print a[0]}' /tmp/file1

search for a string and after getting result cut that word and store result in variable

I Have a file name abc.lst i ahve stored that in a variable it contain 3 words string among them i want to grep second word and in that i want to cut the word from expdp to .dmp and store that into variable
example:-
REFLIST_OP=/tmp/abc.lst
cat $REFLIST_OP
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
Desired Output:-
expdp_TEST_P119_*_18112017.dmp
I Have tried below command :-
FULL_DMP_NAME=`cat $REFLIST_OP|grep /orabackup|awk '{print $2}'`
echo $FULL_DMP_NAME
/data/abc/GOon/expdp_TEST_P119_*_18112017.dmp
REFLIST_OP=/tmp/abc.lst
awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
Test Results:
$ REFLIST_OP=/tmp/abc.lst
$ cat "$REFLIST_OP"
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
$ awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
expdp_TEST_P119_*_18112017.dmp
To save in variable
myvar=$( awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP" )
Following awk may help you on same.
awk -F'/| ' '{print $6}' Input_file
OR
awk -F'/| ' '{print $6}' "$REFLIST_OP"
Explanation: Simply making space and / as a field separator(as per your shown Input_file) and then printing 6th field of the line which is required by OP.
To see the field number and field's value you could use following command too:
awk -F'/| ' '{for(i=1;i<=NF;i++){print i,$i}}' "$REFLIST_OP"
Using sed with one of these regex
sed -e 's/.*\/\([^[:space:]]*\).*/\1/' abc.lst capture non space characters after /, printing only the captured part.
sed -re 's|.*/([^[:space:]]*).*|\1|' abc.lst Same as above, but using different separator, thus avoiding to escape the /. -r to use unescaped (
sed -e 's|.*/||' -e 's|[[:space:]].*||' abc.lst in two steps, remove up to last /, remove from space to end. (May be easiest to read/understand)
myvar=$(<abc.lst); myvar=${myvar##*/}; myvar=${myvar%% *}; echo $myvar
If you want to avoid external command (sed)

Extract substring of string if position is known

first, I need to extract the substring by a known position in the file.txt
file.txt in bash, but starting from the second line
>header
cgatgcgctctgtgcgtgcgtgcg
so let's assume I want position 10 from the second line, the output should be:
c
second, I want to include the surrounding ±5 characters, resulting in
gcgctctgtgc
{ read -r; read -r; echo "${REPLY:9:1}"; echo "${REPLY:4:11}"; } < file.txt
Output:
c
gcgctctgtgc
The ${parameter:offset:length} syntax for substrings is explained in https://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion.
The read command is explained in https://www.gnu.org/software/bash/manual/bashref.html#index-read.
Input redirection: https://www.gnu.org/software/bash/manual/bashref.html#Redirections.
With awk:
To get the character at position 10, 1-indexed:
awk 'NR==2 {print substr($0, 10, 1)}'
NR==2 is checking if the record is second, if so the statements inside {} would be executed
substr($0, 10, 1) will extract 1 character starting from position 10 from field $0 (the whole record) i.e. only the 10-th character will be extracted. The format for substr() is substr(field, offset, length).
Similarly, to get ±5 characters around 10-th:
awk 'NR==2 {print substr($0, (10-5), 11)}'
(10-5) instead of 5 is just to give you the idea of the stuffs.
Example:
% cat file.txt
>header
cgatgcgctctgtgcgtgcgtgcg
% awk 'NR==2 {print substr($0, 10, 1)}' file.txt
c
% awk 'NR==2 {print substr($0, (10-5), 11)}' file.txt
gcgctctgtgc
use sed and cut:
sed -n '2p' file|cut -c 5-15
sed for access 2nd line and cut for print desired characters

awk system does not take hyphens

I want to redirect the output of some command to awk and use system call in awk. But Awk does not accept flags with hyphen. For example, Lets say I have bunch of files, and I want to "cat" them. I would use ls -1 | awk '{ system(" cat " $1)}'
Now, if I want to print the line number also with -n then it does not work ls -1 | awk '{ system(" cat -n" $1)}'
You need a space between -n and the file name:
ls -1 | awk '{ system(" cat -n " $1)}'
Notes
-1 is not needed. ls implicitly prints 1 file per line when its output goes to a pipe.
Any file name with whitespace in it will cause this code to fail.
Parsing the output of ls is generally a bad idea. Both find and the shell offer superior handling of difficult file names.
John1024's helpful answer fixes your problem and contains helpful advice, but let me focus on the syntax aspects:
As a command string, cat -n <file> requires at least 1 space (or tab) between the n, which is an option, and <file>, which is an operand.
String concatenation works differently in awk than in the shell:
" cat -n" $1, despite the presence of a space between " cat -n" and $1, does not insert that space in the resulting string, because awk's string concatenation works by directly joining strings placed next to one another irrespective of intervening whitespace.
For instance, the following commands all yield string literal ab, irrespective of any whitespace between the operands of the string concatenation:
awk 'BEGIN { print "a""b" }'
awk 'BEGIN { print "a" "b" }'
awk 'BEGIN { s = "b"; print "a"s }'
awk 'BEGIN { s = "b"; print "a" s }'
this is not a proper use case for awk, you're better off with something like this
find . -maxdepth 1 -type f -exec cat -n {} \;

Resources