I have this file as:
The number is %d0The number is %d1The number is %d2The number is %d3The number is %d4The number is %d5The number is %d6The...
The number is %d67The number is %d68The number is %d69The number is %d70The number is %d71The number is %d72The....
The number is %d117The number is %d118The number is %d119The number is %d120The number is %d121The number is %d122
I want to pad it like:
The number is %d0 The number is %d1 The number is %d2 The number is %d3 The number is %d4 The number is %d5 The number is %d6
The number is %d63 The number is %d64 The number is %d65 The number is %d66 The number is %d67 The number is %d68 The number is %d69
d118The number is %d119The number is %d120The number is %d121The number is %d122The number is %d123The number is %d124The
Please tell me how to do it through shell script
I am working on Linux
Edit:
This single command pipeline should do what you want:
sed 's/\(d[0-9]\+\)/\1 /g;s/\(d[0-9 ]\{3\}\) */\1/g' test2.txt >test3.txt
# ^ three spaces here
Explanation:
For each sequence of digits following a "d", add three spaces after it. (I'll use "X" to represent spaces.)
d1 becomes d1XXX
d10 becomes d10XXX
d100 becomes d100XXX
Now (the part after the semicolon), capture every "d" and the next three character which must be digits or spaces and output them but not any spaces beyond.
d1XXX becomes d1XX
d10XXX becomes d10X
d100XXX becomes d100
If you want to wrap the lines as you seem to show in your sample data, then do this instead:
sed 's/\(d[0-9]\+\)/\1 /g;s/\(d[0-9 ]\{3\}\) */\1/g' test2.txt | fold -w 133 >test3.txt
You may need to adjust the argument of the fold command to make it come out right.
There's no need for if, grep, loops, etc.
Original answer:
First of all, you really need to say which shell you're using, but since you have elif and fi, I'm assuming it's Bourne-derived.
Based on that assumption, your script makes no sense.
The parentheses for the if and elif are unnecessary. In this context, they create a subshell which serves no purpose.
The sed commands in the if and elif say "if the pattern is found, copy hold space (it's empty, by the way) to pattern space and output it and output all other lines.
The first sed command will always be true so the elif will never be executed. sed always returns true unless there's an error.
This may be what you intended:
if grep -Eqs 'd[0-9]([^0-9]|$)' test2.txt; then
sed 's/\(d[0-9]\)\([^0-9]\|$\)/\1 \2/g' test2.txt >test3.txt
elif grep -Eqs 'd[0-9][0-9]([^0-9]|$)' test2.txt; then
sed 's/\(d[0-9][0-9]\)\([^0-9]\|$\)/\1 \2/g' test2.txt >test3.txt
else
cat test2.txt >test3.txt
fi
But I wonder if all that could be replaced by something like this one-liner:
sed 's/\(d[0-9][0-9]?\)\([^0-9]\|$\)/\1 \2/g' test2.txt >test3.txt
Since I don't know what test2.txt looks like, part of this is only guessing.
Related
I'm currently approaching Linux and stumbled upon something I don't really understand.
I have a already stated command going:
echo "12345"|wc –w|tr "123" "321"
The output of this command is 3, so I thought that it might count how many of these numbers have change, but after some testing I came up with a conclusion that in fact it shows the first number in second tr argument, since it worked in many cases.
For a while I thought I was done with my experiments since I got the whole idea, but I've found a specific case:
echo "46817"|wc -w|tr "46817" "64194" which outputs in 9 and I don't have any idea why.
What does the whole command outputs in not certain cases?
The last command tr changes numbers in the score of second command. So as wc command counts words in first argument (is equal to 1) than last command changes intiger 1 to 9.
echo "12345"|wc –w|tr "123" "321" (outputs 3)
echo "46817"|wc -w|tr "46817" "64194" (outputs 9)
The above commands are pipes in which the output of each command is fed to the next one. Commands are separated by "|" (symbol named, surprise!, "pipe"). Both commands do:
echo: outputs something (to wc).
wc: counts characters, or words, or lines. "wc -w" counts words, so it will output "1" because "12345" and "46817" are words not containing any word separator.
tr: "translates", i.e. changes the characters it receives with other ones. When specifying "123" "321" the 1's (first char in 123) is translated in 3 (the first char of 321); the 2's (second char in 123) are translated into 2 (second char in 321) and so on.
In both commands tr receives "1" as input, and turns that "1" in some other character.
I have file- abc.txt, in below format-
a:,b:,c:,d:,e:,f:,g:
a:0;b:,c:3,d:,e:,f:,g:1
a:9,b:8,c:6,d:5,e:2,f:,g:
a:0;b:,c:2,d:1,e:,f:,g:
Now in unix, I want to get only those rows where this regular expression :[0-9] (colon followed by any number) exists more than 2 times.
Or in other words show rows where at least 3 attributes have numerical values present.
Output should be only 2nd and 3rd row
a:0;b:,c:3,d:,e:,f:,g:1
a:9,b:8,c:6,d:5,e:2,f:,g:
With basic grep:
grep '\(:[[:digit:]].*\)\{3,\}' file
:[[:digit:]].* matches a colon followed by a digit and zero or more arbitrary characters. This expressions is put into a sub pattern: \(...\). The expression \{3,\} means that the previous expression has to occur 3 or more times.
With extended posix regular expressions this can be written a little simpler, without the need to escape ( and {:
grep -E '(:[[:digit:]].*){3,}' file
$ awk -F':[0-9]' 'NF>3' file
a:0;b:,c:3,d:,e:,f:,g:1
a:9,b:8,c:6,d:5,e:2,f:,g:
a:0;b:,c:2,d:1,e:,f:,g:
perl -nE '/:[0-9](?{$count++})(?!)/; print if $count > 2; $count=0' input
perl -ne 'print if /(.*?\:\d.*?){2,}/' yourfile
This matches rows having character:number twice or more times.
https://regex101.com/r/tRWtbY/1
I want to add a symbol " >>" at the end of 1st line and then 5th line and then so on. 1,5,9,13,17,.... I was searching the web and went through below article but I'm unable to achieve it. Please help.
How can I append text below the specific number of lines in sed?
retentive
good at remembering
The child was very sharp, and her memory was extremely retentive.
— Rowlands, Effie Adelaide
unconscionable
greatly exceeding bounds of reason or moderation
For generations in the New York City public schools, this has become the norm with devastating consequences rooted in unconscionable levels of student failure.
— New York Times (Nov 4, 2011)
Output should be like-
retentive >>
good at remembering
The child was very sharp, and her memory was extremely retentive.
— Rowlands, Effie Adelaide
unconscionable >>
greatly exceeding bounds of reason or moderation
For generations in the New York City public schools, this has become the norm with devastating consequences rooted in unconscionable levels of student failure.
— New York Times (Nov 4, 2011)
You can do it with awk:
awk '{if ((NR-1) % 5) {print $0} else {print $0 " >>"}}'
We check if line number minus 1 is a multiple of 5 and if it is we output the line followed by a >>, otherwise, we just output the line.
Note: The above code outputs the suffix every 5 lines, because that's what is needed for your example to work.
You can do it multiple ways. sed is kind of odd when it comes to selecting lines but it's doable. E.g.:
sed:
sed -i -e 's/$/ >>/;n;n;n;n' file
You can do it also as perl one-liner:
perl -pi.bak -e 's/(.*)/$1 >>/ if not (( $. - 1 ) % 5)' file
You're thinking about this wrong. You should append to the end of the first line of every paragraph, don't worry about how many lines there happen to be in any given paragraph. That's just:
$ awk -v RS= -v ORS='\n\n' '{sub(/\n/," >>&")}1' file
retentive >>
good at remembering
The child was very sharp, and her memory was extremely retentive.
— Rowlands, Effie Adelaide
unconscionable >>
greatly exceeding bounds of reason or moderation
For generations in the New York City public schools, this has become the norm with devastating consequences rooted in unconscionable levels of student failure.
— New York Times (Nov 4, 2011)
This might work for you (GNU sed):
sed -i '1~4s/$/ >>/' file
There's a couple more:
$ awk 'NR%5==1 && sub(/$/,">>>") || 1 ' foo
$ awk '$0=$0(NR%5==1?">>>":"")' foo
Here is a non-numeric way in Awk. This works if we have an Awk that supports the RS variable being more than one character long. We break the data into records based on the blank line separation: "\n\n". Inside these records, we break fields on newlines. Thus $1 is the word, $2 is the definition, $3 is the quote and $4 is the source:
awk 'BEGIN {OFS=FS="\n";ORS=RS="\n\n"} $1=$1" >>"'
We use the same output separators as input separators. Our only pattern/action step is then to edit $1 so that it has >> on it. The default action is { print }, which is what we want: print each record. So we can omit it.
Shorter: Initialize RS from catenation of FS.
awk 'BEGIN {OFS=FS="\n";ORS=RS=FS FS} $1=$1" >>"'
This is nicely expressive: it says that the format uses two consecutive field separators to separate records.
What if we use a flag, initially reset, which is reset on every blank line? This solution still doesn't depend on a hard-coded number, just the blank line separation. The rule fires on the first line, because C evaluates to zero, and then after every blank line, because we reset C to zero:
awk 'C++?1:$0=$0" >>";!NF{C=0}'
Shorter version of accepted Awk solution:
awk '(NR-1)%5?1:$0=$0" >>"'
We can use a ternary conditional expression cond ? then : else as a pattern, leaving the action empty so that it defaults to {print} which of course means {print $0}. If the zero-based record number is is not congruent to 0, modulo 5, then we produce 1 to trigger the print action. Otherwise we evaluate `$0=$0" >>" to add the required suffix to the record. The result of this expression is also a Boolean true, which triggers the print action.
Shave off one more character: we don't have to subtract 1 from NR and then test for congruence to zero. Basically whenever the 1-based record number is congruent to 1, modulo 5, then we want to add the >> suffix:
awk 'NR%5==1?$0=$0" >>":1'
Though we have to add ==1 (+3 chars), we win because we can drop two parentheses and -1 (-4 chars).
We can do better (with some assumptions): Instead of editing $0, what we can do is create a second field which contains >> by assigning to the parameter $2. The implicit print action will print this, offset by a space:
awk 'NR%5==1?$2=">>":1'
But this only works when the definition line contains one word. If any of the words in this dictionary are compound nouns (separated by space, not hyphenated), this fails. If we try to repair this flaw, we are sadly brought back to the same length:
awk 'NR%5==1?$++NF=">>":1'
Slight variation on the approach: Instead of trying to tack >> onto the record or last field, why don't we conditionally install >>\n as ORS, the output record separator?
awk 'ORS=(NR%5==1?" >>\n":"\n")'
Not the tersest, but worth mentioning. It shows how we can dynamically play with some of these variables from record to record.
Different way for testing NR == 1 (mod 5): namely, regexp!
awk 'NR~/[16]$/?$0=$0" >>":1'
Again, not tersest, but seems worth mentioning. We can treat NR as a string representing the integer as decimal digits. If it ends with 1 or 6 then it is congruent to 1, mod 5. Obviously, not easy to modify to other moduli, not to mention computationally disgusting.
I have a file with almost 5*(10^6) lines of integer numbers. So, my file is big enough.
The question is all about extract specific lines, filtering them by a condition.
For example, I'd like to:
Extract the N first lines without read entire file.
Extract the lines with the numbers less or equal X (or >=, <=, <, >)
Extract the lines with a condition related a number (math predicate)
Is there a cleaver way to perform these tasks? (using sed or awk or cat or head)
Thanks in advance.
To extract the first $NUMBER lines,
head -n $NUMBER filename
Assuming every line contains just a number (although it will also work if the first token is one), 2 can be solved like this:
awk '$1 >= 1234 && $1 < 5678' filename
And keeping in spirit with that, 3 is just the extension
awk 'condition' filename
It would have helped if you had specified what condition is supposed to be, though. This way, you'll have to read the awk documentation to find out how to code it. Again, the number will be represented by $1.
I don't think I can explain anything about the head call, it's really just what it says on the tin. As for the awk lines: awk, like sed, works linewise. awk fetches lines in a loop and applies your code to each line. This code takes the form
condition1 { action1 }
condition2 { action2 }
# and so forth
For every line awk fetches, the conditions are checked in the order they appear, and the associated action to each condition is performed if the condition is true. It would, for example, have been possible to extract the first $NUMBER lines of a file with awk like this:
awk -v number="$NUMBER" '1 { print } NR == number { exit }' filename
where 1 is synonymous with true (like in C) and NR is the line number. The -v command line option initializes the awk variable number to $NUMBER. If no action is specified, the default action is { print }, which prints the whole line. So
awk 'condition' filename
is shorthand for
awk 'condition { print }' filename
...which prints every line where the condition holds.
I have strings in a file in below format:
fixedstring_1
fixedstring_23
fixedstring_456
...
fixedstring_[1 to n digits]
I tried with grep -E "fixedstring_[.....n times]" filepath in terminal. But, failed.
I want commands to get the count (-c) and list the lines.
If I understand correctly, given the following file...
fixedstring_1
bar
fixedstring_456
foo
fixedstring_45622
fixedstring_
fixedstring
You want to match (and get the count of) only these lines:
fixedstring_1
fixedstring_456
fixedstring_45622
This should work:
grep -Ec 'fixedstring_[[:digit:]]+' filename
The [[:digit:]]+ part matches 1 or more digits. More on grep regexes here: http://www.gnu.org/savannah-checkouts/gnu/grep/manual/grep.html#Regular-Expressions
EDIT:
If you want to match strings with only a certain number of digit's you'll have to get a little more clever:
grep -E 'fixedstring_[[:digit:]]{MIN,MAX}([^[:digit:]]|$)' filename
Replace the MIN with the minimum number of digits you want to match, and MAX with the max.