Find and remove in unix - linux

Below is my input file:
sample.txt:
3"
6-position
7' 4" to 10' 3-1/2"
4.8"
Adjustable from 99" to 111" - max 148
and in the output I only need 3, i.e.
output.txt:
3
4.8
So basically I need to print the numeric value for the " symbol, other non-numeric text needs to be removed entirely.
I tried to implement this with sed, but I was not able to get the desired result.
Is there any way to achieve this on UNIX?

awk is more suited to perform this type of task:
awk '/^ *[0-9]*(\.[0-9]+)?" *$/{sub(/"/, ""); print}' inFile
OUTPUT:
3
4.8

One way with sed:
sed -n 's/^\([0-9]\+\(\.[0-9]\+\)\?\)"$/\1/p' sample.txt > out.txt
or with GNU sed
sed -rn 's/^([0-9]+(\.[0-9]+)?)"$/\1/p' sample.txt > out.txt
or with GNU grep
grep -oP '^[0-9]+(\.[0-9]+)?(?="$)' > out.txt
Be sure to use the correct inch mark (” or "). Or you can match both with a character class [”"].
Edit: updated to work for floating point numbers.

I think you are asking for grep -o [0-9][0-9]*\" sample.txt Which will match one or more numbers followed my a '"', and print each occurrence separately and without surrounding text.

This might work for you (GNU sed):
sed '/^[0-9.]\+"/!d;s/".*//' file

Related

Print between special characters with sed,grep

I need to print the string between these characters....
atob(' ')
I am using a = in the second part as an attempt to stop the code on an equal signs (which the base64 string I'm trying to get ends in.)
I use this script, but it prints the entire line containing the above characters. I need just the data in between.
sed -n '/atob/,${p;/==/q;}'
I appreciate any help. Thank you.
Does this work (tested for GNU sed 4.2.2)?
 sed -n -e "s/atop('\(.*\)')/\1/p" b.txt
where b.txt is
atop('safdasdfasf')
or you can try awk
awk -F\' '/atop/ {print $2}' b.txt
(tested for gnu awk 4.0.2 and added the suggestion by Jotne)
And another working sed:
echo "atop('safdasdfasf')" | sed -r "/atop/ s/^[^']+'([^']+)'.*/\1/"
safdasdfasf

SED: Displaying the first 10 lines of sophisticated expression

How to use sed to find lines with the word linux? As later display a first line 10 with the word linux?
EX.:
cat file | sed -e '/linux/!d' -e '10!d' ### I can not display the first 10 lines of the word linux
cat file | sed '/linux/!d' | sed '10!d' ### It is well
How to make it work with one sed?
cat file | sed -e '/linux/!d; ...?; 10!d'
...? - storing of the buffer linux? 10 later cut the lines?
Someone explain to me?
I would use awk:
awk '/linux/ && c<10 {print;c++} c==10 {exit}' file
This might work for you (GNU sed):
sed -nr '/linux/{p;G;/(.*\n){10}/q;h}' file
Print the line if it contains the required string. If the required number of lines has already been printed quit, otherwise store the line and previous lines in the hold space.
You could use perl:
perl -ne 'if (/linux/) {print; ++$a;}; last if $a==10' inputfile
Using GNU sed:
sed -rn "/linux/{p;x;s/^/P/;ta;:a;s/^P{10}$//;x;Tb;Q;:b}" filename
Thanks. You are great. All of the examples look very nice. Wooow :) It is a pity that I can not do that.
I have not seen for 'r' option in sed. I need to learn.
echo -e 'windows\nlinux\nwindows\nlinux\nlinux\nwindows' | sed -nr '/linux/{p;G;/(.*\n){2}/q;h}'
It works very well.
echo -e 'windows\nlinux\nwindows\nlinux\nlinux\nwindows' | sed -nr '/linux/{p;G;/(.*\n){2}/q;h}' | sed '2s/linux/debian/'
Can I ask you one more example? How to get a result at one sed?

How to filter rows where there is a trailing whitespace in a certain field?

How can I filter records from file where there is a trailing whitespace in a certain field? E.g if I have a file containing rows like these (| as a field delimiter):
a232|var1|var2
a342 |var1|var2
a234|var1|var2
filtering should return a row
a342 |var1|var2
I do not want to remove these white spaces. I tried this:
awk '$1 ~ /[\s]+$/' myfile.txt
but it did not work.
your line is almost correct, but you need to define the FS if it is not space, in your case it is pipe:
awk -F\| '$1~/[[:space:]]+$/'
you can change the $1 to $x to filter on "certain" field.
test:
kent$ echo "a232|var1|var2
a342 |var1|var2
a234|var1|var2"|awk -F\| '$1~/[[:space:]]+$/'
a342 |var1|var2
You can do it using grep:
grep '^[^|]* |'
Try this one
sed -n 's/\(.*\) |.*/\n&/p'
Use grep like this:
grep " |" yourfile
or, more generally
grep "\s|" yourfile
sed -n '/\(.*\)[[:blank:]]\|/ p' YourFile
--posix for GNU sed
From your example,
a232|var1|var2
a342 |var1|var2
a234|var1|var2
Try this,
sed -n 2p filename
here, 2p indicates that you are filtering second row from the file.
if you want to filter third line then code will look like this,
sed -n 3p filename
In general we can state, sed -n Np filename
here, N stand for index of line which you want to filter from a given file.

Linux cut string

In Linux (Cento OS) I have a file that contains a set of additional information that I want to removed. I want to generate a new file with all characters until to the first |.
The file has the following information:
ALFA12345|7890
Beta0-XPTO-2|30452|90 385|29
ZETA2334423 435; 2|2|90dd5|dddd29|dqe3
The output expected will be:
ALFA12345
Beta0 XPTO-2
ZETA2334423 435; 2
That is removed all characters after the character | (inclusive).
Any suggestion for a script that reads File1 and generates File2 with this specific requirement?
Try
cut -d'|' -f1 oldfile > newfile
And, to round out the "big 3", here's the awk version:
awk -F\| '{print $1}' in.dat
You can use a simple sed script.
sed 's/^\([^|]*\).*/\1/g' in.dat
ALFA12345
Beta0-XPTO-2
ZETA2334423 435; 2
Redirect to a file to capture the output.
sed 's/^\([^|]*\).*/\1/g' in.dat > out.dat
And with grep:
$ grep -o '^[^|]*' file1
ALFA12345
Beta0-XPTO-2
ZETA2334423 435; 2
$ grep -o '^[^|]*' file1 > file2

Delete empty lines using sed

I am trying to delete empty lines using sed:
sed '/^$/d'
but I have no luck with it.
For example, I have these lines:
xxxxxx
yyyyyy
zzzzzz
and I want it to be like:
xxxxxx
yyyyyy
zzzzzz
What should be the code for this?
You may have spaces or tabs in your "empty" line. Use POSIX classes with sed to remove all lines containing only whitespace:
sed '/^[[:space:]]*$/d'
A shorter version that uses ERE, for example with gnu sed:
sed -r '/^\s*$/d'
(Note that sed does NOT support PCRE.)
I am missing the awk solution:
awk 'NF' file
Which would return:
xxxxxx
yyyyyy
zzzzzz
How does this work? Since NF stands for "number of fields", those lines being empty have 0 fields, so that awk evaluates 0 to False and no line is printed; however, if there is at least one field, the evaluation is True and makes awk perform its default action: print the current line.
sed
'/^[[:space:]]*$/d'
'/^\s*$/d'
'/^$/d'
-n '/^\s*$/!p'
grep
.
-v '^$'
-v '^\s*$'
-v '^[[:space:]]*$'
awk
/./
'NF'
'length'
'/^[ \t]*$/ {next;} {print}'
'!/^[ \t]*$/'
sed '/^$/d' should be fine, are you expecting to modify the file in place? If so you should use the -i flag.
Maybe those lines are not empty, so if that's the case, look at this question Remove empty lines from txtfiles, remove spaces from start and end of line I believe that's what you're trying to achieve.
I believe this is the easiest and fastest one:
cat file.txt | grep .
If you need to ignore all white-space lines as well then try this:
cat file.txt | grep '\S'
Example:
s="\
\
a\
b\
\
Below is TAB:\
\
Below is space:\
\
c\
\
"; echo "$s" | grep . | wc -l; echo "$s" | grep '\S' | wc -l
outputs
7
5
Another option without sed, awk, perl, etc
strings $file > $output
strings - print the strings of printable characters in files.
With help from the accepted answer here and the accepted answer above, I have used:
$ sed 's/^ *//; s/ *$//; /^$/d; /^\s*$/d' file.txt > output.txt
`s/^ *//` => left trim
`s/ *$//` => right trim
`/^$/d` => remove empty line
`/^\s*$/d` => delete lines which may contain white space
This covers all the bases and works perfectly for my needs. Kudos to the original posters #Kent and #kev
The command you are trying is correct, just use -E flag with it.
sed -E '/^$/d'
-E flag makes sed catch extended regular expressions. More info here
You can say:
sed -n '/ / p' filename #there is a space between '//'
You are most likely seeing the unexpected behavior because your text file was created on Windows, so the end of line sequence is \r\n. You can use dos2unix to convert it to a UNIX style text file before running sed or use
sed -r "/^\r?$/d"
to remove blank lines whether or not the carriage return is there.
This works in awk as well.
awk '!/^$/' file
xxxxxx
yyyyyy
zzzzzz
You can do something like that using "grep", too:
egrep -v "^$" file.txt
My bash-specific answer is to recommend using perl substitution operator with the global pattern g flag for this, as follows:
$ perl -pe s'/^\n|^[\ ]*\n//g' $file
xxxxxx
yyyyyy
zzzzzz
This answer illustrates accounting for whether or not the empty lines have spaces in them ([\ ]*), as well as using | to separate multiple search terms/fields. Tested on macOS High Sierra and CentOS 6/7.
FYI, the OP's original code sed '/^$/d' $file works just fine in bash Terminal on macOS High Sierra and CentOS 6/7 Linux at a high-performance supercomputing cluster.
If you want to use modern Rust tools, you can consider:
ripgrep:
cat datafile | rg '.' line with spaces is considered non empty
cat datafile | rg '\S' line with spaces is considered empty
rg '\S' datafile line with spaces is considered empty (-N can be added to remove line numbers for on screen display)
sd
cat datafile | sd '^\n' '' line with spaces is considered non empty
cat datafile | sd '^\s*\n' '' line with spaces is considered empty
sd '^\s*\n' '' datafile inplace edit
Using vim editor to remove empty lines
:%s/^$\n//g
For me with FreeBSD 10.1 with sed worked only this solution:
sed -e '/^[ ]*$/d' "testfile"
inside [] there are space and tab symbols.
test file contains:
fffffff next 1 tabline ffffffffffff
ffffffff next 1 Space line ffffffffffff
ffffffff empty 1 lines ffffffffffff
============ EOF =============
NF is the command of awk you can use to delete empty lines in a file
awk NF filename
and by using sed
sed -r "/^\r?$/d"

Resources