sed command to copy lines that have strings - linux

I want to copy the line that have strings to another file
for eg
A file contain the below lines
ram 100 50
gopal 200 40
ravi 50 40
krishna 300 600
Govind 100 34
I want to copy the lines that has 100 or 200 to another file by skipping all the characters before (first occurrence in a line)100 or 200
I want it to copy
100 50
200 40
100 34
to another file
I am using sed -n '/100/p' filename > outputfile
can you please help me in adding lines with any one of the string using a single command

Short sed approach:
sed '/[12]00/!d; s/[^0-9[:space:]]*//g; s/^ *//g;' filename > outputfile
/[12]00/!d - exclude/delete all lines that don't match 100 or 200
s/[^0-9[:space:]]*//g - remove all characters except digits and whitespaces
The outputfile contents:
100 50
200 40
100 34

This might work for you (GNU sed):
sed -n '/[12]00/w anotherFile' file
Only print if needed, write to anotherFile the regexp which matches 100 or 200.

There are at least 2 possibilities:
sed -n '/100\|200/p' filename > outputfile
sed -n -e '/100/p' -e '/200/p' filename > outputfile
The latter is probably easier to remember and maintain (but maybe you should be using -f?), but note that it will print lines twice if they match both. You could fix this by using:
sed -n -e '/100/{p;b}' -e '/200/{p;b}' filename > outputfile
Then again, why are you using sed? This sounds like a job for grep.

Related

How to replace two lines with a blank line using SED command?

I want to replace the first two lines with a blank line as below.
Input:
sample
sample
123
234
235
456
Output:
<> blank line
123
234
235
456
Delete the first line, remove all the content from the second line but don't delete it completely:
$ sed -e '1d' -e '2s/.*//' input.txt
123
234
235
456
Or insert a blank line before the first, and delete the first two lines:
$ sed -e '1i\
' -e '1,2d' input.txt
123
234
235
456
Or use tail instead of sed to print all lines starting with the third, and an echo first to get a blank line:
(echo ""; tail +3 input.txt)
Or if you're trying to modify a file in place, use ed instead:
ed -s input.txt <<EOF
1,2c
.
w
EOF
(The c command changes the given range of lines to new content)

Find strings from one file that are not in lines of another file

In a bash shell script, I need to create a file with strings from file 1 that are not found in lines from file 2. File 1 is opened through a for loop of files in a directory.
files=./Output/*
for f in $files
do
done
I have very large files, so using grep isn't ideal. I previously tried:
awk 'NR==FNR{A[$2]=$0;next}!($2 in A){print }' file2 file1 > file3
file 1:
NB551674:136:HHVMJAFX2:1:11101:18246:1165
NB551674:136:HHVMJAFX2:1:11101:10296:1192
NB551674:136:HHVMJAFX2:1:11101:13281:1192
NB551674:136:HHVMJAFX2:2:21204:11743:6409
file 2:
aggggcgttccgcagtcgacaagggctgaaaaa|AbaeA1 NB551674:136:HHVMJAFX2:2:21204:11743:6409 100.000 32 0 0 1 32 83 114 7.30e-10 60.2
taccaacaattcagcgttacgccaacggtaac|AbaeB1 NB551674:136:HHVMJAFX2:4:21611:6341:1845 100.000 32 0 0 1 32 27 58 6.70e-10 60.2
taccaacaattcagcgttacgccaacggtaac|AbaeB1 NB551674:136:HHVMJAFX2:4:11504:1547:13124 100.000 32 0 0 1 32 88 119 6.70e-10 60.2
taccaacaattcagcgttacgccaacggtaac|AbaeB1 NB551674:136:HHVMJAFX2:3:11410:11337:15451 100.000 32 0 0 1 32 27 58 6.70e-10 60.2
expected output:
NB551674:136:HHVMJAFX2:2:21204:11743:6409
You were close - file1 only has 1 field ($1) but you were trying to use $2 in the hash lookup ($2 in A). Do this instead:
$ awk 'NR==FNR{a[$2]; next} !($1 in a)' file2 file1
NB551674:136:HHVMJAFX2:1:11101:18246:1165
NB551674:136:HHVMJAFX2:1:11101:10296:1192
NB551674:136:HHVMJAFX2:1:11101:13281:1192
Don't use all upper case for user-defined variable names in awk or shell btw to avoid clashes with builtin variables and other reasons.
Use comm, which requires sorted files. Print the second field of file2 using a Perl one-liner (or cut):
comm -23 <(sort file1) <(perl -lane 'print $F[1]' file2 | sort)
don't do that one line left compare one line right mess.
use gawk in bytes mode, or preferably, mawk, preload every single line into an array from file 1. use the strings directly as the array's hash indices instead of just numerical 1,2,3....
and set FS same as ORS (to prevent it from unnecessarily attempt to process the string looking to split fields).
close file 1. open file 2, then use each of the strings in file 2 and delete the corresponding entry in the array.
close file 2.
in END section, print out whatever is left inside that array. that's your set.

Row count of each file in a `.zip` folder

I have one zip folder with 5 text files inside. I have to check the row count of each file without unzipping the zip folder.
I tried zcat file.zip | wc -l but it gives the count of the first file only.
Can you guys help me to get the result as mentioned below:
File_Name Rowcount
file1 100
file2 100
file3 100
file4 100
file5 100
If your file is a gzipped tar archive, then you can simply loop over each filename in the archive to get the number of lines in each. For example if you archive contains:
$ tar -tzf /tmp/tmp-david/zipfile.tar.gz
yon.c
yourmachinecode.s
zeroonect.c
zeros
zz
You can loop over the filenames with:
$ for i in $(tar -tzf /tmp/tmp-david/zipfile.tar.gz); do
printf "%8d lines - %s\n" $(wc -l <"$i") "$i"
done
61 lines - yon.c
5 lines - yourmachinecode.s
63 lines - zeroonect.c
0 lines - zeros
1 lines - zz
You can keep a sum and increment with each count as required to get the total.
If your file is a .zip (MS-DOS) archive, then you can do the same thing, but the parsing of the individual filenames from the output of unzip -l takes a bit more work, e.g.
$ unzip -l /tmp/tmp-david/zipfile.zip | grep -v '^-' | \
tail -n+3 | head -n-1 | awk '{print $4}'
(you would use the above as in your command substitution to drive the for loop)

Linux: Number of characters in a text file on lines 'x' through 'y'

How do I print the number of characters on lines x - y of a text file?
I tried using wc -m filename.txt
but I couldn't figure out how to limit the search.
You could use
head -y filename | tail -(y-x+1) | wc -m
You can use the sed command to select the lines you want and then pipe the output into wc. Something like this would select lines 6-10 and print the number of characters:
sed -n '6,10p' filename.txt | wc -m
Try this:
awk '{ print NR, "-", length($0)}' filename.txt
It will print the line number NR and the characters per line length($0) of filename.txt so output will be something like:
1 - 3 # line 1 with 3 characters
2 - 0 # line 2 with no characters
...
In case you just want to print the number of characters for a specific range, let's say from line 1 to 3, this could be used:
awk 'NR>=1 && NR<=3 { print length($0)}' filename.txt

How to do sum from the file and move in particular way in another file in linux?

Acttualy this is my assignment.I have three-four file,related by student record.Every file have two-three student record.like this
Course Name:Opreating System
Credit: 4
123456 1 1 0 1 1 0 1 0 0 0 1 5 8 0 12 10 25
243567 0 1 1 0 1 1 0 1 0 0 0 7 9 12 15 17 15
Every file have different coursename.I did every coursename and studentid move
in one file but now i don't know how to add all marks and move to another file on same place where is id? Can you please tell me how to do it?
It looks like this:
Student# Operating Systems JAVA C++ Web Programming GPA
123456 76 63 50 82 67.75
243567 80 - 34 63 59
I did like this:
#!/bin/sh
find ~/2011/Fall/StudentsRecord -name "*.rec" | xargs grep -l 'CREDITS' | xargs cat > rsh1
echo "STUDENT ID" > rsh2
sed -n /COURSE/p rsh1 | sed 's/COURSE NAME: //g' >> rsh2
echo "GPA" >> rsh2
sed -e :a -e '{N; s/\n/ /g; ta}' rsh2 > rshf
sed '/COURSE/d;/CREDIT/d' rsh1 | sort -uk 1,1 | cut -d' ' -f1 | paste -d' ' >> rshf
Some comments and a few pointers :
It would help to add 'comments' for each line of code that is not self evident ; i.e. code like mv f f.bak doesn't need to be commented, but I'm not sure what the intent of your many lines of code are.
You insert a comment with the '#' char, like
# concatenate all files that contain the word CREDITS into a file called rsh1
find ~/2011/Fall/StudentsRecord -name "*.rec" | xargs grep -l 'CREDITS' | xargs cat > rsh1
Also note that you consistently use all uppercase for your search targets, i.e. CREDITS, when your sample files shows mixed case. Either used correct case for your search targets, i.e.
`grep -l 'Credits'`
OR tell grep to -i(gnore case), i.e.
`grep -il 'Credits'
Your line
sed -n /COURSE/p rsh1 | sed 's/COURSE NAME: //g' >> rsh2
can be reduced to 1 call to sed (and you have the same case confusion thing going on), try
sed -n '/COURSE/i{;s/COURSE NAME: //gip;}' rsh1 >> rsh2
This means (-n don't print every line by default),
`gip` = global substitute,
= ignore case in matching
print only lines where substituion was made
So you're editing out the string COURSE NAME for any line that has COURSE in it, and only printing those lines' (you probably don't need the 'g' (global) specifier given that you expect only 1 instance per line)
Your line
sed -e :a -e '{N; s/\n/ /g; ta}' rsh2 > rshf
Actually looks pretty good, very advanced, you're trying to 'fold' each 2 lines together into 1 line, right?
But,
sed '/COURSE/d;/CREDIT/d' rsh1 | sort -uk 1,1 | cut -d' ' -f1 | paste -d' ' >> rshf
I'm really confused by this, is this where you're trying to total a students score? (with a sort embedded I guess not). Why do you think you need a sort,
While it is possible to perform arithmetic in sed, it is super-crazy hard, so you can either use bash variables to calculate the values OR use a unix tool that is designed to process text AND perform logical and mathematical operations of the data presented, awk or perl come to mind here
Anyway, one solution to total each score is to use awk
echo "123456 1 1 0 1 1 0 1 0 0 0 1 5 8 0 12 10 25" |\
awk '{for (i=2;i<=NF;i++) { tot+=$i }; print $1 "\t" tot }'
Will give you a clue on how to proceed for that.
Awk has predefined variables that it populates for each file, and each line of text that it reads, i.e.
$0 = complete line of text (as defined by the internal variables RS (RecordSeparator)
which defaults to '\n' new-line char, the unix end-of-line char
$1 = first field in text (as defined by the internal variables FS (FieldSeparator)
which defaults to (possibly multiple) space chars OR tab char
a line with 2 connected spaces chars and 1 tab char has 3 fields)
NF = Number(of)Fields in current line of data (again fields defined by value of FS as
described above)
(there are many others, besides, $0, $n, $NF, $FS, $RS).
you can programatically increment for values like $1, $2, $3, by using a variable as in the example code, like $i (i is a variable that has a number between 2 and NF. The leading '$'
says give me the value of field i (i.e. $2, $3, $4 ...)
Incidentally, your problem could be easily solved with a single awk script, but apparently, you're supposed to learn about cat, cut, grep, etc, which is a very worthwhile goal.
I hope this helps.

Resources