How to do sum from the file and move in particular way in another file in linux? - linux

Acttualy this is my assignment.I have three-four file,related by student record.Every file have two-three student record.like this
Course Name:Opreating System
Credit: 4
123456 1 1 0 1 1 0 1 0 0 0 1 5 8 0 12 10 25
243567 0 1 1 0 1 1 0 1 0 0 0 7 9 12 15 17 15
Every file have different coursename.I did every coursename and studentid move
in one file but now i don't know how to add all marks and move to another file on same place where is id? Can you please tell me how to do it?
It looks like this:
Student# Operating Systems JAVA C++ Web Programming GPA
123456 76 63 50 82 67.75
243567 80 - 34 63 59
I did like this:
#!/bin/sh
find ~/2011/Fall/StudentsRecord -name "*.rec" | xargs grep -l 'CREDITS' | xargs cat > rsh1
echo "STUDENT ID" > rsh2
sed -n /COURSE/p rsh1 | sed 's/COURSE NAME: //g' >> rsh2
echo "GPA" >> rsh2
sed -e :a -e '{N; s/\n/ /g; ta}' rsh2 > rshf
sed '/COURSE/d;/CREDIT/d' rsh1 | sort -uk 1,1 | cut -d' ' -f1 | paste -d' ' >> rshf

Some comments and a few pointers :
It would help to add 'comments' for each line of code that is not self evident ; i.e. code like mv f f.bak doesn't need to be commented, but I'm not sure what the intent of your many lines of code are.
You insert a comment with the '#' char, like
# concatenate all files that contain the word CREDITS into a file called rsh1
find ~/2011/Fall/StudentsRecord -name "*.rec" | xargs grep -l 'CREDITS' | xargs cat > rsh1
Also note that you consistently use all uppercase for your search targets, i.e. CREDITS, when your sample files shows mixed case. Either used correct case for your search targets, i.e.
`grep -l 'Credits'`
OR tell grep to -i(gnore case), i.e.
`grep -il 'Credits'
Your line
sed -n /COURSE/p rsh1 | sed 's/COURSE NAME: //g' >> rsh2
can be reduced to 1 call to sed (and you have the same case confusion thing going on), try
sed -n '/COURSE/i{;s/COURSE NAME: //gip;}' rsh1 >> rsh2
This means (-n don't print every line by default),
`gip` = global substitute,
= ignore case in matching
print only lines where substituion was made
So you're editing out the string COURSE NAME for any line that has COURSE in it, and only printing those lines' (you probably don't need the 'g' (global) specifier given that you expect only 1 instance per line)
Your line
sed -e :a -e '{N; s/\n/ /g; ta}' rsh2 > rshf
Actually looks pretty good, very advanced, you're trying to 'fold' each 2 lines together into 1 line, right?
But,
sed '/COURSE/d;/CREDIT/d' rsh1 | sort -uk 1,1 | cut -d' ' -f1 | paste -d' ' >> rshf
I'm really confused by this, is this where you're trying to total a students score? (with a sort embedded I guess not). Why do you think you need a sort,
While it is possible to perform arithmetic in sed, it is super-crazy hard, so you can either use bash variables to calculate the values OR use a unix tool that is designed to process text AND perform logical and mathematical operations of the data presented, awk or perl come to mind here
Anyway, one solution to total each score is to use awk
echo "123456 1 1 0 1 1 0 1 0 0 0 1 5 8 0 12 10 25" |\
awk '{for (i=2;i<=NF;i++) { tot+=$i }; print $1 "\t" tot }'
Will give you a clue on how to proceed for that.
Awk has predefined variables that it populates for each file, and each line of text that it reads, i.e.
$0 = complete line of text (as defined by the internal variables RS (RecordSeparator)
which defaults to '\n' new-line char, the unix end-of-line char
$1 = first field in text (as defined by the internal variables FS (FieldSeparator)
which defaults to (possibly multiple) space chars OR tab char
a line with 2 connected spaces chars and 1 tab char has 3 fields)
NF = Number(of)Fields in current line of data (again fields defined by value of FS as
described above)
(there are many others, besides, $0, $n, $NF, $FS, $RS).
you can programatically increment for values like $1, $2, $3, by using a variable as in the example code, like $i (i is a variable that has a number between 2 and NF. The leading '$'
says give me the value of field i (i.e. $2, $3, $4 ...)
Incidentally, your problem could be easily solved with a single awk script, but apparently, you're supposed to learn about cat, cut, grep, etc, which is a very worthwhile goal.
I hope this helps.

Related

Check values in file is greater or equal in bash script

I have file.txt include:
2
10
60
90
now how can i check if numbers in that file is equal on greater than 50 end then do something. Something in my case is sending an email this part i have.
I have tried do this with awk but it does not work in script.
The following command will output the greatest value of your file:
sort -nr file.txt | head -1
Then just compare it to the value of your choice and voilĂ . Something like:
if [ `sort -nr file.txt | head -1` -ge 50 ]
then
<do something>
fi
Explanation:
sort -n sorts the file as numbers (otherwise 12 would be considered greater than 100).
sort -r reverse the sort (by default it displays lower numbers first, with -r it displays higher first).
head -1 displays only the first output.
This will serve your job.
$ awk 'FNR > 0 { if($1 > 50) print $1 }' <file>

Finding the Number of strings in a File

I'm trying to write a very small program that will check the number of sub strings in a large text file. All it will do is count the first 2000 lines of the text file, find any "TTT" sub-strings, count them, and set a variable to that total. I'm a bit new to shell, so any help would be amazingly appreciated!
#!/bin/bash
$counter=(head -2000 [file name] | grep TTT | grep -o TTT | wc -l)
echo $counter
For what it's worth you might awk better suited for this task:
awk -F"ttt" '{j=(NF-1)+j}END{print j}' filename
This will split each record in your file by delimiter "ttt". Then it counts the number of fields, subtracts one, and adds that to the total.
A file like:
ttt tttttt something
1 5 ttt
tt
one more ttt record
Would be split (visualizing with pipe delim) like:
| || something
1 5 |
tt
one more | record
Counting the number of fields per record:
4
2
1
2
Subtracting one from that:
3
1
0
1
Which totals to 5, which is how many "ttt" substrings are present.
To incorporate this into your script (and fixing your other issue):
#!/bin/bash
counter=$(awk -F"ttt" '{j=(NF-1)+j}END{print j}' filename)
echo $counter
The change here is that when we set a variable in Bash we don't include the $ sign at the front. Only in referencing the variable do we include the $.
You have some minor syntax errors there, probably you meant this:
counter=$(head -2000 [file name] | grep TTT | grep -o TTT | wc -l)
echo $counter
Notice the tiny changes I made there to make it work.
Btw the grep TTT in the middle is redundant, you can simply drop it, that is:
counter=$(head -2000 [file name] | grep -o TTT | wc -l)
grep can already do what you want: counter=$(grep -c TTT $infile). You can limit the number of hits (not lines) with -m NUM, --max-count=NUM, which makes grep stop at the end of the file OR when NUM occurrences are found.

How to delete the first column ( which is in fact row names) from a data file in linux?

I have data file with many thousands columns and rows. I want to delete the first column which is in fact the row counter. I used this command in linux:
cut -d " " -f 2- input.txt > output.txt
but nothing changed in my output. Does anybody knows why it does not work and what should I do?
This is what my input file looks like:
col1 col2 col3 col4 ...
1 0 0 0 1
2 0 1 0 1
3 0 1 0 0
4 0 0 0 0
5 0 1 1 1
6 1 1 1 0
7 1 0 0 0
8 0 0 0 0
9 1 0 0 0
10 1 1 1 1
11 0 0 0 1
.
.
.
I want my output look like this:
col1 col2 col3 col4 ...
0 0 0 1
0 1 0 1
0 1 0 0
0 0 0 0
0 1 1 1
1 1 1 0
1 0 0 0
0 0 0 0
1 0 0 0
1 1 1 1
0 0 0 1
.
.
.
I also tried the sed command:
sed '1d' input.file > output.file
But it deletes the first row not the first column.
Could anybody guide me?
idiomatic use of cut will be
cut -f2- input > output
if you delimiter is tab ("\t").
Or, simply with awk magic (will work for both space and tab delimiter)
awk '{$1=""}1' input | awk '{$1=$1}1' > output
first awk will delete field 1, but leaves a delimiter, second awk removes the delimiter. Default output delimiter will be space, if you want to change to tab, add -vOFS="\t" to the second awk.
UPDATED
Based on your updated input the problem is the initial spaces that cut treats as multiple columns. One way to address is to remove them first before feeding to cut
sed 's/^ *//' input | cut -d" " -f2- > output
or use the awk alternative above which will work in this case as well.
#Karafka I had CSV files so I added the "," separator (you can replace with yours
cut -d"," -f2- input.csv > output.csv
Then, I used a loop to go over all files inside the directory
# files are in the directory tmp/
for f in tmp/*
do
name=`basename $f`
echo "processing file : $name"
#kepp all column excep the first one of each csv file
cut -d"," -f2- $f > new/$name
#files using the same names are stored in directory new/
done
You can use cut command with --complement option:
cut -f1 -d" " --complement input.file > output.file
This will output all columns except the first one.
As #karakfa notes, it looks like it's the leading whitespace which is causing your issues.
Here's a sed oneliner to do the job (that will account for spaces or tabs):
sed -i.bak "s|^[ \t]\+[0-9]\+[ \t]\+||" input.txt
Explanation:
-i edit existing file in place
.bak backup original file and add .bak file extension (can use whatever you like)
s substitute
| separator (easiest character to read as sed separator IMO)
^ start match at start of the line
[ \t] match space or tab
\+ match one or more times (escape required so sed does not interpret '+' literally)
[0-9] match any number 0 - 9
As noted; the input.txt file will be edited in place. The original content of input.txt will be saved as input.txt.bak. Use just -i instead if you don't want a backup of the original file.
Also, if you know that they are definitely leading spaces (not tabs), you could shorten it to this:
sed -i.bak "s|^ \+[0-9]\+[ \t]\+||" input.txt
You can also achieve this with grep:
grep -E -o '[[:digit:]]([[:space:]][[:digit:]]){3}$' input.txt
Which assumes single character digit and space columns. To accommodate a variable number of spaces and digits you can do:
grep -E -o '[[:digit:]]+([[:space:]]+[[:digit:]]+){3}$' input.txt
If your grep supports the -P flag (--perl-regexp) you can do:
grep -P -o '\d+(\s+\d+){3}$' input.txt
And here are a few options if you are using GNU sed:
sed 's/^\s\+\w\+\s\+//' input.txt
sed 's/^\s\+\S\+\s\+//' input.txt
sed 's/^\s\+[0-9]\+\s\+//' input.txt
sed 's/^\s\+[[:digit:]]\+\s\+//' input.txt
Note that the grep regexes are matching the parts that we want to keep while the sed regexes are matching the parts we want to remove.

How to extract the integer or decimal at beginning of each input line, using Linux/Unix utilities?

Given input such as:
1
1a
1.1b
2.0c
How to extract the integer/decimal number at beginning of each input line, using only Linux/Unix command line utilities?
Using awk, you could say:
awk '{print $0+0}'
Awk is available in Linux, BSD, and many other Unix-like operating systems. It helps in this way:
echo "1" | awk '{a+=$0; print a}' # output 1
echo "1a" | awk '{a+=$0; print a}' # output 1
echo "1.1b" | awk '{a+=$0; print a}' # output 1.1
echo "2.0c" | awk '{a+=$0; print a}' # output 2
Some more awk
For extracting only digits
$ awk 'gsub(/[[:alpha:]].*/,x,$1) + 1' << EOF
1
1a
1.1b
2.0c
EOF
1
1
1.1
2.0
For integer
$ awk '{print int($0)}' << EOF
1
1a
1.1b
2.0c
EOF
1
1
1
2
---edit---
If there is any blank line in file, you can avoid printing zero from following
$ awk 'NF{$0+=0}1' << EOF
1
1a
1.1b
2foot4c
2
EOF
1
1
1.1
2
2
Here is a way to do this with sed:
echo "12.3abc" | sed -n 's/^\([0-9.][0-9.]*\).*/\1/p'
Output:
12.3
The block in parentheses matches all numbers or periods '.' that occur at the beginning of the line. Everything after that is match by the '.*'.
The \1 says to replace the entire line with just the portion that was matched in the parentheses.
Assuming your version of grep supports -o:
grep -o '^[0-9.]\+' data.in
NB: This will match any sequence of digits and decimal points at the start of the line.

Sorting space delimited numbers with Linux/Bash

Is there a Linux utility or a Bash command I can use to sort a space delimited string of numbers?
Here's a simple example to get you going:
echo "81 4 6 12 3 0" | tr " " "\n" | sort -g
tr translates the spaces delimiting the numbers, into carriage returns, because sort uses carriage returns as delimiters (ie it is for sorting lines of text). The -g option tells sort to sort by "general numerical value".
man sort for further details about sort.
This is a variation from #JamesMorris answer:
echo "81 4 6 12 3 0" | xargs -n1 | sort -g | xargs
Instead of tr, I use xargs -n1 to convert to new lines. The final xargs is to convert back, to a space separated sequence of numbers.
This is a variation on ghostdog74's answer that's too big to fit in a comment. It shows digits instead of names of numbers and both the original string and the result are in space-delimited strings (instead of an array which becomes a newline-delimited string).
$ s="3 2 11 15 8"
$ sorted=$(echo $(printf "%s\n" $s | sort -n))
$ echo $sorted
2 3 8 11 15
$ echo "$sorted"
2 3 8 11 15
If you didn't use the echo when setting the value of sorted, then the string has newlines in it. In that case echoing it without quotes puts it all on one line, but, as echoing it with quotes would show, each number would appear on its own line. This is the case whether the original is an array or a string.
# demo
$ s="3 2 11 15 8"
$ sorted=$(printf "%s\n" $s | sort -n)
$ echo $sorted
2 3 8 11 15
$ echo "$sorted"
2
3
8
11
15
$ s=(one two three four)
$ sorted=$(printf "%s\n" ${s[#]}|sort)
$ echo $sorted
four one three two
Using Bash parameter expansion (to replace spaces with newlines) we can do:
str="3 2 11 15 8"
sort -n <<< "${str// /$'\n'}"
# alternative
NL=$'\n'
str="3 2 11 15 8"
sort -n <<< "${str// /${NL}}"
If you actually have a space-delimited string of numbers, then one of the other answers provided would work fine. If your list is a bash array, then:
oldIFS="$IFS"
IFS=$'\n'
array=($(sort -g <<< "${array[*]}"))
IFS="$oldIFS"
might be a better solution. The newline delimiter would help if you want to generalize to sorting an array of strings instead of numbers.
Improving on Evan Krall's nice Bash "array sort" by limiting the scope of IFS to a single command:
printf "%q\n" "${IFS}"
array=(3 2 11 15 8)
array=($(IFS=$'\n' sort -n <<< "${array[*]}"))
echo "${array[#]}"
printf "%q\n" "${IFS}"
$ awk 'BEGIN{split(ARGV[1], numbers);for(i in numbers) {print numbers[i]} }' \
"6 7 4 1 2 3" | sort -n
I added this to my .zshrc (or .bashrc) file:
#sort a space-separated list of words (e.g. a list of HTML classes)
sortwords() {
echo $1 | xargs -n1 | sort -g | xargs
}
Call it from the terminal like this:
sortwords "banana date apple cherry"
# apple banana cherry date
Thanks to #FranMowinckel and others for inspiration.

Resources