Grep in a for loop in Linux - linux

I have a file containing a certain number of occurrences of a determined string called "Thermochemistry". Im trying to write a script to grep 500 lines above each occurrence of this string and create a new file numbered accordingly. What I've tried this
occurrences="$(grep -c 'Thermochemistry' $FILE)"
for (( i=1; i<=$occurrences; i++)); do
grep -B500 -m "$i" 'Thermochemistry' $FILE > newfile_"$i".tmp
done
If there are 20 occurrences of 'Thermochemistry' in the file, I wanted it to create 20 new files, called newfile_1.tmp to newfile_20.tmp, but it doesn't work.
Can anyone help?

Next to the magic command from oguz ismail, you could use the following awk line:
awk '/Thermochemistry/{close(f);f="newfile_"++i".tmp"
for(i=FNR;i<=FNR+(FNR>500?500:FNR);++i) print b[i%500] > f
}{b[FNR%500]=$0}' file

Related

Shell script for append string to first of each row in column 6 of csv file

I have 100 csv files with 10 columns and 1000 rows . In column 6, I have a number and I want to append 93 at first of it.
for example :
Source :
2014-06-20 00:05:44,2014-06-2000:08:46,x.x.x.x,091xxxx,x.x.x.x,**788950270**,,971xxx,479xxxx,9xxx
Result :
2014-06-20 00:05:44,2014-06-2000:08:46,x.x.x.x,091xxxx,x.x.x.x,**93788950270**,,971xxx,479xxxx,9xxx
What you want to do can be accomplished using the bash read -a command to loop through the data files, reading 10 values from a csv file, prepending '93' to the sixth element (array element 5 on a 0 based array), and then writing the values back out to a tmp file until done and then replacing the original (after backup) with the tmp file. NOTE depending on whether the original files have a trailing new line at the end, you may need to add/remove the newline that will be present at the end of the reformat operation.
NOTE: this is only valid for 10 csv values per line (any number of rows)
#!/bin/bash
test -r "$1" || { printf "error invalid file: $1\n"; exit 1; }
tmpfile=./tmp.txt
declare -a array
IFS=$','
:>$tmpfile
while read -a array || test -n "${array[9]}"; do
array[5]="93${array[5]}"
for ((i=0; i<9; i++)); do
printf "${array[i]}," >> $tmpfile
done
printf "${array[9]}\n" >> $tmpfile
done <"$1"
cp -a "$1" "${1}.bak"
cp -a $tmpfile "$1"
rm $tmpfile
exit 0
input (taken from your example and date changed on each record to make unique):
2014-03-20 00:05:44,2014-06-20 00:08:46,x.x.x.x,091xxxx,x.x.x.x,788950270,,971xxx,479xxxx,9xxx
2014-04-20 00:05:44,2014-06-20 00:08:46,x.x.x.x,091xxxx,x.x.x.x,788950270,,971xxx,479xxxx,9xxx
2014-05-20 00:05:44,2014-06-20 00:08:46,x.x.x.x,091xxxx,x.x.x.x,788950270,,971xxx,479xxxx,9xxx
2014-06-20 00:05:44,2014-06-20 00:08:46,x.x.x.x,091xxxx,x.x.x.x,788950270,,971xxx,479xxxx,9xxx
output
2014-03-20 00:05:44,2014-06-20 00:08:46,x.x.x.x,091xxxx,x.x.x.x,93788950270,,971xxx,479xxxx,9xxx
2014-04-20 00:05:44,2014-06-20 00:08:46,x.x.x.x,091xxxx,x.x.x.x,93788950270,,971xxx,479xxxx,9xxx
2014-05-20 00:05:44,2014-06-20 00:08:46,x.x.x.x,091xxxx,x.x.x.x,93788950270,,971xxx,479xxxx,9xxx
2014-06-20 00:05:44,2014-06-20 00:08:46,x.x.x.x,091xxxx,x.x.x.x,93788950270,,971xxx,479xxxx,9xxx
Again NOTE this is a non-trivial operation if your files are production files, so backup before, the script will also make a backup of the data file, and then verify the presence/absence of the trailing newline in the original.
use this:
awk -F, -vOFS="," '{$6=substr($6,3);$6="**"93$6}1' FILENAME
if field #6 have not ** use this:
awk -F, -vOFS="," '{$6=93$6}1' FILENAME

Bash script to list files periodically

I have a huge set of files, 64,000, and I want to create a Bash script that lists the name of files using
ls -1 > file.txt
for every 4,000 files and store the resulted file.txt in a separate folder. So, every 4000 files have their names listed in a text files that is stored in a folder. The result is
folder01 contains file.txt that lists files #0-#4000
folder02 contains file.txt that lists files #4001-#8000
folder03 contains file.txt that lists files #8001-#12000
.
.
.
folder16 contains file.txt that lists files #60000-#64000
Thank you very much in advance
You can try
ls -1 | awk '
{
if (! ((NR-1)%4000)) {
if (j) close(fnn)
fn=sprintf("folder%02d",++j)
system("mkdir "fn)
fnn=fn"/file.txt"
}
print >> fnn
}'
Explanation:
NR is the current record number in awk, that is: the current line number.
NR starts at 1, on the first line, so we subtract 1 such that the if statement is true for the first line
system calls an operating system function from within awk
print in itself prints the current line to standard output, we can redirect (and append) the output to the file using >>
All uninitialized variables in awk will have a zero value, so we do not need to say j=0 in the beginning of the program
This will get you pretty close;
ls -1 | split -l 4000 -d - folder
Run the result of ls through split, breaking every 4000 lines (-l 4000), using numeric suffixes (-d), from standard input (-) and start the naming of the files with folder.
Results in folder00, folder01, ...
Here an exact solution using awk:
ls -1 | awk '
(NR-1) % 4000 == 0 {
dir = sprintf("folder%02d", ++nr)
system("mkdir -p " dir);
}
{ print >> dir "/file.txt"} '
There are already some good answers above, but I would also suggest you take a look at the watch command. This will re-run a command every n seconds, so you can, well, watch the output.

Extract Digits From String After Capturing It From File

I'm trying to retrieve a memory value from file, and compare it to reference value. But one thing at a time....
I've attempted using set/source/grep/substring to variable but non of them actually worked. Then I found a way to do it using a for loop (see code).
The issue: I'm receiving the entire string from the file, but I can't manage to get rid of the last character in it.
#!/bin/bash
#source start_params.properties
#mem_val= "$default.default.minmaxmemory.main"
#mem_val= grep "default.default.minmaxmemory.main" start_params.properties
for mLine in $(grep 'default.default.minmaxmemory.main' start_params.properties)
do
echo "$mLine"
done
echo "${mLine:4:5}" # didn't get rid of the last `m` in `-max4095m`
v1="max"
v2="m"
echo "$mLine" | sed -e "s/.*${v1}//;s/${v2}.*//" #this echo the right value.
The loop iterates twice:
First output: default.default.minmaxmemory.main=-min512m
Second output: -max4096m
Then the sed command output is 4096,but how can I change the last line in the code S.T. it'll store the value in a variable?
Thank you for your suggestions,
You could use grep to filter the max part and then another a grep -o to extract the numbers:
echo "$mLine" | grep "$max" | grep -o '[[:digit:]]*'
$ sed '/max[0-9]/!d; s/.*max//; s/m//' start_params.properties
4096
remove lines not matching max[0-9]
remove first part of line until max
remove final m

grouping lines from a txt file using filters in Linux to create multiple txt files

I have a txt file, where each line starts with participant No, followed by the date and other variables (numbers only), so has format:
S001_2 20090926 14756 93
S002_2 20090803 15876 13
I want to write a script that creates smaller txt files containing only 20 participants per file (so first one will contain lines from S001_2 to S020_2;second from S021_2 to S040_2; total number of subjects approximately 200). However, subjects are not organized, therefore I can`t set a range with sed.
What would be the best command to filter ppts into chunks depending on what number (SOO1_2) the line starts with?
Thanks in advance.
Use the split command to split a file (or a filtered result) without ranges and sed. According to the documentation, this should work:
cat file.txt | split -l 20 - PREFIX
This will produce the files PREFIXaa, PREFIXab, ... (Note that it does not add the .txt extension to the file name!)
If you want to filter the files first, in the way #Sergey described:
cat file.txt | sort | split -l 20 - PREFIX
Sort without any parameters should be suitable, because there are leading zeros in your numbers like S001_2. So, first sort the file:
sort file.txt > sorted.txt
Then you will be able to set ranges with sed for file_sort.txt
This looks like a whole script for splitting sorted file into 20-line files:
num=1;
i=1;
lines=`wc -l sorted.txt | cut -d' ' -f 1`;#get number of lines
while [ $i -lt $lines ];do
sed -n $i,`echo $i+19 | bc`p sorted.txt > file$num;
num=`echo $num+1 | bc`;
i=`echo $i+20 | bc`;
done;
$ split -d -l 20 file.txt -a3 db_
produces: db_000, db_001, db_002, ..., db_N

How do I count the number of occurrences of a string in an entire file?

Is there an inbuilt command to do this or has anyone had any luck with a script that does it?
I am looking to count the number of times a certain string (not word) appears in a file. This can include multiple occurrences per line so the count should count every occurrence not just count 1 for lines that have the string 2 or more times.
For example, with this sample file:
blah(*)wasp( *)jkdjs(*)kdfks(l*)ffks(dl
flksj(*)gjkd(*
)jfhk(*)fj (*) ks)(*gfjk(*)
If I am looking to count the occurrences of the string (*) I would expect the count to be 6, i.e. 2 from the first line, 1 from the second line and 3 from the third line. Note how the one across lines 2-3 does not count because there is a LF character separating them.
Update: great responses so far! Can I ask that the script handle the conversion of (*) to \(*\), etc? That way I could just pass any desired string as an input parameter without worrying about what conversion needs to be done to it so it appears in the correct format.
You can use basic tools such as grep and wc:
grep -o '(\*)' input.txt | wc -l
Using perl's "Eskimo kiss" operator with the -n switch to print a total at the end. Use \Q...\E to ignore any meta characters.
perl -lnwe '$a+=()=/\Q(*)/g; }{ print $a;' file.txt
Script:
use strict;
use warnings;
my $count;
my $text = shift;
while (<>) {
$count += () = /\Q$text/g;
}
print "$count\n";
Usage:
perl script.pl "(*)" file.txt
This loops over the lines of the file, and on each line finds all occurrences of the string "(*)". Each time that string is found, $c is incremented. When there are no more lines to loop over, the value of $c is printed.
perl -ne'$c++ while /\(\*\)/g;END{print"$c\n"}' filename.txt
Update: Regarding your comment asking that this be converted into a solution that accepts a regex as an argument, you might do it like this:
perl -ne'BEGIN{$re=shift;}$c++ while /\Q$re/g;END{print"$c\n"}' 'regex' filename.txt
That ought to do the trick. If I felt inclined to skim through perlrun again I might see a more elegant solution, but this should work.
You could also eliminate the explicit inner while loop in favor of an implicit one by providing list context to the regexp:
perl -ne'BEGIN{$re=shift}$c+=()=/\Q$re/g;END{print"$c\n"}' 'regex' filename.txt
You can use basic grep command:
Example: If you want to find the no of occurrence of "hello" word in a file
grep -c "hello" filename
If you want to find the no of occurrence of a pattern then
grep -c -P "Your Pattern"
Pattern example : hell.w, \d+ etc
I have used below command to find particular string count in a file
grep search_String fileName|wc -l
text="(\*)"
grep -o $text file | wc -l
You can make it into a script which accepts arguments like this:
script count:
#!/bin/bash
text="$1"
file="$2"
grep -o "$text" "$file" | wc -l
Usage:
./count "(\*)" file_path

Resources