How to read specific line of files with a Linux command or Shell Script - linux

I need to be able to read the second last line of all the files within a specific directory.
These files are log files and, that specific line contains the status of tasks that ran, 'successful', 'fail', 'warning'.
I need to pull this to dump it after in reports.
At this stage i am looking only to pull the data, so the entire line, and will worry about the handling after.
As the line numbers are not set, they are irregular, I am looking at doing it with a 'while' loop, so it goes through the whole thing, but i am actually not getting the last 2 lines read, and also, i can read 1 file not all of them.
Any ideas on a nice little script to do this?
And anyone knows if this can be just done with just a linux command?

Use the tail command to get the last 2 lines, and then the head command to get the first of these:
for file in $DIR/*; do
tail -2 "$file" | head -1
done

Related

Shell Script to loop over files, aplly command and save each output to new file

I have read most questions regarding this topic, but can't get an answer to my specific question:
I have a number of files in a directory, and I want to apply a command to each of these files and then create a new file with the outpot for every single file. I can only manage to write it into one file alltogether. As i expect to have ~ 500.000 files, i also would need the script to be as efficient as possible.
for f in *.bed; do sort -k1,1 -k2,2n; done
This command sorts each file accordingly and writes the ouput in the Shell - But i cannot manage to write to file in the for-loop without appending it with ">>" .
I'm thankful for any answer providing an approach or an already answered question on this topic!
You can use script like this:
for f in *.bed
do
sort -k1,1 -k2,2n $f >>new_filename
done
If you want to be sure new_filename is empty before run the loop you can clear the content in file with command (before for loop):
>new_filename

Set line maximum of log file

Currently I write a simple logger to log messages from my bash script. The logger works fine and I simply write the date plus the message in the log file. Since the log file will increase, I would like to set the limit of the logger to for example 1000 lines. After reaching 1000 lines, it doesn't delete or totally clear the log file. It should truncate the first line and replace it with the new log line. So the file keeps 1000 lines and doesn't increase further. The latest line should always be at the top of the file. Is there any built in method? Or how could I solve this?
Why would you want to replace the first line with the new message thereby causing a jump in the order of messages in your log file instead of just deleting the first line and appending the new message, e.g. simplistically:
log() {
tail -999 logfile > tmp &&
{ cat tmp && printf '%s\n' "$*"; } > logfile
}
log "new message"
You don't even need a tmp file if your log file is always small lines, just save the output of the tail in a variable and printf that too.
Note that unlike a sed -i solution, the above will not change the inode, hardlinks, permissions or anything else for logfile - it's the same file as you started with just with updated content, it's not getting replaced with a new file.
Your chosen example may not be the best. As the comments have already pointed out, logrotate is the best tool to keep log file sizes at bay; furthermore, a line is not the best unit to measure size. Those commenters are both right.
However, I take your question at face value and answer it.
You can achieve what you want by shell builtins, but it is much faster and simpler to use an external tool like sed. (awk is another option, but it lacks the -i switch which simplifies your life in this case.)
So, suppose your file exists already and is named script.log then
maxlines=1000
log_msg='Whatever the log message is'
sed -i -e1i"\\$log_msg" -e$((maxlines))',$d' script.log
does what you want.
-i means modify the given file in place.
-e1i"\\$log_msg" means insert $log_msg before the first (1) line.
-e$((maxlines))',$d' means delete each line from line number $((maxlines)) to the last one ($).

How does linux redirect IO work internally

When we use the redirect IO operator for a shell script does the operator keep all the data to be written in memory and write it all at once or does write it to file line by line.
Here is what i am working on.
I have about 200 small files ~1000 lines each in a specific format. I want to process (do a regex and change the format a little) each line in all the files and have the new transformed lines in a single combined file.
I have a transformscript.sh that takes a single file and applies the transformation. I run it in the following manner
sh transformscript.sh somefile.txt > newfile.txt
This works fine and fast for a single file.
How do i extend to do it for all the files. will it be efficient to change transformscript.sh to take a directory as argument instead of filename and add a for loop to transform all the lines of all the files together. Or should I run the above trnsformscript.sh for each file and create a new file for each one and combine then separately.
Thanks.
The redirect operator simply opens the file for writing and passes that file descriptor to the shell as its standard output. The shell then writes to the file directly.
You probably do NOT want to run the script separately for each file since you will incur the overhead of bash process creation for each pass. For example:
# don't do it this way
for somefile in $(ls somefiles*.txt); do
newfile=${somefile//some/new}
sh transformscript.sh $somefile > $newfile
done
The above starts one shell for every file found which is pretty inefficient. It would be better to rewrite transformscript.sh to handle multiple files if possible. Depending on how complicated your transform is and whether you need to keep the original filenames, you might be able to use a single sed process. For example, assume you have 200 files named test1.txt through test200.txt all with a "Hello world" line you want to change to "Hello joe". You could do something as simple a this:
sed -i.save 's/Hello world/Hello joe/' test*.txt
The -i tells sed to do an "in place" edit (edit the original file) and the optional ".save" argument to -i makes a backup copy of the original file with a .save extension before editing the original file. Note, this will leave the original contents in the .save files and the new content in the files with the original name which may not be what you want.

Sharing a writable variable between bash scripts

Using bash on Linux.
I'm running find piped into xargs which launches a second bash script to do some processing on each file. I would like to maintain a "running tally" of file sizes and how many errors are encountered by the second script. In other words, every time the second script runs, it calculates the file size and adds it to the total so far, and also same thing if it encounters an error in its processing of the file. And I need this information to be available to the parent script after the find | xargs finishes.
I can do this by having the second script save and update a text file — a crude way to maintain a "global variable" — but I'm wondering if there's a nicer, more efficient way.
Can you use a pipe or Process Substitution to get the information back from the second script?
find ... | xargs second_script |
while read information
do
something useful with it
done
Or:
while read information
do
something useful with it
done < <(find ... | xargs second_script)

egrep not writing to a file

I am using the following command in order to extract domain names & the full domain extension from a file. Ex: www.abc.yahoo.com, www.efg.yahoo.com.us.
[a-z0-9\-]+\.com(\.[a-z]{2})?' source.txt | sort | uniq | sed -e 's/www.//'
> dest.txt
The command write correctly when I specify small maximum parameter -m 100 after the source.txt. The problem if I didn't specify, or if I specified a huge number. Although, I could write to files with grep (not egrep) before with huge numbers similar to what I'm trying now and that was successful. I also check the last modified date and time during the command being executed, and it seems there is no modification happening in the destination file. What could be the problem ?
As I mentioned in your earlier question, it's probably not an issue with egrep, but that your file is too big and that sort won't output anything (to uniq) until egrep is done. I suggested that you split the files into manageable chucks using the split command. Something like this:
split -l 10000000 source.txt split_source.
This will split the source.txt file into 10 million line chunks called split_source.a, split_source.b, split_source.c etc. And you can run the entire command on each one of those files (and maybe changing the pipe to append at the end: >> dest.txt).
The problem here is that you can get duplicates across multiple files, so at the end you may need to run
sort dest.txt | uniq > dest_uniq.txt
Your question is missing information.
That aside, a few thoughts. First, to debug and isolate your problem:
Run the egrep <params> | less so you can see what egreps doing, and eliminate any problem from sort, uniq, or sed (my bets on sort).
How big is your input? Any chance sort is dying from too much input?
Gonna need to see the full command to make further comments.
Second, to improve your script:
You may want to sort | uniq AFTER sed, otherwise you could end up with duplicates in your result set, AND an unsorted result set. Maybe that's what you want.
Consider wrapping your regular expressions with "^...$", if it's appropriate to establish beginning of line (^) and end of line ($) anchors. Otherwise you'll be matching portions in the middle of a line.

Resources