Shell Script to loop over files, aplly command and save each output to new file - linux

I have read most questions regarding this topic, but can't get an answer to my specific question:
I have a number of files in a directory, and I want to apply a command to each of these files and then create a new file with the outpot for every single file. I can only manage to write it into one file alltogether. As i expect to have ~ 500.000 files, i also would need the script to be as efficient as possible.
for f in *.bed; do sort -k1,1 -k2,2n; done
This command sorts each file accordingly and writes the ouput in the Shell - But i cannot manage to write to file in the for-loop without appending it with ">>" .
I'm thankful for any answer providing an approach or an already answered question on this topic!

You can use script like this:
for f in *.bed
do
sort -k1,1 -k2,2n $f >>new_filename
done
If you want to be sure new_filename is empty before run the loop you can clear the content in file with command (before for loop):
>new_filename

Related

Renaming All Files in a Directory

I split a large text file into 60 chunks, which are are named xaa, xab, xac,...xcg. I want to rename these files so that they all end with .txt
How can I do this from the linux command line?
Looked in the split command for the ability to customize the filenames. Looked on Stack Overflow for other solutions but the ones I've come across are all too specific to the OP's situation.
Assuming that your shell is the default Bash:
for f in x??; do mv "$f" "$f.txt"; done
If you want to be more specific, you could say x[abc][a-z] instead of x??.
This is good enough for a one-liner. In a script you would want to check that "$f" exists before trying to rename it.

Insert text in .txt file using cmd

So, I want to insert test in .txt but when I try
type file1.txt >> file2.txt
and sort it using cygwin with sort file1 | uniq >> sorted it will place it at the end of the file. But i want to write it to the start of the file. I don't know if this is possible in cmd and if it's not I can also do it in a linux terminal.
Is there a special flag or operator I need to use?
Thanks in regards, Davin
edit: the file itself (the file i'm writing to) is about 5GB big so i would have to write 5GB to a file every time i wanted to change anything
It is not possible to write to the start of the file. You can only replace the file content with content provided or append to the end of a file. So if you need to add the sorted output in front of the sorted file, you have to do it like that:
mv sorted sorted.old
sort file1 | uniq > sorted
cat sorted.old >> sorted
rm sorted.old
This is not a limitation of the shell but of the file APIs of pretty much every existing operating system. The size of a file can only be changed at the end, so you can increase it, in that case the file will grow at the end (all content stays as it is but now there is empty space after the content) or you can truncate it (in that case content is cut off at the end). It is possible to copy data around within a file but there exists no system function to do that, you have to do it yourself and this is almost as inefficient as the solution shown above.

How to read specific line of files with a Linux command or Shell Script

I need to be able to read the second last line of all the files within a specific directory.
These files are log files and, that specific line contains the status of tasks that ran, 'successful', 'fail', 'warning'.
I need to pull this to dump it after in reports.
At this stage i am looking only to pull the data, so the entire line, and will worry about the handling after.
As the line numbers are not set, they are irregular, I am looking at doing it with a 'while' loop, so it goes through the whole thing, but i am actually not getting the last 2 lines read, and also, i can read 1 file not all of them.
Any ideas on a nice little script to do this?
And anyone knows if this can be just done with just a linux command?
Use the tail command to get the last 2 lines, and then the head command to get the first of these:
for file in $DIR/*; do
tail -2 "$file" | head -1
done

how to use do loop to read several files with similar names in shell script

I have several files named scale1.dat, scale2.dat scale3.dat ... up to scale9.dat.
I want to read these files in do loop one by one and with each file I want to do some manipulation (I want to write the 1st column of each scale*.dat file to scale*.txt).
So my question is, is there a way to read files with similar names. Thanks.
The regular syntax for this is
for file in scale*.dat; do
awk '{print $1}' "$file" >"${file%.dat}.txt"
done
The asterisk * matches any text or no text; if you want to constrain to just single non-zero digits, you could say for file in scale[1-9].dat instead.
In Bash, there is a non-standard additional glob syntax scale{1..9}.dat but this is Bash-only, and so will not work in #!/bin/sh scripts. (Your question has both sh and bash so it's not clear which you require. Your comment that the Bash syntax is not working for you suggests that you may need a POSIX portable solution.) Furthermore, Bash has something called extended globbing, which allows for quite elaborate pattern matching. See also http://mywiki.wooledge.org/glob
For a simple task like this, you don't really need the shell at all, though.
awk 'FNR==1 { if (f) close (f); f=FILENAME; sub(/\.dat/, ".txt", f); }
{ print $1 >f }' scale[1-9]*.dat
(Okay, maybe that's slightly intimidating for a first-timer. But the basic point is that you will often find that the commands you want to use will happily work on multiple files, and so you don't need shell loops at all in those cases.)
I don't think so. Similar names or not, you will have to iterate through all your files (perhaps with a for loop) and use a nested loop to iterate through lines or words or whatever you plan to read from those files.
Alternatively, you can copy your files into one (say, scale-all.dat) and read that single file.

How does linux redirect IO work internally

When we use the redirect IO operator for a shell script does the operator keep all the data to be written in memory and write it all at once or does write it to file line by line.
Here is what i am working on.
I have about 200 small files ~1000 lines each in a specific format. I want to process (do a regex and change the format a little) each line in all the files and have the new transformed lines in a single combined file.
I have a transformscript.sh that takes a single file and applies the transformation. I run it in the following manner
sh transformscript.sh somefile.txt > newfile.txt
This works fine and fast for a single file.
How do i extend to do it for all the files. will it be efficient to change transformscript.sh to take a directory as argument instead of filename and add a for loop to transform all the lines of all the files together. Or should I run the above trnsformscript.sh for each file and create a new file for each one and combine then separately.
Thanks.
The redirect operator simply opens the file for writing and passes that file descriptor to the shell as its standard output. The shell then writes to the file directly.
You probably do NOT want to run the script separately for each file since you will incur the overhead of bash process creation for each pass. For example:
# don't do it this way
for somefile in $(ls somefiles*.txt); do
newfile=${somefile//some/new}
sh transformscript.sh $somefile > $newfile
done
The above starts one shell for every file found which is pretty inefficient. It would be better to rewrite transformscript.sh to handle multiple files if possible. Depending on how complicated your transform is and whether you need to keep the original filenames, you might be able to use a single sed process. For example, assume you have 200 files named test1.txt through test200.txt all with a "Hello world" line you want to change to "Hello joe". You could do something as simple a this:
sed -i.save 's/Hello world/Hello joe/' test*.txt
The -i tells sed to do an "in place" edit (edit the original file) and the optional ".save" argument to -i makes a backup copy of the original file with a .save extension before editing the original file. Note, this will leave the original contents in the .save files and the new content in the files with the original name which may not be what you want.

Resources