Insert text in .txt file using cmd - linux

So, I want to insert test in .txt but when I try
type file1.txt >> file2.txt
and sort it using cygwin with sort file1 | uniq >> sorted it will place it at the end of the file. But i want to write it to the start of the file. I don't know if this is possible in cmd and if it's not I can also do it in a linux terminal.
Is there a special flag or operator I need to use?
Thanks in regards, Davin
edit: the file itself (the file i'm writing to) is about 5GB big so i would have to write 5GB to a file every time i wanted to change anything

It is not possible to write to the start of the file. You can only replace the file content with content provided or append to the end of a file. So if you need to add the sorted output in front of the sorted file, you have to do it like that:
mv sorted sorted.old
sort file1 | uniq > sorted
cat sorted.old >> sorted
rm sorted.old
This is not a limitation of the shell but of the file APIs of pretty much every existing operating system. The size of a file can only be changed at the end, so you can increase it, in that case the file will grow at the end (all content stays as it is but now there is empty space after the content) or you can truncate it (in that case content is cut off at the end). It is possible to copy data around within a file but there exists no system function to do that, you have to do it yourself and this is almost as inefficient as the solution shown above.

Related

Shell Script to loop over files, aplly command and save each output to new file

I have read most questions regarding this topic, but can't get an answer to my specific question:
I have a number of files in a directory, and I want to apply a command to each of these files and then create a new file with the outpot for every single file. I can only manage to write it into one file alltogether. As i expect to have ~ 500.000 files, i also would need the script to be as efficient as possible.
for f in *.bed; do sort -k1,1 -k2,2n; done
This command sorts each file accordingly and writes the ouput in the Shell - But i cannot manage to write to file in the for-loop without appending it with ">>" .
I'm thankful for any answer providing an approach or an already answered question on this topic!
You can use script like this:
for f in *.bed
do
sort -k1,1 -k2,2n $f >>new_filename
done
If you want to be sure new_filename is empty before run the loop you can clear the content in file with command (before for loop):
>new_filename

How to split large file to small files with prefix in Linux/Bash

I have a file in Linux called test. Now I want to split the test into say 10 small files.
The test file has more than 1000 table names. I want the small files to have equal no of lines, the last file might have the same no of table names or not.
What I want is can we add a prefix to the split files while invoking the split command in the Linux terminal.
Sample:
test_xaa test_xab test_xac and so on..............
Is this possible in Linux.
I was able to solve my question with the following statement
split -l $(($(wc -l < test.txt )/10 + 1)) test.txt test_x
With this I was able to get the desired result
I would've sworn split did this on it's own, but to my surprise, it does not.
To get your prefix, try something like this:
for x in /path/to/your/x*; do
mv $x your_prefix_$x
done

Cygwin - How can I keep original file name when outputting results of a command on a file (cut command)?

I'm using a cut command to split up a file. I need the output of the file to keep the original file name. I will not know the name of the file, just what folder it is located in. I need to ultimately add a suffix and prefix to original file after the cut, which I've got figured out. My issue is that I do not know how to keep the original file name after I output the cut.
cut -d, -f1,2,3 for file in * $file > originalfilename.txt
There should only be 1 file in the "dropbox" folder at one time. So if I can store the variable of that file name somewhere and use later that works for me.
Also if there is a way to just modify the file using cut, rather than needing to output it somewhere this would satisfy my needs too, because I would obviously still have original file name then.
I just started using Cygwin a few days ago so I apologize if there is really an obvious answer to this! I have googled everything and couldn't find what I needed.
The answer is no, unix cut does not offer an in-place option. However you can look at alternate options here
You define a variable to store the name of the file and use that variable in the commands:
orig_file='originalfilename.txt'<br>
cut -d, -f1,2,3 for file in * $file > $orig_file <br>
echo "The name of the original file is $orig_file"

How does linux redirect IO work internally

When we use the redirect IO operator for a shell script does the operator keep all the data to be written in memory and write it all at once or does write it to file line by line.
Here is what i am working on.
I have about 200 small files ~1000 lines each in a specific format. I want to process (do a regex and change the format a little) each line in all the files and have the new transformed lines in a single combined file.
I have a transformscript.sh that takes a single file and applies the transformation. I run it in the following manner
sh transformscript.sh somefile.txt > newfile.txt
This works fine and fast for a single file.
How do i extend to do it for all the files. will it be efficient to change transformscript.sh to take a directory as argument instead of filename and add a for loop to transform all the lines of all the files together. Or should I run the above trnsformscript.sh for each file and create a new file for each one and combine then separately.
Thanks.
The redirect operator simply opens the file for writing and passes that file descriptor to the shell as its standard output. The shell then writes to the file directly.
You probably do NOT want to run the script separately for each file since you will incur the overhead of bash process creation for each pass. For example:
# don't do it this way
for somefile in $(ls somefiles*.txt); do
newfile=${somefile//some/new}
sh transformscript.sh $somefile > $newfile
done
The above starts one shell for every file found which is pretty inefficient. It would be better to rewrite transformscript.sh to handle multiple files if possible. Depending on how complicated your transform is and whether you need to keep the original filenames, you might be able to use a single sed process. For example, assume you have 200 files named test1.txt through test200.txt all with a "Hello world" line you want to change to "Hello joe". You could do something as simple a this:
sed -i.save 's/Hello world/Hello joe/' test*.txt
The -i tells sed to do an "in place" edit (edit the original file) and the optional ".save" argument to -i makes a backup copy of the original file with a .save extension before editing the original file. Note, this will leave the original contents in the .save files and the new content in the files with the original name which may not be what you want.

Adding content to middle of file..without reading it till the end

I have read various questions/answers here on unix.stackexchange on how to add or remove lines to/from a file without needing to create a temporary file.
https://unix.stackexchange.com/questions/11067/is-there-a-way-to-modify-a-file-in-place?lq=1
It appears all these answers need one to atleast read till end of the file, which can be time consuming if input is a large file. Is there a way around this? I would expect a file system to be implemented like a linked list...so there should be a way to reach the required "lines" and then just add stuff (node in linked lists). How do I go about doing this?
Am I correct in thinking so? Or Am I missing anything?
Ps: I need this to be done in 'C' and cannot use any shell commands.
As of Linux 4.1, fallocate(2) supports the FALLOC_FL_INSERT_RANGE flag, which allows one to insert a hole of a given length in the middle of a file without rewriting the following data. However, it is rather limited: the hole must be inserted at a filesystem block boundary, and the size of the inserted hole must be a multiple of the filesystem block size. Additionally, in 4.1, this feature was only supported by the XFS filesystem, with Ext4 support added in 4.2.
For all other cases, it is still necessary to rewrite the rest of the file, as indicated in other answers.
The short answer is that yes, it is possible to modify the contents of a file in place, but no, it is not possible to remove or add content in the middle of the file.
UNIX filesystems are implemented using an inode pointer structure which points to whole blocks of data. Each line of a text file does not "know" about its relationship to the previous or next line, they are simply adjacent to each other within the block. To add content between those two lines would require all of the following content to be shifted further "down" within the block, pushing some data into the next block, which in turn would have to shift into the following block, etc.
In C you can fopen a file for update and read its contents, and overwrite some of the contents, but I do not believe there is (even theoretically) any way to insert new data in the middle, or delete data (except to overwrite it with nulls.
You can modify a file in place, for example using dd.
$ echo Hello world, how are you today? > helloworld.txt
$ cat helloworld.txt
Hello world, how are you today?
$ echo -n Earth | dd of=helloworld.txt conv=notrunc bs=1 seek=6
$ cat helloworld.txt
Hello Earth, how are you today?
The problem is that if your change also changes the length, it will not quite work correctly:
$ echo -n Joe | dd of=helloworld.txt conv=notrunc bs=1 seek=6
Hello Joeth, how are you today?
$ echo -n Santa Claus | dd of=helloworld.txt conv=notrunc bs=1 seek=6
Hello Santa Clausare you today?
When you change the length, you have to re-write the file, if not completely then starting at the point of change you make.
In C this is the same as with dd. You open the file, you seek, and you write.
You can open a file in read/write mode. You read the file (or use "seek" to jump to the position you want, if you know it), and then write into the file, but you overwrite the data which are here (this is not an insertion).
Then you can choose to trunc the file from the last point you wrote or keep all the remaining data after the point you wrote without reading them.

Resources