Cygwin - How can I keep original file name when outputting results of a command on a file (cut command)? - cygwin

I'm using a cut command to split up a file. I need the output of the file to keep the original file name. I will not know the name of the file, just what folder it is located in. I need to ultimately add a suffix and prefix to original file after the cut, which I've got figured out. My issue is that I do not know how to keep the original file name after I output the cut.
cut -d, -f1,2,3 for file in * $file > originalfilename.txt
There should only be 1 file in the "dropbox" folder at one time. So if I can store the variable of that file name somewhere and use later that works for me.
Also if there is a way to just modify the file using cut, rather than needing to output it somewhere this would satisfy my needs too, because I would obviously still have original file name then.
I just started using Cygwin a few days ago so I apologize if there is really an obvious answer to this! I have googled everything and couldn't find what I needed.

The answer is no, unix cut does not offer an in-place option. However you can look at alternate options here

You define a variable to store the name of the file and use that variable in the commands:
orig_file='originalfilename.txt'<br>
cut -d, -f1,2,3 for file in * $file > $orig_file <br>
echo "The name of the original file is $orig_file"

Related

Shell Script to loop over files, aplly command and save each output to new file

I have read most questions regarding this topic, but can't get an answer to my specific question:
I have a number of files in a directory, and I want to apply a command to each of these files and then create a new file with the outpot for every single file. I can only manage to write it into one file alltogether. As i expect to have ~ 500.000 files, i also would need the script to be as efficient as possible.
for f in *.bed; do sort -k1,1 -k2,2n; done
This command sorts each file accordingly and writes the ouput in the Shell - But i cannot manage to write to file in the for-loop without appending it with ">>" .
I'm thankful for any answer providing an approach or an already answered question on this topic!
You can use script like this:
for f in *.bed
do
sort -k1,1 -k2,2n $f >>new_filename
done
If you want to be sure new_filename is empty before run the loop you can clear the content in file with command (before for loop):
>new_filename

Insert text in .txt file using cmd

So, I want to insert test in .txt but when I try
type file1.txt >> file2.txt
and sort it using cygwin with sort file1 | uniq >> sorted it will place it at the end of the file. But i want to write it to the start of the file. I don't know if this is possible in cmd and if it's not I can also do it in a linux terminal.
Is there a special flag or operator I need to use?
Thanks in regards, Davin
edit: the file itself (the file i'm writing to) is about 5GB big so i would have to write 5GB to a file every time i wanted to change anything
It is not possible to write to the start of the file. You can only replace the file content with content provided or append to the end of a file. So if you need to add the sorted output in front of the sorted file, you have to do it like that:
mv sorted sorted.old
sort file1 | uniq > sorted
cat sorted.old >> sorted
rm sorted.old
This is not a limitation of the shell but of the file APIs of pretty much every existing operating system. The size of a file can only be changed at the end, so you can increase it, in that case the file will grow at the end (all content stays as it is but now there is empty space after the content) or you can truncate it (in that case content is cut off at the end). It is possible to copy data around within a file but there exists no system function to do that, you have to do it yourself and this is almost as inefficient as the solution shown above.

Paste header line in multiple tsv (tab separated) files

I have multiple .tsv files named as choochoo1.tsv, choochoo2.tsv, ... choochoo(nth).tsv files. I also have a main.tsv file. I want to extract the header line in main.tsv and paste over all choochoo(nth).tsv files. Please note that there are other .tsv files in the directory that I don't want to change or paste header, so I can't do *.tsv and select all the .tsv files (so need to select choochoo string for wanted files). This is what I have tried using bash script, but could not make it work. Please suggest the right way to do it.
for x in *choochoo; do
head -n1 main.tsv > $x
done
You have a problem with the file glob, as well as the redirect:
the file glob will catch things like AAchoochoo but not choochoo1.tsv and not even AAchoochoo.tsv
the redirect will overwrite the existing files instead of adding to them. The redirect command for adding to a file is >>, but that will append text to the end and you want to prepend text in the beginning.
The problem with prepending text to an existing file, is that you have to open the file for both reading and writing and then stream both prepended text and original text, in order - and that is usually where people fail because the shell can't open files like that (there is a slightly more complex way of doing this directly, by opening the file for both reading and writing, but I'm not going to address that further).
You might want to use a temporary file, something like this:
for x in choochoo[0-9]*.tsv; do
mv "$x"{,.orig}
(head -n1 main.tsv; cat "$x.orig") > $x
rm "$x.orig"
done

Find and Replace Incrementally Across Multiple Files - Bash

I apologize in advance if this belongs in SuperUser, I always have a hard time discerning whether these scripting in bash questions are better placed here or there. Currently I know how to find and replace strings in multiple files, and how to find and replace strings within a single file incrementally from searching for a solution to this issue, but how to combine them eludes me.
Here's the explanation:
I have a few hundred files, each in sets of two: a data file (.data), and a message file (data.ms).
These files are linked via a key value unique to each set of two that looks like: ab.cdefghi
Here's what I want to do:
Step through each .data file and do the following:
Find:
MessageKey ab.cdefghi
Replace:
MessageKey xx.aaa0001
MessageKey xx.aaa0002
...
MessageKey xx.aaa0010
etc.
Incrementing by 1 every time I get to a new file.
Clarifications:
For reference, there is only one instance of "MessageKey" in every file.
The paired files have the same name, only their extensions differ, so I could simply step through all .data files and then all .data.ms files and use whatever incremental solution on both and they'd match fine, don't need anything too fancy to edit two files in tandem or anything.
For all intents and purposes whatever currently appears on the line after each MessageKey is garbage and I am completely throwing it out and replacing it with xx.aaa####
String length does matter, so I need xx.aa0009, xx.aaa0010 not xx.aa0009, xx.aa00010
I'm using cygwin.
I would approach this by creating a mapping from old key to new and dumping that into a temp file.
grep MessageKey *.data \
| sort -u \
| awk '{ printf("%s:xx.aaa%04d\n", $1, ++i); }' \
> /tmp/key_mapping
From there I would confirm that the file looks right before I applied the mapping using sed to the files.
cat /tmp/key_mapping \
| while read old new; do
sed -i -e "s:MessageKey $old:MessageKey $new:" * \
done
This will probably work for you, but it's neither elegant or efficient. This is how I would do it if I were only going to run it once. If I were going to run this regularly and efficiency mattered, I would probably write a quick python script.
#Carl.Anderson got me started on the right track and after a little tweaking, I ended up implementing his solution but with some syntax tweaks.
First of all, this solution only works if all of your files are located in the same directory. I'm sure anyone with even slightly more experience with UNIX than me could modify this to work recursively, but here goes:
First I ran:
-hr "MessageKey" . | sort -u | awk '{ printf("%s:xx.aaa%04d\n", $2, ++i); }' > MessageKey
This command was used to create a find and replace map file called "MessageKey."
The contents of which looked like:
In.Rtilyd1:aa.xxx0087
In.Rzueei1:aa.xxx0088
In.Sfricf1:aa.xxx0089
In.Slooac1:aa.xxx0090
etc...
Then I ran:
MessageKey | while IFS=: read old new; do sed -i -e "s/MessageKey $old/MessageKey $new/" *Data ; done
I had to use IFS=: (or I could have alternatively find and replaced all : in the map file with a space, but the former seemed easier.
Anyway, in the end this worked! Thanks Carl for pointing me in the right direction.

Using sed to delete lines present in similar file

I have a file listing from an original and a duplicate drive consisting of 985257 lines and 984997 lines respectfully.
As the number of lines do not match I am certain that some of the files have not duplicated.
In order to establish which files are not present I wish to use sed to filter the original file listing by deleting any lines present in the duplicate listing from the source listing.
I had thought about using a match formula in excel but due to the number of lines the program crashes. I thought using this approach in sed would be a viable option.
I have had no success with my approach so far however.
echo "Start"
# Cat the passed argument which is the duplicate file listing
for line in $(cat $1)
do
#sed the $line variable over the larger file and remove
#sed "${line}/d" LiveList.csv
#sed -i "${line}/d" LiveList.csv
#sed -i '${line}' 'd' LiveList.csv
sed -i "s/'${line}'//" /home/listings/LiveList.csv
done
There is a temporary file which is created and fills to the 103.4mb of the listing file however the listing file itself is not altered at all.
My other concern is that as the listing has been created in windows the '\' character may be escaping the string leading to no matches and therefore no alteration.
Example path:
Path,Length,Extension
Jimmy\tail\images\Jimmy\0001\0014\Text\A0\20\A056TH01-01.html,71982,.html
Please help.
This might work for you:
sort orginal_list.txt duplicate_list.txt | uniq -u
First thing that comes to my mind is just using rsync to take care of copying the missing files as fast as possible. It really works wonders.
If not, you can first sort both files to identify where they differ. You can use some paste trickery to put side by side differences, or even use the diff side-by-side output. When files are ordered, I think diff finds it easily to identify what lines have been added.

Resources