What is the equivalent linux command in Windows? - linux

I am trying to merge all the files in Windows batch, then sort all the rows and filter only based on unique rows as header can be repeated many times. I ever used linux and in linux this command is just this however I am not sure how can I do the same in windows bash,
sed 1d *.csv | sort -r| uniq > merged-file.csv

To do this without the sorting part, you can simply run this from cmdline or in a batchfile.
copy *.csv merged-file.csv
Which will copy the content of each csv file into merged-file.csv
To do the sorting and uniq part, you would need a little more than a simple oneliner.

Related

Getting the latest file in shell with YYYYMMDD_HHMMSS.csv.gz format

1)I have set of files in a directory in shell and i want go get the latest file depending on the time stamp in the file name.
2)For Example:
test1_20180823_121545.csv.gz
test2_20180822_191545.csv.gz
test3_20180823_192050.csv.gz
test4_20180823_100510.csv.gz
test4_20180823_191040.csv.gz
3)
From the above given files based on their time and date extension. My output should be test3_20180823_192050.csv.gz
Using find and sort:
find /path/to/mydirectory -type f | sort -t_ -k2,3 | tail -1
Option for the sort command are -t for the delimiter and -k for selecting the key on which the sort is done.
tail is to get last entry from the sorted list.
if files have also corresponding modification times (shown by ls -l) then you can list them by modification times in reverse order and get the last one
ls -1rt | tail -1
But if you can not rely on this, than you need to write the script (e.g. perl). You would get file list to array then extract time stamp to other array, convert timestamps to epoch time (which is easy to sort) to other array, sort while sorting also file list. Maybe hashes can help with it. Then print last one.
You can try to write it, if you will have issues, someone here can correct you.

How to split large file to small files with prefix in Linux/Bash

I have a file in Linux called test. Now I want to split the test into say 10 small files.
The test file has more than 1000 table names. I want the small files to have equal no of lines, the last file might have the same no of table names or not.
What I want is can we add a prefix to the split files while invoking the split command in the Linux terminal.
Sample:
test_xaa test_xab test_xac and so on..............
Is this possible in Linux.
I was able to solve my question with the following statement
split -l $(($(wc -l < test.txt )/10 + 1)) test.txt test_x
With this I was able to get the desired result
I would've sworn split did this on it's own, but to my surprise, it does not.
To get your prefix, try something like this:
for x in /path/to/your/x*; do
mv $x your_prefix_$x
done

Insert text in .txt file using cmd

So, I want to insert test in .txt but when I try
type file1.txt >> file2.txt
and sort it using cygwin with sort file1 | uniq >> sorted it will place it at the end of the file. But i want to write it to the start of the file. I don't know if this is possible in cmd and if it's not I can also do it in a linux terminal.
Is there a special flag or operator I need to use?
Thanks in regards, Davin
edit: the file itself (the file i'm writing to) is about 5GB big so i would have to write 5GB to a file every time i wanted to change anything
It is not possible to write to the start of the file. You can only replace the file content with content provided or append to the end of a file. So if you need to add the sorted output in front of the sorted file, you have to do it like that:
mv sorted sorted.old
sort file1 | uniq > sorted
cat sorted.old >> sorted
rm sorted.old
This is not a limitation of the shell but of the file APIs of pretty much every existing operating system. The size of a file can only be changed at the end, so you can increase it, in that case the file will grow at the end (all content stays as it is but now there is empty space after the content) or you can truncate it (in that case content is cut off at the end). It is possible to copy data around within a file but there exists no system function to do that, you have to do it yourself and this is almost as inefficient as the solution shown above.

How to sort content of a text file in Terminal Linux by splitting at a specific char?

I have an assignment in school to sort a files content in a specific order.
I had to do it with Windows batch-files first and now I have to do the same in Linux.
The file looks more or less like this the whole way through:
John Doe : Crocodiles : 1035
In windows I solved the problem by this:
sort /r /+39 file.txt
The rows in the file are supposed to get sorted by the number of points (which is the number to the right) in decreasing order.
Also the second part of the assignment is to sort the rows by the center column.
How can I get the same result(s) in Linux? I have tried a couple of different variations of the sort command in Linux too but so far without success.
I'd do it with:
sort -nr -t: -k3
-nr - numbers reverse order
-t: - key separator colon
-k3 - third field
The Linux equivalent of your Windows command, sort /r /+39 file, is:
sort -r -k +39 file

egrep not writing to a file

I am using the following command in order to extract domain names & the full domain extension from a file. Ex: www.abc.yahoo.com, www.efg.yahoo.com.us.
[a-z0-9\-]+\.com(\.[a-z]{2})?' source.txt | sort | uniq | sed -e 's/www.//'
> dest.txt
The command write correctly when I specify small maximum parameter -m 100 after the source.txt. The problem if I didn't specify, or if I specified a huge number. Although, I could write to files with grep (not egrep) before with huge numbers similar to what I'm trying now and that was successful. I also check the last modified date and time during the command being executed, and it seems there is no modification happening in the destination file. What could be the problem ?
As I mentioned in your earlier question, it's probably not an issue with egrep, but that your file is too big and that sort won't output anything (to uniq) until egrep is done. I suggested that you split the files into manageable chucks using the split command. Something like this:
split -l 10000000 source.txt split_source.
This will split the source.txt file into 10 million line chunks called split_source.a, split_source.b, split_source.c etc. And you can run the entire command on each one of those files (and maybe changing the pipe to append at the end: >> dest.txt).
The problem here is that you can get duplicates across multiple files, so at the end you may need to run
sort dest.txt | uniq > dest_uniq.txt
Your question is missing information.
That aside, a few thoughts. First, to debug and isolate your problem:
Run the egrep <params> | less so you can see what egreps doing, and eliminate any problem from sort, uniq, or sed (my bets on sort).
How big is your input? Any chance sort is dying from too much input?
Gonna need to see the full command to make further comments.
Second, to improve your script:
You may want to sort | uniq AFTER sed, otherwise you could end up with duplicates in your result set, AND an unsorted result set. Maybe that's what you want.
Consider wrapping your regular expressions with "^...$", if it's appropriate to establish beginning of line (^) and end of line ($) anchors. Otherwise you'll be matching portions in the middle of a line.

Resources