Sorting numerically if the number is not at the start of a line - linux

I used grep -Eo '[0-9]{1,}kg' *.dat which filters the ones with *kg. Now I'm trying to sort them in increasing order. My output from grep is:
blue_whale.dat:240kg
crocodile.dat:5kg
elephant.dat:6kg
giraffe.dat:15kg
hippopotamus.dat:4kg
humpback_whale.dat:5kg
ostrich.dat:1kg
sea_turtle.dat:10kg
I've tried to used sort -n. But the sorting doesn't work.
edit:
I have bunch of files with how much each animals weight and their length. I filtered the weights of each animal. This part was easy. And then I want to order them in increasing order which I thought was just sort -n.
edit:
In my directory, I have many dat files.
And they contain values like 110000kg 24m
And I need to order them in weight increasing order

You need to use the command in this manner:
grep -Eo '[0-9]{1,}kg' *.dat | sort -t: -n -k2
Use the "-t" option to specify the colon as field separator.
You can use -r option for decreasing or reverse order.

Related

Getting the latest file in shell with YYYYMMDD_HHMMSS.csv.gz format

1)I have set of files in a directory in shell and i want go get the latest file depending on the time stamp in the file name.
2)For Example:
test1_20180823_121545.csv.gz
test2_20180822_191545.csv.gz
test3_20180823_192050.csv.gz
test4_20180823_100510.csv.gz
test4_20180823_191040.csv.gz
3)
From the above given files based on their time and date extension. My output should be test3_20180823_192050.csv.gz
Using find and sort:
find /path/to/mydirectory -type f | sort -t_ -k2,3 | tail -1
Option for the sort command are -t for the delimiter and -k for selecting the key on which the sort is done.
tail is to get last entry from the sorted list.
if files have also corresponding modification times (shown by ls -l) then you can list them by modification times in reverse order and get the last one
ls -1rt | tail -1
But if you can not rely on this, than you need to write the script (e.g. perl). You would get file list to array then extract time stamp to other array, convert timestamps to epoch time (which is easy to sort) to other array, sort while sorting also file list. Maybe hashes can help with it. Then print last one.
You can try to write it, if you will have issues, someone here can correct you.

How to join two files in shell

There are two files-:
File1-:
email
abc#gmail.com
dbc#yahoo.com
hbc#ymail.com
File2-:
abc#gmail.com,dpk,25,India
dbc#yahoo.com,dpk,25,India
hbc#ymail.com,dpk,25,India
kbc#gmail.com,dpk,25,India
nbc#ymail.com,dpk,25,India
Required file should be-:
abc#gmail.com,dpk,25,India
dbc#yahoo.com,dpk,25,India
hbc#ymail.com,dpk,25,India
We are not using grep because actual file contains huge data and grepping an email id of file1 in file2 taking huge time.
Is it possible using Join or Comm utility, if yes please help. I had tried but not got desired result also these two utilities works on sort data, but data in two files is not sorted.
grep -Ff File1 File2
This takes the fixed strings (-F) from File1 (-f) as patterns to grep in File2 for. Grepping for fixed string should speed up operations significantly.
If that doesn't cut it...
join -t',' File1 File2
...should do as well, but requires both files to be sorted. (Joining on the first field is the default so you only have to tell join to use the comma as field delimiter.) If the files really are huge and require sorting first, I am not sure this will actually be faster.

Linux sort command: keys with the same start but different length are not sorted in the same order

I am trying to sort two files in order to join them. Some of the keys I am sorting by are very similar and this seems to be causing issues. For example I have two keys which are a1ke and a1k3-b3. I am using the command:
sort -nk1 file.txt -o file.txt
In one file they appear in this order and in the other they appear in reverse. This is causing issues when I try to join the files.
How can I sort these files so they are in the same order?
Thanks
Do not use the "-n" option, which compares according to the string numerical value.
-n
Compare according to arithmetic value an initial numeric string consisting of optional white
space, an optional - sign, and zero or more digits, optionally followed by a decimal point and
zero or more digits.
Your keys are strings, not numbers.
Instead, you should just do:
sort -k1 file.txt -o file.txt
Additional info:
You can see that sort considers your keys identical when -n is used by doing a unique sort:
sort -un file
You will see that a1k3-b3 and a1ke are considered equal (and therefore only one is emitted). If instead you do:
sort -u file
The result will contain both a1k3-b3 and a1ke, which is what you want.

How to sort content of a text file in Terminal Linux by splitting at a specific char?

I have an assignment in school to sort a files content in a specific order.
I had to do it with Windows batch-files first and now I have to do the same in Linux.
The file looks more or less like this the whole way through:
John Doe : Crocodiles : 1035
In windows I solved the problem by this:
sort /r /+39 file.txt
The rows in the file are supposed to get sorted by the number of points (which is the number to the right) in decreasing order.
Also the second part of the assignment is to sort the rows by the center column.
How can I get the same result(s) in Linux? I have tried a couple of different variations of the sort command in Linux too but so far without success.
I'd do it with:
sort -nr -t: -k3
-nr - numbers reverse order
-t: - key separator colon
-k3 - third field
The Linux equivalent of your Windows command, sort /r /+39 file, is:
sort -r -k +39 file

egrep not writing to a file

I am using the following command in order to extract domain names & the full domain extension from a file. Ex: www.abc.yahoo.com, www.efg.yahoo.com.us.
[a-z0-9\-]+\.com(\.[a-z]{2})?' source.txt | sort | uniq | sed -e 's/www.//'
> dest.txt
The command write correctly when I specify small maximum parameter -m 100 after the source.txt. The problem if I didn't specify, or if I specified a huge number. Although, I could write to files with grep (not egrep) before with huge numbers similar to what I'm trying now and that was successful. I also check the last modified date and time during the command being executed, and it seems there is no modification happening in the destination file. What could be the problem ?
As I mentioned in your earlier question, it's probably not an issue with egrep, but that your file is too big and that sort won't output anything (to uniq) until egrep is done. I suggested that you split the files into manageable chucks using the split command. Something like this:
split -l 10000000 source.txt split_source.
This will split the source.txt file into 10 million line chunks called split_source.a, split_source.b, split_source.c etc. And you can run the entire command on each one of those files (and maybe changing the pipe to append at the end: >> dest.txt).
The problem here is that you can get duplicates across multiple files, so at the end you may need to run
sort dest.txt | uniq > dest_uniq.txt
Your question is missing information.
That aside, a few thoughts. First, to debug and isolate your problem:
Run the egrep <params> | less so you can see what egreps doing, and eliminate any problem from sort, uniq, or sed (my bets on sort).
How big is your input? Any chance sort is dying from too much input?
Gonna need to see the full command to make further comments.
Second, to improve your script:
You may want to sort | uniq AFTER sed, otherwise you could end up with duplicates in your result set, AND an unsorted result set. Maybe that's what you want.
Consider wrapping your regular expressions with "^...$", if it's appropriate to establish beginning of line (^) and end of line ($) anchors. Otherwise you'll be matching portions in the middle of a line.

Resources