Print last field in file and use it for name of another file - linux

I have a tab delimited file with 3 rows and 7 columns. I want to use the number at the end of the file to rename another file.
Example of tab delimited file:
a b c d e f g
a b c d e f g
a b c d e f 1235
So, I want to extract the number from tab delimited file and then rename "file1" to the number extracted (mv file1 1235)
I can print the column, but I cannot seem to extract just the number from the file. Even if I can extract the number I can't seem to figure out how to store that number to use as the new file name.

You can use this awk
name=$(awk 'END {print $NF}' file)
mv file $name

something along these lines perhaps?
num=$(tail -1 file1 | rev | awk '{print $1}' | rev)
mv file1 $num

Using a perl one-liner
perl -ne 'BEGIN{($f) = #ARGV} ($n) = /(\d+)$/; END{rename($f, $n)}' file1

Related

Bash script: filter columns based on a character

My text file should be of two columns separated by a tab-space (represented by \t) as shown below. However, there are a few corrupted values where column 1 has two values separated by a space (represented by \s).
A\t1
B\t2
C\sx\t3
D\t4
E\sy\t5
My objective is to create a table as follows:
A\t1
B\t2
C\t3
D\t4
E\t5
i.e. discard the 2nd value that is present after the space in column 1 for eg. in C\sx\t3 I can discard the x that is present after space and store the columns as C\t3.
I have tried a couple of things but with no luck.
I tried to cut the cols based on \t into independent columns and then cut the first column based on \s and join them again. However, it did not work.
Here is the snippet:
col1=(cut -d$'\t' -f1 $file | cut -d' ' -f1)
col2=(cut -d$'\t' -f1 $file)
myArr=()
for((idx=0;idx<${#col1[#]};idx++));do
echo "#{col1[$idx]} #{col2[$idx]}"
# I will append to myArr here
done
The output is appending the list of col2 to the col1 as A B C D E 1 2 3 4 5. And on top of this, my file is very huge i.e. 5,300,000 rows so I would like to avoid looping over all the records and appending them one by one.
Any advice is very much appreciated.
Thank you. :)
And another sed solution:
Search and replace any literal space followed by any number of non-TAB-characters with nothing.
sed -E 's/ [^\t]+//' file
A 1
B 2
C 3
D 4
E 5
If there could be more than one actual space in there just make it 's/ +[^\t]+//' ...
Assuming that when you say a space you mean a blank character then using any awk:
awk 'BEGIN{FS=OFS="\t"} {sub(/ .*/,"",$1)} 1' file
Solution using Perl regular expressions (for me they are easier than seds, and more portable as there are few versions of sed)
$ cat ls
A 1
B 2
C x 3
D 4
E y 5
$ cat ls |perl -pe 's/^(\S+).*\t(\S+)/$1 $2/g'
A 1
B 2
C 3
D 4
E 5
This code gets all non-empty characters from the front and all non-empty characters from after \t
Try
sed $'s/^\\([^ \t]*\\) [^\t]*/\\1/' file
The ANSI-C Quoting ($'...') feature of Bash is used to make tab characters visible as \t.
take advantage of FS and OFS and let them do all the hard work for you
{m,g}awk NF=NF FS='[ \t].*[ \t]' OFS='\t'
A 1
B 2
C 3
D 4
E 5
if there's a chance of leading edge or trailing edge spaces and tabs, then perhaps
mawk 'NF=gsub("^[ \t]+|[ \t]+$",_)^_+!_' OFS='\t' RS='[\r]?\n'

Linux - Delete lines from file 1 in file 2 BIG DATA

have two files:
file1:
a
b
c
d
file2:
a
b
f
c
d
e
output file (file2) should be:
f
e
I want that the lines of file1 should be deleted directly in file2. I want that the output should be not a new file. It should direct deleted in file 2. Of course there can be created a temp file.
I real file two contains more than 300.000 lines. That is the reason why some solution:
comm -13 file1 file2
don't work.
comm needs the input files to be sorted. You can use process substitution for that:
#!/bin/bash
comm -13 <(sort file1) <(sort file2) > tmp_file
mv tmp_file > original_file
Output:
e
f
Alternatively, if you have enough memory, you can use the following awk command which does not need the input to be sorted:
awk 'NR==FNR{a[$0];next} !($0 in a)' file1 file2
Output (preserved sort order):
f
e
Keep in mind that the size of the array a directly depends on the size of file1.
PS: grep -vFf file1 file2 can also be used and the memory requirements are the same as for the awk solution. Given that, I would probably just use grep.

Fast extraction of lines based on line numbers

I am looking for a fast way to extract lines of a file based on a list of line numbers read from a different file in bash.
Define three files:
position_file: Containing a single column of integers
full_data_file: Containing a single column of data
extracted_data_file: Containing those lines in full_data_file whose line numbers match the integers in position_file
My current way of doing this is
while read position; do
awk -v pos="$position" 'NR==pos {print; exit}' < full_data_file >> extracted_data_file
done < position_file
The problem is that this is painfully slow and I'm trying to do this for a large number of rather large files. I was hoping someone might be able to suggest a faster way.
Thank you for your help.
The right way with awk command:
Input files:
$ head pos.txt data.txt
==> pos.txt <==
2
4
6
8
10
==> data.txt <==
a
b
c
d
e
f
g
h
i
j
awk 'NR==FNR{ a[$1]; next }FNR in a' pos.txt data.txt > result.txt
$ cat result.txt
b
d
f
h
j

how to replace the first column of a tab delimited file

690070 690070 A
690451 690451 B
690571 690571 C
690578 690578 D
690637 690637 F
How can I replace the first column values with a sequential number, starting from 1...n. So it becomes:
1 690070 A
2 690451 B
3 690571 C
4 690578 D
5 690637 F
Can this be done in Vim or some linux command?
You can use awk or vim macro.
awk is really great for such text manipulation
awk '{count++; print count " " $2 " "$3;}' data.stat > /tmp/data.stat && mv /tmp/data.stat data.stat
in Vim:
:let i=1 | g/^[^/\t]*\t/s//\= i. "\t"/ | let i=i+1
Reference
Update
For splitting the first two columns and saving into another file,
I recommend using awk as in Tomáš Šíma's answer, specifically:
awk '{print $1 "\t" $2;}' data.stat > newfilename.txt
If you want to to do everything in Vim:
Copy the current file to a new one
:w newfilename.txt
Open the newly copied file:
:o newfilename.txt
Split the first two columns of the rest of the line:
:%s/^\([^\t]*\)\t\([^\t]*\).*$/\1\t\2/g
Save your edits of course
:w newfilename.txt

Extract lines from File2 already found File1

Using linux commandline, i need to output the lines from text file2 that are already found in file1.
File1:
C
A
G
E
B
D
H
F
File2:
N
I
H
J
K
M
D
L
A
Output:
A
D
H
Thanks!
You are looking for the tools 'grep'
Check this out.
Lets say you have inputs in file1 & file2 files
grep -f file1 file2
will return you
H
D
A
A more flexible tool to use would be awk
awk 'NR==FNR{lines[$0]++; next} $1 in lines'
Example
$ awk 'NR==FNR{lines[$0]++; next} $1 in lines' file1 file2
H
D
A
What it does?
NR==FNR{lines[$0]++; next}
NR==FNR checks if the file number of records is equal to the overall number of records. This is true only for the first file, file1
lines[$0]++ Here we create an associative array with the line, $0 in file 1 as index.
$0 in lines This line works only for the second file because of the next in previous action. This checks if the line in file 2 is there in the saved array lines, if yes the default action of printing the entire line is taken
Awk is more flexible than the grep as you can columns in file1 with any column in file 2 and decides to print any column rather than printing the entire line
This is what the comm utility does, but you have to sort the files first: To get the lines in common between the 2 files:
comm -12 <(sort File1) <(sort File2)

Resources