I have to write a script file to cut the following column and paste it the end of the same row in a new .arff file. I guess the file type doesn't matter.
Current file:
63,male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50'
67,male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1'
The output should be:
male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
how can I do this? using a Linux script file?
sed -r 's/^([^,]*),(.*)$/\2,\1/' Input_file
Brief explanation,
^([^,]*) would match the first field which separated by commas, and \1 behind refer to the match
(.*)$ would be the remainding part except the first comma, and \2 would refer to the match
Shorter awk solution:
$ awk -F, '{$(NF+1)=$1;sub($1",","")}1' OFS=, input.txt
gives:
male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
Explanation:
{$(NF+1)=$1 # add extra field with value of field $1
sub($1",","") # search for string "$1," in $0, replace it with ""
}1 # print $0
EDIT: Reading your comments following your question, looks like your swapping more columns than just the first to the end of the line. You might consider using a swap function that you call multiple times:
func swap(i,j){s=$i; $i=$j; $j=s}
However, this won't work whenever you want to move a column to the end of the line. So let's change that function:
func swap(i,j){
s=$i
if (j>NF){
for (k=i;k<NF;k++) $k=$(k+1)
$NF=s
} else {
$i=$j
$j=s
}
}
So now you can do this:
$ cat tst.awk
BEGIN{FS=OFS=","}
{swap(1,NF+1); swap(2,5)}1
func swap(i,j){
s=$i
if (j>NF){
for (k=i;k<NF;k++) $k=$(k+1)
$NF=s
} else {
$i=$j
$j=s
}
}
and:
$ awk -f tst.awk input.txt
male,t,145,233,typ_angina,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,f,160,286,asympt,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
Why using sed or awk, the shell can handle this easily
while read l;do echo ${l#*,},${l%%,*};done <infile
If it's a win file with \r
while read l;do f=${l%[[:cntrl:]]};echo ${f#*,},${l%%,*};done <infile
If you want to keep the file in place.
printf "%s" "$(while read l;do f=${l%[[:cntrl:]]};printf "%s\n" "${f#*,},${l%%,*}";done <infile)">infile
I need to add a random guid to each line in a large text file. I need that guid to be different for each line.
This works except that the guid is the same for every line:
sed -e "s/$/$(uuidgen -r)/" text1.log > text2.log
Here is a way to do it using awk:
awk -v cmd='uuidgen' 'NF{cmd | getline u; print $0, u > "test2.log"; close(cmd)}' test1.log
Condition NF (or NF > 0) ensures we do it only for non-empty lines.
Since we are calling close(cmd) each time there will be a new call to uuidgen for every record.
However since uuidgen is called for every non-empty line, it might be slow for huge files.
That's because the command substitution will get evaluated before the commands gets started.
The shell will first execute uuidgen -r, and replace the command substitution be it's result, let's say 0e4e5a48-82d1-43ea-94b6-c5de7573bdf8. The shell will then execute sed like this:
sed -e "s/$/0e4e5a48-82d1-43ea-94b6-c5de7573bdf8/" text1.log > text2.log
You can use a while loop in the shell to achieve your goal:
while read -r line ; do
echo "$line $(uuidgen -r)"
done < file > file_out
Rather than run a whole new uuidgen process for each and every line, I generated a new UUID for each line in Perl which is just a function call:
#!/usr/bin/perl
use strict;
use warnings;
use UUID::Tiny ':std';
my $filename = 'data.txt';
open(my $fh,'<',$filename)
or die "Could not open file '$filename' $!";
while (my $row = <$fh>) {
chomp $row;
my $uuid = create_uuid(UUID_V4);
my $str = uuid_to_string($uuid);
print "$row $str\n";
}
To test, I generated a 1,000,000 line CSV as shown here.
It takes 10 seconds to add the UUID to the end of each line of the 1,000,000 record file on my iMac.
I'm trying to remove lines that contain 0/0 or ./. in column 71 "FORMAT.1.GT" from a tab delimited text file.
I've tried the following code but it doesn't work. What is the correct way of accomplishing this? Thank you
my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6";
You can either call a one-liner as borodin and zdim said. Which one is right for you is still not clear because you don't tell whether 71st column means the 71st tab-separated field of a line or the 71st character of that line. Consider
12345\t6789
Now what is the 2nd column? Is it the character 2 or the field 6789? Borodin's answer assumes it's 6789 while zdim assumes it's 2. Both showed a solution for either case but these solutions are stand-alone solutions. Programs of its own to be run from the commandline.
If you want to integrate that into your Perl script you could do it like this:
Replace this line:
my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6";
with this snippet:
open( my $fh_in, '<', $Variantlinestsvfile ) or die "cannot open $Variantlinestsvfile: $!\n";
open( my $fh_out, '>', $MDLtsvfile ) or die "cannot open $MDLtsvfile: $!\n";
while( my $line = <$fh_in> ) {
# character-based:
print $fh_out $line unless (substr($line, 70, 3) =~ m{(?:0/0|\./\.)});
# tab/field-based:
my #fields = split(/\s+/, $line);
print $fh_out $line unless ($fields[70] =~ m|([0.])/\1|);
}
close($fh_in);
close($fh_out);
Use either the character-based line or the tab/field-based lines. Not both!
Borodin and zdim condensed this snippet to a one-liner, but you must not call that from a Perl script.
Since you need the exact position and know string lenghts substr can find it
perl -ne 'print if not substr($_, 70, 3) =~ m{(?:0/0|\./\.)}' filename
This prints lines only when a three-character long string starting at 71st column does not match either of 0/0 and ./.
The {} delimiters around the regex allow us to use / and | inside without escaping. The ?: is there so that the () are used only for grouping, and not capturing. It will work fine also without ?: which is there only for efficiency's sake.
perl -ane 'print unless $F[70] =~ m|([0.])/\1|' myfile > newfile
The problem with your command is that you are attempting to capture the output of a command which produces no output - all the matches are redirected to a file, so that's where all the output is going.
Anyway, calling grep from Perl is just wacky. Reading the file in Perl itself is the way to go.
If you do want a single shell command,
grep -Ev $'^([^\t]*\t){70}(\./\.|0/0)\t' file
would do what you are asking more precisely and elegantly. But you can use that regex straight off in your Perl program just as well.
Try it!
awk '{ if ($71 != "./." && $71 != ".0.") print ; }' old_file.txt > new_file.txt
What is the best way to remove all lines from a text file starting at first empty line in Bash? External tools (awk, sed...) can be used!
Example
1: ABC
2: DEF
3:
4: GHI
Line 3 and 4 should be removed and the remaining content should be saved in a new file.
With GNU sed:
sed '/^$/Q' "input_file.txt" > "output_file.txt"
With AWK:
$ awk '/^$/{exit} 1' test.txt > output.txt
Contents of output.txt
$ cat output.txt
ABC
DEF
Walkthrough: For lines that matches ^$ (start-of-line, end-of-line), exit (the whole script). For all lines, print the whole line -- of course, we won't get to this part after a line has made us exit.
Bet there are some more clever ways to do this, but here's one using bash's 'read' builtin. The question asks us to keep lines before the blank in one file and send lines after the blank to another file. You could send some of standard out one place and some another if you are willing to use 'exec' and reroute stdout mid-script, but I'm going to take a simpler approach and use a command line argument to let me know where the post-blank data should go:
#!/bin/bash
# script takes as argument the name of the file to send data once a blank line
# found
found_blank=0
while read stuff; do
if [ -z $stuff ] ; then
found_blank=1
fi
if [ $found_blank ] ; then
echo $stuff > $1
else
echo $stuff
fi
done
run it like this:
$ ./delete_from_empty.sh rest_of_stuff < demo
output is:
ABC
DEF
and 'rest_of_stuff' has
GHI
if you want the before-blank lines to go somewhere else besides stdout, simply redirect:
$ ./delete_from_empty.sh after_blank < input_file > before_blank
and you'll end up with two new files: after_blank and before_blank.
Perl version
perl -e '
open $fh, ">","stuff";
open $efh, ">", "rest_of_stuff";
while(<>){
if ($_ !~ /\w+/){
$fh=$efh;
}
print $fh $_;
}
' demo
This creates two output files and iterates over the demo data. When it hits a blank line, it flips the output from one file to the other.
Creates
stuff:
ABC
DEF
rest_of_stuff:
<blank line>
GHI
Another awk would be:
awk -vRS= '1;{exit}' file
By setting the record separator RS to be an empty string, we define the records as paragraphs separated by a sequence of empty lines. It is now easily to adapt this to select the nth block as:
awk -vRS= '(FNR==n){print;exit}' file
There is a problem with this method when processing files with a DOS line-ending (CRLF). There will be no empty lines as there will always be a CR in the line. But this problem applies to all presented methods.
I have a nice .awk script that takes the 2nd $2 value and prints it. Because the data in the .txt files only go down 8192 lines, any lines after that are irrelevant (the script takes care of that.) I have 400+ .tst files that need to have the same thing done and have the ouput's placed into a single file. So how would I go through every .tst file in the current directory? I tried piping the cat output to a single line version of the script but it only processed the first file. Any suggestions?
BEGIN{
}
{
print $2 "\n";
if (NR==8192)
exit;
}
END {
print NR "\n";
}
This should work -
awk 'FNR<=8192{ print $2 }' *.tst > finalfile
Just glob all the .tst files in the current directory and redirect the output to outfile:
$ awk 'FNR<=8192{print $2"\n";next}{print FNR"\n";nextfile}' *.tst > outfile