How do I convert spaced columns to tabs? [duplicate] - linux

This question already has an answer here:
Fixed width to CSV
(1 answer)
Closed 5 years ago.
This question is not a duplicate as someone had suggested. Mods, pay attention
I'm running a for loop on multiple files that contain information like below
1 Leer Normal [status] — 100
1 Wrap Normal [physical] 15 90
4 Poison Sting Poison [physical] 15 100
9 Bite Dark [physical] 60 100
12 Glare Normal [status] — 100
17 Screech Normal [status] — 85
20 Acid Poison [special] 40 100
25 Spit Up Normal [special] — 100
25 Stockpile Normal [status] — —
25 Swallow Normal [status] — —
28 Acid Spray Poison [special] 40 100
33 Mud Bomb Ground [special] 65 85
36 Gastro Acid Poison [status] — 100
38 Belch Poison [special] 120 90
41 Haze Ice [status] — —
44 Coil Poison [status] — —
49 Gunk Shot Poison [physical] 120 80
I need to be able to extract data from it.
The problem is, each file has different column lengths.
Column 2 sometimes has spaces in it so squeezing all spaces and using space as a delimiter for cut is not an option. I need the columns separated by tabs without using specific information because the loop goes over about 800 files.
sed 's/ \+/ /g' | cut -f 2 -d " "
^ Not what I need since column 2 has spaces in it
cut -b "5-20"
^ Can't use this either because the columns lengths are different for each file.

With sed, to replace multiples consecutive spaces or tabs with one tab:
sed 's/[[:space:]]\{1,\}/\t/g' file
Explanations:
s: substitute
[[:space:]]: space or tab characters
\{1,\}: when at least one occurrence is found
g: apply substitution to all occurrences in line
Edit:
To preserve single spaces in second column, you can replace only when 2 spaces/tabs are found:
sed 's/[[:space:]]\{2,\}/\t/g' file

Related

awk split adds whole string to array position 1 (reason unknown)

So I have a .txt file that looks like this:
mona 70 77 85 77
john 85 92 78 80
andreja 89 90 85 94
jasper 84 64 81 66
george 54 77 82 73
ellis 90 93 89 88
I have created a grades.awk script that contains the following code:
{
FS=" "
names=$1
vi1=$2
vi2=$3
vi3=$4
rv=$5
#printf("%s ",names);
split(names,nameArray," ");
printf("%s\t",nameArray[1]); //prints the whole array of names for some reason, instead of just the name at position 1 in array ("john")
}
So my question is, how do I split this correctly? Am I doing something wrong?
How do you read line by line, word by word correctly. I need to add each column into its own array. I've been searching for the answer for quite some time now and can't fix my problem.
here is a template to calculate average grades per student
$ awk '{sum=0; for(i=2;i<=NF;i++) sum+=$i;
printf "%s\t%5.2f\n", $1, sum/(NF-1)}' file
mona 77.25
john 83.75
andreja 89.50
jasper 73.75
george 71.50
ellis 90.00
printf("%s\t",nameArray[1])
is doing exactly what you want it to do but you aren't printing any newline between invocations so it's getting called once per input line and outputting one word at a time but since you aren't outputting any newlines between words you just get 1 line of output. Change it to:
printf("%s\n",nameArray[1])
There are a few other issues with your code of course (e.g. you're setting FS in the wrong place and unnecessarily, names only every contains 1 word so splitting it into an array doesn't make sense, etc.) but I think that's what you were asking about specifically.
If that's not all you want then edit your question to clarify what you're trying to do and add concise, testable sample input and expected output.

How to find pattern and make operation in another field in awk?

I have a file with 4 columns separated by space like this bellow:
1_86500000 50 1_87500000 19
1_87500000 13 1_89500000 42
1_89500000 25 1_90500000 10
1_90500000 3 1_91500000 11
1_91500000 23 1_92500000 29
1_92500000 34 1_93500000 4
1_93500000 39 1_94500000 49
1_94500000 35 1_95500000 26
2_35500000 1 2_31500000 81
2_31500000 12 2_4150000 50
The First and Third columns are not in phase so I can not divide the value of one by another.
As there are only two or one possible columns $1 or $3, a solution would be look for the pattern and divide its value in the another column or set it to 0 if there is none like this expected result shows:
P.S. the second field in this expected result is just illustrative to shown the division.
1_86500000 0/50 0
1_87500000 19/13 1.46154
1_89500000 42/25 1.68
1_90500000 10/3 3.333
1_91500000 11/23 0.47826
1_92500000 29/34 0.85294
1_93500000 4/39 0.10256
1_94500000 49/35 1.4
2_35500000 0/1 0
2_31500000 81/12 6.75
2_4150000 50/0 50
I do not archived anything by myself other than this. So I do not have any starting point by now.
I tried separate the fields merged with _ to see if I could match by subtracting the coordinates. If I got 0 would mean that the columns was in phase and correct. But I could not go further.
awk '{if( ($5-$2)==0) print $1,$2,$3,$4,$5,$6}' file
I tried to match both columns but I only got phased results:
awk '{if(($1==$3)) print $1,$4/$2}' file
Can you help me?
awk to the rescue!
$ awk '{d[$1]=$2; n[$3]=$4}
END {for(k in n)
if(k in d) {print k,n[k]"/"d[k],n[k]/d[k]; delete d[k]}
else print k,n[k]"/0",n[k];
for(k in d) print k,"0/"d[k],0}' file | sort
1_86500000 0/50 0
1_87500000 19/13 1.46154
1_89500000 42/25 1.68
1_90500000 10/3 3.33333
1_91500000 11/23 0.478261
1_92500000 29/34 0.852941
1_93500000 4/39 0.102564
1_94500000 49/35 1.4
1_95500000 26/0 26
2_31500000 81/12 6.75
2_35500000 0/1 0
2_4150000 50/0 50
your division by zero result is little strange though!
Explanation keep two arrays for numerator and denominator. Once scanned the file, go over numerator array and find the corresponding denominator and make the division. For the denominators not used apply the convention given.

Remove lines from file using Hex locations

I've a big file from which i want to remove some content, the file is binary, and i don't have line numbers, but hex address, so how can i remove the region between:
0x13e70a00 and 0x1eaec03ff
With sed (both inclusive)
Will something like this, work?
sed -n 's/\x13e70a00/,s/\x1eaec03ff/ p' orig-data-file > new-file
from what you wrote it looks like you are trying to delete all the bytes between the two hex patterns. for that you will need
this deletes all the bytes between the patterns inclusive of the patterns.
sed 's/\x13\xe7\x0a\x00.*\x1e\xae\xc0\x3f//g' in >out
This deletes all bytes between patterns leaving the patterns intact. (there is a way to this with numbered parts of regexes but this is a bit clearer to beging with)
sed 's/\x13\xe7\x0a\x00.*\x1e\xae\xc0\x3f/\x13\xe7\x0a\x00\x1e\xae\xc0\x3f/g' in >out
They search s/ for a <pattern1> followed by any text .* followed by <pattern2> and replace it with either nothing //g or just the two edges /<pattern1><pattern2>/g throughout the file /g
If you want to delete (or replace) from byte 300 to byte 310:
sed 's/\(.\{300\}\).\{10\}/\1rep-str/' in>out
this matches the first 300 characters (.\{300\} )and remembers them (the \(\) ). It matches the next 10 characters too. It replaces this whole combined match with the first 300 characters (\1) followed by your replacement string rep-str this replacement string can be empty to just delete the text between bytes 300 and 310.
However, this is quite brittle if there are any newline characters. if you can live without replacement:
dd if=file bs=1 skip=310|dd of=file bs=1 seek=300 conv=notrunc
this does an in place replacement by copying from the 310th byte onwards till into the file starting from 300 position thus deleting 10 bytes
an even more general alternative is
dd if=in bs=1 count=300>out
printf "replacement text">>out
dd if=in bs=1 skip=310>>out
though the simplest thing to do will be to use a hex editor like Bless
You should be able to use a clever combination of converting bash numbers from hex to decimal, bash math to add 1 to the decimal offsets, and cut --complement -b to remove the correct segment from the file.
EDIT: Like this:
$ snip_out 0x0f 0x10 <<< "0123456789abcdeffedcba9876543210" | od -t x1
0000000 30 31 32 33 34 35 36 37 38 39 61 62 63 64 65 65
0000020 64 63 62 61 39 38 37 36 35 34 33 32 31 30
0000036
Where snip_out is a two-parameter shell-script that operates on stdin and stdout:
#!/bin/bash
START_RANGE_DEC=$(printf "%d" $1)
END_RANGE_DEC=$(printf "%d" $2)
# Most hex ranges begin with 0; cut begins with 1.
CUT_START_DEC=$(( $START_RANGE_DEC + 1 ))
CUT_END_DEC=$(( $END_RANGE_DEC + 1 ))
# cut likes to append a newline after output. Use head to remove it.
exec cut --complement -b $CUT_START_DEC-$CUT_END_DEC | head -c -1

merging two files based on two columns

I have a question very similar to a previous post:
Merging two files by a single column in unix
but i want to merge my data based on two columns (The orders are the same, so no need to sort).
Example,
subjectid subID2 name age
12 121 Jane 16
24 241 Kristen 90
15 151 Clarke 78
23 231 Joann 31
subjectid subID2 prob_disease
12 121 0.009
24 241 0.738
15 151 0.392
23 231 1.2E-5
And the output to look like
subjectid SubID2 prob_disease name age
12 121 0.009 Jane 16
24 241 0.738 Kristen 90
15 151 0.392 Clarke 78
23 231 1.2E-5 Joanna 31
when i use join it only considers the first column(subjectid) and repeats the SubID2 column.
Is there a way of doing this with join or some other way please? Thank you
join command doesn't have an option to scan more than one field as a joining criteria. Hence, you will have to add some intelligence into the mix. Assuming your files has a FIXED number of fields on each line, you can use something like this:
join f1 f2 | awk '{print $1" "$2" "$3" "$4" "$6}'
provided the the field counts are as given in your examples. Otherwise, you need to adjust the scope of print in the awk command, by adding or taking away some fields.
If the orders are identical, you could still merge by a single column and specify the format of which columns to output, like:
join -o '1.1 1.2 2.3 1.3 1.4' file_a file_b
as described in join(1).

How can I align columns where the biggest number or greatest string is the align indicator?

How can I right align (and left align?) a block of numbers or text in vim like this:
from:
45 209 25 1
2 4 2 3
34 5 300 5
34 120 34 12
to this:
45 209 25 1
2 4 2 3
34 5 300 5
34 120 34 12
That means the biggest number or greatest string in every column doesn't move.
In the first column it is 45+34, in the second column 209+120, in the third column 300 and in the last column 12.
Have a look at the align plugin, it can do this and much more. Great tool in your utility belt!
Found here
After some serious vimhelp/reading I found the correct AlignCtrl mapping...
Visually select the table, e.g. by using ggVG, then do a \Tsp i.e. <leader>Tsp
Then I get this:
45 209 25 1
2 4 2 3
34 5 300 5
34 120 34 12
From vimhelp:
\Tsp : use Align to make a table separated by blanks |alignmap-Tsp|
(right justified)
You can look into the Tabularize plugin. So if you have something like
45 209 25 1
2 4 2 3
34 5 300 5
34 120 34 12
just select those lines in the visual mode and type :Tab/ and it will format it as
45 209 25 1
2 4 2 3
34 5 300 5
34 120 34 12
Also, it looks like you don't have an equal number of spaces separating the numbers at the moment. So before you use the plugin, replace all the multiple spaces with a single space with the following regex:
%s![^ ]\zs \+! !g
With the Align plugin you can select the rows you want to align and hit :
<Leader>Tsp
From Align.txt
\Tsp : use Align to make a table separated by blanks |alignmap-Tsp|
(right justified)
(The help mention \ because it is the default leader but in case you have changed it to something else you must adapt accordingly)
Just trying on my install, I got the following result :
45 209 25 1
2 4 2 3
34 5 300 5
34 120 34 12
In my opinion Align plugin is great but the "align maps" and various commands are not really easy to remember.
With the Align and AlignMaps plugins: select using V, then \anum (AlignMaps comes with Align). One advantage of \anum is that it also handles decimal points (commas) and scientific notation.
I think the best thing to do is to first eat all multiple spaces with
:{range}s/ \+/ /g
And then call Tabularize
:Tab / /r1
Or change that r to l.

Resources