Concatenating columns from different files, while skipping the blank lines - linux

I know it's likely possible to do this with awk, but I have no idea how to do it.
Suppose I have the following 2 tab separated files, where there are blank lines that only contain \n:
file1:
A 1 4
B 2 5
C 3 6
D 7 10
E 8 11
A 9 12
file2:
E 13 16
F 14 17
G 15 18
H 19 22
I 20 23
J 21 24
I want to generate a new file which corresponds to the concatenation of the first 2 columns from file 1 with the third column from file 2, and then the third column from file 1:
final file:
A 1 16 4
B 2 17 5
C 3 18 6
D 7 22 10
E 8 23 11
A 9 24 12
Note that, in the final file, it's important that the blank lines should be kept blank, and no tabs should be inserted in there.

Simple paste + awk combination:
paste file1 file2 | awk '!NF{ print "" }NF{ print $1,$2,$6,$3 }'
The output:
A 1 16 4
B 2 17 5
C 3 18 6
D 7 22 10
E 8 23 11
A 9 24 12

awk 'NR==FNR{a[NR]=$3;next} NF{$3=a[FNR] OFS $3} 1' file2 file1

Related

How to drop columns of csv data in J

I have a lot of csv files that I have to drop the date column.
I have a J line that reads in csv file into a numeric array, rdtabfile =: (0&".;.2#:(TAB&,)#:}:);._2) # ReadFile #<
If you know the column number of the date column, I would just use a mask across each line of the array and the copy # dyadic verb.
[ t =: i. 4 5
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
mask=: ~: [: i. # NB. x would be the column to be dropped, y is the numeric matrix
delcol=: (mask # ])"1
1 delcol t
0 2 3 4
5 7 8 9
10 12 13 14
15 17 18 19
delcola=: ((~: [: i. #) # ])"1 NB. can be done in one line
2 delcola t
0 1 3 4
5 6 8 9
10 11 13 14
15 16 18 19

How to set a variable space with right alignment for a string in Python?

I'm trying to do this program where given a number N, one has to print out the decimal, octal, hexadecimal and binary for all the numbers in range 1 to N. The trouble is that the platform requires the solution in a particular format.
Suppose the number is 17, so the output should be like :
1 1 1 1
2 2 2 10
3 3 3 11
4 4 4 100
5 5 5 101
6 6 6 110
7 7 7 111
8 10 8 1000
9 11 9 1001
10 12 A 1010
11 13 B 1011
12 14 C 1100
13 15 D 1101
14 16 E 1110
15 17 F 1111
16 20 10 10000
17 21 11 10001
For 7 it would be like :
1 1 1 1
2 2 2 10
3 3 3 11
4 4 4 100
5 5 5 101
6 6 6 110
7 7 7 111
If you notice, the above is required to be printed in a way that the decimal, octal and hexadecimal numbers need a minimum of 2 spaces at their left whereas the binary numbers need at least one space at their left. Now, as the length of the numbers increase the space needs to be given accordingly such that the minimum space is there even for the max length number. So, how do I print them using a variable space? So far I have tried this :
Code
def print_formatted(number):
space=len(str(bin(number))[2:])
for i in range(1,number+1):
print('{:2d}'.format(i), end='')
print('{:>3s}'.format(str(oct(i))[2:]), end='')
print('{:>3s}'.format(str(hex(i))[2:]), end='')
print('{:>'+str(space)+'s}'.format(str(bin(i))[2:]))
print_formatted(17)
Here, I just tried doing the required with just the binary numbers but it's giving me an error
print('{:>'+str(space)+'s}'.format(str(bin(i))[2:]))
ValueError: Single '}' encountered in format string
Is there any fix/alternative for this?
Your problem is operator order - the + for string concattenation is weaker then the method call in
'{:>' + str(space) + 's}'.format(str(bin(i))[2:])
. Thats why you call the .format(...) only on "s}" - not the whole string. And thats where the
ValueError: Single '}' encountered in format string
comes from.
Putting the complete formatstring into parenthesis before applying .format to it fixes that.
You also need 1 more space for binary and can skip some str() that are not needed:
def print_formatted(number):
space=len(str(bin(number))[2:])+1 # fix here
for i in range(1,number+1):
print('{:2d}'.format(i), end='')
print('{:>3s}'.format(oct(i)[2:]), end='')
print('{:>3s}'.format(hex(i)[2:]), end='')
print(('{:>'+str(space)+'s}').format(bin(i)[2:])) # fix here
print_formatted(17)
Output:
1 1 1 1
2 2 2 10
3 3 3 11
4 4 4 100
5 5 5 101
6 6 6 110
7 7 7 111
8 10 8 1000
9 11 9 1001
10 12 a 1010
11 13 b 1011
12 14 c 1100
13 15 d 1101
14 16 e 1110
15 17 f 1111
16 20 10 10000
17 21 11 10001
From your given output above you might need to prepend this by 2 spaces - not sure if its a formatting error in your output above or part of the restrictions.
You could also shorten this by using f-strings (and removing superflous str() around bin, oct, hex: they all return a strings already).
Then you need to calculate the the numbers you use to your space out your input values:
def print_formatted(number):
de,bi,oc,he = len(str(number)), len(bin(number)), len(oct(number)), len(hex(number))
for i in range(1,number+1):
print(f' {i:{de}d}{oct(i)[2:]:>{oc}s}{hex(i)[2:]:>{he}s}{bin(i)[2:]:>{bi}s}')
print_formatted(26)
to accomodate other values then 17, f.e. 128:
1 1 1 1
2 2 2 10
3 3 3 11
...
8 10 8 1000
...
16 20 10 10000
...
32 40 20 100000
...
64 100 40 1000000
...
128 200 80 10000000

Using IF and AND function

I am trying to use the IF and AND function in excel for values in two different cells. I have 25 conditions.
Below is the formula I've created but it keeps on saying there's an error.
IF(AND(A10=“A”,B10=1),11,IF(AND(A=“A”,B10=2),16,IF(AND(A10=“A”,B10=3),20,IF(AND(A10=“A”,B10=4),23,IF(AND(A10=“A”,B10=5),25,IF(AND(A10=“B”,B10=1),7,IF(AND(A10=“B”,B10=2),12,IF(AND(A10=“B”,B10=3),17,IF(AND(A10=“B”,B10=4),21,IF(AND(A10=“B”,B10=5),24,IF(AND(A10=“C”,B10=1),4,IF(AND(A10=“C”,B10=2),8,IF(AND(A10=“C”,B10=3),13,IF(AND(A10=“C”,B10=4),18,IF(AND(A10=“C”,B10=5),22,IF(AND(A10=“D”,B10=1),2,IF(AND(A10=“D”,B10=2),5,IF(AND(A10=“D”,B10=3),9,IF(AND(A10=“D”,B10=4),14,IF(AND(A10=“D”,B10=5),19,IF(AND(A10=“E”,B10=1),1,IF(AND(A10=“E”,B10=2),3,IF(AND(A10=“E”,B10=3),6,IF(AND(A10=“E”,B10=4),10,15))))))))))))))))))))))))))))))))))))))))))))))))
I expected the output to be, for example; if cell1 is "A" and cell2 is 1 the result should be 11.
I would highly advise a lookup table. Simply have all of your options listed out with their desired results and find them with a criteria search, such as the use of sumifs function.
For example, if you paste J1:L25 your possibilities:
A 1 11
A 2 16
A 3 20
A 4 23
A 5 25
B 1 7
B 2 12
B 3 17
B 4 21
B 5 24
C 1 4
C 2 8
C 3 13
C 4 18
C 5 22
D 1 2
D 2 5
D 3 9
D 4 14
D 5 19
E 1 1
E 2 3
E 3 6
E 4 10
E 5 15
You can then place the formula =SUMIFS($L$1:$L$25,$J$1:$J$25,$A$10,$K$1:$K$25,$B$10) to return your desired value.
That is, =SUMIFS(range_of_results, criteria_range_of_A-E, A10, criteria_range_of_1-5, B10)

Delete duplicate rows but keep the first 2 instances

I have a column which contends duplicate rows, then i will like to delete but to keep the first 2 instances .
Remove duplicate lines which has been repeated more than 2 times
Example input
i 10
i 10
a 12
a 12
b 12
b 12
c 14
c 14
x 14
x 14
y 14
y 14
a 14
a 14
n 13
n 13
m 13
m 13
x 13
x 13
output desired.
i 10
i 10
a 12
a 12
c 14
c 14
n 13
n 13
I tried
awk '!a[$2]++' file
Appreciate your help
I think the problem with your command is that you are checking if it is the first one instead of checking whether it is the one of the first two. Something like this should work:
awk 'a[$2]++<2' file

How do I get each column of the same row in one line?

I have this columns in excel:
A B C D E F
Nima1 1 2 3 4 5
Nima2 6 7 8 9 10
Nima3 11 12 13 14 15
Nima4 16 17 18 19 20
and I want to show them like this:
Nima1 1
Nima1 2
Nima1 3
Nima1 4
Nima1 5
Nima2 6
…
Nima4 20
and so far I come up with nothing, every formula that I write doesn't work.
please if anyone knows how to do it, guide me through it.
In any unused cell to the right put in this formula,
'for system that use a comma as a list separator
=INDEX(A:E,(ROW(1:1)-1)/5+1,IF(COLUMN(A:A)=1,1,MOD(ROW(1:1)-1,5)+1))
'for system that use a semi-colon as a list separator
=INDEX(A:E;(ROW(1:1)-1)/5+1;IF(COLUMN(A:A)=1;1;MOD(ROW(1:1)-1;5)+1))
Fill right one column then fill both down until you get zeroes.

Resources