Deleting columns at regular intervals - linux

I have a file with about 25000 columns and 3,000 lines. Now starting with column 4, I want to delete columns 3-5 at the interval of 8th column till the end.
Example:
The input file
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20
d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 d15 d16 d17 d18 d19 d20
Now I want the output file
c1 c2 c3 c4 c5 c9 c10 c11 c12 c13 c17 c18 c19 c20
d1 d2 d3 d4 d5 d9 d10 d11 d12 d13 d17 d18 d19 d20
I hope my question is not a confusing one. I know how to print at regular intervals but don't know how to solve this.

Simplest way is probably something like:
awk '{
printf "%s %s %s", $1, $2, $3
for (i=4; i<=NF; i++)
if ( ((i-3)%8) !~ /^[345]$/)
printf " %s", $i
print ""
}' file
c1 c2 c3 c4 c5 c9 c10 c11 c12 c13 c17 c18 c19 c20
d1 d2 d3 d4 d5 d9 d10 d11 d12 d13 d17 d18 d19 d20
It might need the math tweaked but hopefully you get the idea.

Related

vim : copy one entire column from one file to another file

File #1:
F#1 A1 B1 C1
F#1 A2 B2 C2
F#1 A3 B3 C3
F#1 A4 B4 C4
F#1 A5 B5 C5
F#1 A6 B6 C6
File #2:
D1 E1 F1
D2 E2 F2
D3 E3 F3
D4 E4 F4
D5 E5 F5
D6 E6 F6
The wanted format in File#2:
F#1 D1 E1 F1
F#1 D2 E2 F2
F#1 D3 E3 F3
F#1 D4 E4 F4
F#1 D5 E5 F5
F#1 D6 E6 F6
In vim, how to copy the entire column (1st column in this example) from File #1 to File #2, as in the wanted format shown above? Note that, in reality, the file is very long.
Thank you.
Blockwise visual mode (CTRL-v) can help you achieving this (:help v for documentation).
The sequence of commands should be:
Go to the top-left corner.
Press CTRL-v. You should be in "VISUAL BLOCK" mode.
Press G - it will take you to the end of file.
Press l to mark the column (4 times in this case).
Press y to copy this column.
Open the second file (:e <second-filename>).
Go to the top left corner and press P to paste the column.

I'd like to create a many-to-many table using unique values from multiple columns

I'm not sure if I got the term right, but I have lists of unique values and I want to replicate each values from a lists to correspond to every value from the rest of the lists.
From the following column consisting of unique values.....
column1
column2
column3
column4
a1
b1
c1
d1
a2
b2
c2
d2
a3
I want to create a table like this.
column1
column2
column3
column4
a1
b1
c1
d1
a1
b1
c1
d2
a1
b1
c2
d1
a1
b1
c2
d2
a1
b2
c1
d1
a1
b2
c1
d2
a1
b2
c2
d1
a1
b2
c2
d2
a2
b1
c1
d1
a2
b1
c1
d2
a2
b1
c2
d1
a2
b1
c2
d2
a2
b2
c1
d1
a2
b2
c1
d2
a2
b2
c2
d1
a2
b2
c2
d2
a3
b1
c1
d1
a3
b1
c1
d2
a3
b1
c2
d1
a3
b1
c2
d2
a3
b2
c1
d1
a3
b2
c1
d2
a3
b2
c2
d1
a3
b2
c2
d2
Is there a way to do this in excel, with the use of base functions, without manually replicating each value using copy/paste?
With Office 365 we can use:
=LET(
rng,A1:D3,
arr,INDEX(rng,MID(BASE(SEQUENCE(ROWS(rng)^4,,0),ROWS(rng),4),SEQUENCE(,4),1)+1,{1,2,3,4}),
FILTER(arr,(INDEX(arr,0,1)<>0)*(INDEX(arr,0,2)<>0)*(INDEX(arr,0,3)<>0)*(INDEX(arr,0,4)<>0))
)
We can also change the formula a bit to not assume 4 columns:
=LET(
rng,A1:D3,
clm,COLUMNS(rng),
sclm,SEQUENCE(,clm),
rw,ROWS(rng),
arr,INDEX(rng,MID(BASE(SEQUENCE(rw^clm,,0),rw,clm),sclm,1)+1,sclm),
FILTER(arr,BYROW(arr,LAMBDA(a,COUNT(--a)=0))))
This will do any number of columns (within reason).

Fastest way of going through many arrays

In python, I am trying to check all possible combinations of a5 + b5 + c5 + d5 = e5
I am starting my search with numbers under 200, but it takes a huge amount of time to go through all possibilities, how could I make this code faster?
xL = [x*x*x*x*x for x in range(1, 200)]
for x1 in xL:
for x2 in xL:
for x3 in xL:
for x4 in xL:
for x5 in xL:
if x1 + x2 + x3 + x4 == x5:
print(x1, x2, x3, x4, x5)
WARNING: I'm not a python programmer and none of the example code is tested. Expect bugs and silly syntax errors. Focus on the concepts.
If a5 + b5 + c5 + d5 = e5 then you can rearrange the formula to get a5 = e5 - b5 - c5 - d5
If the minimum values for b, c and d is 1; then you also know that a5 <= e5 - 1 - 1 - 1
In other words, after you've chosen a value for e, you can limit the range of values of a5 to the range 1 to e5 - 3; like maybe:
xL = [x*x*x*x*x for x in range(1, 200)]
for e5 in xL:
for a5 in xL:
if a5 > e5 - 3: break
You can rearrange the formula again to get b5 = e5 - a5 - c5 - d5. If the minimum values for c and d is 1; then you also know that b5 <= e5 - a5 - 1 - 1.
In other words, after you've chosen a value for e and chosen a value for a, you can limit the range of values of b5 to the range 1 to e5 - a5 - 2; like maybe:
xL = [x*x*x*x*x for x in range(1, 200)]
for e5 in xL:
for a5 in xL:
if a5 > e5 - 3: break
for b5 in xL:
if b5 > e5 - a5 - 2: break
If you continue this logic you end up with:
xL = [x*x*x*x*x for x in range(1, 200)]
for e5 in xL:
for a5 in xL:
if a5 > e5 - 3: break
for b5 in xL:
if b5 > e5 - a5 - 2: break
for c5 in xL:
if c5 > e5 - a5 - b5 - 1: break
for d5 in xL:
if d5 > e5 - a5 - b5 - c5: break
if a5 + b5 + c5 + d5 == e5:
print(e5, a5, b5, c5, d5)
break
However; after you've found one solution you can merely swap the values in a and b to find another solution. For example, you might be able to do this:
if a5 + b5 + c5 + d5 == e5:
print(e5, a5, b5, c5, d5)
print(e5, b5, a5, c5, d5)
break
The problem is that the same solution will be reported twice, unless you can find a way to avoid it. You can avoid that by making sure that b is never larger than a and doing this:
xL = [x*x*x*x*x for x in range(1, 200)]
for e5 in xL:
for a5 in xL:
if a5 > e5 - 3: break
for b5 in xL:
if b5 > a5: break
if b5 > e5 - a5 - 2: break
for c5 in xL:
if c5 > e5 - a5 - b5 - 1: break
for d5 in xL:
if d5 > e5 - a5 - b5 - c5: break
if a5 + b5 + c5 + d5 == e5:
print(e5, a5, b5, c5, d5)
if a5 != b5:
print(e5, b5, a5, c5, d5)
break
This is "very fortunate" because that if b5 > a5: break will improve performance a lot.
However; after you've found "one solution that becomes 2 solutions" you can merely swap the values in a and c to find another solution, and also swap the values in b and c to find another solution; using the same technique to avoid reporting the same solution twice (and improving performance more).
xL = [x*x*x*x*x for x in range(1, 200)]
for e5 in xL:
for a5 in xL:
if a5 > e5 - 3: break
for b5 in xL:
if b5 > a5: break
if b5 > e5 - a5 - 2: break
for c5 in xL:
if c5 > b5: break
if c5 > e5 - a5 - b5 - 1: break
for d5 in xL:
if d5 > e5 - a5 - b5 - c5: break
if a5 + b5 + c5 + d5 == e5:
print(e5, a5, b5, c5, d5)
if a5 != b5:
print(e5, b5, a5, c5, d5)
if b5 != c5:
print(e5, c5, a5, b5, d5)
else:
if b5 != c5:
print(e5, a5, c5, b5, d5)
if a5 != c5:
print(e5, c5, b5, a5, d5)
break
However; after you've found "one solution that becomes several solutions" you can merely swap the values in a and d to find another solution, and also swap the values in b and d to find another solution, and also swap the value in c and d to find another solution; using the same technique to avoid reporting the same solution twice (and improving performance more).

Rolling available Calculation

What I think I am looking for is a rolling calculation, for example:
If A2 = F2:F4 Then subtract B2 from G2 and add C2 with result in D2. This would continue until it reaches the same part again like in rows 3,4,8,and 10. Once a part is repeated rather than looking at the column G I would like it to use the latest value in D.
A3 = F2:F4 subtract B3 from G3 and add C3 with result in D3 (20-5+0-15)
A4 = F2:F4 subtract B4 from D4 and add C4 with result in D4 (15-2+0-13)
A8 = F2:F4 subtract B8 from D8 and add C8 with result in D8 (13-8+7-12)
A10 = F2:F4 subtract B10 from D10 and add C10 with result in D10 (12-2+0-10)
Use VLOOKUP to return the correct start, and SUMIFS with dynamic range to do the subtraction and addition.
=VLOOKUP([#PART],Table3,2,FALSE)-SUMIFS($B$2:B2,$A$2:A2,[#PART])+SUMIFS($C$2:C2,$A$2:A2,[#PART])
Note: to use only structured references we need to introduce the volatile OFFSET:
=VLOOKUP([#PART],Table3,2,FALSE)-
SUMIFS(OFFSET(Table2[[#Headers],[QTY]],1,,ROW()-ROW(Table2[[#Headers],[QTY]])),OFFSET(Table2[[#Headers],[PART]],1,,ROW()-ROW(Table2[[#Headers],[PART]])),[#PART])+
SUMIFS(OFFSET(Table2[[#Headers],[DELIVERED QTY]],1,,ROW()-ROW(Table2[[#Headers],[DELIVERED QTY]])),OFFSET(Table2[[#Headers],[PART]],1,,ROW()-ROW(Table2[[#Headers],[PART]])),[#PART])

Column Summation of Cell Labels Excel

I have (sparse) data (in the first few cells for example where PA is A1 etc). The labels are fixed but the numbers need to be arbitrary\variable.
PA 5 8
RA 7 2.55 11 7.1
Pils 8 4.5 6.4
S\P 4 3.5 4.2 5.2 7.2
GH ED CW TR PH FL DG
Now I want to add up some of these entries. I want to add up all combinations under the following rule:
There should be one element from each row
There should be at most one element from each column
In this example I was able to do this by hand and the cells I want to add up (vertically) are as follows (this data is sitting in the four rows of cells below cells from B7 to AV7 inclusive):
B1 B1 B1 B1 B1 B1 B1 B1 B1 B1
C2 C2 C2 C2 C2 C2 D2 D2 D2 D2
F3 F3 F3 G3 G3 G3 C3 C3 C3 F3
D4 E4 H4 D4 F4 H4 E4 F4 H4 C4
******************************************************************************
B1 B1 B1 B1 B1 B1 B1 B1 B1 B1
D2 D2 D2 D2 D2 D2 F2 F2 F2 F2
F3 F3 G3 G3 G3 G3 C3 C3 C3 G3
E4 H4 C4 E4 F4 H4 D4 E4 H4 C4
******************************************************************************
B1 B1 B1 B1 B1 B1 B1 B1 B1 B1
F2 F2 F2 H2 H2 H2 H2 H2 H2 H2
G3 G3 G3 C3 C3 F3 F3 F3 G3 G3
D4 E4 H4 D4 E4 C4 D4 E4 C4 D4
******************************************************************************
B1 B1 D1 D1 D1 D1 D1 D1 D1 D1
H2 H2 C2 C2 C2 C2 F2 F2 F2 F2
G3 G3 F3 F3 G3 G3 C3 C3 G3 G3
E4 F4 E4 H4 F4 H4 E4 H4 C4 E4
******************************************************************************
D1 D1 D1 D1 D1 D1 D1
F2 H2 H2 H2 H2 H2 H2
G3 C3 F3 F3 G3 G3 G3
H4 E4 C4 E4 C4 E4 F4
******************************************************************************
Frankly I am stumped. I thought I could do stuff like =(=B7) and such but it isn't working. I would have tried some IF stuff but since the state space was so small I thought it might have been easier to write out all the combinations.
I need to send the worksheet to someone who will input the data and find the minimum of the 47 combinations.
Thank you for your time. I have a basic, working use of Excel.
The equivalent of what you are trying with =(=B7) would be =INDIRECT(B7). That is, get the value that B7 refers to.

Resources