Calculate the number of times a combination is used in a log - python-3.x

I have a log of events in the bellow form:
A B C D
A B C D
A B C D
A B C D
D E F G
D E F G
D E F G
D E F G
D E F G
D E F G
D E F G
A D E F G
D E F G
A D E G
I am trying to calculate the frequency of for example how many times A -> B.
With the bellow code I calculate the frequency of each trace.
from collections import Counter
flog = []
input_file ="test.txt"
with open(input_file, "r") as f:
for line in f.readlines():
line = line.split()
flog.append(line)
trace_frequency= map(tuple,flog)
flog=list(Counter(trace_frequency).items())
That gives me :
(('A', 'B', 'C', 'D'), 4)
(('D', 'E', 'F', 'G'), 8)
(('A', 'D', 'E', 'F', 'G'), 1)
(('A', 'D', 'E', 'G'), 1)
So my question is how can I go from the above to a format where I calculate all instances of the log to the bellow format:
A B 4
B C 4
C D 4
A D 2
D E 10...etc
Thanks to all for your time.

Instead of counting each line as a whole, split each line to pairs then count the appearance of each pair.
For example, instead of counting ('A', 'B', 'C', 'D'), count ('A', 'B'), ('B', 'C'), ('C', 'D') individually.
from collections import Counter
flog = []
input_file = "test.txt"
with open(input_file, "r") as f:
for line in f.readlines():
line = line.split()
flog.extend(line[i: i + 2] for i in range(len(line) - 1))
# ^ note extend instead of append
trace_frequency = map(tuple, flog)
flog = list(Counter(trace_frequency).items())
flog is now
[(('A', 'B'), 4), (('B', 'C'), 4), (('C', 'D'), 4), (('D', 'E'), 10),
(('E', 'F'), 9), (('F', 'G'), 9), (('A', 'D'), 2), (('E', 'G'), 1)]
To get your desired format (with the bonus of order) you can use:
flog = Counter(trace_frequency)
for entry, count in flog.most_common():
print(' '.join(entry), count)
Outputs
D E 10
E F 9
F G 9
A B 4
B C 4
C D 4
A D 2
E G 1

Not sure if it's the best, but one possibility is to use Pandas. Given a file log.txt that looks like this:
0 1 2 3 4
A B C D
A B C D
A B C D
A B C D
D E F G
D E F G
D E F G
D E F G
D E F G
D E F G
D E F G
A D E F G
D E F G
A D E G
This code will work:
import pandas as pd
import numpy as np
df = pd.read_csv('log.txt', sep='\s+')
combos = [[(y[1][x], y[1][x + 1]) for x in range(len(df.loc[0]) - 1)] for y in df.iterrows()]
combos = [item for sublist in combos for item in sublist if np.nan not in item]
from collections import Counter
print(Counter(combos))
Giving you:
('A', 'B') 4
('B', 'C') 4
('C', 'D') 4
('D', 'E') 10
('E', 'F') 9
('F', 'G') 9
('A', 'D') 2
('E', 'G') 1

Related

ES6 multiple assignments/updates in one line [duplicate]

Can anyone explain, why the following happens with ES6 array destructuring?
let a, b, c
[a, b] = ['A', 'B']
[b, c] = ['BB', 'C']
console.log(`a=${a} b=${b} c=${c}`)
// expected: a=A b=BB c=C
// actual: a=BB b=C c=undefined
http://codepen.io/ronkot/pen/WxRqXg?editors=0011
As others have said, you're missing semicolons. But…
Can anyone explain?
There are no semicolons automatically inserted between your lines to separate the "two" statements, because it is valid as a single statement. It is parsed (and evaluated) as
let a = undefined, b = undefined, c = undefined;
[a, b] = (['A', 'B']
[(b, c)] = ['BB', 'C']);
console.log(`a=${a} b=${b} c=${c}`);
wherein
[a, b] = …; is a destructuring assignment as expected
(… = ['BB', 'C']) is an assignment expression assigning the array to the left hand side, and evaluating to the array
['A', 'B'][…] is a property reference on an array literal
(b, c) is using the comma operator, evaluating to c (which is undefined)
If you want to omit semicolons and let them be automatically inserted where ever possible needed, you will need to put one at the start of every line that begins with (, [, /, +, - or `.
You've fallen into a trap of line wrapping and automatic semicolon insertion rules in JavaScript.
Take this example:
let x = [1, 2]
[2, 1]
It's the interpreted as:
let x = [1, 2][2, 1] // === [1, 2][(2, 1)] === [1, 2][1] === 2
That weird [(2, 1)] thing above is related to how Comma Operator works.
Thus, your example:
let a, b, c
[a, b] = ['A', 'B']
[b, c] = ['BB', 'C']
console.log(`a=${a} b=${b} c=${c}`)
Is interpreted as:
let a, b, c
[a, b] = ['A', 'B'][b, c] = ['BB', 'C']
console.log(`a=${a} b=${b} c=${c}`)
Now, if you insert a semicolon, it will work as you intended:
let a, b, c
[a, b] = ['A', 'B']; // note a semicolon here
[b, c] = ['BB', 'C']
console.log(`a=${a} b=${b} c=${c}`)
Also, it's a good idea to check your code by pasting it into Babel repl to see the generated output:
'use strict';
var a = void 0,
b = void 0,
c = void 0;
var _ref = ['A', 'B'][(b, c)] = ['BB', 'C'];
a = _ref[0];
b = _ref[1];
console.log('a=' + a + ' b=' + b + ' c=' + c);
I believe you have forgotten the line breaks ';'. Below is the corrected code. Please try:
let a,b,c
[a, b] = ['A', 'B'];
[b, c] = ['BB', 'C'];
console.log(`a=${a} b=${b} c=${c}`)
let a, b, c
[a, b] = ['A', 'B']***;***
[b, c] = ['BB', 'C']
console.log(`a=${a} b=${b} c=${c}`)
console: a=A b=BB c=C

How to make X number of random sets of 3 from pandas column?

I have a dataframe column that looks like this (roughly 200 rows):
col1
a
b
c
d
e
f
I want to create a new dataframe with one column and 15 sets of 3 random combinations of the items in the pandas column. for example:
new_df
combinations:
(a,b,c)
(a,c,d)
(a,d,c)
(b,a,d)
(d,a,c)
(a,d,f)
(e,a,f)
(a,f,e)
(b,e,f)
(f,b,e)
(c,b,e)
(b,e,a)
(a,e,f)
(e,f,a)
Currently the code I have creates a combination of every possible combination and runs out of memory when I try to append the results to another dataframe:
import pandas as pd
from itertools import permutations
df = pd.read_csv('')
combo = df['col1'].tolist()
perm = permutations(combo,3)
combinations = pd.DataFrame(columns=['combinations'])
list_ = []
for i in list(perm):
combinations['combinations'] = i
list_.append(i)
How do I stop the sets of random combinations to stop at any X number of set or in this case 15 combinations of 3?
The reason your code runs out of memory is specifically because of the part where you call list(perm). doing this will generate EVERY permutation possible. So when you do
for i in list(perm):
...
You're telling python to create a list of all permutations, then try to iterate over that list. Instead, if you iterate over the generator that calling permutations creates (e.g. for i in perm: instead of for i in list(perm):), you can simply iterate over each permutation without storing them all into memory at once. So if you break your for loop after it loops 15 times, you can achieve your desired result.
However, since we're using itertools, we can vastly simplify that logic using islice to do the work of getting the first 15 without explicitly writing a for-loop and breaking at the 15th iteration:
import pandas as pd
from itertools import permutations, islice
# df = pd.read_csv('')
# combo = df['col1'].tolist()
combo = list("abcefg")
perm_generator = permutations(combo,3)
# get first 15 permutations without running the generator
first_15_perms = islice(perm_generator, 15)
# Store the first 15 permutations into a Series object
series_perms = pd.Series(list(first_15_perms), name="permutations")
print(series_perms)
0 (a, b, c)
1 (a, b, e)
2 (a, b, f)
3 (a, b, g)
4 (a, c, b)
5 (a, c, e)
6 (a, c, f)
7 (a, c, g)
8 (a, e, b)
9 (a, e, c)
10 (a, e, f)
11 (a, e, g)
12 (a, f, b)
13 (a, f, c)
14 (a, f, e)
Name: permutations, dtype: object
If you want this as a single column in a DataFrame you can use the to_frame() method:
df_perms = series_perms.to_frame()
print(df_perms)
permutations
0 (a, b, c)
1 (a, b, e)
2 (a, b, f)
3 (a, b, g)
4 (a, c, b)
5 (a, c, e)
6 (a, c, f)
7 (a, c, g)
8 (a, e, b)
9 (a, e, c)
10 (a, e, f)
11 (a, e, g)
12 (a, f, b)
13 (a, f, c)
14 (a, f, e)
While not quite as elegant as the previous answers, If you truly want to create a random sampling of values, not just the first you could also do something along the lines of the following:
def newFrame(df: pd.DataFrame, srccol: int, cmbs: int, rows: int) -> pd.DataFrame:
il = df[srccol].values.tolist()
nw_df = pd.DataFrame()
data = []
for r in range(rows):
rd =[]
for ri in range(cmbs):
rd.append(rnd.choice(il))
data.append(tuple(rd))
nw_df['Combinations'] = data
return nw_df
Which when passed a a df as shown in your example in the form of:
new_df = newFrame(df, 0, 3, 15)
Produces:
Combinations
0 (a, f, e)
1 (a, d, f)
2 (b, c, d)
3 (a, a, d)
4 (f, b, c)
5 (e, b, b)
6 (e, e, d)
7 (c, f, f)
8 (f, e, b)
9 (d, c, e)

Python - list concatenation with shallow referencing?

I'm working on such project, and got question for code compression.
There are many ways for list concatenation a+b, a.extend(b), and so on.
but my question is, there are any way for shallow concatenation of the lists; for example,
a = [1,2,3]
b = [4,5]
c = a+b
c
>> [1,2,3,4,5]
b[0] = 10
c
>> [1,2,3,4,5]
but, my desired result is [1,2,3,10,5], then how to define c for this?
If you create a nested list from a and b you can kinda achieve what you are looking for:
>>> a = [1, 2, 3]
>>> b = [4, 5]
>>> c = [a, b]
>>> b[0] = 10
>>> c
[[1, 2, 3], [10, 5]]
>>> c = [val for sub_list in c for val in sub_list]
>>> c
[1, 2, 3, 10, 5]
a = [1,2,3]
b = [4,5]
b[0] = 10
c = a+b
print(c)
or
import copy
a = [1,2,3]
b = [4,5]
c=[a,b]
bb = copy.copy(c)
c[1][0]=10
print([val for l in c for val in l])
# refer this link for shallow-deep-copy reference
# https://www.programiz.com/python-programming/shallow-deep-copy

Diagonals with same characters - Python

The program must accept a character matrix of size RxC as the input. The program must print the number of diagonals that parallel to the top-left to bottom-right diagonal and having the same characters in the matrix.
def lower_diagonals(row,matrix):
# a list to store the lower diagonals
# which are || to top left to bottom right
d=[]
# Iterating from the second row till the last row
for i in range(1,row):
nop,dummy = [],0
for j in range(i,row):
try:
nop.append(matrix[j][dummy])
except:
break
dummy+=1
d.append(nop)
return d
def upper_diagonals(col,matrix):
# a list to store the lower diagonals
# which are || to top left to bottom right
d=[]
# Iterating from 1st column till the last column
for i in range(1,col):
dum , nop = i,[]
# Iterating till the last before row
for j in range(row-1):
try:
nop.append(matrix[j][dum])
except:
break
dum+=1
d.append(nop)
return d
def diagonals(matrix,row,col):
return lower_diagonals(row,matrix) + upper_diagonals(col,matrix)
row,col = map(int,input().input().split())
matrix =[input().strip().split(' ') for i in range(row)]
new_matrix = diagonals(matrix,row,col)
t=0
for i in new_matrix:
if len(list(set(i))) == 1 : t+=1
print(t)
Example :
Input :
4 4
u m o a
h n a o
y h r w
b n h e
Output:
4
Input :
5 7
G a # z U p 3
e G b # n U p
a e G m # e U
L l e g k # t
j L a e G s #
Output:
6
My code works perfect for all the above mentioned cases but it fails for the below case
Input :
2 100
b h D k 2 D 9 I e Q # * B 5 H Z r q u n P C 4 a e K l 2 E p 6 R V v 0 d 8 x C F P M F C e m K H O y # 0 I T r P 8 P N 9 Z 7 S J J P c L g x X f 5 1 o i Y V Y G Y 9 A E O 2 r 2 # S 8 z D 6 a q r i k r
V o 4 T M m z p 6 G H D Y a 6 t O 7 # w y t 2 m A 1 a + 0 p t P D z 7 V N T x + I t 4 x x y 1 Q G M t M 0 v d G e u 4 b 8 m D # I v D i T 1 u L f e 1 Y E Y q Y c A 8 P 2 q 2 A 8 y b u E 3 c 1 s M n X
Expected Output:
9
My Output:
100
Can anyone help me in structuring the logic for this case Thanks in advance
Note :
2<=R,C<=100
Time limit : 500ms
I think i probably found a logic for my problem
r,c = map(int,input().strip().split())
mat = []
for i in range(r):
l = list(map(str,input().strip().split()))
mat.append(l[::-1])
count = 0
for i in range(r+c-1):
l = []
for row in range(r):
for col in range(c):
if row+col == i:
l.append(mat[row][col])
l = list(set(l))
if len(l) == 1:
count+=1
print(count)

Python assigning two lists in one line

Is it possible to declare two lists in the same one-liner? Following code contains two one-liners so you have to loop over c twice:
c = [1,2,3,4]
a = [ d for d in c if (d % 2 == 0)]
b = [ d for d in c if (d % 2 != 0)]
Is this what you want ??
c = [1, 2, 3, 4]
a, b = [d for d in c if (d % 2 == 0)], [d for d in c if (d % 2 != 0)]
or how about
a, b = [], []
for i in c:
b.append(i) if i % 2 else a.append(i)

Resources