sampling a fixed number of unique pairs from a population - python-3.x

I have a sample of n elements. I want to sub-sample m unique pairs from n.
Is there a simple off-the-self method to do this in python ?
For example, if n = [1,2,3,4,5,6,7] and m = 3, one such sample will be [(1,2),(3,4),(5,6)]

The random module has a sample function which will pick n unique items from a collection. You can then pair them up to create your desired output:
import random
import itertools
data = [1,2,3,4,5,6,7,8,9,10]
m = 3
def pairwise(iterable):
# from the itertools cookbook: https://docs.python.org/3/library/itertools.html#itertools-recipes
a, b = itertools.tee(iterable)
next(b, None)
return zip(a, b)
chosen = random.sample(data, m * 2)
result = pairwise(chosen)

Depending on what you mean precisely with 'random', the answer will differ!
For uniform sampling of unique pairs, assuming all elements of your list are distinct:
import itertools, random
n, m = [1,2,3,4,5,6,7], 3
x = random.sample( list(itertools.permutations(n,2)), m)
print(x) #e.g. [(1,2),(3,4),(5,6)]

Related

Find the most frequent number in a 3d list

I can't find a way to show the most frequent number in this list
a = [1,2,4,5,6,7,15,16,19,23,24,26,27,28,29,30,31,33,36,37,38,39,40,41,42,43,44,45,47,48,49,50,51,52,56,57,58,60]
b = [1,3,4,5,6,8,9,10,15,16,17,18,20,21,22,24,26,28,29,31,32,33,36,37,38,40,41,43,44,47,48,50,52,53,54,56,57,58,60]
c = [2,3,5,6,8,9,12,13,17,19,20,23,25,26,27,28,29,30,31,33,34,35,36,37,40,44,45,47,48,52,53,54,55,56,57,58,60]
d = [2,5,7,9,11,12,13,14,16,18,20,22,23,26,29,30,33,34,36,38,40,41,42,43,44,46,47,49,50,51,53,56,57,58,60]
list_1 = [a,b]
list_2 = [c,d]
lists = [list_1, list_2]
I have tried the collections library with the most_common() funtion but it does't seem to work. Same happens with numpy arrays.
It would be perfect if I could get the top 10 most common number too.
The reason for the list to be multi-dimensional is for easy comparison between months
Jan_22 = [jan_01, jan_02, jan_03, jan_04]
Fev_22 = [fev_01, fev_02, fev_03, fev_04]
months = [Fev_22, Jan_22]
Each month has 4 data sets, making those lists allows me to compare big chunks of data, Top 10 most common values from 2021, most common number in jan, fev, mar, April, may ,jun. Would make it easier and clear
Thanks
Maybe I don't fully understand your question, but I don't understand why the list needs to be multi-dimensional if you only want to know the frequency of a given value.
import pandas as pd
a = [1,2,4,5,6,7,15,16,19,23,24,26,27,28,29,30,31,33,36,37,38,39,40,41,42,43,44,45,47,48,49,50,51,52,56,57,58,60]
b = [1,3,4,5,6,8,9,10,15,16,17,18,20,21,22,24,26,28,29,31,32,33,36,37,38,40,41,43,44,47,48,50,52,53,54,56,57,58,60]
c = [2,3,5,6,8,9,12,13,17,19,20,23,25,26,27,28,29,30,31,33,34,35,36,37,40,44,45,47,48,52,53,54,55,56,57,58,60]
d = [2,5,7,9,11,12,13,14,16,18,20,22,23,26,29,30,33,34,36,38,40,41,42,43,44,46,47,49,50,51,53,56,57,58,60]
values = pd.Series(a + b + c + d)
print(values.value_counts().head(10))
print(values.value_counts().head(10).index.to_list())
I dont get why are you adding up lists in each step to get a 3D element, you could just use arrays or smth like that, but here is a function that does what you want in a 3d list (returns the x most common elements in your 3d list ,a.k.a list of lists):
import numpy as np
arr = [[1,2,4,5,6,7,15,16,19,23,24,26,27,28,29,30,31,33,36,37,38,39,40,41,42,43,44,45,47,48,49,50,51,52,56,57,58,60],
[1,3,4,5,6,8,9,10,15,16,17,18,20,21,22,24,26,28,29,31,32,33,36,37,38,40,41,43,44,47,48,50,52,53,54,56,57,58,60],
[2,3,5,6,8,9,12,13,17,19,20,23,25,26,27,28,29,30,31,33,34,35,36,37,40,44,45,47,48,52,53,54,55,56,57,58,60],
[2,5,7,9,11,12,13,14,16,18,20,22,23,26,29,30,33,34,36,38,40,41,42,43,44,46,47,49,50,51,53,56,57,58,60]]
def x_most_common(arr, x):
l = [el for l in arr for el in l]
output = list(set([(el, l.count(el)) for el in l]))
output.sort(key= lambda i: i[1], reverse=True)
return output[:x]
# test:
print(x_most_common(arr, 5))
output:
[(56, 4), (36, 4), (47, 4), (58, 4), (5, 4)]

Convert values in a list to RGB values

I'm trying the following to convey the value in a list to a list of [R,G,B] values.
data = range(0,6)
minima = min(data)
maxima = max(data)
norm = matplotlib.colors.Normalize(vmin=minima, vmax=maxima, clip=True)
mapper = cm.ScalarMappable(norm=norm, cmap=cm.Greys_r)
node_color = []
for d in data:
node_color.append(mapper.to_rgba(d))
The above returns a 4th dimension A. I would like to know if there is a way to obtain only RGB values.
mapper.to_rgba(d) returns a tuple of the form (r, g, b, a). You can directly assign the result to a 4-tuple as r, g, b, a = mapper.to_rgba(d). And then create a triple as (r, g, b) to be stored in a list.
mapper.to_rgba also works when it gets a list or array as parameter, so calling mapper.to_rgba(data) directly gets the list of all rgba-tuples. Via a list comprehension, a new list of rgb-triples can be created:
import matplotlib
from matplotlib import cm
data = range(0, 6)
norm = matplotlib.colors.Normalize(vmin=min(data), vmax=max(data), clip=True)
mapper = cm.ScalarMappable(norm=norm, cmap=cm.Greys_r)
node_color = [(r, g, b) for r, g, b, a in mapper.to_rgba(data)]
PS: The above code gives r, g and b values between 0 and 1. Depending on the application, integer values from 0 to 255 could be needed:
node_color = [(r, g, b) for r, g, b, a in mapper.to_rgba(data, bytes=True)]

Creating tables in a radom in python

My aim is to create a table using two list. I was successful creating this, but I need this result in random order, not in sequence. Here my question how my result to make random from my output.
Is there any other method?
a = [2,3,4,5,6,7,8,9]
b = [12,13,14,15,16,17,19]
for i in b:
for j in a:
print(i,'x',j,'=,')
This should give you the desired result:
from random import randint
a = [2,3,4,5,6,7,8,9]
b = [12,13,14,15,16,17,19]
for i in range(0, len(a)):
for j in range(0, len(b)):
aNum = a[randint(0, len(a)-1)]
bNum = b[randint(0, len(b)-1)]
print(aNum, 'x', bNum, '=')
a.remove(aNum)
b.remove(bNum)

How to avoid double for loops?

I want to calculate part of 2 matrices (inner, outer) using data from 2 other matrices. They are all the same size. The code below works but it is too slow on big matrices. I used np.fromfunction in another case but was calculating the entire matrix not only a subset.
What's the fastest replacement for the double for loops?
F = np.random.rand(100,100)
S = 10*np.random.rand(100,100) + 1
L,C = F.shape
inner = np.zeros((L,C))
outer = np.zeros((L,C))
for l in range(5, L - 5):
for c in range(5, C - 5):
inner[l,c] = np.mean(F[l-5 : l+5 , c-5:c])
outer[l,c] = np.mean(F[l-5 : l+5 , c+5 : c+5+int(S[l,c])])
It looks like inner is the result of the convolution of 10x5 averaging filter on F. This is quite easy to rewrite as a convolution with scipy and it will be as fast as you can get from a CPU. However since you are leaving out 5 rows and columns on the borders of the matrices, you have to truncate the output inner and inner2 matrices accordingly to be able to compare them.
import numpy as np
from scipy.signal import convolve2d
F = np.random.rand(100,100)
S = 10*np.random.rand(100,100) + 1
L,C = F.shape
inner = np.zeros((L,C))
outer = np.zeros((L,C))
for l in range(5, L - 5):
for c in range(5, C - 5):
inner[l, c] = np.mean(F[l-5 : l+5 , c-5:c])
outer[l, c] = np.mean(F[l-5 : l+5 , c+5 : c+ 5 + int(S[l, c])])
# if inner[l, c] = np.mean(F[l-5 : l+5 , c-5:c+5]),
# then you should use a 10x10 filter
avg_filter = np.ones((10, 5)) / (10*5)
inner2 = convolve2d(F, avg_filter, mode='valid')
# should be very small (1.262e-13 for me)
print((inner2[:89, :89] - inner[5:94, 5:94]).sum())
The expression for outer is quite strange, because of this int(S[l, c]) offset that you add to your expression. I don't think you can represent this as a matrix computation.
So to replace your double for loop you can use from itertools import product which iterates over the cartesian product of two iterables, like this:
from itertools import product
for (l, c) in product(range(5, L - 5), range(5, C - 5)):
outer[l, c] = np.mean(F[l-5 : l+5 , c+5 : c+ 5 + int(S[l, c])])
From a signal processing standpoint, I'm not sure what is the outer matrix supposed to be. It would be easier to write faster code with the desired effect if you told us what you are trying to do.

Python partial derivative

I am trying to put numbers in a function that has partial derivatives but I can't find a correct way to do it,I have searched all the internet and I always get an error.Here is the code:
from sympy import symbols,diff
import sympy as sp
import numpy as np
from scipy.misc import derivative
a, b, c, d, e, g, h, x= symbols('a b c d e g h x', real=True)
da=0.1
db=0.2
dc=0.05
dd=0
de=0
dg=0
dh=0
f = 4*a*b+a*sp.sin(c)+a**3+c**8*b
x = sp.sqrt(pow(diff(f, a)*da, 2)+pow(diff(f, b)*db, 2)+pow(diff(f, c)*dc, 2))
def F(a, b, c):
return x
print(derivative(F(2 ,3 ,5)))
I get the following error: derivative() missing 1 required positional argument: 'x0'
I am new to python so maybe it's a stupid question but I would feel grateful if someone helped me.
You can find three partial derivatives of function foo by variables a, b and c at the point (2,3,5):
f = 4*a*b+a*sp.sin(c)+a**3+c**8*b
foo = sp.sqrt(pow(diff(f, a)*da, 2)+pow(diff(f, b)*db, 2)+pow(diff(f, c)*dc, 2))
foo_da = diff(foo, a)
foo_db = diff(foo, b)
foo_dc = diff(foo, c)
print(foo_da," = ", float(foo_da.subs({a:2, b:3, c:5})))
print(foo_db," = ", float(foo_db.subs({a:2, b:3, c:5})))
print(foo_dc," = ", float(foo_dc.subs({a:2, b:3, c:5})))
I have used a python package 'sympy' to perform the partial derivative. The point at which the partial derivative is to be evaluated is val. The argument 'val' can be passed as a list or tuple.
# Sympy implementation to return the derivative of a function in x,y
# Enter ginput as a string expression in x and y and val as 1x2 array
def partial_derivative_x_y(ginput,der_var,val):
import sympy as sp
x,y = sp.symbols('x y')
function = lambda x,y: ginput
derivative_x = sp.lambdify((x,y),sp.diff(function(x,y),x))
derivative_y = sp.lambdify((x,y),sp.diff(function(x,y),y))
if der_var == 'x' :
return derivative_x(val[0],val[1])
if der_var == 'y' :
return derivative_y(val[0],val[1])
input1 = 'x*y**2 + 5*log(x*y +x**7) + 99'
partial_derivative_x_y(input1,'y',(3,1))

Resources