Can a list with repeating values be converted into a multiset in python? - python-3.x

I am having two lists with repeating values and I wanted to take the intersection of the repeating values along with the values that have occurred only once in any one of the lists.
I am just a beginner and would love to hear simple suggestions!

Method 1
l1=[1,2,3,4]
l2=[1,2,5]
intersection=[value for value in l1 if value in l2]
for x in l1+l2:
if x not in intersection:
intersection.append(x)
print(intersection)
Method 2
print(list(set(l1+l2)))

Related

Finding the optimal selections of x number per column and y numbers per row of an NxM array

Given an NxM array of positive integers, how would one go about selecting integers so that the maximum sum of values is achieved where there is a maximum of x selections in each row and y selections in each column. This is an abstraction of a problem I am trying to face in making NCAA swimming lineups. Each swimmer has a time in every event that can be converted to an integer using the USA Swimming Power Points Calculator the higher the better. Once you convert those times, I want to assign no more than 3 swimmers per event, and no more than 3 races per swimmer such that the total sum of power scores is maximized. I think this is similar to the Weapon-targeting assignment problem but that problem allows a weapon type to attack the same target more than once (in my case allowing a single swimmer to race the same event twice) and that does not work for my use case. Does anybody know what this variation on the wta problem is called, and if so do you know of any solutions or resources I could look to?
Here is a mathematical model:
Data
Let a[i,j] be the data matrix
and
x: max number of selected cells in each row
y: max number of selected cells in each column
(Note: this is a bit unusual: we normally reserve the names x and y for variables. These conventions can help with readability).
Variables
δ[i,j] ∈ {0,1} are binary variables indicating if cell (i,j) is selected.
Optimization Model
max sum((i,j), a[i,j]*δ[i,j])
sum(j,δ[i,j]) ≤ x ∀i
sum(i,δ[i,j]) ≤ y ∀j
δ[i,j] ∈ {0,1}
This can be fed into any MIP solver.

Creating a dynamic array with given probabilities in Excel

I want to create a dynamic array that returns me X values based on given probabilities. For instance:
Imagine this is a gift box and you can open the box N times. What I want is to have N random results. For example, I want to get randomly 5 of these two rarities but based on their chances.
I have this following formula for now:
=index(A2:A3,randarray(5,1,1,rows(A2:A3),1). And this is the output I get:
The problem here is that I have a dynamic array with the 5 results BUT NOT BASED ON THE PROBABILITIES.
How can I add probabilities to the array?
Here is how you could generate a random outcome with defined probabilities for the entries (Google Sheets solution, not sure about Excel):
=ARRAYFORMULA(
VLOOKUP(
RANDARRAY(H1, 1),
{
{0; OFFSET(C2:C,,, COUNTA(C2:C) - 1)},
OFFSET(A2:A,,, COUNTA(C2:C))
},
2
)
)
This whole subject of random selection was treated very thoroughly in Donald Knuth's series of books, The Art of Computer Programming, vol 2, "Semi-Numerical Algorithms". In that book he presents an algorithm for selecting exactly X out of N items in a list using pseudo-random numbers. What you may not have considered is that after you have chosen your first item the probability array has changed to (X-1)/(N-1) if your first outcome was "Normal" or X/(N-1) if your first outcome was "Rare". This means you'll want to keep track of some running totals based on your prior outcomes to ensure your probabilities are dynamically updated with each pick. You can do this with formulas, but I'm not certain how the back-reference will perform inside an array formula. Microsoft's dynamic array documentation indicates that such internal array references are considered "circular" and are prohibited.
In any case, trying to extend this to 3+ outcomes is very problematic. In order to implement that algorithm with 3 choices (X + Y + Z = N picks) you would need to break this up into one random number for an X or not X choice and then a second random number for a Y or not Y choice. This becomes a recursive algorithm, beyond Excel's ability to cope in formulas.

Optimization of Python comprehension expression

I was trying to get the frequency of max value in an integer list (intlist)
intlist.count(max(intlist))
this works and is good in speed as well.
I wanted to implement the max method with comprehension,-
[x if x>y else y for x in intlist for y in intlist if x!=y][-1]
the later turns out to be very slow.
Can any one point out what is the issue here.
testing with
intlist=np.array([1, 2, 3,3,-1])
in this case the value expected is 2 as 3 is the max value and it occurs 2 times.
The list comprehension will not calculate the maximum value in the first place. Indeed, it will here calculate the maximum of two values from intlist of the latest values. So unless the last two items in the list are the same, it will calculate the maximum of the last two values.
Furthermore it is not very efficient, since it runs in O(n2) time, and O(n2) memory. For huge lists, this would thus require gigantic amounts of memory.
Usually it is not a good idea to use list comprehension if you do not need a list in the first place. You can calculate a maximum with a for loop, where you each time compare an item with the thus far obtained maximum:
def other_max(listlike):
mmax = listlike[0]
for x in listlike:
if x > mmax:
mmax = x
return mmax
or with numpy we can sum up the array of booleans:
>>> (intlist == intlist.max()).sum()
2

How can I subset the dataframe using if without facing truth value of a series is ambiguous error?

I have a list containing around 45 dataframes with 8 columns. Now I want to subset the dataframes based on specific values present in a particular column.
Code:
for z in list_dataframes:
if(z['Segmentation']=="FAST"):
list_fast.append(z)
Gives me error stating the truth value of a series is ambiguous.
Can anyone tell me how to solve this?
P.S. Another entirely different question how do you remove empty dataframes from a list of dataframes consisting both empty and non-empty dataframes.
You can just use in on the values:
for z in list_dataframes:
if("FAST" in z['Segmentation'].values):
list_fast.append(z)
Or wrap it all in comprehension:
list_fast = [z for z in list_dataframes if 'FAST' in z['Segmentation'].values]

String pre-processing step, to answer further queries in O(1) time

A string is given to you and it contains characters consisting of only 3 characters. Say, x y z.
There will be million queries given to you.
Query format: x z i j
Now in this we need to find all possible different substrings which begins with x and ends in z. i and j denotes the lower and upper bound of the range where the substring must lie. It should not cross this.
My Logic:-
Read the string. Have 3 arrays which will store the count of x y z respectively, for i=0 till strlen
Store the indexes of each characters separately in 3 more arrays. xlocation[], ylocation[], zlocation[]
Now, accordingly to the query, (a b i j) find all the indices of b within the range i and j.
Calculate the answer, for each index of b and sum it to get the result.
Is it possible to pre-process this string before the query? So, like that it takes O(1) time to answer the query.
As the others suggested, you can do this with a divide and conquer algorithm.
Optimal substructure:
If we are given a left half of the string and a right half and we know how many substrings there are in the left half and how many there are in the right half then we can add the two numbers together. We will be undercounting by all the strings that begin in the left and end in the right. This is simply the number of x's in the left substring multiplied by the number of z's in the right substring.
Therefore we can use a recursive algorithm.
This would be a problem however if we tried to solve for everything single i and j combination as the bottom level subproblems would be solved many many times.
You should look into implementing this with a dynamic programming algorithm keeping track of substrings in range i,j, x's in range i,j, and z's in range i,j.

Resources