Min function while using strings in python - string

I'm using chisquare test for my data. I'm appending them in a loop in that way:
My .txt file looks like below, it has 180 rows with strings like that. Now I want to find the minimum value from those 180 rows, which is contained in parentesis, like in example below (15.745037950673217,), but I don't want to lose information which is assigned to a string in that row, which is 201701241800 Chi for 75 degree model.
...
201701241800 Chi for 75 degree model (15.745037950673217,)
201701241800 Chi for 76 degree model (16.014744332924252,)
...
The code I use looks like this:
o = chisquare(f_obs=fin, f_exp=y)
rows = str(Date) + str(Start_time_hours_format) + str(Start_time_minutes_format) + " Chi for {} degree model ".format(r) + str(o[0:1])
table.append(rows)
The problem is that number of those calculations is enormously huge. My task is to find minimum value in each iteration, which is defined by a for loop. Example above came from one iteration (There are 180 degree models in each iteration). The problem is I cannot use min(table) because I've got there strings, but I cannot erase them, because that information is important. Do you have any ideas how to find min value here? I mean specificly min value in parentesis.

If you have a list lst, min(lst) returns the minimum value without modifying the list. If you don't have a list, but objects from which you want to consider a value, let's say obj[i].myvalue, then you can do something like
min = 1000 # a huge number much bigger than your expected values
for o in obj:
if o.myvalue < min:
min = o.myvalue
which assigns to min the minimum value (probably it is not the best way, but it works for sure).
[I would be more specific, but it is not clear what kind of object you have to find the minimum of. Please consider to update your question to be more explicative.]

Ok, so I've found a way to solve this problem. Code below:
o = chisquare(f_obs=fin, f_exp=y)
rows = str(Date) + str(Start_time_hours_format) + str(Start_time_minutes_format) + " Chi for {} degree model ".format(r) + str(o[0:1])
print(rows)
table.append(rows)
with open('measurements.txt', 'a') as j:
j.write(min(table))
j.write('\n')
j.close()

Related

Algorithm, that finds the k-greatest number in O(n*log(k))

was wondering, if you have given an unsorted list of arrays of any length n >= k,
what is your idea, to find the k-greatest number in O(n*log(k)) time. So the k = 2 -greatest number of an Array containing the numbers 1 to 9 would be 8 for example.
I'm trying to code this in python, if you have an idea how in that time complexity :)
My answer is not python-specific, however you should be able to implement the used concepts in python, or find libraries already implementing them.
The basic idea is to iterate over the list and store the current greatest, second greatest, ... , k-greatest number in a separate data structure. Since you will be iterating over all n entries in your array, the complexity of this is in O(n * insertion_step_complexity)
As seen above, the insertion step needs to not exceed a complexity of O(log(k)) to achieve this you can use a AVL-Tree that has a complexity of O(log(m)) for inserting and deleting items, where m is the number of items stored within the avl-tree.
An algorithm would look like this:
def find_k_greatest_number(k, array):
avl_tree = initialize AVL tree here
avl_items = 0
for number in array:
if (number > avl_tree.smallest_number()):
if (avl_itmes >= k):
avl_tree.delete_smallest_number()
else:
avl_items++
avl_tree.insert(number)
return avl_tree.smallest_number()
Finding the smallest number in a sorted tree is dependent on its height. Since the AVL tree can't exceed the height of log(k) the complexity of finding the smallest number is O(log(k)).

Pyomo: define objective Rule based on condition

In a transport problem, I'm trying to insert the following rule into the objective function:
If a supply of BC <19,000 tons, then we will have a penalty of $ 125 / MT
I added a constraint to check the condition but would like to apply the penalty in the objective function.
I was able to do this in Excel Solver, but the values ​​do not match. I've already checked both, and debugged the code, but I could not figure out what's wrong.
Here is the constraint:
def bc_rule(model):
return sum(model.x[supplier, market] for supplier in model.suppliers \
for market in model.markets \
if 'BC' in supplier) >= 19000
model.bc_rules = Constraint(rule=bc_rule, doc='Minimum production')
The problem is in the objective rule:
def objective_rule(model):
PENALTY_THRESHOLD = 19000
PENALTY_COST = 125
cost = sum(model.costs[supplier, market] * model.x[supplier, market] for supplier in model.suppliers for market in model.markets)
# what is the problem here?
bc = sum(model.x[supplier, market] for supplier in model.suppliers \
for market in model.markets \
if 'BC' in supplier)
if bc < PENALTY_THRESHOLD:
cost += (PENALTY_THRESHOLD - bc) * PENALTY_COST
return cost
model.objective = Objective(rule=objective_rule, sense=minimize, doc='Define objective function')
I'm getting a much lower value than found in Excel Solver.
Your condition (if) depends on a variable in your model.
Normally, ifs should never be used in a mathematical model, and that is not only for Pyomo. Even in Excel, if statements in formulas are simply converted to scalar value before optimization, so I would be very careful when saying that it is the real optimal value.
The good news is that if statements are easily converted into mathematical constraints.
For that, you need to add a binary variable (0/1) to your model. It will take the value of 1 if bc <= PENALTY_TRESHOLD. Let's call this variable y, and is defined as model.y = Var(domain=Binary).
You will add model.y * PENALTY_COST as a term of your objective function to include the penalty cost.
Then, for the constraint, add the following piece of code:
def y_big_M(model):
bigM = 10000 # Should be a big number, big enough that it will be bigger than any number in your
# model, but small enough that it will stay around the same order of magnitude. Avoid
# utterly big number like 1e12 and + if you don't need to, since having numbers too
# large causes problems.
PENALTY_TRESHOLD = 19000
return PENALTY_TRESHOLD - sum(
model.x[supplier, market]
for supplier in model.suppliers
for market in model.markets
if 'BC' in supplier
) <= model.y * bigM
model.y_big_M = Constraint(rule=y_big_M)
The previous constraint ensures that y will take a value greater than 0 (i.e. 1) when the sum that calculates bc is smaller than the PENALTY_TRESHOLD. Any value of this difference that is greater than 0 will force the model to put 1 in the value of variable y, since if y=1, the right hand side of the constraint will be 1 * bigM, which is a very big number, big enough that bc will always be smaller than bigM.
Please, also check your Excel model to see if your if statements really works during the solver computations. Last time I checked, Excel solver do not convert if statements into bigM constraints. The modeling technique I showed you works for absolutely all programming method, even in Excel.

Search and remove algorithm

Say you have an ordered array of values representing x coordinates.
[0,25,50,60,75,100]
You might notice that without the 60, the values would be evenly spaced (25). This would be indicative of a repeating pattern, something that I need to extract using this list (regardless of the length and the values of the list). In this particular example, the algorithm should find and remove the 60.
There are no time or space complexity requirements.
Both the values in the list and the ideal spacing (e.g 25) are unknown. So the algorithm must obtain this by looking at the values. In addition, the number of values, and where the outliers are in the array are not guaranteed. There may be more than one outlier. The algorithm should return a list with the outliers removed. Extra points if the algorithm uses a threshold for the spacing.
Edit: Here is an example image
Here there is one outlier on the x axis. (green-line) There are two on the y axis. The x-coordinates of the array represent the rho of the line on that axis.
arr = [0,25,50,60,75,100]
First construct the distances array
dist = np.array([arr[i+1] - arr[i] for (i, _) in enumerate(arr) if i < len(arr)-1])
print(dist)
>> [25 25 10 15 25]
Now I'm using np.where and np.percentile to cut the array in 3 part: the main , the upper values and the lower values. I arbitrary set them to 5%.
cond_sup = np.where(dist > np.percentile(dist, 95))
print(cond_sup)
>> (array([]),)
cond_inf = np.where(dist < np.percentile(dist, 5))
print(cond_inf)
>> (array([2]),)
You now got indexes where the value is different from the others.
So, dist[2] has a problem, which mean by construction the problem is between arr[2] and arr[2+1]
I don't know if you want to remove 1 or more numbers from this array. So I think the way to solve this problem will be like this:
array A[] = [0,25,50,60,75,100];
sort array (if needed).
create a new array B[] with value i-th: B[i] = A[i+1] - A[i]
find the value of B[] elements that appear most time. It's will be our distance.
find i such that A[i+1]-A[i] != distance
find k (k>i and k min) such that A[i+k]-A[i] == distance
so, we need remove A[i+1] => A[i+k-1]
I hope it is right.

Python 3 - calculate total in if else function using for loop

If anybody can give me some hints to point me in the right direction so I can solve it myself that would be great.
I am trying to calculate the total and average income depending on number of employee's. Do I have to make another list or iterate the current list (list1) to solve.
def get_input():
Name = input("Enter a name: ")
Hours = float(input("Enter hours worked: "))
Rate = float(input("Enter hourly rate: "))
return Name, Hours, Rate
def calc_pay(Hours, Rate):
if Hours > 40:
overtime = (40 * Rate) + (Hours - 40) * (Rate * 1.5)
print(list1[0], "should be paid", overtime)
else:
no_overtime = (Hours * Rate)
print(list1[0], "should be paid", no_overtime)
return Hours, Rate
x = int(input("Enter the number of employees: "))
for i in range(x):
list1 = list(get_input())
calc_pay(list1[1], list1[2])
i += 1
If you want to keep track of the total pay for all the employees, you probably need to make two major changes to your code.
The first is to change calc_pay to return the calculated pay amount instead of only printing it (the current return value is pretty useless, since the caller already has those values). You may want to skip printing in the function (since calculating the value and returning it is the function's main job) and let that get done by the caller, if necessary.
The second change is to add the pay values together in your top level code. You could either append the pay values to a list and add them up at the end (with sum), or you could just keep track of a running total and add each employee's pay to it after you compute it.
There are a few other minor things I'd probably change in your code if I was writing it, but they're not problems with its correctness, just style issues.
The first is variable names. Python has a guide, PEP 8 that makes a bunch of suggestions about coding style. It's only an official rule for the Python code that's part of the standard library, but many other Python programmers use it loosely as a baseline style for all Python projects. It recommends using lowercase_names_with_underscores for most variable and function names, and reserving CapitalizedNames for classes. So I'd use name, hours and rate instead of the capitalized versions of those names. I'd also strongly recommend that you use meaningful names instead of generic names like x. Some short names like i and x can be useful in some situations (like coordinates and indexes), but I'd avoid using them for any non-generic purpose. You also don't seem to be using your i variable for anything useful, so it might make sense to rename it _, which suggests that it's not going to be used. I'd use num_employees or something similar instead of x. The name list1 is also bad, but I suggest doing away with that list entirely below. Variable names with numbers in them are often a bad idea. If you're using a lot of numbered names together (e.g. list1, list2, list3, etc.), you probably should be putting your values in a single list instead (a list of lists) instead of the numbered variables. If you just have a few, they should just have more specific names (e.g. employee_data instead of list1).
My second suggestion is about handling the return value from get_input. You can unpack the tuple of values returned by the function into separate variables, rather than putting them into a list. Just put the names separated by commas on the left side of the = operator:
name, hours, rate = get_input()
calc_pay(hours, rate)
My last minor suggestion is about avoiding repetition in your code. A well known programming suggestion is "Don't Repeat Yourself" (often abbreviated DRY), since repeated (especially copy/pasted) code is hard to modify later and sometimes harbors subtle bugs. Your calc_pay function has a repeated print line that could easily be moved outside of the if/else block so that it doesn't need to be repeated. Just have both branches of the conditional code write the computed pay to the same variable name (instead of different names) and then use that single variable in the print line (and a return line if you follow my suggested fix above for the main issue of your question).
Thanks for the help people. Here was the answer
payList = []
num_of_emps = int(input("Enter number of employees: "))
for i in range(num_of_emps):
name, hours, rate = get_input()
pay = calc_pay(hours, rate)
payList.append(pay)
total = sum(payList)
avg = total / num_of_emps
print("The total amount to be paid is $", format(total, ",.2f"), sep="")
print("\nThe average employee is paid $", format(avg, ",.2f"), sep="")
Enter objects mass, then calculate its weight.
If the object weighs more than 500.
Else the object weighs less than 100.
Use formula: weight = mass x 9.8

MATLAB: fastest way to do a root-mean-squared error between a vector and array of vectors

I have a question regarding the fastest way to compute the RMSE between a single vector and an array of vectors. Specifically, I have a vector A representing an point and would like to find the index in a list B of points that A is closest to. Right now I am using:
tempmat = bsxfun(#minus,A,B);
tempmat1 = sqrt(sum(tempmat.^2,2);
index = find(tempmat1 == min(tempmat1));
this takes about 0.058 seconds to calculate the index. Is there a faster way in MATLAB of doing this? I performing this calculations literally millions of times.
Many thanks for reading,
Joe
tempmat = bsxfun(#minus,A,B);
tmpmat1 = sum(tempmat.^2,2);
[m,index] = min(tempmat1);
m = sqrt(m); %# optional, only if you need the actual numerical value
This avoids calculating sqrt on the whole array, since the minumum of the squared differences will have the same index. It also uses the second output of min to avoid the second pass of find.
You'll probably find that
tempmat = A - B(ones(1, size(A,1)), :)
is faster than the bsxfun version, unless size(A,1) is exceptionally large.
This assumes that A is your array and B is your vector. The RSS calculation implies that you have row vectors.
Also, I presume you know that you're calculating the RSS not RMS.

Resources