The code below is a simple modification of the code that reads a page with Selenium and stores the address in a variable according to the condition of the page. (It's too long, so I've simplified some of it.)
a = ["70", "80", "90", "100", "110"]
b = [112, 1513, 14, 505, 36]
c = ["url_1", "url_2", "url_3", "url_4", "url_5"]
last = []
num = 0
for l in a:
if 110 == int(l):
last.insert(0, c[num])
elif 100 == int(l):
last.append(c[num])
elif 90 == int(l):
last.append(c[num])
elif 80 == int(l):
last.append(c[num])
elif 70 == int(l):
last.append(c[num])
num += 1
print(last)
Originally, it was the idea of putting the elements of variable-a in variable-last in order of magnitude.
But I found that I need to reorder the contents by the elements of list b.
What I want to do is to sort the elements of variable-a in numerical order, while at the same time sorting them again in order of largest in the elements of variable-b.
For example
a = ["70", "100", "90", "100", "100"]
b = [112, 1513, 14, 505, 36]
c = ["url_1", "url_2", "url_3", "url_4", "url_5"]
When classifying '100' in a above, it is also to consider the size of the numbers on the same index in b and classify them in order of size. So, again, the elements of c with the same index are put into the variable last in order. so finally
last = ["url_2", "url_4", "url_5", "url_3", "url_1"]
I want to complete a list in this order. All day long, I failed. help
You can do this using built-ins:
>>> a = ["70", "100", "90", "100", "100"]
>>> b = [112, 1513, 14, 505, 36]
>>> c = ["url_1", "url_2", "url_3", "url_4", "url_5"]
>>>
>>> sorted(zip(map(int, a), b, c), key=lambda x: x[:2], reverse=True)
[(100, 1513, 'url_2'), (100, 505, 'url_4'), (100, 36, 'url_5'), (90, 14, 'url_3'), (70, 112, 'url_1')]
Then if you want to extract only the "urls":
>>> x = sorted(zip(map(int, a), b, c), key= lambda x: x[:2], reverse=True)
>>> [i[2] for i in x]
['url_2', 'url_4', 'url_5', 'url_3', 'url_1']
The way sorted() works, is that it tries to compare each value, index for index. That means that you can give it a way to sort values based on different "columns" of the sliced value.
You can customize that sorting via the key keyword. You can give it a function (or as I did above, a lambda function).
Use pandas dataframe functions:
a = ["70", "100", "90", "100", "100"]
b = [112, 1513, 14, 505, 36]
c = ["url_1", "url_2", "url_3", "url_4", "url_5"]
df = pd.DataFrame([int(i) for i in a], columns=['a'])
df['b'] = b
df['c'] = c
df.sort_values(by = ['a','b'], ascending=False, inplace=True)
print(df['c'].to_list())
Related
In my dataset, I am trying to get the margin between two values. The code below runs perfectly if the fourth race was not included. After grouping based on a column, it seems that sometimes, there will be only 1 value, therefore, no other value to get a margin out of. I want to ignore these groupings in that case. Here is my current code:
import pandas as pd
data = {'Name':['A', 'B', 'B', 'C', 'A', 'C', 'A'], 'RaceNumber':
[1, 1, 2, 2, 3, 3, 4], 'PlaceWon':['First', 'Second', 'First', 'Second', 'First', 'Second', 'First'], 'TimeRanInSec':[100, 98, 66, 60, 75, 70, 75]}
df = pd.DataFrame(data)
print(df)
def winning_margin(times):
times = list(times)
winner = min(times)
times.remove(winner)
return min(times) - winner
winning_margins = df[['RaceNumber', 'TimeRanInSec']] \
.groupby('RaceNumber').agg(winning_margin)
winning_margins.columns = ['margin']
winners = df.loc[df.PlaceWon == 'First', :]
winners = winners.join(winning_margins, on='RaceNumber')
avg_margins = winners[['Name', 'margin']].groupby('Name').mean()
avg_margins
How about returning a NaN if times does not have enough elements:
import numpy as np
def winning_margin(times):
if len(times) <= 1: # New code
return np.NaN # New code
times = list(times)
winner = min(times)
times.remove(winner)
return min(times) - winner
your code runs with this change and seem to produce sensible results. But you can furthermore remove NaNs later if you want eg in this line
winning_margins = df[['RaceNumber', 'TimeRanInSec']] \
.groupby('RaceNumber').agg(winning_margin).dropna() # note the addition of .dropna()
You could get the winner and margin in one step:
def get_margin(x):
if len(x) < 2:
return np.NaN
i = x['TimeRanInSec'].idxmin()
nl = x['TimeRanInSec'].nsmallest(2)
margin = nl.max()-nl.min()
return [x['Name'].loc[i], margin]
Then:
df.groupby('RaceNumber').apply(get_margin).dropna()
RaceNumber
1 [B, 2]
2 [C, 6]
3 [C, 5]
(the data has the 'First' indicator corresponding to the slower time in the data)
I have a list of items. I want to replace first item with product of remaining items of list except first. Do the same for remaining all. How can i do that?
lst = [2,3,5,4,7]
Output should be :
New_lst = [420,280,168,210,120]
First get the product:
>>> import math
>>> p = math.prod([2,3,5,4,7])
>>> p
840
Then divide all numbers by the product:
>>> lst = [2,3,5,4,7]
>>> New_lst = [p//i for i in lst]
>>> New_lst
[420, 280, 168, 210, 120]
I have an input.txt file like below format.
A = Xyz
B
Value:274:53:3
C = 1190
E
WQQQW
Value:554
A = UrR
B
Value:113:00:1
C = 34
E
WQQQW
Value:982
I'd like to store in a dictionary the data related with A, B and E in order to get:
d = {
'A': ['Xyz', 'UrR'],
'B': ['274:53:3', '113:00:1'],
'E': ['554', '982'],
}
I'm doing like below, not I only storing the key, value pair related with A since the values for A are in the same line.
d = {"A":[],"B":[],"E":[]}
for line in open('input.txt'):
lst_line = line.replace(":", "=", 1).split("=")
if ("A" or "B" or "E") in lst_line[0]:
k = lst_line[0].strip()
v = lst_line[1].replace("\n", "").strip()
d[k].append(v)
>>> d
{'A': ['Xyz', 'UrR'], 'B': [], 'E': []}
I'm stuck in how to store the values for B that is one line below after Value: and for E that is 2 lines below after Value:.
Every key seems to have a very specific logic which can be divided into independent if conditions. Below code reads value for respective key based on the condition mentioned in question.
d = {"A": [], "B": [], "E": []}
with open("input.txt") as file:
while True:
line = file.readline() # read next line
if not line:
break # break on end of file
lst_line = line.replace(":", "=", 1).split("=") # key from line
if "A" in lst_line[0]:
k = lst_line[0].strip()
v = lst_line[1].replace("\n", "").strip()
d[k].append(v)
if "B" in lst_line[0]:
k = lst_line[0].strip()
line = file.readline() # read next line for value i.e. if key is B value is on the next line (one line below)
lst_line = line.replace(":", "=", 1).split("=") # get value for B
v = lst_line[1].replace("\n", "").strip()
d[k].append(v)
if "E" in lst_line[0]:
k = lst_line[0].strip()
file.readline() # skip junk line
line = file.readline() # read next line for value i.e. for E value is two lines below.
lst_line = line.replace(":", "=", 1).split("=") # get value for E
v = lst_line[1].replace("\n", "").strip()
d[k].append(v)
print(d)
Output:
{'A': ['Xyz', 'UrR'], 'B': ['274:53:3', '113:00:1'], 'E': ['554', '982']}
Here is how you can use regex:
import re
with open('file.txt', 'r') as r:
r = r.read()
dct = {'A': re.findall('(?<=A \= ).*?(?= \n)',r),
'B': re.findall('\d\d\d:\d\d:\d',r),
'E': re.findall('(?<=Value:)\d\d\d(?!:)',r)}
print(dct)
Output:
{'A': ['Xyz', 'UrR'],
'B': ['274:53:3', '113:00:1'],
'E': ['554', '982']}
I need to get every 3rd value out of a list and add it to a new list.
This is what I have so far.
def make_reduced_samples(original_samples, skip):
skipped_list = []
for count in range(0, len(original_samples), skip):
skipped_list.append(count)
return skipped_list
skip is equal to 3
I get the indexes and not the value of the numbers in the list.
It gives me [0,3,6]. Which are the indexes in the list and not the value of the indexes.
The example I am given is:
In this list [12,87,234,34,98,11,9,72], you should get [12,34,9].
I cannot use skipped_list = original_samples[::3] in any way.
You need to append the value of the original_samples array at the index. Not the index (count) itself.
def make_reduced_samples(original_samples, skip):
skipped_list = []
for count in range(0, len(original_samples), skip):
skipped_list.append(original_samples[count])
return skipped_list
The correct, most pythonic, and most efficient way to do that is to use slicing.
lst = [12, 87, 234, 34, 98, 11, 9, 72]
skipped_list = lst[::3]
print(skipped_list) # [12, 34, 9]
If the step does not obey a linear relation (which it does here), then you could use a list-comprehension with enumerate to filter on the index.
skipped_list = [x for i, x in enumerate(lst) if i % 3 == 0]
print(skipped_list) # [12, 34, 9]
One liner:
skipped_list = [j for (i,j) in enumerate(original_samples, start=1) if i % 3 == 0]
I have written this code which works but takes a very long time (~8hrs) to finish execution.
Wondering if it can be optimized to execute quicker.
The aim is to group a lots of items (x,y,z) coordinates based on their distance to one another. For example;
I would like to group them for a distance of +-0.5 in x, +-0.5 in y and +-0.5 in z, then the output from the data below would be [(0,3),(1),(2,4)...].
x y z
0 1000.1 20.2 93.1
1 647.7 91.7 87.7
2 941.2 44.3 50.6
3 1000.3 20.3 92.9
4 941.6 44.1 50.6
...
What I have done (and which works) is described below.
It compares the first row of the data_frame with the 2nd, 3rd, 4th .. until the end, and for each row, if the distance from x to x < +-0.5 and y to y < +-0.5 and z to z < +- 0.5 then the index is added to a list, group. If it doesn't then it compares the next row until reaching the end of the loop.
After each loop is complete the indexes which matched (stored in group), are added to another list, groups, as a set and then removed from the original list, a, and then next a[0] is compared and so on.
groups = []
group = []
data = [(x,y,z),(x,y,z),(etc)] # > 50,000 entries
data_frame = pd.DataFrame(data, columns=['x','y','z'])
a = list(i for i in range(len(data_frame)))
threshold = 0.5
for j in range(len(a) - 1) :
if len(a) > 0:
group.append(a[0])
for ii in range(a[0], len(data_frame) - 1):
if ((data_frame.loc[a[0],'x'] - data_frame.loc[ii,'x']) < threshold) and ((data_frame.loc[a[0],'y'] - data_frame.loc[ii,'y']) < threshold) and ((data_frame.loc[a[0],'z'] - data_frame.loc[ii,'z']) < threshold):
group.append(ii)
else:
continue
groups.append(set(group))
for iii in group:
if iii in a:
a.remove(iii)
else:
continue
group = []
else:
break
which returns something like this, for example;
groups = [{0}, {1, 69}, {2, 70}, {3, 67}, {4}, {5}, {6}, {7, 9}, {8}, {10}, {11}, {12}, 13}, {14, 73}, {15}, {16}, {17, 21, 74}, {18, 20}, {19}, {22, 23}]
Have made many edits to this question as it was not very clear. Hopefully makes sense now.
Below is an attempt using better logic 'O(NlogN)' which is much faster but doesn't return the correct answer. Have used the same +-0.5 for x,y,z.
Edit:
test_list = [(i,x,y,z), ... , (i,x,y,z)]
df3 = sorted(test_list,key=lambda x: x[1])
result = []
while df3:
if len(df3) > 1: ####added this because was crashing at the end of the loop
a = df3.pop(0)
alist=[a[0]]
while ((abs(a[1] - df3[0][1]) < 0.5) and (abs(a[2] - df3[0][2]) < 0.5) and (abs(a[3] - df3[0][3]) < 0.5)):
alist.append(df3.pop(0)[0])
if df3:
continue
else:
break
result.append(alist)
else:
result.append(a[0])
break
Since you are comparing each data point with every other one, your implementation has a worst time complexity of O(N!). A better way is to do a sorting first.
import random
df = [i for i in range(100)]
random.shuffle(df)
df2 = [(i,x) for i,x in enumerate(df)]
df3 = sorted(df2,key=lambda x: x[1])
df3
[(31, 0), (24, 1), (83, 2)......
Assuming now you want to group number that are +5/-5 into one list. You can then slice number into list based on a condition.
result = []
while df3:
a = df3.pop(0)
alist=[a[0]]
while a[1] + 5 >= df3[0][1]:
alist.append(df3.pop(0)[0])
if df3:
continue
else:
break
result.append(alist)
result
[[31, 24, 83, 58, 82, 35], [0, 65, 77, 41, 67, 56].......
Sorting takes O(NlogN) and a grouping basically takes linear time. So this would be much faster than N!