Merge every x element in multiple lists to return new list - python-3.x

I'm writing a script that scrapes all of the data from my works ticketing site and the end goal is to have it send a text when a new ticket enters the bucket with all of the important info of the ticket.
Python 3.10
So far, it pulls from a scattered list and combines all of the elements into an appropriate group ie. ticket numbers,titles and priorities.
tn = rawTickets[0::14]
title = rawTickets[5::14]
priority = rawTickets[9::14]
With this I can say
num = x
wholeticket = tn[num], title[num], priority[num],
print(wholeticket)
and get x ticket in the list
# Results: "tn0, title0, priority0"
I want it to print all of the available tickets in the list based on a range
totaltickets = 0
for lines in rawTickets:
if lines == '':
totaltickets += 1
numrange = range(totaltickets)
so lets say there are only 3 tickets in the queue,
I want it to print
tn0, title0, priority0,
tn1, title1, priority1,
tn2, title2, priority2,
But I want to avoid doing this;
ticket1 = tn[0], title[0], priority[0],
ticket2 = tn[1], title[1], priority[1],
ticket3 = tn[2], title[2], priority[2],
flowchart to help explain

You could use zip:
tickets = list(zip(rawTickets[0::14], rawTickets[5::14], rawTickets[9::14]))
This will give you a list of 3-tuples.

You could do something like that:
l1 = [*range(0,5)]
l2 = [*range(5,10)]
l3 = [*range(10,15)]
all_lst = [(l1[i], l2[i], l3[i]) for i in range(len(l1))]
Or you could use zip as trincot offered.
Note that on large scales, zip is much faster.

Related

Is there any ways to make this more efficient?

I have 24 more attempts to submit this task. I spent hours and my brain does not work anymore. I am a beginner with Python can you please help to figure out what is wrong? I would love to see the correct code if possible.
Here is the task itself and the code I wrote below.
Note that you can have access to all standard modules/packages/libraries of your language. But there is no access to additional libraries (numpy in python, boost in c++, etc).
You are given a content of CSV-file with information about set of trades. It contains the following columns:
TIME - Timestamp of a trade in format Hour:Minute:Second.Millisecond
PRICE - Price of one share
SIZE - Count of shares executed in this trade
EXCHANGE - The exchange that executed this trade
For each exchange find the one minute-window during which the largest number of trades took place on this exchange.
Note that:
You need to send source code of your program.
You have only 25 attempts to submit a solutions for this task.
You have access to all standart modules/packages/libraries of your language. But there is no access to additional libraries (numpy in python, boost in c++, etc).
Input format
Input contains several lines. You can read it from standart input or file “trades.csv”
Each line contains information about one trade: TIME, PRICE, SIZE and EXCHANGE. Numbers are separated by comma.
Lines are listed in ascending order of timestamps. Several lines can contain the same timestamp.
Size of input file does not exceed 5 MB.
See the example below to understand the exact input format.
Output format
If input contains information about k exchanges, print k lines to standart output.
Each line should contain the only number — maximum number of trades during one minute-window.
You should print answers for exchanges in lexicographical order of their names.
Sample
Input Output
09:30:01.034,36.99,100,V
09:30:55.000,37.08,205,V
09:30:55.554,36.90,54,V
09:30:55.556,36.91,99,D
09:31:01.033,36.94,100,D
09:31:01.034,36.95,900,V
2
3
Notes
In the example four trades were executed on exchange “V” and two trades were executed on exchange “D”. Not all of the “V”-trades fit in one minute-window, so the answer for “V” is three.
X = []
with open('trades.csv', 'r') as tr:
for line in tr:
line = line.strip('\xef\xbb\xbf\r\n ')
X.append(line.split(','))
dex = {}
for item in X:
dex[item[3]] = []
for item in X:
dex[item[3]].append(float(item[0][:2])*60.+float(item[0][3:5])+float(item[0][6:8])/60.+float(item[0][9:])/60000.)
for item in dex:
count = 1
ccount = 1
if dex[item][len(dex[item])-1]-dex[item][0] <1:
count = len(dex[item])
else:
for t in range(len(dex[item])-1):
for tt in range(len(dex[item])-t-1):
if dex[item][tt+t+1]-dex[item][t] <1:
ccount += 1
else: break
if ccount>count:
count=ccount
ccount=1
print(count)
First of all it is not necessary to use datetime and csv modules for such a simple case (like in Ed-Ward's example).
If we remove colon and dot signs from the time strings it could be converted to int() directly - easier way than you tried in your example.
CSV features like dialect and special formatting not used so i suggest to use simple split(",")
Now about efficiency. Efficiency means time complexity.
The more times you go through your array with dates from the beginning to the end, the more complicated the algorithm becomes.
So our goal is to minimize cycles count, best to make only one pass by all rows and especially avoid nested loops and passing through collections from beginning to the end.
For such a task it is better to use deque, instead of tuple or list, because you can pop() first element and append last element with complexity of O(1).
Just append every time for needed exchange to the end of the exchange's queue until difference between current and first elements becomes more than 1 minute. Then just remove first element with popleft() and continue comparison. After whole file done - length of each queue will be the max 1min window.
Example with linear time complexity O(n):
from collections import deque
ex_list = {}
s = open("trades.csv").read().replace(":", "").replace(".", "")
for line in s.splitlines():
s = line.split(",")
curr_tm = int(s[0])
curr_ex = s[3]
if curr_ex not in ex_list:
ex_list[curr_ex] = deque()
ex_list[curr_ex].append(curr_tm)
if curr_tm >= ex_list[curr_ex][0] + 100000:
ex_list[curr_ex].popleft()
print("\n".join([str(len(ex_list[k])) for k in sorted(ex_list.keys())]))
This code should work:
import csv
import datetime
diff = datetime.timedelta(minutes=1)
def date_calc(start, dates):
for i, date in enumerate(dates):
if date >= start + diff:
return i
return i + 1
exchanges = {}
with open("trades.csv") as csvfile:
reader = csv.reader(csvfile)
for row in reader:
this_exchange = row[3]
if this_exchange not in exchanges:
exchanges[this_exchange] = []
time = datetime.datetime.strptime(row[0], "%H:%M:%S.%f")
exchanges[this_exchange].append(time)
ex_max = {}
for name, dates in exchanges.items():
ex_max[name] = 0
for i, d in enumerate(dates):
x = date_calc(d, dates[i:])
if x > ex_max[name]:
ex_max[name] = x
print('\n'.join([str(ex_max[k]) for k in sorted(ex_max.keys())]))
Output:
2
3
( obviously please check it for yourself before uploading it :) )
I think the issue with your current code is that you don't put the output in lexicographical order of their names...
If you want to use your current code, then here is a (hopefully) fixed version:
X = []
with open('trades.csv', 'r') as tr:
for line in tr:
line = line.strip('\xef\xbb\xbf\r\n ')
X.append(line.split(','))
dex = {}
counts = []
for item in X:
dex[item[3]] = []
for item in X:
dex[item[3]].append(float(item[0][:2])*60.+float(item[0][3:5])+float(item[0][6:8])/60.+float(item[0][9:])/60000.)
for item in dex:
count = 1
ccount = 1
if dex[item][len(dex[item])-1]-dex[item][0] <1:
count = len(dex[item])
else:
for t in range(len(dex[item])-1):
for tt in range(len(dex[item])-t-1):
if dex[item][tt+t+1]-dex[item][t] <1:
ccount += 1
else: break
if ccount>count:
count=ccount
ccount=1
counts.append((item, count))
counts.sort(key=lambda x: x[0])
print('\n'.join([str(x[1]) for x in counts]))
Output:
2
3
I do think you can make your life easier in the future by using Python's standard library, though :)

Identifying duplicate items in a list

I want to figure out how to identify any case of identical items in a list.
Currently, there is a list of people and I want to first identify their surnames and put their surnames in a separate list called list_surnames.
Then I want to loop through that list and figure out whether there are instances of people having the same surname and if so I would add that to the amount value.
this code currently does not identify cases of duplication in that list.
Should be said I am brand new to learning programming, I apologize if code is horrible
group = ["Jonas Hansen", "Bo Klaus Nilsen", "Ida Kari Lund Toftegaard", "Ole Hansen"]
amount = 0
list_surnames = []
for names in group:
new_list = names.split(" ")
extract_surname = new_list[-1:]
for i in extract_surname:
list_surnames.append(i)
for x in list_surnames:
if x == list_surnames:
amount += 1
print(list_surnames)
print(amount)
You can use the Counter to count
from collections import Counter
l = ["Jonas Hansen", "Bo Klaus Nilsen", "Ida Kari Lund Toftegaard", "Ole Hansen"]
last = [names.split()[-1] for names in l]
print(last)
c = Counter(last)
print(c)

extract just one element from list and write it into a csv as another name

I have a list:
IDs = ["111111111111", "222222222222"]
and create a csv with this code:
for acc in IDs:
with open("/tmp/test.csv", "a+") as f:
test = csv.writer(f)
test.writerow([IDs])
result is:
{'111111111111', '222222222222'}
what i want to do is like:
if IDs == "111111111111":
IDs = "AccountA"
elif IDs == "222222222222":
IDs = "AccountB"
expected result in csv:
Account A
some information about account a i put later on it
Account B
some information about account a i put later on it
How can I achieve the result?
You could use a dictionary. What you do is you create a dictonary with all data. At the left side you would have your input, and and the right side you have your data that your want to write. For this case, take a look at this dictionary:
data = {
'111111111111':'AccountA',
'222222222222':'AccountB'
}
Than, create a loop around your list and create a new list, with the new ids, configured with your data.
new_ids = []
for x in ids:
new_ids.append(data[x])
Now, you can use the new_ids list to use in your write function.
Hope it helps.
Sincerly, Chris Fowl.

Search the nth number of string in side the another list in python

add name, where is a string denoting a contact name. This must store as a new contact in the application.
find partial, where is a string denoting a partial name to search the application for. It must count the number of contacts starting with and print the count on a new line.
Given sequential add and find operations, perform each operation in order.
Input:
4
add hack
add hackerrank
find hac
find hak
Sample Output
2
0
We perform the following sequence of operations:
1.Add a contact named hack.
2.Add a contact named hackerrank.
3.Find and print the number of contact names beginning with hac.
There are currently two contact names in the application
and both of them start with hac, so we print 2 on a new line.
4.Find and print the number of contact names beginning with hak.
There are currently two contact names in the application
but neither of them start with hak, so we print 0 on a new line.
i solved it but it is taking long time for large number of string. my code is
addlist =[]
findlist=[]
n = int(input().strip())
for a0 in range(n):
op, contact = input().strip().split(' ')
if(op=='add'):
addlist.append(contact)
else:
findlist.append(contact)
for item in findlist:
count=0
count=[count+1 for item2 in addlist if item in item2 if item==item2[0:len(item)]]
print(sum(count))
is there any other way to avoid the long time to computation.
As far as optimizing goes I broke your code apart a bit for readability and removed a redundant if statement. I'm not sure if its possible to optimize any further.
addlist =[]
findlist=[]
n = int(input().strip())
for a0 in range(n):
op, contact = input().strip().split(' ')
if(op=='add'):
addlist.append(contact)
else:
findlist.append(contact)
for item in findlist:
count = 0
for item2 in addlist:
if item == item2[0:len(item)]:
count += 1
print(count)
I tested 10562 entries at once and it processed instantly so if it lags for you it can be blamed on your processor

Sequential Search 2 Lists

So for homework we were asked to write a function that takes 2 lists as an input and use a sequential/linear search to go through them and if any name appeared in both lists to append that name to a new list. For the actual assignment two classes are specified as VoterList and VoterName, thus not allowing us to use 'in' and only VoterNames can be appended to a VoterList. (This task is going to be developed into finding people who voted twice in two different voting booths for an election).
So I have written a function that seems to work when I pass in 3-4 person long lists but I'm not sure that it is actually a sequential search working how it should be. Would be awesome for some advice. Cheers
def fraud_detect_seq(first_booth_voters, second_booth_voters):
fraud = []
length_first = len(first_booth_voters)
length_second = len(second_booth_voters)
first_booth_position = 0
second_booth_position = 0
while first_booth_position < length_first:
name_comparison = first_booth_voters[first_booth_position]
if second_booth_position == length_second:
first_booth_position += 1
second_booth_position = 0
elif second_booth_voters[second_booth_position] == name_comparison:
fraud.append(second_booth_voters[second_booth_position])
first_booth_position += 1
second_booth_position += 1
elif second_booth_voters[second_booth_position] != name_comparison:
second_booth_position += 1
print(fraud)
fraud_detect_seq(['Jackson', 'Dylan', 'Alice'],['Jackson', 'ylan', 'Alice'])
Gets the output:
['Jackson', 'Alice']
Which is correct. But I feel like I'm not doing it right.
def fraud_detect_seq(first_booth_voters, second_booth_voters):
fraud = []
for voter in first_booth_voters:
if voter in second_booth_voters:
fraud.append(voter)
This is a really simple way of checking if they are in both lists. There isn't a wrong way to write the program but since you're using python you might as well us the most "pythonic". For loops are incredibly useful in python for checking membership in lists.

Resources