Search the nth number of string in side the another list in python - python-3.x

add name, where is a string denoting a contact name. This must store as a new contact in the application.
find partial, where is a string denoting a partial name to search the application for. It must count the number of contacts starting with and print the count on a new line.
Given sequential add and find operations, perform each operation in order.
Input:
4
add hack
add hackerrank
find hac
find hak
Sample Output
2
0
We perform the following sequence of operations:
1.Add a contact named hack.
2.Add a contact named hackerrank.
3.Find and print the number of contact names beginning with hac.
There are currently two contact names in the application
and both of them start with hac, so we print 2 on a new line.
4.Find and print the number of contact names beginning with hak.
There are currently two contact names in the application
but neither of them start with hak, so we print 0 on a new line.
i solved it but it is taking long time for large number of string. my code is
addlist =[]
findlist=[]
n = int(input().strip())
for a0 in range(n):
op, contact = input().strip().split(' ')
if(op=='add'):
addlist.append(contact)
else:
findlist.append(contact)
for item in findlist:
count=0
count=[count+1 for item2 in addlist if item in item2 if item==item2[0:len(item)]]
print(sum(count))
is there any other way to avoid the long time to computation.

As far as optimizing goes I broke your code apart a bit for readability and removed a redundant if statement. I'm not sure if its possible to optimize any further.
addlist =[]
findlist=[]
n = int(input().strip())
for a0 in range(n):
op, contact = input().strip().split(' ')
if(op=='add'):
addlist.append(contact)
else:
findlist.append(contact)
for item in findlist:
count = 0
for item2 in addlist:
if item == item2[0:len(item)]:
count += 1
print(count)
I tested 10562 entries at once and it processed instantly so if it lags for you it can be blamed on your processor

Related

How do I count multiple lines in a list?

I'm a very new Python user. My project is to take a very long (20k lines) file that includes movies and actors in them and refine it. I'm trying to find out which of the movies listed has the highest number of actors.
I'm not sure how to do multiple counts of a single file.
This is the file that starts the project. It repeats like that with different movie titles for 20k lines. Pic of original file The first part of the project is to build a list which contains every movie's full cast list which is what the code below does. Now what I'm trying to do is get the program to count how many actors is in each movie and print which one has the highest number of actors.
lines_seen = list()
fhand = open...
#opens but I don't want to show address
actors = list()
titles = list()
is_Actor = True
for line in fhand:
line = line.rstrip()
if (is_Actor):
titles.append(line)
if line not in lines_seen:
lines_seen.append("The title of the movie is:")
lines_seen.append(line)
print(" ")
print("The title of the movie is '", line, "'")
print("The actors in the movie are:")
elif not (is_Actor):
lines_seen.append(line)
print(line)
actors.append(line)
is_Actor = not(is_Actor)
fhand.close()
Heres what I've done so far
actors = dict()
is_Title = True
for line in fhand:
words = line.split()
if (is_Title):
if line not in actors:
actors[line] = 1
else:
actors[line] = actors[line] + 1
is_Title = not is_Title
Now I'm trying to get it to return the highest value. I've googled it and it tells me to use max() but that returns a value of 97 when I know the highest value is 207. What do I do from here?
Recommendation #1: Make yourself a small chunk of data that you can experiment with and read/print results. It will be 55x (my estimate) easier to troubleshoot than 20k lines. Maybe 2 movies, 1 with 2 actors, 1 with 1 actor.
Are you familiar with python dictionaries? It seems what you want to do is associate a list of actors with a movie title. Then you can inspect the sizes of the lists in the dictionary to find the one with the highest length.
In basic Python, you should ...
make an empty dictionary outside of your loop to hold the results, as you are doing with actors, etc.
start reading the file. It seems like your data is in a predictable pattern that the title is followed by a single actor name, so if you want to keep your current reading construct (an alternate would be to read 2 lines each pass through a different loop) you need to "hold onto the movie title" until the next loop to get the actor, so in pseudocode you could modify your loop to something like:
title = None
is_actor = False
for line in fhand:
if not is_actor: # you have a title...
title = line
else: # you have an actor
# get the list from the dictionary for the current title, or make a new list if no entry yet
# add the actor to the list
# put the list back into the dictionary
is_actor = not is_actor
Then inspect your dictionary and manipulate it as needed
For a primer on dictionaries (and other introductory concepts) I strongly recommend Think Python. See the whole chapter on dictionaries.

In Python, how to formulate multiple if statements in one so conditions and actions are taken iteratively from a 2 lists (conditions, actions)?

I have a folder with many images. I want to rename all those image files with consecutive numbers (I am using enumerate). I also want the new name to start (before the number) with a differetn prefix according with the new number assigned by enumerate. Better to explain with an example:
Imagine you need to photograph all items in 4 cabinets. So:
from the first cabinet you took 20 photos (because there were 20 items),
from the 2nd cabinet you took 15 photos,
from the 3rd cabinet you took 25 photos,
from the 4th cabinet you took 10 photos.
So you end up with In total of 70 photos. You want to rename them with consecutive numbers starting from 101 (or any other number) so in this case you will have new names from 101 until 170. But you also want to keep the original cabinet they came from as part of the new name. So at the end you want to end up with all the images renamed as something like this:
cabA_0101,... cabA_0120, cabB_0121,... cabB_0135, cabC_0136,... cabC_0160, cabD_0161,... cabD_0170
I have accompished it with the code below. However, as you can see, I had to included an if-statement for each cabinet (cabA, CabB, CabC, cabC) to assigned its prefix to the new name. Although not relevant, notice I have also included/kept the original image name (assigned automatically by my camera) at the end of the new name. Also, in the else-statement included a warning in case a mistake exist related with the numbering.
import glob
import os
datadir = "myDirPath"
targetdata = os.path.join(datadir, "\*.jpg")
# Sorting files by name because this code needs them sorted like that
sorted_files = sorted(glob.glob(targetdata))
# First item number in each cabinet (last one is last item number of last cabinet):
cabNUM_1 = 101
cabNUM_2 = 121
cabNUM_3 = 136
cabNUM_4 = 161
cabNUM_last = 170
# For-Loop to Create New Name
for i, f in enumerate(sorted_files, 101): # numbering starting from 101
aisNAME =["none"]
try:
head, tail = os.path.split(f)
# Assigning cabinet Prefix according numbering
if cabNUM_1 <= i < cabNUM_2:
aisNAME[0] = "cabA"
elif cabNUM_2 <= i < cabNUM_3:
aisNAME[0] = "cabB"
elif cabNUM_3 <= i <= cabNUM_4:
aisNAME[0] = "cabC"
elif cabNUM_4 <= i <= cabNUM_last:
aisNAME[0] = "cabD"
else:
aisNAME[0] = "WRONG_Numbering__"
print("ERROR - Drawer # assigned: ", i, "- OUT OF RANGE - Check Parameters")
os.rename(f, os.path.join(head, aisNAME[0] + '_' + str(i).zfill(4) + '___Original-File-Name_' + tail))
except OSError:
print('Invalid operation')
This code will be used constantly but situations will be different each time: different number of cabinets and items per cabinets. I want this code to be more general so it can handle all those changes with minimum input. Otherwise I will have to adapt it each time creating more/less elif-statements with its corresponding parameters.
My idea is to find a way to formulate one single if-statement that iterates through 2 lists, one with the actions (cabinet-prefixes) and the other with the conditions (first item-number in each of those cabinets including, off course, the very last item-number in last cabinet). So using previous example the 2 lists would be:
cabNAME = ["cabA", "cabB", "cabC", "cabD"]
cabNUM = [101, 121, 136, 161, 170]
The general if statements will assign first element in cabNAME list as prefix if number being assigned (i) is between first and second element in cabNUM list, then it will jump to second element in cabNAME if number being assigned is betwteen second and thrird number in cabNUM list, and so on. I assume the final else-statement migth have to be specified separately in a different line (... or not?) since it is a safety measure in case any number is out of range or inexpected.
I have tried several ways to write and include this general if-statement (including writing a for-loop)... but so far haven't been succesul.
Any help will be greatly appreciate it!

Is there any ways to make this more efficient?

I have 24 more attempts to submit this task. I spent hours and my brain does not work anymore. I am a beginner with Python can you please help to figure out what is wrong? I would love to see the correct code if possible.
Here is the task itself and the code I wrote below.
Note that you can have access to all standard modules/packages/libraries of your language. But there is no access to additional libraries (numpy in python, boost in c++, etc).
You are given a content of CSV-file with information about set of trades. It contains the following columns:
TIME - Timestamp of a trade in format Hour:Minute:Second.Millisecond
PRICE - Price of one share
SIZE - Count of shares executed in this trade
EXCHANGE - The exchange that executed this trade
For each exchange find the one minute-window during which the largest number of trades took place on this exchange.
Note that:
You need to send source code of your program.
You have only 25 attempts to submit a solutions for this task.
You have access to all standart modules/packages/libraries of your language. But there is no access to additional libraries (numpy in python, boost in c++, etc).
Input format
Input contains several lines. You can read it from standart input or file “trades.csv”
Each line contains information about one trade: TIME, PRICE, SIZE and EXCHANGE. Numbers are separated by comma.
Lines are listed in ascending order of timestamps. Several lines can contain the same timestamp.
Size of input file does not exceed 5 MB.
See the example below to understand the exact input format.
Output format
If input contains information about k exchanges, print k lines to standart output.
Each line should contain the only number — maximum number of trades during one minute-window.
You should print answers for exchanges in lexicographical order of their names.
Sample
Input Output
09:30:01.034,36.99,100,V
09:30:55.000,37.08,205,V
09:30:55.554,36.90,54,V
09:30:55.556,36.91,99,D
09:31:01.033,36.94,100,D
09:31:01.034,36.95,900,V
2
3
Notes
In the example four trades were executed on exchange “V” and two trades were executed on exchange “D”. Not all of the “V”-trades fit in one minute-window, so the answer for “V” is three.
X = []
with open('trades.csv', 'r') as tr:
for line in tr:
line = line.strip('\xef\xbb\xbf\r\n ')
X.append(line.split(','))
dex = {}
for item in X:
dex[item[3]] = []
for item in X:
dex[item[3]].append(float(item[0][:2])*60.+float(item[0][3:5])+float(item[0][6:8])/60.+float(item[0][9:])/60000.)
for item in dex:
count = 1
ccount = 1
if dex[item][len(dex[item])-1]-dex[item][0] <1:
count = len(dex[item])
else:
for t in range(len(dex[item])-1):
for tt in range(len(dex[item])-t-1):
if dex[item][tt+t+1]-dex[item][t] <1:
ccount += 1
else: break
if ccount>count:
count=ccount
ccount=1
print(count)
First of all it is not necessary to use datetime and csv modules for such a simple case (like in Ed-Ward's example).
If we remove colon and dot signs from the time strings it could be converted to int() directly - easier way than you tried in your example.
CSV features like dialect and special formatting not used so i suggest to use simple split(",")
Now about efficiency. Efficiency means time complexity.
The more times you go through your array with dates from the beginning to the end, the more complicated the algorithm becomes.
So our goal is to minimize cycles count, best to make only one pass by all rows and especially avoid nested loops and passing through collections from beginning to the end.
For such a task it is better to use deque, instead of tuple or list, because you can pop() first element and append last element with complexity of O(1).
Just append every time for needed exchange to the end of the exchange's queue until difference between current and first elements becomes more than 1 minute. Then just remove first element with popleft() and continue comparison. After whole file done - length of each queue will be the max 1min window.
Example with linear time complexity O(n):
from collections import deque
ex_list = {}
s = open("trades.csv").read().replace(":", "").replace(".", "")
for line in s.splitlines():
s = line.split(",")
curr_tm = int(s[0])
curr_ex = s[3]
if curr_ex not in ex_list:
ex_list[curr_ex] = deque()
ex_list[curr_ex].append(curr_tm)
if curr_tm >= ex_list[curr_ex][0] + 100000:
ex_list[curr_ex].popleft()
print("\n".join([str(len(ex_list[k])) for k in sorted(ex_list.keys())]))
This code should work:
import csv
import datetime
diff = datetime.timedelta(minutes=1)
def date_calc(start, dates):
for i, date in enumerate(dates):
if date >= start + diff:
return i
return i + 1
exchanges = {}
with open("trades.csv") as csvfile:
reader = csv.reader(csvfile)
for row in reader:
this_exchange = row[3]
if this_exchange not in exchanges:
exchanges[this_exchange] = []
time = datetime.datetime.strptime(row[0], "%H:%M:%S.%f")
exchanges[this_exchange].append(time)
ex_max = {}
for name, dates in exchanges.items():
ex_max[name] = 0
for i, d in enumerate(dates):
x = date_calc(d, dates[i:])
if x > ex_max[name]:
ex_max[name] = x
print('\n'.join([str(ex_max[k]) for k in sorted(ex_max.keys())]))
Output:
2
3
( obviously please check it for yourself before uploading it :) )
I think the issue with your current code is that you don't put the output in lexicographical order of their names...
If you want to use your current code, then here is a (hopefully) fixed version:
X = []
with open('trades.csv', 'r') as tr:
for line in tr:
line = line.strip('\xef\xbb\xbf\r\n ')
X.append(line.split(','))
dex = {}
counts = []
for item in X:
dex[item[3]] = []
for item in X:
dex[item[3]].append(float(item[0][:2])*60.+float(item[0][3:5])+float(item[0][6:8])/60.+float(item[0][9:])/60000.)
for item in dex:
count = 1
ccount = 1
if dex[item][len(dex[item])-1]-dex[item][0] <1:
count = len(dex[item])
else:
for t in range(len(dex[item])-1):
for tt in range(len(dex[item])-t-1):
if dex[item][tt+t+1]-dex[item][t] <1:
ccount += 1
else: break
if ccount>count:
count=ccount
ccount=1
counts.append((item, count))
counts.sort(key=lambda x: x[0])
print('\n'.join([str(x[1]) for x in counts]))
Output:
2
3
I do think you can make your life easier in the future by using Python's standard library, though :)

Identifying duplicate items in a list

I want to figure out how to identify any case of identical items in a list.
Currently, there is a list of people and I want to first identify their surnames and put their surnames in a separate list called list_surnames.
Then I want to loop through that list and figure out whether there are instances of people having the same surname and if so I would add that to the amount value.
this code currently does not identify cases of duplication in that list.
Should be said I am brand new to learning programming, I apologize if code is horrible
group = ["Jonas Hansen", "Bo Klaus Nilsen", "Ida Kari Lund Toftegaard", "Ole Hansen"]
amount = 0
list_surnames = []
for names in group:
new_list = names.split(" ")
extract_surname = new_list[-1:]
for i in extract_surname:
list_surnames.append(i)
for x in list_surnames:
if x == list_surnames:
amount += 1
print(list_surnames)
print(amount)
You can use the Counter to count
from collections import Counter
l = ["Jonas Hansen", "Bo Klaus Nilsen", "Ida Kari Lund Toftegaard", "Ole Hansen"]
last = [names.split()[-1] for names in l]
print(last)
c = Counter(last)
print(c)

String to dictionary word count and display

I have a homework question which asks:
Write a function print_word_counts(filename) that takes the name of a
file as a parameter and prints an alphabetically ordered list of all
words in the document converted to lower case plus their occurrence
counts (this is how many times each word appears in the file).
I am able to get an out of order set of each word with it's occurrence; however when I sort it and make it so each word is on a new line the count disappears.
import re
def print_word_counts(filename):
input_file = open(filename, 'r')
source_string = input_file.read().lower()
input_file.close()
words = re.findall('[a-zA-Z]+', source_string)
counts = {}
for word in words:
counts[word] = counts.get(word, 0) + 1
sorted_count = sorted(counts)
print("\n".join(sorted_count))
When I run this code I get:
a
aborigines
absence
absolutely
accept
after
and so on.
What I need is:
a: 4
aborigines: 1
absence: 1
absolutely: 1
accept: 1
after: 1
I'm not sure how to sort it and keep the values.
It's a homework question, so I can't give you the full answer, but here's enough to get you started. Your mistake is in this line
sorted_count = sorted(counts)
Firstly, you cant sort a dictionary by nature. Secondly, what this does is take the keys of the dictionary, sorts them, and returns a list.
You can just print the value of counts, or, if you really need them in sorted order, consider changing the dictionary items into a list, then sorting them.
lst = list(count.items())
#sort and return lst

Resources