Identifying duplicate items in a list - python-3.x

I want to figure out how to identify any case of identical items in a list.
Currently, there is a list of people and I want to first identify their surnames and put their surnames in a separate list called list_surnames.
Then I want to loop through that list and figure out whether there are instances of people having the same surname and if so I would add that to the amount value.
this code currently does not identify cases of duplication in that list.
Should be said I am brand new to learning programming, I apologize if code is horrible
group = ["Jonas Hansen", "Bo Klaus Nilsen", "Ida Kari Lund Toftegaard", "Ole Hansen"]
amount = 0
list_surnames = []
for names in group:
new_list = names.split(" ")
extract_surname = new_list[-1:]
for i in extract_surname:
list_surnames.append(i)
for x in list_surnames:
if x == list_surnames:
amount += 1
print(list_surnames)
print(amount)

You can use the Counter to count
from collections import Counter
l = ["Jonas Hansen", "Bo Klaus Nilsen", "Ida Kari Lund Toftegaard", "Ole Hansen"]
last = [names.split()[-1] for names in l]
print(last)
c = Counter(last)
print(c)

Related

Merge every x element in multiple lists to return new list

I'm writing a script that scrapes all of the data from my works ticketing site and the end goal is to have it send a text when a new ticket enters the bucket with all of the important info of the ticket.
Python 3.10
So far, it pulls from a scattered list and combines all of the elements into an appropriate group ie. ticket numbers,titles and priorities.
tn = rawTickets[0::14]
title = rawTickets[5::14]
priority = rawTickets[9::14]
With this I can say
num = x
wholeticket = tn[num], title[num], priority[num],
print(wholeticket)
and get x ticket in the list
# Results: "tn0, title0, priority0"
I want it to print all of the available tickets in the list based on a range
totaltickets = 0
for lines in rawTickets:
if lines == '':
totaltickets += 1
numrange = range(totaltickets)
so lets say there are only 3 tickets in the queue,
I want it to print
tn0, title0, priority0,
tn1, title1, priority1,
tn2, title2, priority2,
But I want to avoid doing this;
ticket1 = tn[0], title[0], priority[0],
ticket2 = tn[1], title[1], priority[1],
ticket3 = tn[2], title[2], priority[2],
flowchart to help explain
You could use zip:
tickets = list(zip(rawTickets[0::14], rawTickets[5::14], rawTickets[9::14]))
This will give you a list of 3-tuples.
You could do something like that:
l1 = [*range(0,5)]
l2 = [*range(5,10)]
l3 = [*range(10,15)]
all_lst = [(l1[i], l2[i], l3[i]) for i in range(len(l1))]
Or you could use zip as trincot offered.
Note that on large scales, zip is much faster.

How to make a list of students who have passed all the exams in the file

I have file with lines like:
1. 'abc0123,spja,40'
2. 'sed0898,spja,15'
3. 'sed0898,spja,10'
4. 'abc0123,udbs,10'
5. 'bem0334,dim,18'
6. 'bem0334,dim,0'
7. 'bem0334,spja,30'
etc. first word before comma means student login, second mean subject of exam and third means points for exam. One row represents one attempt on exam. I need return only students who passed on exams to which they tried. Doesn't matter on order by lines. In case above passed students bem0334 and sed0898. For passing student must have 15 and more points. So i started with saving lines into list of strings but i don't know how to test if students has passed on all his exams. `
def vrat_uspesne(soubor_vysledky):
f = open(soubor_vysledky, "r")
studens = []
exams = []
tmp = ""
for line in f:
spliter = line.split(',')
exams.append(line.rstrip('\n'))
student.append(spliter[0])
student = set(student)
student = list(student)
return student
You appear to have a typo in that code snippet (student vs students).
The general approach I would suggest is to map lines to data structs, then group the data by student login using a dictionary.

Search the nth number of string in side the another list in python

add name, where is a string denoting a contact name. This must store as a new contact in the application.
find partial, where is a string denoting a partial name to search the application for. It must count the number of contacts starting with and print the count on a new line.
Given sequential add and find operations, perform each operation in order.
Input:
4
add hack
add hackerrank
find hac
find hak
Sample Output
2
0
We perform the following sequence of operations:
1.Add a contact named hack.
2.Add a contact named hackerrank.
3.Find and print the number of contact names beginning with hac.
There are currently two contact names in the application
and both of them start with hac, so we print 2 on a new line.
4.Find and print the number of contact names beginning with hak.
There are currently two contact names in the application
but neither of them start with hak, so we print 0 on a new line.
i solved it but it is taking long time for large number of string. my code is
addlist =[]
findlist=[]
n = int(input().strip())
for a0 in range(n):
op, contact = input().strip().split(' ')
if(op=='add'):
addlist.append(contact)
else:
findlist.append(contact)
for item in findlist:
count=0
count=[count+1 for item2 in addlist if item in item2 if item==item2[0:len(item)]]
print(sum(count))
is there any other way to avoid the long time to computation.
As far as optimizing goes I broke your code apart a bit for readability and removed a redundant if statement. I'm not sure if its possible to optimize any further.
addlist =[]
findlist=[]
n = int(input().strip())
for a0 in range(n):
op, contact = input().strip().split(' ')
if(op=='add'):
addlist.append(contact)
else:
findlist.append(contact)
for item in findlist:
count = 0
for item2 in addlist:
if item == item2[0:len(item)]:
count += 1
print(count)
I tested 10562 entries at once and it processed instantly so if it lags for you it can be blamed on your processor

String to dictionary word count and display

I have a homework question which asks:
Write a function print_word_counts(filename) that takes the name of a
file as a parameter and prints an alphabetically ordered list of all
words in the document converted to lower case plus their occurrence
counts (this is how many times each word appears in the file).
I am able to get an out of order set of each word with it's occurrence; however when I sort it and make it so each word is on a new line the count disappears.
import re
def print_word_counts(filename):
input_file = open(filename, 'r')
source_string = input_file.read().lower()
input_file.close()
words = re.findall('[a-zA-Z]+', source_string)
counts = {}
for word in words:
counts[word] = counts.get(word, 0) + 1
sorted_count = sorted(counts)
print("\n".join(sorted_count))
When I run this code I get:
a
aborigines
absence
absolutely
accept
after
and so on.
What I need is:
a: 4
aborigines: 1
absence: 1
absolutely: 1
accept: 1
after: 1
I'm not sure how to sort it and keep the values.
It's a homework question, so I can't give you the full answer, but here's enough to get you started. Your mistake is in this line
sorted_count = sorted(counts)
Firstly, you cant sort a dictionary by nature. Secondly, what this does is take the keys of the dictionary, sorts them, and returns a list.
You can just print the value of counts, or, if you really need them in sorted order, consider changing the dictionary items into a list, then sorting them.
lst = list(count.items())
#sort and return lst

Sequential Search 2 Lists

So for homework we were asked to write a function that takes 2 lists as an input and use a sequential/linear search to go through them and if any name appeared in both lists to append that name to a new list. For the actual assignment two classes are specified as VoterList and VoterName, thus not allowing us to use 'in' and only VoterNames can be appended to a VoterList. (This task is going to be developed into finding people who voted twice in two different voting booths for an election).
So I have written a function that seems to work when I pass in 3-4 person long lists but I'm not sure that it is actually a sequential search working how it should be. Would be awesome for some advice. Cheers
def fraud_detect_seq(first_booth_voters, second_booth_voters):
fraud = []
length_first = len(first_booth_voters)
length_second = len(second_booth_voters)
first_booth_position = 0
second_booth_position = 0
while first_booth_position < length_first:
name_comparison = first_booth_voters[first_booth_position]
if second_booth_position == length_second:
first_booth_position += 1
second_booth_position = 0
elif second_booth_voters[second_booth_position] == name_comparison:
fraud.append(second_booth_voters[second_booth_position])
first_booth_position += 1
second_booth_position += 1
elif second_booth_voters[second_booth_position] != name_comparison:
second_booth_position += 1
print(fraud)
fraud_detect_seq(['Jackson', 'Dylan', 'Alice'],['Jackson', 'ylan', 'Alice'])
Gets the output:
['Jackson', 'Alice']
Which is correct. But I feel like I'm not doing it right.
def fraud_detect_seq(first_booth_voters, second_booth_voters):
fraud = []
for voter in first_booth_voters:
if voter in second_booth_voters:
fraud.append(voter)
This is a really simple way of checking if they are in both lists. There isn't a wrong way to write the program but since you're using python you might as well us the most "pythonic". For loops are incredibly useful in python for checking membership in lists.

Resources