Time Complexity for Python built-ins? - python-3.x

Is there any good reference resource to know the time complexity of Python's built-in functions like dict.fromkeys(), .lower()? I found links like this UCI resource which lists time-complexity for basic list & set operations but of course, not for all built-ins. I also found Python Reference - The Right Way but most of references have #TODO for time complexity.
I also tried reading the source code of python built-ins to figure out how the functions like dict.fromkeys() are implemented but felt lost.

This is a great place to start:
https://wiki.python.org/moin/TimeComplexity
It says that Get Item is O(1) and Iteration is O(n) (Average Case).
So then, if with .fromkeys() you iterate over just the keys of the dict, then make those the keys of a new dict, while also setting values, I'd think that you'd have between O(n) and O(2n), where n is the number of keys in the first dict.
Sorry that I can't offer more than conjecture, but hopefully that link is helpful.

Related

python beginner : Dynamically modifying a python list with indexing/slicing to preform arithmetic operators [duplicate]

This question already has answers here:
Sum a list of numbers in Python
(26 answers)
Closed last month.
I have been stuck trying to write code that will dynamically take user input from a list and preform general arithmetic operators. In order to work around this I used indexing and slicing which did solve my problem temporarily but a new problem rose from doing this.
listgrades= []
num_students = int(input("How many students are you evaluating?"))
def student_info():
for i in range(0, num_students):
student_name=input("Enter your name here: ")
studnet_age=input("Enter your age here: ")
student_total_grade=int(float(input("What is your total grade")))
listgrades.append(student_total_grade)
student_info()
grades_sum= (listgrades[0] + listgrades[1] + listgrades[2]) / num_students
print(f"The average of all the student grades is {grades_sum}")
`
I'm trying to change the (listgrades[0] + listgrades[1] + listgrades[2]) to something more changeable, workable and scalable
I was trying to look and find a solution or a way to work around this but I hit a dead end and I ran out of ideas at this point.
I think a loop of some sorts might work for this but I'm not sure.
side note: I kinda looked into numpy and I can't use it since my school lab computers won't allow anything out of the default python module library.
I have some general advice and some specific suggestions.
For general advice, to see what built in methods are available is to use python itself.
At the command line type python3
then, within python type dir(list)
and you will see the available methods for lists.
You get more detailed information about any specific method by typing help and the class you are interested in followed by a dot, than the method name. For example help(list.count).
You can also type help(list) to get a more in depth list of all the functions and instructions for use. Type space to continue to the next screen, and b to backup a screen. Type q to end the help screens.
To exit python type exit()
More specifically, I agree that a loop would be a more dynamic direction to go, given you are asking how many students to evaluate in your input.
One way to loop through your list would be:
for x in listgrades:
sum = sum + x
Of course, you can perform other math operations inside the loop, or in similar loops. Presumably you will initialize your value before the loop.
At some point you may need to count how many grades are in your list. Fortunately, there's a built-in method for lists that gives you that information called count. You'll see this if you use python's dir(list) or help(list).
number_of_grades = listgrades.count
I think this will point you in the direction you were thinking with the code, without giving away much in exactly how, which is what you are learning. Best of luck!

What are the upsides of generators in python 3?

I know that you can use generators/list comprehensions as filters. You can do a lot with lists but what can you do with generators? Python would only make such thing as a generator if it is useful.
The biggest benefit of a generator is that it doesn't need to reserve memory for every element of a sequence, it generates each item as needed.
Because of this, a generator doesn't need to have a defined size. It can generate an infinite sequence if needed.

Way to use bisect module for sets in python

I was looking for something similar to lower_bound() function for sets in
python, as we have in C++.
Task is to have a ds, which inserts element in sorted manner, storing only single instance of each distinct value, and returns the left neighbor of a given value, both operations in O(logn) worst time in python.
python: something similar to bisect module for lists, with efficient insertion may work.
sets are unordered, and the standard lib does not offer tree structures.
Maybe you could look at sorted containers (3rd party lib): http://www.grantjenks.com/docs/sortedcontainers/ it might offer a good approach to your problem.

How to implement efficient string interning in f#?

What is to implement a custom string type in f# for interning strings. i have to read large csv files into memory. Given most of the columns are categorical, values are repeating and it makes sense to create new string first time it is encountered and only refer to it on subsequent occurrences to save memory.
In c# I do this by creating a global intern pool (concurrent dict) and before setting a value, lookup the dictionary if it already exists. if it exists, just point to the string already in the dictionary. if not, add it to the dictionary and set the value to the string just added to dictionary.
New to f# and wondering what is the best way to do this in f#. will be using the new string type in records named tuples etc and it will have to work with concurrent processes.
Edit:
String.Intern uses the Intern Pool. My understanding is, it is not very efficient for large pools and is not garbage collected i.e. any/all interned strings will remain in intern pool for lifetime of the app. Imagine a an application where you read a file, perform some operations and write data. Using Intern Pool solution will probably work. Now imagine you have to do the same 100 times and the strings in each file have little in common. If the memory is allocated on heap, after processing each file, we can force garbage collector to clear unnecessary strings.
I should have mentioned I could not really figure out how to do the C# approach in F# (other than implementing a C# type and using it in F#)
Memorisation pattern is slightly different from what I am looking for? We are not caching calculated results - we are ensuring each string object is created no more than once and all subsequent creations of same string are just references to the original. Using a dictionary to do this is a one way and using String.Intern is other.
sorry if is am missing something obvious here.
I have a few things to say, so I'll post them as an answer.
First, I guess String.Intern works just as well in F# as in C#.
let x = "abc"
let y = StringBuilder("a").Append("bc").ToString()
printfn "1 : %A" (LanguagePrimitives.PhysicalEquality x y) // false
let y2 = String.Intern y
printfn "2 : %A" (LanguagePrimitives.PhysicalEquality x y2) // true
Second, are you using a dictionary in combination with String.Intern in your C# solution? If so, why not just do s = String.Intern(s); after the string is ready following input from file?
To create a type for use in your business domain to handle string deduplication in general is a very bad idea. You don't want your business domain polluted by that kind of low level stuff.
As for rolling your own. I did that some years ago, probably to avoid that problem you mentioned with the strings not being garbage collected, but I never tested if that actually was a problem.
It might be a good idea to use a dictionary (or something) for each column (or type of column) where the same values are likely to repeat in great numbers. (This is pretty much what you said already.)
It makes sense to only keep these dictionaries live while you read the information from file, and stuff it into internal data structures. You might be thinking that you need the dictionaries for subsequent reads, but I am not so sure about that.
The important thing is to deduplicate the great majority of strings, and not necessarily every single duplicate. Because of this you can greatly simplify the solution as indicated. You most probably have nothing to gain by overcomplicating your solution to squeeze out the last fraction of memory savings.
Releasing the dictionaries after the file is read and structures filled, will have the advantage of not holding on to strings when they are no longer really needed. And of course you save memory by not holding onto the dictionaries.
I see no need to handle concurrency issues in the implementation here. String.Intern must necessarily be immune to concurrency issues. If you roll your own with the design suggested, you would not use it concurrently. Each file being read would have its own set of dictionaries for its columns.

Why is it not possible to get dictionary values in O(1) time?

Can we write a data structure which will search directly by taking the values in O(1) time?
For example, in this code in python3, we can get morse code by taking the keys and output the values.
morse={'A':'.-','B':'-...','C':'-.-.','D':'-..','E':'.',\
'F':'..-.','G':'--.','H':'....','I':'..','J':'.---',\
'K':'-.-','L':'.-..','M':'--','N':'_.','O':'---',\
'P':'.--.','Q':'--.-','R':'.-.','S':'...','T':'-',\
'U':'..-','V':'...-','W':'.--','X':'-..-','Y':'-.--',\
'Z':'--..','1':'.---','2':'..---','3':'...--','4':'....-',\
'5':'.....','6':'-....','7':'--...','8':'---..','9':'----.',\
'0':'----'}
n=input()
n=''.join(i.upper() for i in n if i!=' ')
for i in n:
print(morse[i],end=' ')
This gives the output:
>>>
S O S
... --- ...
If we want to search by taking the morse code as input and giving the string as output:
>>>
... --- ...
S O S
how do we do that without making another dictionary of morse code?
Please provide the proper reasoning and what are the limitations if any.
Python dictionaries are hashmaps behind the scenes. The keys are hashed to achieve O(1) lookups. The same is not done for values for a few reasons, one of which is the reason #CrakC mentioned: the dict doesn't have to have unique values. Maintaining an automatic reverse lookup would be nonconsistent at best. Another reason could be that fundamental data structures are best kept to a minimum set of operations for predictability reasons.
Hence the correct & common pattern is to maintain a separate dict with key-value pairs reversed if you want to have reverse lookups in O(1). If you cannot do that, you'll have to settle for greater time complexities.
Yes, getting the name of the key from its value in a dictionary is not possible in python. The reason for this is quite obvious. The keys in a dictionary are unique in nature i.e., there cannot be two entries in the dictionary for the same key. But the inverse is not always true. Unique keys might have non-unique values. It should be noted here that the immutable nature of the keys actually defines the structure of the dictionary. Since they are unique in nature, they can be indexed and so fetching the value of a given key executes in O(1) time. The inverse, as explained above, cannot be realized in O(1) time and will always take an average time of O(n). The most important point that you should know here is that python dictionaries are not meant to be used this way.
Further reading: http://stupidpythonideas.blogspot.in/2014/07/reverse-dictionary-lookup-and-more-on.html
Can we write a data structure which will search directly by taking the values in O(1) time?
The answer to that question would be yes, and it's a HasMap or HashTable.
Following your example, what actually happens there is that Python Dictionaries are implemented as HashMap's. From that follows that search complexity is O(1) but, as I understand, your real problem is how to search the key by the value in O(1) too. Well, being dictionaries implemented as hashmaps, if Python provided (I am not 100% sure it doesn't) that reverse searching functionality it wouldn't be O(1) because HashMaps are not designed to provide it.
It can be shown looking at how HashMaps work: you would need a hashing function which would map the key and the value to the same index in the array which, if not impossible, is pretty hard to do.
I guess that your best option is to define de inverse dictionary. It's not that uncommon to sacrifice memory to achieve better times.
As CrakC has correctly stated it is not possible to get the key from the dictionary in O(1) time, you will need to traverse the dictionary once in O(n) time in order to search for the key in the dictionary. As you do not want to create another dictionary this would be your only option.

Resources