python deepcopy not deepcopying user classes? - python-3.x

I will get straight to the example that made me ask such a question:
Python 3.6.6 (default, Jul 19 2018, 14:25:17)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from copy import deepcopy
In [2]: class Container:
...: def __init__(self, x):
...: self.x = x
...:
In [3]: anobj = Container("something")
In [4]: outobj = Container(anobj)
In [5]: copy = deepcopy(outobj)
In [6]: id(copy) == id(outobj)
Out[6]: False
In [7]: id(copy.x) == id(outobj.x)
Out[7]: False
In [8]: id(copy.x.x) == id(outobj.x.x)
Out[8]: True
As per the documentation of deepcopy, I was expecting the last line to have False as a response, i.e. that deepcopy would also clone the string.
Why it is not the case?
How can I obtain the desired behaviour? My original code has various levels of nesting custom classes with "final" attributes that are predefined types.
Thanks in advance

At least in CPython, the ID points to the object address in memory. Because Python strings are immutable, deepcopy won't create a different ID. There's really no need to create a different string in memory to hold the exact same data.
The same happens for tuples that only hold immutable objects, for example:
>>> from copy import deepcopy
>>> a = (1, -1)
>>> b = deepcopy(a)
>>> id(a) == id(b)
True
If you tuple holds mutable objets, that won't happen:
>>> a = (1, [])
>>> b = deepcopy(a)
>>> id(a) == id(b)
False
So in the end the answer is: deepcopy is working just fine for your classes, you just found a gotcha about copying immutable objects.

Related

I am learning python and I appreciate if someone explains why I got the following output regarding tuple. They seem kind of contradictory to me [duplicate]

Two variables in Python have the same id:
a = 10
b = 10
a is b
>>> True
If I take two lists:
a = [1, 2, 3]
b = [1, 2, 3]
a is b
>>> False
according to this link Senderle answered that immutable object references have the same id and mutable objects like lists have different ids.
So now according to his answer, tuples should have the same ids - meaning:
a = (1, 2, 3)
b = (1, 2, 3)
a is b
>>> False
Ideally, as tuples are not mutable, it should return True, but it is returning False!
What is the explanation?
Immutable objects don't have the same id, and as a matter of fact this is not true for any type of objects that you define separately. Generally speaking, every time you define an object in Python, you'll create a new object with a new identity. However, for the sake of optimization (mostly) there are some exceptions for small integers (between -5 and 256) and interned strings, with a special length --usually less than 20 characters--* which are singletons and have the same id (actually one object with multiple pointers). You can check this like following:
>>> 30 is (20 + 10)
True
>>> 300 is (200 + 100)
False
>>> 'aa' * 2 is 'a' * 4
True
>>> 'aa' * 20 is 'a' * 40
False
And for a custom object:
>>> class A:
... pass
...
>>> A() is A() # Every time you create an instance you'll have a new instance with new identity
False
Also note that the is operator will check the object's identity, not the value. If you want to check the value you should use ==:
>>> 300 == 3*100
True
And since there is no such optimizational or interning rule for tuples or any mutable type for that matter, if you define two same tuples in any size they'll get their own identities, hence different objects:
>>> a = (1,)
>>> b = (1,)
>>>
>>> a is b
False
It's also worth mentioning that rules of "singleton integers" and "interned strings" are true even when they've been defined within an iterator.
>>> a = (100, 700, 400)
>>>
>>> b = (100, 700, 400)
>>>
>>> a[0] is b[0]
True
>>> a[1] is b[1]
False
* A good and detailed article on this: http://guilload.com/python-string-interning/
Immutable != same object.*
An immutable object is simply an object whose state cannot be altered; and that is all. When a new object is created, a new address will be assigned to it. As such, checking if the addresses are equal with is will return False.
The fact that 1 is 1 or "a" is "a" returns True is due to integer caching and string interning performed by Python so do not let it confuse you; it is not related with the objects in question being mutable/immutable.
*Empty immutable objects do refer to the same object and their isness does return true, this is a special implementation specific case, though.
Take a look at this code:
>>> a = (1, 2, 3)
>>> b = (1, 2, 3)
>>> c = a
>>> id(a)
178153080L
>>> id(b)
178098040L
>>> id(c)
178153080L
In order to figure out why a is c is evaluated as True whereas a is b yields False I strongly recommend you to run step-by-step the snippet above in the Online Python Tutor. The graphical representation of the objects in memory will provide you with a deeper insight into this issue (I'm attaching a screenshot).
According to the documentation, immutables may have same id but it is not guaranteed that they do. Mutables always have different ids.
https://docs.python.org/3/reference/datamodel.html#objects-values-and-types
Types affect almost all aspects of object behavior. Even the importance of object identity is affected in some sense: for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed.
In previous versions of Python, tuples were assigned different IDs. (Pre 3.7)
As of Python 3.7+, two variables with the same tuple assigned may have the same id:
>>>a = (1, 2, 3)
>>>b = (1, 2, 3)
>>>a is b
True
Integers above 256 also have different ids:
>>>a = 123
>>>b = 123
>>>a is b
True
>>>
>>>a = 257
>>>b = 257
>>>a is b
False
Check below code..
tupils a and b are retaining their older references(ID) back when we have assigned their older values back. (BUT, THIS WILL NOT BE THE CASE WITH LISTS AS THEY ARE MUTABLE)
Initially a and b have same values ( (1,2) ), but they have difference IDs. After alteration to their values, when we reassign value (1,2) to a and b, they are now referencing to THEIR OWN same IDs (88264264 and 88283400 respectively).
>>> a = (1,2)
>>> b = (1,2)
>>> a , b
((1, 2), (1, 2))
>>> id(a)
88264264
>>> id(b)
88283400
>>> a = (3,4)
>>> b = (3,4)
>>> id(a)
88280008
>>> id(b)
88264328
>>> a = (1,2)
>>> b = (1,2)
>>> id(a)
88264264
>>> id(b)
88283400
>>> a , b
((1, 2), (1, 2))
>>> id(a) , id(b)
(88264264, 88283400)
>>>
**Check the link Why don't tuples get the same ID when assigned the same values?
also after reading this. Another case also been discussed here.

np.copy(obj) vs obj.copy() vs copy.copy(obj) vs copy.deepcopy(obj)

np.copy(obj) vs obj.copy() vs copy.copy(obj) vs copy.deepcopy(obj)
I see that there are basically four methods we can use for copying object in Python.
I am not crystal clear about the differences between the four.
Someone please explain the differences from the ground.
Thanks.
TL;DR they differ in copy method:
Shallow copy:
numpy.copy()
(dict | list | ...).copy()
copy.copy()
Deep copy:
copy.deepcopy()
Also by the purpose:
Copy data structure:
numpy.copy()
(dict | list | ...).copy()
Copy object (data structure ⊂ object):
copy.copy()
copy.deepcopy()
Plus, copy.copy() or copy.deepcopy() will internally call obj.__copy__() and obj.__deepcopy__() methods respectively, if exists - Which means user classes can control the copying behavior.
Explanation about Shallow copy & Deep copy
There's 2 kind of copy in python: Shallow copy and Deep copy.
From Python documents:
The difference between shallow and deep copying is only relevant for compound objects (objects that contain other objects, like lists or class instances):
A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.
Such difference is best shown when we copy class instance.
>>> class SomeClass:
... pass
>>> a = [SomeClass()]
>>> b = a.copy()
>>> a[0] == b[0]
True
>>> id(a[0])
2778700770576
>>> id(b[0])
2778700770576
# Not actually copied, referencing to same instance.
>>> import copy
>>> a = [SomeClass()]
>>> b = copy.deepcopy(a)
>>> a[0] == b[0]
False
>>> id(a[0])
2778695702544
>>> id(b[0])
2778717746032
# Actually copied into different instance
For shallow copy, if your data contain compound object - most commonly list, dict, user classes, etc - and you have to work on it, make sure to use Deep copy to avoid things like below:
>>> a = [[0], [0]]
>>> b = a.copy()
>>> b[0].append(10)
>>> a
[[0, 10], [0]]
>>> b
[[0, 10], [0]]
# ---
>>> a = [[0], [0]]
>>> b = copy.deepcopy(a)
>>> b[0].append(10)
>>> a
[[0], [0]]
>>> b
[[0, 10], [0]]

Appending value to a list based on dictionary key

I started writing Python scripts for my research this past summer, and have been picking up the language as I go. For my current work, I have a dictionary of lists, sample_range_dict, that is initialized with descriptor_cols as the keys and empty lists for values. Sample code is below:
import numpy as np
import pandas as pd
def rangeFunc(arr):
return (np.max(arr) - np.min(arr))
df_sample = pd.DataFrame(np.random.rand(2000, 4), columns=list("ABCD")) #random dataframe for testing
col_list = df_sample.columns
sample_range_dict = dict.fromkeys(col_list, []) #creates dictionary where each key pairs with an empty list
rand_df = df_sample.sample(n=20) #make a new dataframe with 20 random rows of df_sample
I want to go through each column from rand_df and calculate the range of values, putting each range in the list with the specified column name (e.g. sample_range_dict["A"] = [range in column A]). The following is the code I initially thought to use for this:
for d in col_list:
sample_range_dict[d].append(rangeFunc(rand_df[d].tolist()))
However, instead of each key having one item in the list, printing sample_range_dict shows each key having an identical list of 4 values:
{'A': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744],
'B': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744],
'C': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744],
'D': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744]}
I've determined that the first value is the range for "A", second value is the range for "B", and so on. My question is about why this is happening, and how I could rewrite the code in order to get one item in the list for each key.
P.S. I'm looking to make this an iterative process, hence using lists instead of single numbers.
The issue is this line:
sample_range_dict = dict.fromkeys(col_list, [])
You only created one list. You don't have four lists with the same elements; you have one list, and four references to it. When you add to it via one reference, the element is visible through the other references, because it's the same list:
>>> a = dict.fromkeys(['x', 'y', 'z'], [])
>>> a['x'] is a['y']
True
>>> a['x'].append(5)
>>> a['y']
[5]
If you want each key to have a different list, either create a new list for each key:
>>> a = { k: [] for k in ['x', 'y', 'z'] }
>>> a['x'] is a['y']
False
>>> a['x'].append(5)
>>> a['y']
[]
Or use a defaultdict which will do it for you:
>>> from collections import defaultdict
>>> a = defaultdict(list)
>>> a['x'] is a['y']
False
>>> a['x'].append(5)
>>> a['y']
[]

Assign a pandas dataframe to an object as a static class variable - memory use (Python)

I have an Python object called DNA. I want to create 100 instances of DNA. Each of the instances contains a pandas dataframe that is identical for all instances. To avoid duplication, I want to incorporate this dataframe as a static/class attribute.
import pandas as pd
some_df = pd.DataFrame()
class DNA(object):
df = some_variable # Do i declare here?
def __init__(self,df = pd.DataFrame(), name='1'):
self.name = name
self.instance_df = instance_df # I want to avoid this
DNA.some_df = df # Does this duplicate the data for every instance?
What is the correct way to do this?
Can I use the init function to create the class variable? Or will it create a separate class variable for every instance of the class?
Do I need to declare the class variable between the 'class..' and 'def init(...)'?
Some other way?
I want to be able to change the dataframe that I use as a class variable but once the class is loaded, it needs to reference the same value (i.e. the same memory) in all instances.
I've answered your question in the comments:
import pandas as pd
some_df = pd.DataFrame()
class DNA(object):
df = some_variable # You assign here. I would use `some_df`
def __init__(self,df = pd.DataFrame(), name='1'):
self.name = name
self.instance_df = instance_df # Yes, avoid this
DNA.some_df = df # This does not duplicate, assignment **never copies in Python** However, I advise against this
So, using
DNA.some_df = df
inside __init__ does work. Since default arguments are evaluated only once at function definition time, that df is always the same df, unless you explicitly pass a new df to __init__, but that smacks of bad design to me. Rather, you probably want something like:
class DNA(object):
def __init__(self,df = pd.DataFrame(), name='1'):
self.name = name
<some work to construct a dataframe>
df = final_processing_function()
DNA.df = df
Suppose, then you want to change it, at any point you can use:
DNA.df = new_df
Note:
In [5]: class A:
...: pass
...:
In [6]: a1 = A()
In [7]: a2 = A()
In [8]: a3 = A()
In [9]: A.class_member = 42
In [10]: a1.class_member
Out[11]: 42
In [11]: a2.class_member
Out[11]: 42
In [12]: a3.class_member
Out[12]: 42
Be careful, though, when you assign to an instance Python takes you at your word:
In [14]: a2.class_member = 'foo' # this shadows the class variable with an instance variable in this instance...
In [15]: a1.class_member
Out[15]: 42
In [16]: a2.class_member # really an instance variable now!
Out[16]: 'foo'
And that is reflected by examining the namespace of the instances and the class object itself:
In [17]: a1.__dict__
Out[17]: {}
In [18]: a2.__dict__
Out[18]: {'class_member': 'foo'}
In [19]: A.__dict__
Out[19]:
mappingproxy({'__dict__': <attribute '__dict__' of 'A' objects>,
'__doc__': None,
'__module__': '__main__',
'__weakref__': <attribute '__weakref__' of 'A' objects>,
'class_member': 42})

How to turn a string lists into a lists?

There are other threads about turning strings inside a lists into different data types. I want to turn a string that is in the form of a lists into a lists. Like this: "[5,1,4,1]" = [5,1,4,1]
I need this because I am writing a program that requires the user to input a lists
Example of problem:
>>> x = input()
[3,4,1,5]
>>> x
'[3,4,1,5]'
>>> type(x)
<class 'str'>
If you mean evaluate python objects like this:
x = eval('[3,4,1,5]');
print (x);
print(type(x) is list)
[3, 4, 1, 5]
True
Use this with caution as it can execute anything user will input. Better use a parser to get native lists. Use JSON for input and parse it.
Use eval() for your purpose. eval() is used for converting code within a string to real code:
>>> mystring = '[3, 5, 1, 2, 3]'
>>> mylist = eval(mystring)
>>> mylist
[3, 5, 1, 2, 3]
>>> mystring = '{4: "hello", 2:"bye"}'
>>> eval(mystring)[4]
'hello'
>>>
Use exec() to actually run functions:
>>> while True:
... inp = raw_input('Enter your input: ')
... exec(inp)
...
Enter your input: print 'hello'
hello
Enter your input: x = 1
Enter your input: print x
1
Enter your input: import math
Enter your input: print math.sqrt(4)
2.0
In your scenario:
>>> x = input()
[3,4,1,5]
>>> x = eval(x)
>>> x
[3, 4, 1, 5]
>>> type(x)
<type 'list'>
>>>
Thanks for your input guys, but I would prefer not to eval() because it is unsafe.
Someone actually posted the answer that allowed me to solve this but then they deleted it. I am going to reposts that answer:
values = input("Enter values as lists here")
l1 = json.loads(values)
You can use ast.literal_eval for this purpose.
Safely evaluate an expression node or a string containing a Python
expression. The string or node provided may only consist of the
following Python literal structures: strings, bytes, numbers, tuples,
lists, dicts, sets, booleans, and None.
This can be used for safely evaluating strings containing Python
expressions from untrusted sources without the need to parse the
values oneself.
>>> import ast
>>> val = ast.literal_eval('[1,2,3]')
>>> val
[1, 2, 3]
Just remember to check that it's actually a list:
>>> isinstance(val, list)
True

Resources