Destroying python object after removing it from list of objects - python-3.x

I have an object:
class Flow:
this object belongs to list of objects flowList = [Flow1, Flow2, Flow3]
If I do:
del flowList[0]
Do I need to destroy as well Flow1 object? I want my script to process thousands of flows and be memory efficient. Processed flow should die after processing as it won't be needed anymore.

No, this is not required, and there are no reliable ways that I know of of explicitly freeing certain memory anyways.
Python uses Garbage Collection, which means it automatically disposes of objects when they are no longer needed. Once you're done with an object, you can simply leave it. If the only reference to Flow1 was the one inside the list, del flowList[0] is sufficient to allow it to be freed. If there's another reference outside of the list, you can do del Other_Flow1 to delete the other reference to allow it to be freed.
If you have something like this:
huge_list = [1, 2, 3, . . .]
use_huge_list(huge_list)
You could add something like this after the call to use_huge_list to help free huge_list more readily:
del huge_list
This deletes the reference huge_list to the [1, 2, 3, . . .] object. If that's the only reference to the object, it will allow it to be freed. I'd only do this though if memory is a huge concern, and huge_list will remain in scope for an extended period of time, but is never used (although, that may be a smell that suggests that you need to refactor anyway).
I would not use del like that constantly though. It's unnecessary to delete labels in 99% of cases. Don't overthink it. Unless you've profiled and know for sure that objects staying in memory are causing you problems, don't worry about it.

Related

Python - How to remove reference to an object without copying it

I have fallowing problem:
In Ray software, it is needed to remove all references to an object, to release memory that is used by the object. So, I need to:
def some_func():
# results is very complex object with array, lists, dictionaries
results = ray.get(hash_for_given_job)
result_copy = copy.deepcopy(results)
del results
return results_copy
Nevertheless, it causes that the object will occupy twice of memory at some point in time. Therefore increasing RAM memory usage. How to remove a reference, and return an object, without copying it ?
I strongly suspect you don't actually need to deep copy results. Objects in the ray object store are immutable. When you call ray.get() you're already creating a copy of the object, thus if you do something like
results = ray.get(hash_for_given_job)
results_copy = ray.get(hash_for_given_job)
results.a = "hello"
assert results_copy.a != results_copy.b
This would hold true regardless of whether results and results_copy are in the same function, different tasks, different machines, etc.

How can I share the data without locking whole part of it?

Consider the following scenarios.
let mut map = HashMap::new();
map.insert(2,5);
thread::scope(|s|{
s.spawn(|_|{
map.insert(1,5);
});
s.spawn(|_|{
let d = map.get(&2).unwrap();
});
}).unwrap();
This code cannot be compiled because we borrow the variable map mutably in h1 and borrow again in h2. The classical solution is wrapping map by Arc<Mutex<...>>. But in the above code, we don't need to lock whole hashmap. Because, although two threads concurrently access to same hashmap, they access completely different region of it.
So I want to share map through thread without using lock, but how can I acquire it? I'm also open to use unsafe rust...
in the above code, we don't need to lock whole hashmap
Actually, we do.
Every insert into the HashMap may possibly trigger its reallocation, if the map is at that point on its capacity. Now, imagine the following sequence of events:
Second thread calls get and retrieves reference to the value (at runtime it'll be just an address).
First thread calls insert.
Map gets reallocated, the old chunk of memory is now invalid.
Second thread dereferences the previously-retrieved reference - boom, we get UB!
So, if you need to insert something in the map concurrently, you have to synchronize that somehow.
For the standard HashMap, the only way to do this is to lock the whole map, since the reallocation invalidates every element. If you used something like DashMap, which synchronizes access internally and therefore allows inserting through shared reference, this would require no locking from your side - but can be more cumbersome in other parts of API (e.g. you can't return a reference to the value inside the map - get method returns RAII wrapper, which is used for synchronization), and you can run into unexpected deadlocks.

How to explicitly release object after creating it with `ray.put`?

I'm trying to get rid of object pinned in shared memory using ray.put.
Here is code sample:
import ray
<create obj>
for ...:
obj_id = ray.put(obj)
<do stuff with obj_id on ray Actors using ray.get(obj_id)>
del obj_id
After this is finished, I look at ray dashboard and see that all obj_id are still in ray shared memory with reference type LOCAL_REFERENCE.
Official docs do not elaborate on whether there is any way of explicitly controlling object lifetime. As far as I understood, it basically suggests to wait until all memory is used, and then rely on ray to clean things up.
Question: how do I explicitly purge object from ray shared memory?
Note: I'm using Jupyter, can it be the case that this object is still alive due to this fact?
The function ray.internal.internal_api.free() performs this function. I can't find any documentation on the Ray docs for this function but it has a good docstring you can find here which I've copy-pasted below.
Free a list of IDs from the in-process and plasma object stores.
This function is a low-level API which should be used in restricted
scenarios.
If local_only is false, the request will be send to all object stores.
This method will not return any value to indicate whether the deletion is
successful or not. This function is an instruction to the object store. If
some of the objects are in use, the object stores will delete them later
when the ref count is down to 0.
Examples:
>>> x_id = f.remote()
>>> ray.get(x_id) # wait for x to be created first
>>> free([x_id]) # unpin & delete x globally
Args:
object_refs (List[ObjectRef]): List of object refs to delete.
local_only (bool): Whether only deleting the list of objects in local
object store or all object stores.

Placing a small object at the beggining of memory block

I need to store a object describing the memory details of a memory block allocated by sbrk(), at the beggining of the memory block itself.
for example:
metaData det();
void* alloc = sbrk(sizeof(det)+50000);
//a code piece to locate det at the beggining of alocate.
I am not allowed to use placement new, And not allowed to allocate memory using new/malloc etc.
I know that simply assigning it to the memory block would cause undefined behaviour.
I was thinking about using memcpy (i think that can cause problems as det is not dynamicly allocated).
Could assigning a pointer to the object at the beginning work (only if theres no other choise), or memcpy?
thanks.
I am not allowed to use placement new
Placement new is the only way to place an object in an existing block of memory.
[intro.object] An object is created by a definition, by a new-expression, when implicitly changing the active member of a union, or when a temporary object is created
You cannot make a definition to refer to an existing memory region, so that's out.
There's no union, so that's also out.
A temporary object cannot be created in an existing block of memory, so that's also out.
The only remaining way is with a new expression, and of those, only placement new can refer to an existing block of memory.
So you are out of luck as far as the C++ standard goes.
However the same problem exists with malloc. There are tons of code out there that use malloc without bothering to use placement new. Such code just casts the result of malloc to the target type and proceeds from there. This method works in practice, and there is no sign of it being ever broken.
metaData *det = static_cast<metaData *>(alloc);
On an unrelated note, metaData det(); declares a function, and sizeof is not applicable to functions.

Thread safety for arrays in D?

Please bear with me on this as I'm new to this.
I have an array and two threads.
First thread appends new elements to the array when required
myArray ~= newArray;
Second thread removes elements from the array when required:
extractedArray = myArray[0..10];
myArray = myArray[10..myArray.length()];
Is this thread safe?
What happens when the two threads interact on the array at the exact same time?
No, it is not thread-safe. If you share data across threads, then you need to deal with making it thread-safe yourself via facilities such as synchronized statements, synchronized functions, core.atomic, and mutexes.
However, the other major thing that needs to be pointed out is that all data in D is thread-local by default. So, you can't access data across threads unless it's explicitly shared. So, you don't normally have to worry about thread safety at all. It's only when you explicitly share data that it's an issue.
this is not thread safe
this has the classic lost update race:
appending means examening the array to see if it can expand in-place, if not it needs to make a (O(n) time) copy while the copy is busy the other thread can slice of a piece and when the copy is done that piece will return
you should look into using a linked list implementation which are easier to make thread safe
Java's ConcurrentLinkedQueue uses the list described here for it's implementation and you can implement it with the core.atomic.cas() in the standard library
It is not thread-safe. The simplest way to fix this is to surround array operations with the synchronized block. More about it here: http://dlang.org/statement.html#SynchronizedStatement

Resources