Memory leak in IPython.parallel module? - memory-leaks

I'm using IPython.parallel to process a large amount of data on a cluster. The remote function I run looks like:
def evalPoint(point, theta):
# do some complex calculation
return (cost, grad)
which is invoked by this function:
def eval(theta, client, lview, data):
async_results = []
for point in data:
# evaluate current data point
ar = lview.apply_async(evalPoint, point, theta)
async_results.append(ar)
# wait for all results to come back
client.wait(async_results)
# and retrieve their values
values = [ar.get() for ar in async_results]
# unzip data from original tuple
totalCost, totalGrad = zip(*values)
avgGrad = np.mean(totalGrad, axis=0)
avgCost = np.mean(totalCost, axis=0)
return (avgCost, avgGrad)
If I run the code:
client = Client(profile="ssh")
client[:].execute("import numpy as np")
lview = client.load_balanced_view()
for i in xrange(100):
eval(theta, client, lview, data)
the memory usage keeps growing until I eventually run out (76GB of memory). I've simplified evalPoint to do nothing in order to make sure it wasn't the culprit.
The first part of eval was copied from IPython's documentation on how to use the load balancer. The second part (unzipping and averaging) is fairly straight-forward, so I don't think that's responsible for the memory leak. Additionally, I've tried manually deleting objects in eval and calling gc.collect() with no luck.
I was hoping someone with IPython.parallel experience could point out something obvious I'm doing wrong, or would be able to confirm this in fact a memory leak.
Some additional facts:
I'm using Python 2.7.2 on Ubuntu 11.10
I'm using IPython version 0.12
I have engines running on servers 1-3, and the client and hub running on server 1. I get similar results if I keep everything on just server 1.
The only thing I've found similar to a memory leak for IPython had to do with %run, which I believe was fixed in this version of IPython (also, I am not using %run)
update
Also, I tried switching logging from memory to SQLiteDB, in case that was the problem, but still have the same problem.
response(1)
The memory consumption is definitely in the controller (I could verify this by: (a) running the client on another machine, and (b) watching top). I hadn't realized that non SQLiteDB would still consume memory, so I hadn't bothered purging.
If I use DictDB and purge, I still see the memory consumption go up, but at a much slower rate. It was hovering around 2GB for 20 invocations of eval().
If I use MongoDB and purge, it looks like mongod is taking around 4.5GB of memory and ipcluster about 2.5GB.
If I use SQLite and try to purge, I get the following error:
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/hub.py", line 1076, in purge_results
self.db.drop_matching_records(dict(completed={'$ne':None}))
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 359, in drop_matching_records
expr,args = self._render_expression(check)
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 296, in _render_expression
expr = "%s %s"%null_operators[op]
TypeError: not enough arguments for format string
So, I think if I use DictDB, I might be okay (I'm going to try a run tonight). I'm not sure if some memory consumption is still expected or not (I also purge in the client like you suggested).

Is it the controller process that is growing, or the client, or both?
The controller remembers all requests and all results, so the default behavior of storing this information in a simple dict will result in constant growth. Using a db backend (sqlite or preferably mongodb if available) should address this, or the client.purge_results() method can be used to instruct the controller to discard any/all of the result history (this will delete them from the db if you are using one).
The client itself caches all of its own results in its results dict, so this, too, will result in growth over time. Unfortunately, this one is a bit harder to get a handle on, because references can propagate in all sorts of directions, and is not affected by the controller's db backend.
This is a known issue in IPython, but for now, you should be able to clear the references manually by deleting the entries in the client's results/metadata dicts and if your view is sticking around, it has its own results dict:
# ...
# and retrieve their values
values = [ar.get() for ar in async_results]
# clear references to the local cache of results:
for ar in async_results:
for msg_id in ar.msg_ids:
del lview.results[msg_id]
del client.results[msg_id]
del client.metadata[msg_id]
Or, you can purge the entire client-side cache with simple dict.clear():
view.results.clear()
client.results.clear()
client.metadata.clear()
Side note:
Views have their own wait() method, so you shouldn't need to pass the Client to your function at all. Everything should be accessible via the View, and if you really need the client (e.g. for purging the cache), you can get it as view.client.

Related

"shmop_open(): unable to attach or create shared memory segment 'No error':"?

I get this every time I try to create an account to ask this on Stack Overflow:
Oops! Something Bad Happened!
We apologize for any inconvenience, but an unexpected error occurred while you were browsing our site.
It’s not you, it’s us. This is our fault.
That's the reason I post it here. I literally cannot ask it on Overflow, even after spending hours of my day (on and off) repeating my attempts and solving a million reCAPTCHA puzzles. Can you maybe fix this error soon?
With no meaningful/complete examples, and basically no documentation whatsoever, I've been trying to use the "shmop" part of PHP for many years. Now I must find a way to send data between two different CLI PHP scripts running on the same machine, without abusing the database for this. It must work without database support, which means I'm trying to use shmop, but it doesn't work at all:
$shmopid = shmop_open(1, 'w', 0644, 99999); // I have no idea what the "key" is supposed to be. It says: "System's id for the shared memory block. Can be passed as a decimal or hex.", so I've given it a 1 and also tried with 123. It gave an error when I set the size to 64, so I increased it to 99999. That's when the error changed to the one I now face above.
shmop_write($shmopid, 'meow 123', 0); // Write "meow 123" to the shared variable.
while (1)
{
$shared_string = shmop_read($shmopid, 0, 8); // Read the "meow 123", even though it's the same script right now (since this is an example and minimal test).
var_dump($shared_string);
sleep(1);
}
I get the error for the first line:
shmop_open(): unable to attach or create shared memory segment 'No error':
What does that mean? What am I doing wrong? Why is the manual so insanely cryptic for this? Why isn't this just a built-in "superarray" that can be accessed across the scripts?
About CLI:
It cannot work in standalone CLI processes, as an answer here says:
https://stackoverflow.com/a/34533749
The master process is the one to hold the shared memory block, so you will have to use php-fpm or mod_php or some other web/service-running version, and maybe even start/request/stop it all from a CLI php script.
About shmop usage itself:
Use "c" mode in shmop_open() for creating the segment before it can be used with "a" or "w".
I stumbled on this error in a different scenario where shared memory is completely optional to speed up some repeated operations. So I wanted to try reading first without knowing memory size and then allocate from actual data when needed. In my case I had to call it as #shmop_open() to hide this error output.
About shmop on Windows:
PHP 7 crashed Apache worker process causing its restart with status 3221225477 when trying to reallocate a segment with the same predefined (arbitrary number) key but different size, even after shmop_delete(). As a workaround for this, I took filemtime() of the source file which contains data to be stored in memory, and used that timestamp as the key in shmop_open(). It still was not flawless IIRC, and I don't know if it would cause memory leaks, but it was enough to test my code which would mainly run on Linux anyway.
Finally, as of PHP version 8.0.7, shmop seems to work fine with Apache 2.4.46 and mod_php in Windows 10.

1000 rows limit for chef-api module/wrapper

So im using this Node module to connect to chef from my API.
https://github.com/normanjoyner/chef-api
The same contains a method called "partialSearch" which will fetch determined data for all nodes that match a given criteria. The problem I have, on of our environments have 1386 nodes attached to it, but it seems the module only returns 1000 as a maximum.
There does not seem to be any method to "offset" the results. This module works pretty well and its a shame this feature is not implemented since its lack really breaks the utility of such.
Does someone bumped into a similar issue with this module and can advise how to workaround it?
Here its an extract of my code :
chef.config(SetOptions(environment));
console.log("About to search for any servers ...");
chef.partialSearch('node',
{
q: "name:*"
},
{
name: ['name'] ,
'ipaddress': ['ipaddress'] ,
'chef_environment': ['chef_environment'] ,
'ip6address': ['ip6address'],
'run_list': ['run_list'],
'chef_client': ['chef_client'],
'ohai_time': ['ohai_time']
}
, function(err, chefRes) {
Regards!
The maximum is 1000 results per page, you can still request pages in order. The Chef API doesn't have a formal cursor system for pagination so it's just separate requests with a different start value, which can sometimes lead to minor desync (as in an item at the end of one page might shift in ordering and also show up at the start of the next page) so just make sure you handle that. That said, the fancy API in the client library you linked doesn't seem to expose that option, so you'll have to add it or otherwise workaround the problem. Check out https://github.com/sethvargo/chef-api/blob/master/lib/chef-api/resources/partial_search.rb#L34 for a Ruby implementation that does handle it.
We have run into similar issues with Chef libraries. One work-around you might find useful is if you have some node attribute that you can use to segment all of your nodes into smaller groups that are less than 1000.
If you have no such natural segmentation friendly already, a simple implementation would be to create a new attribute called segment and during your chef runs set the attribute's value randomly to a number between 1 and 5.
Now you can perform 5 queries (each query will only search for a single segment) and you should find all your nodes and if the randomness is working each group will be sized about 275 (1386/5).
As your node population grows you'll need to keep increasing the number of segments to ensure the segment sizes are less than 1000.

Objective C For loops with #autorelease and ARC

As part of an app that allows auditors to create findings and associate photos to them (Saved as Base64 strings due to a limitation on the web service) I have to loop through all findings and their photos within an audit and set their sync value to true.
Whilst I perform this loop I see a memory spike jumping from around 40MB up to 500MB (for roughly 350 photos and 255 findings) and this number never goes down. On average our users are creating around 1000 findings and 500-700 photos before attempting to use this feature. I have attempted to use #autorelease pools to keep the memory down but it never seems to get released.
for (Finding * __autoreleasing f in self.audit.findings){
#autoreleasepool {
[f setToSync:#YES];
NSLog(#"%#", f.idFinding);
for (FindingPhoto * __autoreleasing p in f.photos){
#autoreleasepool {
[p setToSync:#YES];
p = nil;
}
}
f = nil;
}
}
The relationships and retain cycles look like this
Audit has a strong reference to Finding
Finding has a weak reference to Audit and a strong reference to FindingPhoto
FindingPhoto has a weak reference to Finding
What am I missing in terms of being able to effectively loop through these objects and set their properties without causing such a huge spike in memory. I'm assuming it's got something to do with so many Base64 strings being loaded into memory when looping through but never being released.
So, first, make sure you have a batch size set on the fetch request. Choose a relatively small number, but not too small because this isn't for UI processing. You want to batch a reasonable number of objects into memory to reduce loading overhead while keeping memory usage down. Try 50 or 100 and see how it goes, then consider upping the batch size a little.
If all of the objects you're loading are managed objects then the correct way to evict them during processing is to turn them into faults. That's done by calling refreshObject:mergeChanges: on the context. BUT - that discards any changes, and your loop is specifically there to make changes.
So, what you should really be doing is batch saving the objects you've modified and then turning those objects back into faults to remove the data from memory.
So, in your loop, keep a counter of how many you've modified and save the context each time you hit that count and refresh all the objects that were processed so far. The batch on the fetch and the batch size to save should be the same number.
There's probably a big difference in size between your "Finding" objects and the associated images. So your primary aim should be to redesign your database in a way so that unfaulting (loading) a Finding object does not automatically load the base64 encoded image.
That's actually one of the major strengths of Code Data: Loading part of an object hierarchy. Just try to move the base64 encoded data to an own (managed) object so that Core Data does not load it. It will still be loaded as needed when the reference is touched.

Give reads priority over writes in Elasticsearch

I have an EC2 server running Elasticsearch 0.9 with a nginx server for read/write access. My index has about 750k small-medium documents. I have a pretty continuous stream of minimal writes (mainly updates) to the content. The speeds/consistency I receive with search is fine with me, but I have some sporadic timeout issues with multi-get (/_mget).
On some pages in my app, our server will request a multi-get of a dozen to a few thousand documents (this usually takes less than 1-2 seconds). The requests that fail, fail with a 30,000 millisecond timeout from the nginx server. I am assuming this happens because the index was temporarily locked for writing/optimizing purposes. Does anyone have any ideas on what I can do here?
A temporary solution would be to lower the timeout and return a user friendly message saying documents couldn't be retrieved (however they still would have to wait ~10 seconds to see an error message).
Some of my other thoughts were to give read priority over writes. Anytime someone is trying to read a part of the index, don't allow any writes/locks to that section. I don't think this would be scalable and it may not even be possible?
Finally, I was thinking I could have a read-only alias and a write-only alias. I can figure out how to set this up through the documentation, but I am not sure if it will actually work like I expect it to (and I'm not sure how I can reliably test it in a local environment). If I set up aliases like this, would the read-only alias still have moments where the index was locked due to information being written through the write-only alias?
I'm sure someone else has come across this before, what is the typical solution to make sure a user can always read data from the index with a higher priority over writes. I would consider increasing our server power, if required. Currently we have 2 m2x-large EC2 instances. One is the primary and the replica, each with 4 shards.
An example dump of cURL info from a failed request (with an error of Operation timed out after 30000 milliseconds with 0 bytes received):
{
"url":"127.0.0.1:9200\/_mget",
"content_type":null,
"http_code":100,
"header_size":25,
"request_size":221,
"filetime":-1,
"ssl_verify_result":0,
"redirect_count":0,
"total_time":30.391506,
"namelookup_time":7.5e-5,
"connect_time":0.0593,
"pretransfer_time":0.059303,
"size_upload":167002,
"size_download":0,
"speed_download":0,
"speed_upload":5495,
"download_content_length":-1,
"upload_content_length":167002,
"starttransfer_time":0.119166,
"redirect_time":0,
"certinfo":[
],
"primary_ip":"127.0.0.1",
"redirect_url":""
}
After more monitoring using the Paramedic plugin, I noticed that I would get timeouts when my CPU would hit ~80-98% (no obvious spikes in indexing/searching traffic). I finally stumbled across a helpful thread on the Elasticsearch forum. It seems this happens when the index is doing a refresh and large merges are occurring.
Merges can be throttled at a cluster or index level and I've updated them from the indicies.store.throttle.max_bytes_per_sec from the default 20mb to 5mb. This can be done during runtime with the cluster update settings API.
PUT /_cluster/settings HTTP/1.1
Host: 127.0.0.1:9200
{
"persistent" : {
"indices.store.throttle.max_bytes_per_sec" : "5mb"
}
}
So far Parmedic is showing a decrease in CPU usage. From an average of ~5-25% down to an average of ~1-5%. Hopefully this can help me avoid the 90%+ spikes I was having lock up my queries before, I'll report back by selecting this answer if I don't have any more problems.
As a side note, I guess I could have opted for more balanced EC2 instances (rather than memory-optimized). I think I'm happy with my current choice, but my next purchase will also take more CPU into account.

oracle query - ORA-01652: unable to extend temp segment but only in some versions of sql*plus

This one has me rather confused. I've written a query which runs fine from my development client but fails on the production client with error "ORA-01652: unable to extend temp segment by....". In both instances, the database and user is the same. On my development machine (MS Windows) I've got SQL*PLUS (Release 9.0.1.4.0) and Toad 9.0 (both using version 9.0.4.0.1 of the oci.dll). Both run the code without errors.
However when I run the same file, against the same database, using the same username/password from a different machine, this time version 10.2.0.4.0 (from the 10.2.0.4-1 Oracle instant client) I get the error.
It does occur reproducibly.
Unfortunately I've only got limited access to the dictionary views on the database which is set up as read-only (can't even get an explain plan!).
I've tried working around the problem by tuning the query (I suspect that there is a large interim result set which is subsequently trimmed down) but have not managed to change the behaviour at either client.
It may be possible to deploy a different version of the client on the machine causing the problems - but currently that looks like downgrading to a previous version.
Any ideas?
TIA
Update
Based on Gary's answer below, I had a look at the glogin.sql scripts - the only difference was that 'SET SQLPLUSCOMPATIBILITY 8.1.7' was present on the working client but absent on failing client - but adding it in did not resolve the problem.
I also tried
alter session set workarea_size_policy=manual;
alter session set hash_area_size=1048576000;
and
alter session set sort_area_size=1048576000;
to no avail :(
Update 2
I managed to find the same behaviour, this time talking to an Oracle 8i backend. In this case the database was RW. That allowed me to confirm that the different clients were, as I suspected, generating different plans. But why????
Looking at the output of 'SHOW PARAMETERS' both clients reported exactly the same settings!
Years ago I worked on a DR database that was fully READONLY, and even the TEMP tablespace wasn't writable. Any query that tried to spill to temp would fail (even if the temp space to be used was pretty trivial).
If this is the same situation, I wouldn't be surprised if there was a login.sql (or glogin.sql or a logon trigger) that does an ALTER SESSION to set larger PGA memory value for the session, and/or changes the optimizer goal to FIRST_ROWS.
If you can, compare the results of the following from both clients:
select * from v$parameter
where ismodified != 'FALSE';
Also from each client for the problem SQL, try EXPLAIN PLAN FOR SELECT...
and SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
See if it is coming up with different query plans.
Not really an answer - but a bit more information....
Our local DBAs were able to confirm that the 16Gb (!) TEMP tablespace was indeed being used and had filled up, but only from the Linux clients (I was able to recreate the error making an oci8 call from PHP). In the case of the sqlplus client I was actually using exactly the same file to run the query on both clients (copied via scp without text conversion - so line endings were CRLF - i.e. byte for byte the same as was running on the Windows client).
So the only rational solution was that the 2 client stacks were resulting in different execution plans!
Running the query from both clients approx simultaeneously on a DBMS with very little load gave the same result - meaning that the two clients also generated different sqlids for the query.
(and also Oracle was ignoring my hints - I hate when it does that).
There is no way Oracle should be doing this - even if it were doing some internal munging of the query before presenting it to the DBMS (which would give rise to the different sqlids) the client stack used should be totally transparent regarding the choice of an execution plan - this should only ever change based on the content of the query and the state of the DBMS.
The problem was complicated by not being to see any explain plans - but for the query to use up so much temporary tablespace, it had to be doing a very ugly join (at least partially cartesian) before filtering the resultset. Adding hints to override this had no effect. So I resolved the problem by splitting the query into 2 cursors and doing a nested lookup using PL/SQL. A very ugly solution, but it solved my immediate problem. Fortunately I just need to generate a text file.
For the benefit of anyone finding themselves in a similar pickle:
BEGIN
DECLARE
CURSOR query_outer IS
SELECT some_primary_key, some_other_stuff
FROM atable
WHERE....
CURSOR query_details (p_some_pk) IS
SELECT COUNT(*), SUM(avalue)
FROM btable
WHERE fk=p_some_pk
AND....
FOR m IN query_outer
LOOP
FOR n IN query_details(m.some_primary_key)
LOOP
dbms_out.put_line(....);
END LOOP;
END LOOP;
END;
The more I use Oracle, the more I hate it!
A bit more information - I've run into the same problem connecting to a different database - this time an Oracle 8i. And I can get EXPLAIN plans.
Although in this case the query completed on both clients, it took 50% longer running from Linux sql*plus 10.2.0.4.0 vs WinXP sql*plus 8.0.6.0
As I suspected, the plans are different - but both are using the default settings on the client, both are using the same optimizer mode. It seems to think the plan generated from the Linux client has a lower cost than that from the XP client (although it does take longer to run). How does the optimizer prune the search path for potential plans?

Resources