Memory leak where CPython extension returns a 'PyList_New' instance to Python, which is never deallocated

Memory leak where CPython extension returns a 'PyList_New' instance to Python, which is never deallocated - memory-leaks

I've been trying to debug a memory leak for a few days and I'm running out of ideas.
High-level: I've written a CPython extension that allows querying against binary data files, and it returns the results as a Python list of objects. Usage is similar to this psuedocode:
for config in configurations:
s = Strategy(config)
for date in alldates:
data = extension.getData(date)
# do analysis on 'data', capture/save statistics
I've used tracemalloc, memory_profiler, objgraph, sys.getrefcount, and gc.get_referrers to try to find the root cause, and these tools all point to this extension as source of an exorbitant amount of memory (many gigs). For context, a single record in the binary file is 64 bytes, there are typically 390 records per day, so each date iteration is working with ~24K bytes. Now, there are many iterations happening (synchronously), but in each iteration the data is used as a local variable, so I expected each subsequent assignment to deallocate the previous object. The results from memory_profile suggest otherwise...
Line # Mem usage Increment Occurences Line Contents
============================================================
86 33.7 MiB 33.7 MiB 1 #profile
87 def evaluate(self, date: int, filterConfidence: bool, limitToMaxPositions: bool, verbose: bool) -> None:
92 112.7 MiB 0.0 MiB 101 for symbol in self.symbols:
93 111.7 MiB 0.0 MiB 100 fromdate: int = TradingDays.getAdjacentDay(date, -(self.config.analysisPeriod - 1))
94 111.7 MiB 0.0 MiB 100 throughdate: int = date
95
96 111.7 MiB 0.0 MiB 100 maxtime: int = self.config.maxTimeToGain
97 111.7 MiB 0.0 MiB 100 target: float = self.config.profitTarget
98 111.7 MiB 0.0 MiB 100 islong: bool = self.config.isLongStrategy
99
100 111.7 MiB 0.8 MiB 100 avgtime: Optional[int] = FileStore.getAverageTime(symbol, maxtime, target, islong, fromdate, throughdate, verbose)
101 111.7 MiB 0.0 MiB 100 if avgtime is None:
102 110.7 MiB 0.0 MiB 11 continue
103
104 112.7 MiB 78.3 MiB 89 weightedModel: WeightedModel = self.testAverageTimes(symbol, avgtime, fromdate, throughdate)
105 112.7 MiB 0.0 MiB 89 if weightedModel is not None:
106 112.7 MiB 0.0 MiB 88 self.watchlist.append(weightedModel)
107 112.7 MiB 0.0 MiB 88 self.averageTimes[symbol] = avgtime
108
109 112.7 MiB 0.0 MiB 1 if verbose:
110 print('\nFull Evaluation Results')
111 print(self.getWatchlistTableString())
112
113 112.7 MiB 0.0 MiB 1 self.watchlist.sort(key=WeightedModel.sortKey, reverse=True)
114
115 112.7 MiB 0.0 MiB 1 if filterConfidence:
116 112.7 MiB 0.0 MiB 91 self.watchlist = [ m for m in self.watchlist if m.getConfidence() >= self.config.winRate ]
117
118 112.7 MiB 0.0 MiB 1 if limitToMaxPositions:
119 self.watchlist = self.watchlist[:self.config.maxPositions]
120
121 112.7 MiB 0.0 MiB 1 return
This is from the first iteration of the evaluate function (there are 30 iterations total). Line 104 is where it seems to be accumulating memory. What's strange is that the weightedModel contains only basic stats about the data queried, and that data is stored in a loop-local variable. I can't figure out why the memory used is not cleaned up after each inner iteration.
I've tried to del the objects in question after an iteration completes, but it has no effect. The refcount does seem high for the containing objects, and gc.get_referrers shows an object as referring to itself (?).
I'm happy to provide additional information/code, but I've tried so many things at this point a braindump would be a complete mess :) I'm hoping someone with more experience might be able to help me focus my thought process.
Cheers!

Found it! The leak was one layer deeper, where the extension function builds an instance of a Python object.
This was the leaky version:
PyObject* obj = PyObject_CallObject(PRICEBAR_CLASS_DEF, args);
PyObject_SetAttrString(obj, "id", PyLong_FromLong(bar->id));
# a bunch of other attrs...
return obj;
This is the fixed version:
PyObject* obj = PyObject_CallObject(PRICEBAR_CLASS_DEF, args);
PyObject* id = PyLong_FromLong(bar->id);
# others...
PyObject_SetAttrString(obj, "id", id);
# others...
Py_DECREF(id);
# others...
return obj;
For some reason I had it in my head that the PyLong_FromLong function did NOT increment the ref count of the resulting object, but this is apparently not true. This is how I wound up with an extra reference count for every bar object that was created.

Related

Memory usage of List vs Generator seems almost same. why?

I have the following python code. I am trying to understand python generators. If my understanding is correct the print_list will take much more memory than print_generator. I am using memory profiler to profile the two functions below.
from memory_profiler import profile
import logging
my_list = [i for i in range(100000)]
my_generator = (i for i in range(1000000))
#profile
def print_generator():
try:
while True:
item = next(my_generator)
logging.info(item)
except StopIteration:
pass
finally:
print('Printed all elements')
#profile
def print_list():
for item in my_list:
logging.info(item)
pass
logging.basicConfig(filename='app.log', filemode='w', format='%(name)s - %(levelname)s - %(message)s')
print_list()
print_generator()
The result of the profiling is pasted below.
Memory usage for the generator.
Line # Mem usage Increment Occurences Line Contents
============================================================
10 23.0 MiB 23.0 MiB 1 #profile
11 def print_generator():
12 23.0 MiB 0.0 MiB 1 try:
13 23.0 MiB 0.0 MiB 1 while True:
14 23.0 MiB -26026.5 MiB 1000001 item = next(my_generator)
15 23.0 MiB -26026.5 MiB 1000000 logging.info(item)
16 23.0 MiB -0.1 MiB 1 except StopIteration:
17 23.0 MiB 0.0 MiB 1 pass
18 finally:
19 23.0 MiB 0.0 MiB 1 print('Printed all elements')
Memory usage for the list
Line # Mem usage Increment Occurences Line Contents
============================================================
22 23.0 MiB 23.0 MiB 1 #profile
23 def print_list():
24 23.0 MiB 0.0 MiB 100001 for item in my_list:
25 23.0 MiB 0.0 MiB 100000 logging.info(item)
26 23.0 MiB 0.0 MiB 100000 pass
The memory usage for the list and the generator seems almost idendical.
So what am I missing here?. Why is generator using less memory?

How to reduce/free memory when using xarray datasets?

This is the output of the memory profiler of a function in my code, using xarray (v.0.16.1) datasets:
Line # Mem usage Increment Line Contents
================================================
139 94.195 MiB 94.195 MiB #profile
140 def getMaps(ncfile):
141 335.914 MiB 241.719 MiB myCMEMSdata = xr.open_dataset(ncfile).resample(time='3H').reduce(np.mean)
142
143 335.945 MiB 0.031 MiB plt.figure(figsize=(20.48, 10.24))
144
145 # projection, lat/lon extents and resolution of polygons to draw
146 # resolutions: c - crude, l - low, i - intermediate, h - high, f - full
147 336.809 MiB 0.863 MiB map = Basemap(projection='merc', llcrnrlon=-10.,
148 335.945 MiB 0.000 MiB llcrnrlat=30., urcrnrlon=36.5, urcrnrlat=46.)
149
150
151 339.773 MiB 2.965 MiB X, Y = np.meshgrid(myCMEMSdata.longitude.values,
152 336.809 MiB 0.000 MiB myCMEMSdata.latitude.values)
153 348.023 MiB 8.250 MiB x, y = map(X, Y)
154
155 # reduce arrows density (1 out of 15)
156 348.023 MiB 0.000 MiB yy = np.arange(0, y.shape[0], 15)
157 348.023 MiB 0.000 MiB xx = np.arange(0, x.shape[1], 15)
158 348.023 MiB 0.000 MiB points = np.meshgrid(yy,xx)
159
160 #cycle time to save maps
161 348.023 MiB 0.000 MiB i=0
162 742.566 MiB 0.000 MiB while i < myCMEMSdata.time.values.size:
163 742.566 MiB 305.996 MiB map.shadedrelief(scale=0.65)
164 #waves height
165 742.566 MiB 0.000 MiB waveH = myCMEMSdata.VHM0.values[i, :, :]
166 742.566 MiB 0.000 MiB my_cmap = plt.get_cmap('rainbow')
167 742.566 MiB 0.043 MiB map.pcolormesh(x, y, waveH, cmap=my_cmap, norm=matplotlib.colors.LogNorm(vmin=0.07, vmax=4.,clip=True))
168 # waves direction
169 742.566 MiB 0.000 MiB wDir = myCMEMSdata.VMDR.values[i, :, :]
170 742.566 MiB 0.242 MiB map.quiver(x[tuple(points)],y[tuple(points)],np.cos(np.deg2rad(270-wDir[tuple(points)])),np.sin(np.deg2rad(270-wDir[tuple(points)])),
171 742.566 MiB 0.000 MiB edgecolor='lightgray', minshaft=4, width=0.007, headwidth=3., headlength=4., linewidth=.5)
172 # save plot
173 742.566 MiB 0.000 MiB filename = pd.to_datetime(myCMEMSdata.time[i].values).strftime("%Y-%m-%d_%H")
174 742.566 MiB 0.086 MiB plt.show()
175 742.566 MiB 39.406 MiB plt.savefig(TEMPDIR+filename+".jpg", quality=75)
176 742.566 MiB 0.000 MiB plt.clf()
177 742.566 MiB 0.000 MiB del wDir
178 742.566 MiB 0.000 MiB del waveH
179 742.566 MiB 0.000 MiB i += 1
180
181 #out of loop
182 581.840 MiB 0.000 MiB plt.close("all")
183 581.840 MiB 0.000 MiB myCMEMSdata.close()
184 441.961 MiB 0.000 MiB del myCMEMSdata
As you can see the allocated memory is not freed up, and after many runs of the program, it simply fails ("Killed") for low memory.
How can I free the memory allocated by the dataset? I am using either dataset.close() and deleting the variable, with no success.

Too much memory used when dealing with BeautifulSoup4 in Python 3

I wrote a script which is fetching the HTML content of a page and analyze it.
Running this code in a loop, where I gave several URLs, I notice the memory usage was growing too much and too quickly.
Profiling and debugging the code with several tools, I notice the problem comes from the bit of code that is using BeautifulSoup4, at least I think.
Line Mem usage Increment Line Contents
59 40.5 MiB 40.5 MiB #profile
60 def crawl(self):
70 40.6 MiB 0.0 MiB self.add_url_to_crawl(self.base_url)
71
72 291.8 MiB 0.0 MiB for url in self.page_queue:
74 267.4 MiB 0.0 MiB if url in self.crawled_urls:
75 continue
76
77 267.4 MiB 0.0 MiB page = Page(url=url, base_domain=self.base_url, log=self.log)
78
79 267.4 MiB 0.0 MiB if page.parsed_url.netloc != page.base_domain.netloc:
80 continue
81
82 291.8 MiB 40.1 MiB page.analyze()
83
84 291.8 MiB 0.0 MiB self.content_hashes[page.content_hash].add(page.url)
94
95 # Add crawled page links into the queue
96 291.8 MiB 0.0 MiB for url in page.links:
97 291.8 MiB 0.0 MiB self.add_url_to_crawl(url)
98
100 291.8 MiB 0.0 MiB self.crawled_pages.append(page.getData())
101 291.8 MiB 0.0 MiB self.crawled_urls.add(page.url)
102
103 291.8 MiB 0.0 MiB mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
104 291.8 MiB 0.0 MiB print('Memory usage is: {0} KB'.format(mem))
Here is what line 104 is printing each run:
Memory usage is: 69216 KB
Memory usage is: 92092 KB
Memory usage is: 105796 KB
Memory usage is: 134704 KB
Memory usage is: 158604 KB
Memory usage is: 184068 KB
Memory usage is: 225324 KB
Memory usage is: 248708 KB
Memory usage is: 273780 KB
Memory usage is: 298768 KB
Using tracemalloc in the main file which is calling all the modules and running the crawl method above, I got the following list from tracemalloc.snapshot():
/usr/lib/python3.8/site-packages/bs4/element.py:744: size=23.3 MiB, count=210391, average=116 B
/usr/lib/python3.8/site-packages/bs4/builder/__init__.py:215: size=17.3 MiB, count=335036, average=54 B
/usr/lib/python3.8/site-packages/bs4/element.py:628: size=9028 KiB, count=132476, average=70 B
/usr/lib/python3.8/html/parser.py:327: size=7804 KiB, count=147140, average=54 B
/usr/lib/python3.8/site-packages/bs4/element.py:121: size=6727 KiB, count=132476, average=52 B
/usr/lib/python3.8/site-packages/bs4/element.py:117: size=6702 KiB, count=40848, average=168 B
/usr/lib/python3.8/html/parser.py:324: size=6285 KiB, count=85986, average=75 B
/usr/lib/python3.8/site-packages/bs4/element.py:772: size=5754 KiB, count=105215, average=56 B
/usr/lib/python3.8/html/parser.py:314: size=5334 KiB, count=105196, average=52 B
/usr/lib/python3.8/site-packages/bs4/__init__.py:587: size=4932 KiB, count=105197, average=48 B
Most of the files listed above are under the /bs4/ folder. Now, from the moment the page variable (line 82) is not stored anywhere, page.getData() is returning a dictionary and page.url a string, why do I have BeautifulSoup getting so much space in memory?
In line 72 you can see how the memory usage changed from ~40MB to ~291MB (considering the loop processed 10 URLs), it's a big change considering that the data I'm actually saving are a small dictionary and a string.
Am I having a problem with the garbage collector or I wrote something wrong?
I'm not very practical with Python, so I hope the point I've made with the profiling and debugging is correct.

Improving memory usage during serialization (Data.Binary)

I'm still kinda new to Haskell and learning new things every day. My problem is a too high memory usage during seralization using the Data.Binary library. Maybe I'm just using the library the wrong way, but I can't figure it out.
The actual idea is, that I read binary data from disk, add new data und write everything back to disk. Here's the code:
module Main
where
import Data.Binary
import System.Environment
import Data.List (foldl')
data DualNo = DualNo Int Int deriving (Show)
instance Data.Binary.Binary DualNo where
put (DualNo a b) = do
put a
put b
get = do
a <- get
b <- get
return (DualNo a b)
-- read DualNo from HDD
readData :: FilePath -> IO [DualNo]
readData filename = do
no <- decodeFile filename :: IO [DualNo]
return no
-- write DualNo to HDD
writeData :: [DualNo] -> String -> IO ()
writeData no filename = encodeFile filename (no :: [DualNo])
writeEmptyDataToDisk :: String -> IO ()
writeEmptyDataToDisk filename = writeData [] filename
-- feed a the list with a new dataset
feedWithInputData :: [DualNo] -> [(Int, Int)] -> [DualNo]
feedWithInputData existData newData = foldl' func existData newData
where
func dataset (a,b) = DualNo a b : dataset
main :: IO ()
main = do
[newInputData, toPutIntoExistingData] <- System.Environment.getArgs
if toPutIntoExistingData == "empty"
then writeEmptyDataToDisk "myData.dat"
else return ()
loadedData <- readData "myData.dat"
newData <- return (case newInputData of
"dataset1" -> feedWithInputData loadedData dataset1
"dataset2" -> feedWithInputData loadedData dataset2
otherwise -> feedWithInputData loadedData dataset3)
writeData newData "myData.dat"
dataset1 = zip [1..100000] [2,4..200000]
dataset2 = zip [5,10..500000] [3,6..300000]
dataset3 = zip [4,8..400000] [6,12..600000]
I'm pretty sure, there's a lot to improve in this code. But my biggest problem is the memory usage with big datasets.
I profiled my programm with GHC.
$ ghc -O2 --make -prof -fprof-auto -auto-all -caf-all -rtsopts -fforce-recomp Main.hs
$ ./Main dataset1 empty +RTS -p -sstderr
165,085,864 bytes allocated in the heap
70,643,992 bytes copied during GC
12,298,128 bytes maximum residency (7 sample(s))
424,696 bytes maximum slop
35 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 306 colls, 0 par 0.035s 0.035s 0.0001s 0.0015s
Gen 1 7 colls, 0 par 0.053s 0.053s 0.0076s 0.0180s
INIT time 0.001s ( 0.001s elapsed)
MUT time 0.059s ( 0.062s elapsed)
GC time 0.088s ( 0.088s elapsed)
RP time 0.000s ( 0.000s elapsed)
PROF time 0.000s ( 0.000s elapsed)
EXIT time 0.003s ( 0.003s elapsed)
Total time 0.154s ( 0.154s elapsed)
%GC time 57.0% (57.3% elapsed)
Alloc rate 2,781,155,968 bytes per MUT second
Productivity 42.3% of total user, 42.5% of total elapsed
Looking at the prof-file:
Tue Apr 12 18:11 2016 Time and Allocation Profiling Report (Final)
Main +RTS -p -sstderr -RTS dataset1 empty
total time = 0.06 secs (60 ticks # 1000 us, 1 processor)
total alloc = 102,613,008 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
put Main 48.3 53.0
writeData Main 30.0 18.8
dataset1 Main 13.3 23.4
feedWithInputData Main 6.7 0.0
feedWithInputData.func Main 1.7 4.7
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 68 0 0.0 0.0 100.0 100.0
main Main 137 0 0.0 0.0 86.7 76.6
feedWithInputData Main 150 1 6.7 0.0 8.3 4.7
feedWithInputData.func Main 154 100000 1.7 4.7 1.7 4.7
writeData Main 148 1 30.0 18.8 78.3 71.8
put Main 155 100000 48.3 53.0 48.3 53.0
readData Main 147 0 0.0 0.1 0.0 0.1
writeEmptyDataToDisk Main 142 0 0.0 0.0 0.0 0.1
writeData Main 143 0 0.0 0.1 0.0 0.1
CAF:main1 Main 133 0 0.0 0.0 0.0 0.0
main Main 136 1 0.0 0.0 0.0 0.0
CAF:main2 Main 132 0 0.0 0.0 0.0 0.0
main Main 139 0 0.0 0.0 0.0 0.0
writeEmptyDataToDisk Main 140 1 0.0 0.0 0.0 0.0
writeData Main 141 1 0.0 0.0 0.0 0.0
CAF:main7 Main 131 0 0.0 0.0 0.0 0.0
main Main 145 0 0.0 0.0 0.0 0.0
readData Main 146 1 0.0 0.0 0.0 0.0
CAF:dataset1 Main 123 0 0.0 0.0 5.0 7.8
dataset1 Main 151 1 5.0 7.8 5.0 7.8
CAF:dataset4 Main 122 0 0.0 0.0 5.0 7.8
dataset1 Main 153 0 5.0 7.8 5.0 7.8
CAF:dataset5 Main 121 0 0.0 0.0 3.3 7.8
dataset1 Main 152 0 3.3 7.8 3.3 7.8
CAF:main4 Main 116 0 0.0 0.0 0.0 0.0
main Main 138 0 0.0 0.0 0.0 0.0
CAF:main6 Main 115 0 0.0 0.0 0.0 0.0
main Main 149 0 0.0 0.0 0.0 0.0
CAF:main3 Main 113 0 0.0 0.0 0.0 0.0
main Main 144 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 107 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 103 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 101 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 94 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 86 0 0.0 0.0 0.0 0.0
Now I add further data:
$ ./Main dataset2 myData.dat +RTS -p -sstderr
343,601,008 bytes allocated in the heap
175,650,728 bytes copied during GC
34,113,936 bytes maximum residency (8 sample(s))
971,896 bytes maximum slop
78 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 640 colls, 0 par 0.082s 0.083s 0.0001s 0.0017s
Gen 1 8 colls, 0 par 0.140s 0.141s 0.0176s 0.0484s
INIT time 0.001s ( 0.001s elapsed)
MUT time 0.138s ( 0.139s elapsed)
GC time 0.221s ( 0.224s elapsed)
RP time 0.000s ( 0.000s elapsed)
PROF time 0.000s ( 0.000s elapsed)
EXIT time 0.006s ( 0.006s elapsed)
Total time 0.370s ( 0.370s elapsed)
%GC time 59.8% (60.5% elapsed)
Alloc rate 2,485,518,518 bytes per MUT second
Productivity 39.9% of total user, 39.8% of total elapsed
Looking at the new prof-file:
Tue Apr 12 18:15 2016 Time and Allocation Profiling Report (Final)
Main +RTS -p -sstderr -RTS dataset2 myData.dat
total time = 0.14 secs (139 ticks # 1000 us, 1 processor)
total alloc = 213,866,232 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
put Main 41.0 50.9
writeData Main 25.9 18.0
get Main 25.2 16.8
dataset2 Main 4.3 11.2
readData Main 1.4 0.8
feedWithInputData.func Main 1.4 2.2
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 68 0 0.0 0.0 100.0 100.0
main Main 137 0 0.0 0.0 95.7 88.8
feedWithInputData Main 148 1 0.7 0.0 2.2 2.2
feedWithInputData.func Main 152 100000 1.4 2.2 1.4 2.2
writeData Main 145 1 25.9 18.0 66.9 68.9
put Main 153 200000 41.0 50.9 41.0 50.9
readData Main 141 0 1.4 0.8 26.6 17.6
get Main 144 0 25.2 16.8 25.2 16.8
CAF:main1 Main 133 0 0.0 0.0 0.0 0.0
main Main 136 1 0.0 0.0 0.0 0.0
CAF:main7 Main 131 0 0.0 0.0 0.0 0.0
main Main 139 0 0.0 0.0 0.0 0.0
readData Main 140 1 0.0 0.0 0.0 0.0
CAF:dataset2 Main 126 0 0.0 0.0 0.7 3.7
dataset2 Main 149 1 0.7 3.7 0.7 3.7
CAF:dataset6 Main 125 0 0.0 0.0 2.2 3.7
dataset2 Main 151 0 2.2 3.7 2.2 3.7
CAF:dataset7 Main 124 0 0.0 0.0 1.4 3.7
dataset2 Main 150 0 1.4 3.7 1.4 3.7
CAF:$fBinaryDualNo1 Main 120 0 0.0 0.0 0.0 0.0
get Main 143 1 0.0 0.0 0.0 0.0
CAF:main4 Main 116 0 0.0 0.0 0.0 0.0
main Main 138 0 0.0 0.0 0.0 0.0
CAF:main6 Main 115 0 0.0 0.0 0.0 0.0
main Main 146 0 0.0 0.0 0.0 0.0
CAF:main5 Main 114 0 0.0 0.0 0.0 0.0
main Main 147 0 0.0 0.0 0.0 0.0
CAF:main3 Main 113 0 0.0 0.0 0.0 0.0
main Main 142 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 107 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 103 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 101 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 94 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 86 0 0.0 0.0 0.0 0.0
The more often I add new data, the higher the memory usage becomes. I mean, it's clear, that I need more memory for a bigger dataset. But isn't there a better solution for this problem (like gradually writing data back to disk).
Edit:
Actually the most important thing, that bothers me, is the following observation:
I run the program for the first time and add new data to an existing (empty) file on my disk.
The size of the saved file on my disk is: 1.53 MByte.
But (looking at the first prof-file) the program allocated more than 102 MByte. More than 50% was allocated by the put function from the Data.Binary package.
I run the program a second time and add new data to an existing (not empty) file on my disk.
The size of the saved file on my disk is 3.05 MByte.
But (looking at the second prof-file) the program allocated more than 213 MByte. More than 66% was allocated by the put and get function together.
=> Conclusion: In the first example I needed 102/1.53 = 66 times more memory running the program than space for the binary file on my disk.
In the second example I needed 213/3.05 = 69 times more memory running the programm than space for the binary file on my disk.
Question:
Is the Data.Binary package for serialization so efficient (and awesome), that it can decrease the needed memory to such an extent.
Analogous question:
Do I really need so much more memory for loading the data in my program than space for the the same data in a binary-file on disk?

How to find the average of every six cells in Excel

This is a relatively common question so I don't want to be voted down for asking something that has been asked before. I will explain as I go along the steps I took to answer this question using StackOver Flow and other sources so that you can see that I have made attempts to solve it without solving a question.
I have a set of values as below:
O P Q "R" Z
6307 586 92.07 1.34
3578 195 94.83 6.00
3147 234 93.08 4.29
3852 227 94.43 15.00
3843 171 95.74 5.10
3511 179 95.15 7.18
6446 648 90.87 1.44
4501 414 91.58 0.38
3435 212 94.19 6.23
I want to take the average of the first six values in row "R" and then put that average in the sixth column in the sixth row of Z as such:
O P Q "R" Z
6307 586 92.07 1.34
3578 195 94.83 6.00
3147 234 93.08 4.29
3852 227 94.43 15.00
3843 171 95.74 5.10
3511 179 95.15 7.18 6.49
6446 648 90.87 1.44
4501 414 91.58 0.38
3435 212 94.19 6.23
414 414 91.58 3.49
212 212 94.19 11.78
231 231 93.44 -1.59 3.6
191 191 94.59 2.68
176 176 91.45 .75
707 707 91.96 2.68
792 420 90.95 0.75
598 598 92.15 7.45
763 763 90.66 -4.02
652 652 91.01 3.75
858 445 58.43 2.30 2.30
I have utilized the following formula I obtained
=AVERAGE(OFFSET(R1510,COUNTA(R:R)-6,0,6,1))
but I received an answer that was different from what I obtained by simply taking the average of the six previous cells as such:
=AVERAGE(R1505:R1510)
I then tried the following code from a Stack OverFlow (excel averaging every 10 rows) conversation that was tangentially similar to what I wanted
=AVERAGE(INDEX(R:R,1+6*(ROW()-ROW($B$1))):INDEX(R:R,10*(ROW()- ROW($B$1)+1)))
but I was unable to get an answer that resembled what I got from taking a rote
==AVERAGE(R1517:R1522)
I also found another approach in the following but was unable to accurately change the coding (F3 to R1510, for example)
=AVERAGE(OFFSET(F3,COUNTA($R$1510:$R$1517)-1,,-6,))
Doing so gave me a negative number for a clearly positive set of data. It was -6.95.

Put this in Z1 and copy down:
=IF(MOD(ROW(),6)=0,AVERAGE(INDEX(R:R,ROW()-5):INDEX(R:R,ROW())),"")

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Memory leak where CPython extension returns a 'PyList_New' instance to Python, which is never deallocated - memory-leaks

Related

Memory usage of List vs Generator seems almost same. why?

How to reduce/free memory when using xarray datasets?

Too much memory used when dealing with BeautifulSoup4 in Python 3

Improving memory usage during serialization (Data.Binary)

How to find the average of every six cells in Excel

Categories

Resources