Maximum number of activewidth object allowed - python-3.x

My program runs fine with limited data but when I put in all four databases activewidth won't work.
Database 1 has 29990 entries.
Database 2 has around 27000 entries.
Database 3 has roughly 17000 entries.
Database 4 has 430 entries.
Each database, grouped by its kind and includes business type, name, address, city, state, phone number, longitude, latitude, sales tax info, and daily hours of operation.
In total 12.1Mb of data.
With database 1 only in the program it works fine and I can scroll over a point on the map and activewidth will increase the size of the dot and the program will bring up the underlying data on the left hand side of the screen just like it is suppose to do.
Now that I have added in all four maps and can click them on and off separately, with only #1 turned on activewidth won't work and the underlying data won't show on the left. The points on the map are there and I can click through all four checkbuttons and turn on and off the points. I currently don't have the code in yet for the underlying data on database 2-4, just the ability to turn them on and off. Only activewidth isn't working now that I've got it so I can view the points for all 4 databases.
I decided to try commenting out all code for databases 2-4 and see what would happen and it went back to working fine again. Then I went and added in database 2 back into the mix and it went back to not working again. Then I tried database 2 only and it was activewidthing fine as long as database 1 was commented out. With database 1 active the activewidth was very slow to work/not working at all.
Is there a feasible maximum number of entries I can use. Hopefully not because I still have several more databases to get finished off and added into the program that will take the total number of entries up over 100K before all is said and done.
Nothing else makes sense since I'm just changing self.alocation to self.blocation, when I go to add in database 2-4. I'm just changing the identifier to show which database is being worked with and copying the rest of the code over between routines since everything is the same...just different business separated into appropriately grouped databases. It seems it's in the amount of data that is being used and not in the anything doing with the way the program is written.
I figured by splitting up the files, not only for my benefit but also to make the files smaller it would help alleviate the problem but so far it hasn't. Is there any other way to work around data overload?
self.alocation = []
for x in range(0, len(self.atype)):
pix1x = round((float(self.along[x])+(-self.longitudecenter+(self.p/2)))/(self.p/714),0)
pix1y = round((((self.p/2) + self.latitudecenter-(float(self.alat[x])))/(self.p/714)),0)
z = self.canvas.create_line(pix1x, pix1y, pix1x+4, pix1y+4, activewidth="10.0", fill = '', width = 5)
self.alocation.append((z,x))

Related

How to aggregate data by period in a rrdtool graph

I have a rrd file with average ping times to a server (GAUGE) every minute and when the server is offline (which is very frequent for reasons that doesn't matter now) it stores a NaN/unknown.
I'd like to create a graph with the percentage the server is offline each hour which I think can be achieved by counting every NaN within 60 samples and then dividing by 60.
For now I get to the point where I define a variable that is 1 when the server is offline and 0 otherwise, but I already read the docs and don't know how to aggregate this:
DEF:avg=server.rrd:rtt:AVERAGE CDEF:offline=avg,UN,1,0,IF
Is it possible to do this when creating a graph? Or I will have to store that info in another rrd?
I don't think you can do exactly what you want, but you have a couple of options.
You can define a sliding window average, that shows the percentage of the previous hour that was unknown, and graph that, using TRENDNAN.
DEF:avg=server.rrd:rtt:AVERAGE:step=60
CDEF:offline=avg,UN,100,0,IF
CDEF:pcavail=offline,3600,TREND
LINE:pcavail#ff0000:Availability
This defines avg as the 1-min time series of ping data. Note we use step=60 to ensure we get the best resolution of data even in a smaller graph. Then we define offline as 100 when the server is there, 0 when not. Then, pcavail is a 1-hour sliding window average of this, which will in effect be the percentage of time during the previous hour during which the server was available.
However, there's a problem in that RRDTool will silently summarise the source data before you get your hands on it, if there are many data points to a pixel in the graph (this won't happen if doing a fetch of course). To get around that, you'd need to have the offline CDEF done at store time -- IE, have a COMPUTE type DS that is 100 or 0 depending on if the avg DS is known. Then, any averaging will preserve data (normal averaging omits the unknowns, or the xff setting makes the whole cdp unknown).
rrdtool create ...
DS:rtt:GAUGE:120:0:9999
DS:offline:COMPUTE:rtt,UN,100,0,IF
rrdtool graph ...
DEF:offline=server.rrd:offline:AVERAGE:step=3600
LINE:offline#ff0000:Availability
If you are able to modify your RRD, and do not need historical data, then use of a COMPUTE in this way will allow you to display your data in a 1-hour stepped graph as you wanted.

Getting Multiple Last Price Quotes from Interactive Brokers's API

I have a question regarding the Python API of Interactive Brokers.
Can multiple asset and stock contracts be passed into reqMktData() function and obtain the last prices? (I can set the snapshots = TRUE in reqMktData to get the last price. You can assume that I have subscribed to the appropriate data services.)
To put things in perspective, this is what I am trying to do:
1) Call reqMktData, get last prices for multiple assets.
2) Feed the data into my prediction engine, and do something
3) Go to step 1.
When I contacted Interactive Brokers, they said:
"Only one contract can be passed to reqMktData() at one time, so there is no bulk request feature in requesting real time data."
Obviously one way to get around this is to do a loop but this is too slow. Another way to do this is through multithreading but this is a lot of work plus I can't afford the extra expense of a new computer. I am not interested in either one.
Any suggestions?
You can only specify 1 contract in each reqMktData call. There is no choice but to use a loop of some type. The speed shouldn't be an issue as you can make up to 50 requests per second, maybe even more for snapshots.
The speed issue could be that you want too much data (> 50/s) or you're using an old version of the IB python api, check in connection.py for lock.acquire, I've deleted all of them. Also, if there has been no trade for >10 seconds, IB will wait for a trade before sending a snapshot. Test with active symbols.
However, what you should do is request live streaming data by setting snapshot to false and just keep track of the last price in the stream. You can stream up to 100 tickers with the default minimums. You keep them separate by using unique ticker ids.

Excel Get & Transform (Power Query) M Code Style and Performance

I've created a few reasonably complex M queries and have started running into some severe performance issues. I'm wondering if has to do with how I sometime organize my code.
The issues I've been having are:
1) Power Query constantly uses all of several CPU cores, calculating something, even if I'm not waiting for a result.
2) In task manager I can sometimes see that the Power Query threads ("Microsoft.mashup.Container.NetFX40.exe") are nearly idle, while Excel.exe is using 100% of one core for tens of minutes - even though at most I'm looking values in a few parameter tables that don't contain more than a couple dozen cells.
3) Some steps take extremely long to calculate, even though the operations involved are trivial. For example, I have a list of 10 text values taken from an Excel table. This list appears as one of my query steps when I 'preview' it. Then I want to remove a single value, so the next step = List.RemoveItems(myList, {"val"}). It didn't compute after 30 minutes, even though I could see the list was correctly loaded in a previous step.
4) UI sometimes becomes unresponsive for several minutes after changing code. Can still right-click on Queries at left hand side to enter advanced editor, and click the red X at top right and choose to keep changes, but all the rest is unresponsive. Not greyed out, just unresponsive.
Anyway, I just wanted to ask if anyone's had similar trouble, and if anyone knows what triggers particularly bad performance in PQ.
I'll often use something like the following pattern to keep the total number of queries down while still being able to easily inspect individual steps:
let
ThisWB = Excel.CurrentWorkbook(),
CfgTbl = ThisWB{[Name="myCfgTbl"]}[Content],
x = aFn(CfgTbl),
y = bFn(CfgTbl),
output = [ThisWB=ThisWB, CfgTbl=CfgTbl, x=x, y=y]
in
output
Is this likely to lead to any issues? Just thought it might because at one point after waiting a very long time for a simple function result, I created a new query = Excel.CurrentWorkbook(){[Name="myCfgTbl"]}[Content], referenced it from the other query, and my result calculated immediately. No idea why.
It calculates previews. Turn off auto preview generation.
I messed with something like this in cases with formula-heavy tables.
The rest probably requires code examples, especially your last case.
BTW, is your version of power query (or Excel 2016) up-to-date?

Excel Data Table Repeating Results

I am working with a model that utilizes the Data Table functionality in Excel to process many scenarios and collect the output. However, I will occasionally find that the data table will repeat results for sequential scenarios, which would be impossible.
The data table is generated with the following VBA code:
Sheets("ProjSheet").Range(Range("DTAnchor").Offset(-1, -1),Range("DTAnchor").Offset(NumScns - 1, 1))
.Table ColumnInput:=Range("CurrScen")
The calculation mode is semiautomatic throughout the code, and the data table is updated using Calculate.
The erroneous output might look something like this:
Scn | Result
------------
1 | 341.5
2 | 0
3 | 861.4
4 | 861.4 <- Wrong!
5 | 861.4 <- Wrong!
6 | 10.5
7 | 64.9
...
The workbook is not small at ~22MB, and the data table typically takes a minute or so to churn through 1,000 scenarios.
I can verify the result of any particular scenario by running it individually, so we know that the results in this case should not be identical.
These models are typically saved with different model settings and run simultaneously in different instances of Excel. They are also generally run on remote computers where multiple users could log in, kick off some models, and then disconnect and leave them running in the background. These computers have 8 cores so they could run up to 8 models simultaneously. Sometimes, the models are opened and kicked off in different Excel instances using a macro that writes a VB Script which is then kicked off by a .bat file.
I've had difficulty replicating the error reliably. One theory I have is that when more Excel models are running than there are cores in the remote computer, Excel freezes up during the data table process and cannot always complete its calculation for some scenarios, and therefore it spits out the same results it currently has stored. I don't know enough about Excel's inner workings to decide if this makes sense, though.
Has anyone else come across data tables that repeat results before? If you know more about the mechanics of data tables, do you know what might cause this error and how to prevent it in the future?
Let me know if you need any more information about the error or things that I've tried to identify the cause. Thanks!

Strange data access time in Azure Table Storage while using .Take()

this is our situation:
We store user messages in table Storage. The Partition key is the UserId and the RowKey is used as a message id.
When a users opens his message panel we want to just .Take(x) number of messages, we don't care about the sortOrder. But what we have noticed is that the time it takes to get the messages varies very much by the number of messages we take.
We did some small tests:
We did 50 * .Take(X) and compared the differences:
So we did .Take(1) 50 times and .Take(100) 50 times etc.
To make an extra check we did the same test 5 times.
Here are the results:
As you can see there are some HUGE differences. The difference between 1 and 2 is very strange. The same for 199-200.
Does anybody have any clue how this is happening? The Table Storage is on a live server btw, not development storage.
Many thanks.
X: # Takes
Y: Test Number
Update
The problem only seems to come when I'm using a wireless network. But I'm using the cable the times are normal.
Possibly the data is collected in batches of a certain number x. When you request x+1 rows, it would have to take two batches and then drop a certain number.
Try running your test with increments of 1 as the Take() parameter, to confirm or dismiss this assumption.

Resources