Getting Multiple Last Price Quotes from Interactive Brokers's API - python-3.x

I have a question regarding the Python API of Interactive Brokers.
Can multiple asset and stock contracts be passed into reqMktData() function and obtain the last prices? (I can set the snapshots = TRUE in reqMktData to get the last price. You can assume that I have subscribed to the appropriate data services.)
To put things in perspective, this is what I am trying to do:
1) Call reqMktData, get last prices for multiple assets.
2) Feed the data into my prediction engine, and do something
3) Go to step 1.
When I contacted Interactive Brokers, they said:
"Only one contract can be passed to reqMktData() at one time, so there is no bulk request feature in requesting real time data."
Obviously one way to get around this is to do a loop but this is too slow. Another way to do this is through multithreading but this is a lot of work plus I can't afford the extra expense of a new computer. I am not interested in either one.
Any suggestions?

You can only specify 1 contract in each reqMktData call. There is no choice but to use a loop of some type. The speed shouldn't be an issue as you can make up to 50 requests per second, maybe even more for snapshots.
The speed issue could be that you want too much data (> 50/s) or you're using an old version of the IB python api, check in connection.py for lock.acquire, I've deleted all of them. Also, if there has been no trade for >10 seconds, IB will wait for a trade before sending a snapshot. Test with active symbols.
However, what you should do is request live streaming data by setting snapshot to false and just keep track of the last price in the stream. You can stream up to 100 tickers with the default minimums. You keep them separate by using unique ticker ids.

Related

Tracking a counter value in application insights

I'm trying to use application insights to keep track of a counter of number of active streams in my application. I have 2 goals to achieve:
Show the current (or at least recent) number of active streams in a dashboard
Activate a kind of warning if the number exceeds a certain limit.
These streams can be quite long lived, and sometimes brief. So the number can sometimes change say 100 times a second, and sometimes remain unchanged for many hours.
I have been trying to track this active streams count as an application insights metric.
I'm incrementing a counter in my application when a new stream opens, and decrementing when one closes. On each change I use the telemetry client something like this
var myMetric = myTelemetryClient.GetMetric("Metricname");
myMetric.TrackValue(myCount);
When I query my metric values with Kusto, I see that because of these clusters of activity within a 10 sec period, my metric values get aggregated. For the purposes of my alarm, I can live with that, as I can look at the max value of the aggregate. But I can't present a dashboard of the number of active streams, as I have no way of knowing the number of active streams between my measurement points. I know the min value, max and average, but I don't know the last value of the aggregate period, and since it can be somewhere between 0 and 1000, its no help.
So the solution I have doesn't serve my needs, I thought of a couple of changes:
Adding a scheduled pump to my counter component, which will send the current counter value, once every say 5 minutes. But I don't like that I then have to add a thread for each of these counters.
Adding a timer to send the current value once, 5 minutes after the last change. Countdown gets reset each time the counter changes. This has the same problem as above, and does an excessive amount of work to reset the counter when it could be changing thousands of times a second.
In the end, I don't think my needs are all that exotic, so I wonder if I'm using app insights incorrectly.
Is there some way I can change the metric's behavior to suit my purposes? I appreciate that it's pre-aggregating before sending data in order to reduce ingest costs, but it's preventing me from solving a simple problem.
Is a metric even the right way to do this? Are there alternative approaches within app insights?
You can use TrackMetric instead of the GetMetric ceremony to track individual values withouth aggregation. From the docs:
Microsoft.ApplicationInsights.TelemetryClient.TrackMetric is not the preferred method for sending metrics. Metrics should always be pre-aggregated across a time period before being sent. Use one of the GetMetric(..) overloads to get a metric object for accessing SDK pre-aggregation capabilities. If you are implementing your own pre-aggregation logic, you can use the TrackMetric() method to send the resulting aggregates.
But you can also use events as described next:
If your application requires sending a separate telemetry item at every occasion without aggregation across time, you likely have a use case for event telemetry; see TelemetryClient.TrackEvent (Microsoft.ApplicationInsights.DataContracts.EventTelemetry).

Running a repetitive task in Node.js for each row in a postgres table on a different interval for each row

What would be a good approach to running a repetitive task for each row in a large postgres db table on a different per row interval in Node.js.
To give you some more context, here's a quick description of the application:
It's a chat based customer support app.
It consists of teams, which can be either a client team or a support team. Teams have users, which can be either client users or support users.
Client users send messages to a support team and wait for one of that team's users to answer their question.
When there's an unanswered client message waiting for a response, every agent for the receiving support team will receive a notification every n seconds (n being set on a per-team basis by the team admin).
So this task needs to infinitely loop through the rows in the teams table and send notifications if:
The team has messages waiting to be answered.
N seconds have passed since the last notification was sent (N being the number of seconds set by the team admin).
There might be a better approach to this condition altogether.
So my questions are:
What is an efficient way to infinitely loop through a postgres table with no upper limit on the number rows?
Should I load 1 row at a time? Several at a time?
What would be a good way to do this in Node?
I'm using Knex. Does Knex provide a mechanism for lazy loading a table and iterating through the rows?
A) Running a repetitive task via node can be done via a the js built-in function 'setInterval'.
// run the intervalFnc() every 5 seconds
const timerId = setTimeout(intervalFnc, 5000);
function intervalFnc() { console.log("Hello"); }
// to quit running it:
clearTimeout(timerId);
Then your interval function can do the actual work. An alternative would be to use cron (linux), or some OS process scheduler to trigger the function. I would use this method if you want to do it every minute, and a cron job if you want to do it every hour (in between these times becomes more debatable).
B) An efficient way...
B-1) Retrieving a block of records from a DB will be more efficient than one at a time. Knex has .offset and .limit clauses to choose a group of records to retrieve. A sample from the knex doc:
knex.select('*').from('users').limit(10).offset(30)
B-2) Database indexed access is important for performance if your tables are very large. I would recommend including an status flag field in your table to note which records are 'in-process', and also include a "next-review-timestamp" field with both fields being both indexed. Retrieve the records that have status_flag='in-process' AND next_review_timestamp <= now(). Sample:
knex('users').where('status_flag', 'in-process').whereRaw('next_review_timestamp <= now()')
Hope this helps!

Spark streaming - waiting for data for window aggregations?

I have data in the format { host | metric | value | time-stamp }. We have hosts all around the world reporting metrics.
I'm a little confused about using window operations (say, 1 hour) to process data like this.
Can I tell my window when to start, or does it just start when the application starts? I want to ensure I'm aggregating all data from hour 11 of the day, for example. If my window starts at 10:50, I'll just get 10:50-11:50 and miss 10 minutes.
Even if the window is perfect, data may arrive late.
How do people handle this kind of issue? Do they make windows far bigger than needed and just grab the data they care about on every batch cycle (kind of sliding)?
In the past, I worked on a large-scale IoT platform and solved that problem by considering that the windows were only partial calculations. I modeled the backend (Cassandra) to receive more than 1 record for each window. The actual value of any given window would be the addition of all -potentially partial- records found for that window.
So, a perfect window would be 1 record, a split window would be 2 records, late-arrivals are naturally supported but only accepted up to a certain 'age' threshold. Reconciliation was done at read time. As this platform was orders of magnitude heavier in terms of writes vs reads, it made for a good compromise.
After speaking with people in depth on MapR forums, the consensus seems to be that hourly and daily aggregations should not be done in a stream, but rather in a separate batch job once the data is ready.
When doing streaming you should stick to small batches with windows that are relatively small multiples of the streaming interval. Sliding windows can be useful for, say, trends over the last 50 batches. Using them for tasks as large as an hour or a day doesn't seem sensible though.
Also, I don't believe you can tell your batches when to start/stop, etc.

How to deal with a large amount of logs and redis?

Say I have about 150 requests coming in every second to an api (node.js) which are then logged in Redis. At that rate, the moderately priced RedisToGo instance will fill up every hour or so.
The logs are only necessary to generate daily\monthly\annual statistics: which was the top requested keyword, which was the top requested url, total number of requests daily, etc. No super heavy calculations, but a somewhat time-consuming run through arrays to see which is the most frequent element in each.
If I analyze and then dump this data (with a setInterval function in node maybe?), say, every 30 minutes, it doesn't seem like such a big deal. But what if all of sudden I have to deal with, say, 2500 requests per second?
All of a sudden I'm dealing with 4.5 ~Gb of data per hour. About 2.25Gb every 30 minutes. Even with how fast redis\node are, it'd still take a minute to calculate the most frequent requests.
Questions:
What will happen to the redis instance while 2.25 gb worth of dada is being processed? (from a list, I imagine)
Is there a better way to deal with potentially large amounts of log data than moving it to redis and then flushing it out periodically?
IMO, you should not use Redis as a buffer to store your log lines and process them in batch afterwards. It does not really make sense to consume memory for this. You will better served by collecting your logs in a single server and write them on a filesystem.
Now what you can do with Redis is trying to calculate your statistics in real-time. This is where Redis really shines. Instead of keeping the raw data in Redis (to be processed in batch later), you can directly store and aggregate the statistics you need to calculate.
For instance, for each log line, you could pipeline the following commands to Redis:
zincrby day:top:keyword 1 my_keyword
zincrby day:top:url 1 my_url
incr day:nb_req
This will calculate the top keywords, top urls and number of requests for the current day. At the end of the day:
# Save data and reset counters (atomically)
multi
rename day:top:keyword tmp:top:keyword
rename day:top:url tmp:top:url
rename day:nb_req tmp:nb_req
exec
# Keep only the 100 top keyword and url of the day
zremrangebyrank tmp:top:keyword 0 -101
zremrangebyrank tmp:top:url 0 -101
# Aggregate monthly statistics for keyword
multi
rename month:top:keyword tmp
zunionstore month:top:keyword 2 tmp tmp:top:keyword
del tmp tmp:top:keyword
exec
# Aggregate monthly statistics for url
multi
rename month:top:url tmp
zunionstore month:top:url 2 tmp tmp:top:url
del tmp tmp:top:url
exec
# Aggregate number of requests of the month
get tmp:nb_req
incr month:nb_req <result of the previous command>
del tmp:nb_req
At the end of the month, the process is completely similar (using zunionstore or get/incr on monthly data to aggregate the yearly data).
The main benefit of this approach is the number of operations done for each log line is limited while the monthly and yearly aggregation can easily be calculated.
how about using flume or chukwa (or perhaps even scribe) to move log data to a different server (if available) - you could store log data using hadoop/hbase or any other disk based store.
https://cwiki.apache.org/FLUME/
http://incubator.apache.org/chukwa/
https://github.com/facebook/scribe/

Strange data access time in Azure Table Storage while using .Take()

this is our situation:
We store user messages in table Storage. The Partition key is the UserId and the RowKey is used as a message id.
When a users opens his message panel we want to just .Take(x) number of messages, we don't care about the sortOrder. But what we have noticed is that the time it takes to get the messages varies very much by the number of messages we take.
We did some small tests:
We did 50 * .Take(X) and compared the differences:
So we did .Take(1) 50 times and .Take(100) 50 times etc.
To make an extra check we did the same test 5 times.
Here are the results:
As you can see there are some HUGE differences. The difference between 1 and 2 is very strange. The same for 199-200.
Does anybody have any clue how this is happening? The Table Storage is on a live server btw, not development storage.
Many thanks.
X: # Takes
Y: Test Number
Update
The problem only seems to come when I'm using a wireless network. But I'm using the cable the times are normal.
Possibly the data is collected in batches of a certain number x. When you request x+1 rows, it would have to take two batches and then drop a certain number.
Try running your test with increments of 1 as the Take() parameter, to confirm or dismiss this assumption.

Resources