How to do scalability testing for a JTAPI application - voip

I have an existing JTAPI application that I'm going to be enhancing and refactoring. One of the first concerns is if the new enhancements will scale reasonably well to larger number of IP phones. I've done scalability testing of web services by simulating web service clients with threads, and that approach works well for determining if web services will scale.
Now I'm trying to come up with a way to simulate increasing numbers of IP phones on a network because I obviously don't want to have to build a real network with hundreds or thousands of IP phones.
I'll start with simple JTAPI operations like querying each device on the network to determine which ones are busy, but more complex operations will also have to be tested.
I could build a network of 10 IP phones and "scale out" that network by repeating each JTAPI operation N times for each phone. I would test with N = 1, 3, 7, 10, 30, 70, 100, 300, ... One potential problem with this approach is that the results could end up heavily skewed by the latency of the IP phones in responding to multiple JTAPI operations instead of showing the scalability of a larger network of IP phones responding to a single JTAPI operation.
Another approach is to setup a network of IP soft phones and scale that out, but I don't think IP soft phones would behave like a real IP phone in terms of latency and responding to JTAPI operations.
How have others tested the scalability of their JTAPI applications?
Thanks.

Related

Prevent bottleneck on bandwidth for mobile internet

I am sure that this question has already been answered, but unfortunately I do not know the keywords. Therefore my search remained unsuccessful until now.
Scenario: I want to transmit a lifestream via Mobile Internet using RaspberryPi, and depending on the bandwidth, downscale the streams and upscale them again when available.
My two questions for the network specialists among you:
i know i can actively check the bandwidth, but how would you do this without interfering with the existing processes transmitting? Should I commit a bandwidth to the processes and then slowly determine the remaining bandwidth using a test tool? Or are there already practical solutions?
Can I determine in the mobile Internet, or in the network interface, when a bottelneck is reached?
Passive methods would be my preference. where I wouldn't have to load the bandwidth. e.g. I could know how much bandwidth the stream uses, and how much arrives. But how do I make sure there is enough capacity before I go up with the bitrate?
Thanks for your wisdom ;)

Why are Google Page Speed insight scores so different from GTMetrix, WebPageTest.org, Pingdom, etc?

Is this because it uses a slower connection to the website? I have read that it's a fast 3G connection? Is that used as well as field data?
I have websites that load in under 2 seconds but they fail the PSI tests.
Ryan - Google PageSpeed is a very robust tool. It is also stricter than other tools such as GTMetrix or Pingdom.
There are several factors that impact speed. Expect a variance of 5 to 7 points depending on the location of google servers relative to your server. If you are getting a larger variation - that could be your CDN instead of your server.
Double-check results in running Google Lighthouse. You can find this under Chrome dev tools.
Late answer but hopefully this will help people understand the difference.
Short Answer
Page Speed Insights (PSI) simulates a mid tier mobile phone on slow 4G connection. You will always score lower on PSI mobile tests as the other sites do not use throttling.
The desktop tab of PSI should be similar but yet again uses different metrics for score that the others do not appear to have updated to (at time of writing).
Longer Answer
Is this because it uses a slower connection to the website? I have read that it's a fast 3G connection?
Page Speed Insights (PSI),uses lighthouse to power it.
As part of this it uses simulated network throttling to simulate network latency and slower connection speeds (comparable to fast 3G / slow 4G).
It also simulates a slower CPU.
It does both of these to simulate a mid-tier mobile phone on a 4G connection. Mobiles have lower processing power and may be used "on the go" without WiFi.
GTMetrix, WebPageTest.org, Pingdom etc. all check the desktop version of the site.
This is the main reason you will see vastly different scores as they do not apply any form of throttling to the CPU or network speeds.
You should find that you get similar scores if you compare the desktop tab of PSI report to them as that is unthrottled.
Another difference (although I am not 100% sure) is that I think those sites are still using Lighthouse version 5 scoring at their core. Lighthouse changed to version 6 scoring earlier this year, to reflect the items that really matter to the end user. This is why I said "similar" scores in the previous paragraph.
Is that used as well as field data?
No field data is real world data, also known as RUM (Real User Metrics). It is collected from real visitors to your site.
It has no affect on your score on PSI as that is calculated each time from "lab data".
Field data is there for diagnostics (as RUM are far more reliable and help identify errors automated testing may miss such as an overloaded server, problems at certain screen sizes etc.)
I have websites that load in under 2 seconds but they fail the PSI tests.
Are you sure? It may show 2 seconds on automated tests (for desktop) but in the real world how can you know that?
One way to check is to actually monitor this information on your site. This answer I gave has all the relevant metrics you may want to gather and monitor for site performance.
If you combine that information with screen size and device information you have everything you need to identify issues in near real time.

Load and Performance testing on android and ios app

I need to perform a load test with 200+ concurrent devices on the android and ios apps. Is there any tool that can do that?
It depends on the network protocol(s) which your application is using for communicating with the backend.
You can identify which protocol(s) are in scope by installing the application into Android Emulator or iOS Simulator and use a sniffer tool like Wireshark to capture the network traffic.
Once you figure out which protocol(s) are being used you can choose a proper load testing tool which supports this(ese) protocol(s), an example comparison of free and open source load testing tools can be found i.e. in Open Source Load Testing Tools: Which One Should You Use? article
After you decide which tool you will be using you will need to replicate mobile device traffic using the tool of your choice to 100% match the network footprint of the mobile device (you might need to perform parameterization of credentials and correlation of dynamic parameters) and as soon as it will be done you should be able to replay the requests with increased number of virtual users.
Try AWS Device Farm they have a lot of configurations, devices and global options for testing.
Typically
you capture the device network requests using a proxy (we use charles proxy) as you are functional testing the app
Take out static resources, css, images, scripts (which are served from a cdn) and third party resources
then parameterise the dynamic requests to create a load test script
While you are perf testing, monitor navigate through the app to see the end-user impact when the back-end is under heavy load.
Yes, there are many solutions. The governing factor is going to be the communications model between your handheld device and the application/system under test.
In most cases (but not all) the protocol for communication is HTTP. In this case you may leverage a proxy for recording the conversation between client and server to reproduce the conversation of a single session. You may then modify this session to address dynamic server data for session, date, time, account information and user inputs. Once that is done then you may replay 200++ session representing the load of 200++ users on your system.
I would recommend a network simulator be involved in your test. Mobile networks are particularly dirty, leading to higher error rates and longer latch times (protocol, layer 3) on sites. Having the impairment from the network simulator will better allow you to understand the response times for your client. Look for impairment solutions which can ingest OOKLA data for various locations and times of day matching your high load windows.

How fast is the BGP protocol?

Watched a CBT Nugget video and it was said that BGP protocol was slow. So if you brought up a domain it would take days for the domain to be fully accessible. However while at work a change was made on the router concerning BGP routes and it took minutes for the change to seen. So is the BGP protocol slow or is it fast. Thanks
It’s common knowledge that BGP has no ability to make performance-based routing decisions and often routes traffic through paths that are congested or affected by routing anomalies.
Since BGP is focused on reachability and its own stability, in case some problems occur the traffic may only be rerouted due to hard failures. Hard failures are total losses of reachability as opposed to degradation. This means that even though service may be so degraded that it is unusable for an end user, BGP will continue to assume that a degraded route is valid until and unless the route is invalidated by a total lack of reachability.
One way to detect this problem is to monitor reachability of key remote services.
Another is looking at total traffic, which will be much lower than usual in the presence of a black hole. Once detected, recovering from routing black hole affecting one ISP/peer is very SIMPLE: shut down the BGP session towards that provider until they’ve fixed the problem.
Alternatively you can automate the process of selecting the best performing transit provider or peer by deploying a route optimizer which evaluates all ISPs, IXes, and partial peers in terms of packet loss and latency and automatically reroutes traffic through the most reliable path.
So, BGP can be fast if correctly configured and optimized.
Considering the recent changes done in few minutes as you mention, It actually while working with the ISPs, it depends on how large their network is and based on their topologies they have to make a decision based on the AS. The larger the AS the more time it will take. By larger, I meant the hops and advertisements and network Segments. While working on Simulation environment it takes around 30seconds. But the production network is different.
In general it is about the definition of "fast".
Compared to IGPs, BGP is really slow.
Inside a production network where you may need subsecond reconvergence of your network, that is very fast compared to some minutes (iBGP is not discussed here, because this can be highly optimized (IGPs/BFD/Timers).
eBGP can't because you lose control at the border of your AS). Your delay gets bigger with every hop (maybe the next provider) your route has to be propagated. You might have seen your change after minutes, but that does not mean, that this change was propagated around the globe.
You can check the changes here with some kind of history:
https://stat.ripe.net/special/bgplay

How many open udp or tcp/ip connections can a linux machine have?

There are limits imposed by available memory, bandwidth, CPU, and of course, the network connectivity. But those can often be scaled vertically. Are there any other limiting factors on linux? Can they be overcome without kernel modifications? I suspect that, if nothing else, the limiting factor would become the gigabit ethernet. But for efficient protocols it could take 50K concurrent connections to swamp that. Would something else break before I could get that high?
I'm thinking that I want a software udp and/or tcp/ip load balancer. Unfortunately nothing like that in the open-source community seems to exist, except for the http protocol. But it is not beyond my abilities to write one using epoll. I expect it would go through a lot of tweaking to get it to scale, but that's work that can be done incrementally, and I would be a better programmer for it.
The one parameter you will probably have some difficulty with is jitter. Has you scale the number of connections per box, you will undoubtedly put strain on all the resources of the said system. As a result, the jitter characteristics of the forwarding function will likely suffer.
Depending on your target requirements, that might or not be an issue: if you plan to support mainly elastic traffic (traffic which does not suffer much from jitter and latency) then it's ok. If the proportion of inelastic traffic is high (e.g. interactive voice/video), then this might be more of an issue.
Of course you can always over engineer in this case ;-)
If you intend to have a server which holds one socket open per client, then it needs to be designed carefully so that it can efficiently check for incoming data from 10k+ clients. This is known as the 10k problem.
Modern Linux kernels can handle a lot more than 10k connections, generally at least 100k. You may need some tuning, particularly the many TCP timeouts (if using TCP) to avoid closing / stale sockets using up lots of resource if a lot of clients connect and disconnect frequently.
If you are using netfilter's conntrack module, that may also need tuning to track that many connections (this is independent of tcp/udp sockets).
There are lots of technologies for load balancing, the most well-known is LVS (Linux Virtual Server) which can act as the front end to a cluster of a real servers. I don't know how many connections it can handle, but I think we use it with at least 50k in production.
To your question, you are only restrained by hardware limitations. This was the design philosophy for linux systems. You are describe exactly what would be your limiting factors.
Try HAProxy software load balancer:
http://haproxy.1wt.eu/

Resources