How to structure Blazor Server Side App to have up to date information with folders of CSVs as the Data Source? - memory-leaks

To give the brief introduction, I'm new to Blazor. I could very well be missing an obvious feature.
Project assigned is requesting a Blazor, Server Side Application to display charts of information retrieved from 10 accessible system folders, each with hourly CSVs, the last of which adds a row of data every 1-3 seconds. Once the hour has passed, a new CSV is created, and we continue, ad infinitum for purposes of this argument. Each CSV has 100 columns; we're only focusing on 3 for now. If the argument of SQL comes up, they do not want to upload anywhere from 1 to 1.5 million rows to SQL each day. CSVs currently have anywhere from 1500 to 7200 rows of data.
Currently, page loads, data chunk is retrieved, in this case the last 4 hours, and then every 5 seconds, data from the last two files is retrieved (to avoid missing any hour turnover rows), and only new data is added to the data source in the site. The Date Timestamp of each row is treated as unique for each cluster. The charts only show the last 15 to 30 minute of a rolling buffer of activity for demo purposes, though they may very well request longer
Read access to CSVs is not a concern. All methods to access CSVs are wrapped in using statements. My concern is, in this program's infancy, each client opening the page reinitiates the data retrieval and background looper, and that means we effectively have no scalability, and the process memory is just an upward slope.
What are my actions to reduce server load, if there are any? Should I push harder for SQL based?

OK, some thoughts and ideas. I'm reading a bit between the lines here, so forgive me if I'm way off the mark. (Sounds a bit like a NHS COVID dashboard app, they love their spreadsheets!)
What the customer wants is a front end that "reacts" i.e. redraws from new data, when new data arrives. So that's an event driven update (either time or when a new entry is made). This event can be detected in the backend by detecting file updates. System.IO.FileSystemWatcher comes to mind. That drives a new read of the updated file, refresh of the Service data set and trigger of an event in the service. Use a Singleton backend data service - to get the data. Any users with an open SPA session register with the event (probably through a scoped or transient controller service), which precipitates a UI update.
A SQL solution I'm sure will work better, but you don't often get to choose!

Related

How to make Copy Data work faster and have better performance (Azure Synapse)

A bit of context: my Azure Synapse pipeline makes a GET Request to a REST API in order to import data to the Data Lake (ADLSGen2) in parquet file format.
I am looking forward to requesting data to the API on an hourly basis in order to get information of the previous hour. I have also considered to set the trigger to run every half an hour to get the data of the previous 30 minutes.
The thing is: this last GET request and Copy Data debug took a bit less than 20 minutes. The DUI used was set in "Auto", and it equals 4 even if I set it manually to 8 on the activity settings.
I was wondering if there are any useful suggestions to make a Copy Data activity work faster, whatever the cost may be (I would really like info about it, if you consider it pertinent).
Thanks in advance!
Mateo
You need to check which part is running slow.
You can click on the glasses icon to see the copy data details.
If the latency is on "Time to first byte" or "Reading from source" the issue is on the REST API side.
If the latency is on "Writing to sink" the problem may be from writing to data lake.
If the issue is on the API side, try to contact the provider. Another option, if applicable, is to use a few copy data activities, each will copy a part of the data.
If the issue is on data lake, you should check the setting on the sink side.

In React, speed up large initial fetch of data when web app loads

Similar to the project I am working on, this website has a search bar at the top of its home page:
On the linked website, the search bar works seemingly immediately when visiting the website. According to their own website, there have been roughly ~20K MLB players in MLB history, and this is a good estimate for the number of dropdown items in this select widget.
On my project, it currently takes 10-15 seconds to make this fetch (from MongoDB, using Node + Express) for a table with ~15mb of data that contains the data for the select's dropdown items. This 15mb of data is as small as I could make this table, as it includes only two keys (1 for the id, and 1 for the name for each dropdown). This table is large because there are more than 150K options to choose from in my project's select widget. I currently have to disable the widget for the first 15 seconds while the data loads, which results in a bad user experience.
Is there any way to make the data required for the select widget immediately available to the select when users visit, that way the widget does not have to be disabled? In particular:
Can I use localstorage to store this table in the users browser? is 15MB too big for localstorage? This table changes / increases in size daily (not too persistent), and a table in localstorage would then be outdated the next day, no?
Can I avoid all-together having to do this fetch? Potentially there is a way to load the correct data into react only when a user searches for that user?
Some other approach?
Saving / fetching quicker for 15mb of data for this select will improve our react app's user experience by quite a bit.
The data on the site you link to is basically 20k in size. It does not contain all the players but fetches the data as needed when you click on a link in the drop-down. So if you have 20Mb of searchable data, then you need to find a way to only load it as required. How to do that sensibly depends on the nature of the data. Many search bars with large result sets behind them will use a typeahead search where the user's input is posted back as they type (with a decent debounce interval) and the search results matching the user's input sent back in real time (usually with a limit of, say, the first 20 or 50 results).
So basically the answer is to find a way to only serve up the data that the user needs rather than basically downloading the entire database to the browser (option 2 of your list). You must obviously provide a search API to allow that to happen.

Managing constantly changing data in Database

I need some advice on how to architect my data in monogoDB. I have this app, where users can view, add, edit and remove credit and debit transactions. Below is how the data looks.
The balance column here is dynamic. For example if someone adds a transaction dates 10-09-2017, all the amount in the balance field thereafter needs to change in that moment to reflect the new transaction. Right now, I am not saving this balance field at all in the database and is calculating it every time when the user loads the page, reloads it, and also when editing, deleting, adding a transaction. Now it is fast, but I assume, in the future, when the user has a lot of transactions, they will become slow as these calculations needs to be done before the user is displayed the data table. Is there a more efficient way to do this?
Also I am doing the calculations on the client side, so the load is on the client's device and not on server. I think if it is on server side, and a lot of users start using it, the API requests will become much slower and not unusable at all after a while. Is this the right way?
PS : Also it was hard making sure the reader understand my questions but I have tried my best. Please let me know if I should explain this in more details or if I should add any more details.
It is not a question about mongodb, it is a question about user interface.
Will you really display the whole history of transactions at once?
You should either utilize pagination (simplest) or reload on scroll to load your data.
Before you get problems because of the balance cell calculation, it is more likely that you experience problems because of:
Slow loading from network (almost certainly)
Slow page interaction because of DOM size (maybe)
Show the first 100 to 500 transactions and provider the user with some way to load earlier entries.
Update - Regarding server-side balance calculation:
You could calculate balance on server-side and store it into a second collection which serves as a cache. If a transaction insertion happens in the past, you recalculate the cache. To speed this up, you can utilize snapshots:
Within a third collection, you could store the current balance in certain intervals, e.g. with the following data structure:
{ Balance: 150000, Date: 2017-02-03, LastTransactionId: 546 }
When a transaction is inserted in the past, take the most recent snapshot before that past moment and recalculate the cache based on that. This way, you. can keep the number of recalculated transactions pretty small.

Lotus notes agent runs slower in server compared to development PC

I have an attendance recording system that has 2 databases, one for current, another for archiving. The server processes attendance records, and puts records marked completed into the archive. There is no processing done in the archive database.
Here's the issue. One of the requirement was to build a blank record for each staff every day, for which attendance records are put into. The agent that does this calls a few procedures and does some checking within the database. As of current, there are roughly 1,800 blank records created daily. On the development PC, processing each records takes roughly 2 to 3 seconds, which translates to an average of an hour and a half. However, when we deployed it on the server, processing each records takes roughly 7 seconds, roughly translates into 3 and a half hours to complete. We have had instances when the agent takes 4.5 to 5 hours to complete.
Note that in both instances, agents are scheduled. There are no other lotus apps in the server, and the server is free and idle most of the time (no other application except Windows Server and Lotus Notes). Is there anything that could cause the additional processing time compared on the development PC and the server?
Your process is generating 1800 new documents every day, and you have said that you are also archiving documents regularly, so I presume that means that you are deleting them after you archive them. Performance problems can build up over time in applications like this. You probably have a large number of deletion stubs in the database, and the NSF file is probably highly fragmented (internally and/or externally).
You should use the free NotesPeek utility to examine the database and see how many deletion stubs it contains. Then you should check the purge interval setting and consider lowering it to the smallest value that you are comfortable with. (I.e., big enough so you know that all servers and users will replicate within that time, but small enough to avoid allowing a large buildup of deletion stubs.) If you change the purge interval, you can wait 24 hours for the stubs to be purged, or you can manually run updall against the database on the server console to force it.
Then you should run compact -c on the NSF file, and also run a defrag on the server disk volume where the NSF lives.
If these steps do improve your performance, then you may want to take steps in your code to prevent recurrence of the problem by using coding techniques that minimize deletion stubs, database growth and fragmentation.
I.e., go into your code for archiving, and change it so it doesn't delete them after archiving. Instead, have your code mark them with a field such as FreeDocList := "1". Then add a hidden view called (FreeDocList) with a selction formula of FreeDocList = "1". Also go into ever other view in the database and add & (!(FreeDocList = "1")) to the selection formulas. Then change the code adds the new blank documents, so that instead of creating new docs it just goes to the FreeDocList view, finds the first document, sets FreeDocList = "0", and clears all the previous field values. Of course, if there aren't enough documents the FreeDocList view, your code would revert to the old behavior and create a new document.
With the above changes, you will be re-using your existing documents whenever possible instead of deleting and creating new ones. I've run benchmarks on code like this and found that it can help; but I can't guarantee it in all cases. Much would depend on what else is going on in the application.

Which is the best method to do pagination so that load on server is minimum

I have done a bit of research on pagination and from what i have read there are 2 contradictory solutions of doing it
Load a small set of data from the database each time a user clicks next
Problem - Suppose there are a million rows that meet any WHERE conditions. That means a million rows are retrieved, stored, filesorted, then most of them are discarded and only 20 retrieved. If the user clicks the "next" button the same process happens again, only a different 20 are retrieved.(ref - http://www.mysqlperformanceblog.com/2008/09/24/four-ways-to-optimize-paginated-displays/)
Load all the data form the database and cache it...This has few problems too mentioned here - http://www.javalobby.org/java/forums/t63849.html
So i know i will have to use a hybrid of both..however the question boils down to - Which operation is more expensive -
making repeated queries in database for small chunks of data
or
transferring a large result set over the network
My company has exactly this situation, and we've chosen a bit of a hybrid. Our data is tabular, so we send it via AJAX to datatables This allows for good UI formatting, sorting, filtering, and show/hide of columns. Datatables has a great solution that will "queue ahead" called "pipelining" that will grab a quantity of data ahead of the user's action (in our case, up to 5 times the records they request) then page through without requests until it runs out of data. It's EXTREMELY easy to implement with Datatables, but I suspect a similar solution would not be difficult if you had to write it by hand using jQuery's AJAX functionality.
I tried doing a full load and cache on a 1.5 million record database and it was a trainwreck. The client almost dumped me because they got mad it was so slow. After a solid overnight of AJAX goodness, the client was happy once again. But best never to get to that point.
Good Luck.

Resources