CSV Playback with Node Red - node.js

Disclaimer - I am not a software guy so please bear with me while I learn.
I am looking to use node red as a parser/translator by taking data from a CSV file and sending out the rows of data at 1Hz. Let's say 5-10 rows of data being read and published per second.
Eventually, I will publish that data to some Modbus registers but I'm not there yet.
I have scoured the web and tried several examples, however, as soon as I trigger the flow, Node.Red stops responding and I have to delete the source CSV,(so it can't run any more) and restart node.red in order to get it back up in running.
I have many of the Big Nodes from this guy installed and have tried a variety of different methods but I just can't seem to get it.
If I can get a single column of data from a CSV file being sent out one row at a time, I think that would keep me busy for a bit.

There is a file node that will read a file a line at a time, you can then feed this through the csv node to parse out the fields in the CSV into an object so you can work with it.
The delay node has a rate limiting function that can be used to limit the flow to processing 1 message per second to achieve the rate you want.
All the nodes I've mentioned should be in the core set that ships with Node-RED

Related

Google Sheet API adding cells without read/ write limit?

Problem:
I have huge .csv files with 16,000+ lines of data and I intend to upload them to a Google Sheet via AWS Lambda (Node.js). The only problem is that the Google Sheet API has a read/ write limit of 300 actions per minute (it would take 53 minutes until all 16,000 lines are added) and it would take me too long to split the dataset into pieces of 300, add them and wait. I have tried uploading it in one go (which im currently doing) but then the data is just in one cell, that's not my goal. Is there a way (I'd be grateful for any documentation or article) for me to upload the data in one go and split it later? Or maybe tell Google Sheets to put the data in single cells itself instead of me doing that?
What have I tried?
I have tried several things already. I have uploaded the entire file in one go but that resulted in just one cell to be written with roughly 16,000 lines of data.
I have also tried waiting a minute after my read/write limit of 300 is reached and then write again but this resulted in huge wait times of 50+ minutes (that would be too expensive since it runs on AWS Lambda).
I'm at my wits end and can't seem to find a solution to my problem. I'd be really grateful for a solution, piece of documentary or even an article. I tried finding any resources of my own but to no avail. Thank you in advance, if question arise, feel free to ask me and I'll provide you with more information.

Live Connection to Database for Excel PowerQuery?

I currently have approximately 10M rows, ~50 columns in a table that I wrap up and share as a pivot. However, this also means that it takes approximately 30mins-1hour to download the csv or much longer to do a powerquery ODBC connection directly to Redshift.
So far the best solution I've found is to use Python -- Redshift_connector to run update queries and perform an unload a zipped resultset to an S3 bucket then use BOTO3/gzip to download and unzip the file, then finally performing a refresh from the CSV. This resulted in a 600MB excel file compiled in ~15-20 mins.
However, this process still feel clunky and sharing a 600MB excel file among teams isn't the best either. I've searched for several days but I'm not closer to finding an alternative: What would you use if you had to share a drillable table/pivot among a team with a 10GB datastore?
As a last note: I thought about programming a couple of PHP scripts, but my office doesn't have the infrastructure to support that.
Any help would or ideas would be most appreciated!
Call a meeting with the team and let them know about the constraints, you will get some suggestions and you can give some suggestions
Suggestions from my side:
For the file part
reduce the data, for example if it is time dependent, increase the interval time, for example an hourly data can be reduced to daily data
if the data is related to some groups you can divide the file into different parts each file belonging to each group
or send them only the final reports and numbers they require, don't send them full data.
For a fully functional app:
you can buy a desktop PC (if budget is a constraint buy a used one or use any desktop laptop from old inventory) and create a PHP/Python web application that can do all the steps automatically
create a local database and link it with the application
create the charting, pivoting etc modules on that application, and remove the excel altogether from your process
you can even use some pre build applications for charting and pivoting part, Oracle APEX is one examples that can be used.

Limiting Kismet log files to a size or duration

Looking for a solid way to limit the size of Kismet's database files (*.kismet) through the conf files located in /etc/kismet/. The version of Kismet I'm currently using is 2021-08-R1.
The end state would be to limit the file size (10MB for example) or after X minutes of logging the database is written to and closed. Then, a new database is created, connected, and starts getting written to. This process would continue until Kismet is killed. This way, rather than having one large database, there will be multiple smaller ones.
In the kismet_logging.conf file there are some timeout options, but that's for expunging old entries in the logs. I want to preserve everything that's being captured, but break the logs into segments as the capture process is being performed.
I'd appreciate anyone's input on how to do this either through configuration settings (some that perhaps don't exist natively in the conf files by default?) or through plugins, or anything else. Thanks in advance!
Two interesting ways:
One could let the old entries be taken out, but reach in with SQL and extract what you wanted as a time-bound query.
A second way would be to automate the restarting of kismet... which is a little less elegant.. but seems to work.
https://magazine.odroid.com/article/home-assistant-tracking-people-with-wi-fi-using-kismet/
If you read that article carefully... there are lots of bits if interesting information here.

How to structure Blazor Server Side App to have up to date information with folders of CSVs as the Data Source?

To give the brief introduction, I'm new to Blazor. I could very well be missing an obvious feature.
Project assigned is requesting a Blazor, Server Side Application to display charts of information retrieved from 10 accessible system folders, each with hourly CSVs, the last of which adds a row of data every 1-3 seconds. Once the hour has passed, a new CSV is created, and we continue, ad infinitum for purposes of this argument. Each CSV has 100 columns; we're only focusing on 3 for now. If the argument of SQL comes up, they do not want to upload anywhere from 1 to 1.5 million rows to SQL each day. CSVs currently have anywhere from 1500 to 7200 rows of data.
Currently, page loads, data chunk is retrieved, in this case the last 4 hours, and then every 5 seconds, data from the last two files is retrieved (to avoid missing any hour turnover rows), and only new data is added to the data source in the site. The Date Timestamp of each row is treated as unique for each cluster. The charts only show the last 15 to 30 minute of a rolling buffer of activity for demo purposes, though they may very well request longer
Read access to CSVs is not a concern. All methods to access CSVs are wrapped in using statements. My concern is, in this program's infancy, each client opening the page reinitiates the data retrieval and background looper, and that means we effectively have no scalability, and the process memory is just an upward slope.
What are my actions to reduce server load, if there are any? Should I push harder for SQL based?
OK, some thoughts and ideas. I'm reading a bit between the lines here, so forgive me if I'm way off the mark. (Sounds a bit like a NHS COVID dashboard app, they love their spreadsheets!)
What the customer wants is a front end that "reacts" i.e. redraws from new data, when new data arrives. So that's an event driven update (either time or when a new entry is made). This event can be detected in the backend by detecting file updates. System.IO.FileSystemWatcher comes to mind. That drives a new read of the updated file, refresh of the Service data set and trigger of an event in the service. Use a Singleton backend data service - to get the data. Any users with an open SPA session register with the event (probably through a scoped or transient controller service), which precipitates a UI update.
A SQL solution I'm sure will work better, but you don't often get to choose!

Does spark streaming devides the load if two instances of an application run simultaneously?

I have a noticed recently that when I have two instance of a streaming application submitted and they are working in parallel, the input rate somehow changes.
This image is from one of the applications, when I killed the other one. The input rate increases.
I am subscribing from MQTT message broker to get the data to the application. Does this mean that the load gets divided between the two applications?
More info: After the data is being processed by the application, it gets written to HBase, and the transaction is idempotent and nothing happens if data gets written twice.
There are multiple ways to identify it.
If you are maintaining time-stamp based versions of hbase cell data, you can check how many number of time-stamp versions are present for that particular data cell. You can easily check this through hbase shell.
Check this for referance
Another way is, you can log this data with streaming application id from both the streams. Check if same data is getting inserted from both the streams or not.

Resources