I've recently come across some code from Github for using a multi-threaded Inventory Item import using web services:
https://github.com/Acumatica/InventoryItemImportMultiThreaded
I'd like to know if this is faster or better than using an import scenario for mass import, and what may be the advantages of one over the other.
We have used the multi-threaded inventory import and it is MUCH faster than an import Scenario. We were importing 250K Stock Items. Doing via import scenario, which only uses one thread, took 26 hours. Using the tool on Github brought that down to 6 hours. Be aware though that Acumatica limits your processing power to the licensed amount of cores. The license we used was an enterprise license so we could use a lot of cores. Here is a great article that talks about bulk loading: Optomizing Large Import
Related
After doing a lot of research on Node.js vs Flask, I came to the conclusion that when it comes to speed and performance, Nodejs does outperform Flask. On the other hand, I have already built an optimized application in pandas to perform data analysis.
Though I was planning to build my REST API using Flask, since my users are going to be big in numbers and gradually grow with time, I want to ensure that it doesn't compromise on performance.
I have built my frontend on React JS and now this frontend is going to make multiple API calls to my backend to get the data and perform CRUD operations. I am currently left with two options now:
Either rebuild my entire optimized data analysis architecture using pandas alternatives in Javascript like Danfo.js and host everything on node. This way I won't prefer to do as I don't have enough experience with data analytics packages in javascript and additionally, I have already done decent work optimizing my code in pandas.
To build my server in Flask that listens to nodejs requests locally and does all the data computation and returns the response to node which then sends the response to frontend. I am not sure how this architecture will perform when compared to only using Flask for both RESTful API and backend processing server.
Note that my frontend users are going to gradually increase with time and I want to ensure a great User experience.
Do you think that this combination has a better chance than using Node or Flask as a standalone system? Do you have any better alternative to all that I proposed? My aim is to ensure the best user experience and backend being able to handle multiple requests with the least wait time and not overloading the requests to the server.
Please note that I am not using Flask just as a database (so don't suggest me to replace it with some database) but more as a processing unit. Even when users aren't making any requests, some data modeling keeps happening within my flask framework via pandas (though it has nothing to do with Flask API Calls) in a parallel thread or processor you can say.
I would suggest to start with flask given that you have pandas highly optimized, but with a few additions besides vanilla flask. A few things to consider:
track influx requests volume
time for how long does it takes for each request to process with pandas
to introduce distributed task worker early on (eg. celery and etc.) especially if the processing should be something running asynchronously and out of the request and response cycle
cloud functions (eg. GCP or equivalent on AWS) because it seems like you want something that can process as per request, and cloud function supports burst of requests + charge as you use
The list can go on and on, but hopefully it gets you started with less doubts. Cheers!
I am an amateur to the world of Python programming and I need help. I have 10GB of data and I have written python codes with Spyder to process the data. a part of codes is provided:
The codes are good with a small sample of data. However, with 10GB of data, my laptop cannot handle it so I need to use Google Cloud Engine. How I can upload the data and use Google Cloud Engine to run codes?
import os
import pandas as pd
import pickle
import glob
import numpy as np
df=pd.read_pickle(r'C:\user\mydata.pkl')
i=2018
while i>=1995:
df=df[df.OverlapYearStart<=i]
df.to_pickle(r'C:\user\done\{}.pkl'.format(i))
i=i-1
I agree with the previous answer, just to complement it you can take a look in AI Platform Notebooks which is a managed service that offers an integrated JupyterLab environment, also has the capacity to pull your data from BigQuery and allow you to scale your application on demand.
On the other hand, I don't know how you have storage your 10GB of data into CSV? in a database? As is mentioned in the first answer Cloud Storage allows you to create buckets to store your data, once the data is in Cloud Storage you may export that data into BigQuery tables to work with that data in your app using Google Cloud App Engine or the earlier suggestion AI Platform Notebooks this will depend of your solution.
Probably the easiest thing to start digging into, is going to be to use App Engine to run the code itself:
https://cloud.google.com/appengine/docs/python/
And use Google Cloud Storage to hold your data objects:
https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-python
I don't know what the output of your application is, so depending on what you want to do with the output, Google Compute Engine may be the right answer if AppEngine doesn't quite fit what you're doing.
https://cloud.google.com/compute/
The first two links take you to the documentation on how to get going with Python for AppEngine and Google Cloud Storage.
Edit to add from comments, that you'll also need to manage the memory footprint of your app. If you're really doing everything in one giant while loop, no matter where you run the application you'll have memory problems as all 10GB of your data will likely get loaded into memory. Definitely still shift that into the Cloud IMO, but yeah, that memory will need to get broken up somehow and handled in smaller chunks.
Currently Acumatica ERP provides Import scenarios for import from other Systems, but to import closed Invoice it's needed to import it and process.
We don't want to process the invoice and import it as already closed invoice.
How can I do this? Will I have to import data directly to Database?
I would try to avoid Database direct entries. What if you add a customization on the page you are trying to import to allow it to go direct to closed?
You can use the graph IsImport to know the screen is running as an import to perform special functionality for import scenarios only. I am not sure, but keep in mind that screen based API might also set/use IsImport. There is also an IsContractBasedAPI but don't think that applies for your scenario.
For example maybe you allow the status field to be enabled when the process is running IsImport and adjust your import scenario to to set the status. Or add a custom button only available for import that does a direct close process.
Is there anybody out there using Node-Neo4j-embedded in production mode ?
What kind of limits are expectable ?
Because I think this module is pushing the Cypher queries directly to the node-java module, what uses them directly with Neo4j java libs, I belief there shouldn't be any limits.
I feel it is dangerous to decide to use a lib what isn't maintained for about 2 years (see: github) - and it shouldn't be on Neo4j docs if it isn't maintained (see: README.md dead link about API-Docs).
It looks like there could be a new trend to power up node.js support like first citizen languages by other distributor(s) for (in_memory) graph databases. Maybe Neo4j also should review this and the unmaintained node module (like OrentDB did). The trend had bin initiated by a benchmark-battle between ArangoDB and OrientDB.
I would love to see an Node-Neo4j-embedded benchmark answer to the open source benchmark of ArangoDB - done by professional Neo4j people like OrientDB people had done. But note: They hadn't been fair enough (read the last lines about enabling query caches...).
Or it has to be a new benchmark focused on most possible first citizens-like access by NodeJS. There are three possible solutions to test. I am not experienced enough to do such a test what would be really acceptable. But I would like to help by verifying this.
Please support this call for action with comments and (several types of) answers. A better (native like access) and wider range of supporting in_memory and graph solutions would help the node community very much. A new benchmark would force innovation
Short note about ArangoDBs benchmark: They've tested the REST-APIs. But if you think about performance, you don't like to use a REST-API - you like to use direct library access.
#editors: you are welcome
We (ArangoDB) think that the scalability of embedded databases is to limited. It also limits the number of databases which you may want to compare. Users prefer to implement their solutions in their Application stack of choice, so you would limit the number of people potentially interested in your comparison.
The better way of doing this is to compare the officialy supported interface of the database vendor into the client stack that is commonly supported amongst all players in the field. This is why we have chosen nodejs.
There is enough chatter about benchmarks and how to compare them on stackoverflow, so in doubt, start out to create a usecase and implement code for it, present your results in a reproducable way and request that for comment, instead of demanding others to do this for you.
I would like to script benchmark of my socket.io implementation.
After some research I have identified several NodeJS modules, but they have either not been updated for past years (wsbench), or are only supporting websocket protocol (wsbench, thor) or is not testing socket.io implementation but socket.io project (socket.io-benchmark).
Since socket.io project has been highly active the past year, I wonder what is the latest and greatest tool/module to use for benchmarking?
My requirements:
Easy to script and run the tests
Test reports giving good overview of test runs
Test reports should be easy to save in order to compare with later benchmarking
Just came across this in search of some benchmarking for my Socket.IO project.
I found socket.io-benchmark, however had some additional items that I wanted to work through but found one of the forks nearly there.
https://github.com/slowthinker/socket.io-benchmark
I also forked it to add a cap on messagse/second sent to give it more realistic parameters.
Hope that helps!
I would suggest Artillery: Artillery is a modern, powerful, easy-to-use, open-source load-testing toolkit: https://github.com/shoreditch-ops/artillery
Here some feature:
Mulitple protocols: Load-test HTTP, WebSocket and Socket.io applications
Scenarios: Specify scenarios to test multi-step interactions in your API or web app
Perfomance metrics: get detailed performance metrics (latency, requests per second, concurrency, throughput)
Scriptable: write custom logic in JS to do pretty much anything
High performance: generate serious load on modest hardware
Integrations: statsd support out of the box for real-time reporting (integrate with Datadog, Librato, InfluxDB etc)
Extensible: custom reporting plugins, custom protocol engines etc
and more! HTML reports, nice CLI, parameterization with CSV files