BadGateway Power Automate using Execute Script Action - sharepoint

I'm using the Run Script action to transfer data between Excel sheets and I'm facing the Badgateway error, saying that the action has timed out. The export action converts the data to JSON format, which is then used as input in the action to import the data and transform the JSON into table format again. The amount of data transferred is large (>10000), however, even when using a smaller database (1000 rows x 3 columns) the error appears sometimes, which may lead me to the conclusion that it may not be the amount of data and yes, Microsoft's database is low at the time of running Flow, so it can't fulfill my request.
I would like to know if any Power Automate plan can help to solve this problem, if any license allows the user to use a greater capacity of the database or have a space destined in which the connection to the server does not fail at the time of executing the flow, flawlessly processing my request. Or if it is a problem that I must solve by decreasing the amount of data transferred in this format, and if is it, how can I measure this quantity of Data Power Automate can process.

Related

How to make Copy Data work faster and have better performance (Azure Synapse)

A bit of context: my Azure Synapse pipeline makes a GET Request to a REST API in order to import data to the Data Lake (ADLSGen2) in parquet file format.
I am looking forward to requesting data to the API on an hourly basis in order to get information of the previous hour. I have also considered to set the trigger to run every half an hour to get the data of the previous 30 minutes.
The thing is: this last GET request and Copy Data debug took a bit less than 20 minutes. The DUI used was set in "Auto", and it equals 4 even if I set it manually to 8 on the activity settings.
I was wondering if there are any useful suggestions to make a Copy Data activity work faster, whatever the cost may be (I would really like info about it, if you consider it pertinent).
Thanks in advance!
Mateo
You need to check which part is running slow.
You can click on the glasses icon to see the copy data details.
If the latency is on "Time to first byte" or "Reading from source" the issue is on the REST API side.
If the latency is on "Writing to sink" the problem may be from writing to data lake.
If the issue is on the API side, try to contact the provider. Another option, if applicable, is to use a few copy data activities, each will copy a part of the data.
If the issue is on data lake, you should check the setting on the sink side.

How to structure Blazor Server Side App to have up to date information with folders of CSVs as the Data Source?

To give the brief introduction, I'm new to Blazor. I could very well be missing an obvious feature.
Project assigned is requesting a Blazor, Server Side Application to display charts of information retrieved from 10 accessible system folders, each with hourly CSVs, the last of which adds a row of data every 1-3 seconds. Once the hour has passed, a new CSV is created, and we continue, ad infinitum for purposes of this argument. Each CSV has 100 columns; we're only focusing on 3 for now. If the argument of SQL comes up, they do not want to upload anywhere from 1 to 1.5 million rows to SQL each day. CSVs currently have anywhere from 1500 to 7200 rows of data.
Currently, page loads, data chunk is retrieved, in this case the last 4 hours, and then every 5 seconds, data from the last two files is retrieved (to avoid missing any hour turnover rows), and only new data is added to the data source in the site. The Date Timestamp of each row is treated as unique for each cluster. The charts only show the last 15 to 30 minute of a rolling buffer of activity for demo purposes, though they may very well request longer
Read access to CSVs is not a concern. All methods to access CSVs are wrapped in using statements. My concern is, in this program's infancy, each client opening the page reinitiates the data retrieval and background looper, and that means we effectively have no scalability, and the process memory is just an upward slope.
What are my actions to reduce server load, if there are any? Should I push harder for SQL based?
OK, some thoughts and ideas. I'm reading a bit between the lines here, so forgive me if I'm way off the mark. (Sounds a bit like a NHS COVID dashboard app, they love their spreadsheets!)
What the customer wants is a front end that "reacts" i.e. redraws from new data, when new data arrives. So that's an event driven update (either time or when a new entry is made). This event can be detected in the backend by detecting file updates. System.IO.FileSystemWatcher comes to mind. That drives a new read of the updated file, refresh of the Service data set and trigger of an event in the service. Use a Singleton backend data service - to get the data. Any users with an open SPA session register with the event (probably through a scoped or transient controller service), which precipitates a UI update.
A SQL solution I'm sure will work better, but you don't often get to choose!

Custom Validation error reporting in Data Factory

I'm using Azure Data Factory to build some file to db imports and one of the requirements I have is if a file isn't valid. e.g. either a column is missing or contains incorrect data (wrong data type, lookup doesn't exist in a db) then an alert is sent detailing the errors. Errors should be regular human readable so rather than a SQL error saying insert would violate a forign key, it should say incorrect value entered for x.
This doc (https://learn.microsoft.com/en-us/azure/data-factory/how-to-data-flow-error-rows) describes a way of using conditional splits to add custom validation that would certainly work to allow me to import the good data and write the bad data to another file with custom error messages. But how can I then trigger an alert with this? As far as I can tell, this will result in the data flow reporting success and to do something like calling a logic app to send an email needs to be done in the pipeline rather than data flow.
That’s a good point, but couldn’t you write the bad records to an error table/file, then give aggregated summary of how many records erred, counts of specific errors, that would be passed to logic app/SendGrid API to alert interested parties of the status. It would be post-data flow completion activity that checks to see if there is an error file or error records in the table, if so, aggregate and classify, then alert.
I have a similar notification process that gives me successful/erred pipeline notifications, as well as 30 day pipeline statistics... % pipeline successful, average duration, etc.
I’m not at my computer right now, otherwise I’d give more detail with examples.
To catch the scenario when the rows copied and rows written are not equal , may be you can use output of the copy active and if the difference is not 0 , send an alert .

Differences between BigQuery BQ.insert_rows_json and BQ.load_from_json?

I want to stream data into BigQuery and I was thinking in use PubSub + Cloud Functions, since there is no transformation needed (for now, at least) and using Cloud Data Flow feels like a little bit over kill for just inserting rows to a table. I am correct?
The data is streamed from a GCP VM using a Python script into PubSub and it has the following format:
{'SEGMENT':'datetime':'2020-12-05 11:25:05.64684','values':(2568.025,2567.03)}
The BigQuery schema is datetime:timestamp, value_A: float, value_B: float.
My questions with all this are:
a) Do I need to push this into BigQuery as json/dictionary with all values as strings or it has to be with the data type of the table?
b) What's the difference between using BQ.insert_rows_json and BQ.load_table_from_json and which one should I use for this task?
EDIT:
What I'm trying to get is actually market data of some assets. Say around 28 instruments and capture all their ticks. On an average day, there are ~60.k ticks per instrument, so we are talking about ~33.6 M invocations per month. What is needed (for now) is to insert them in a table for further analysis. I'm currently not sure if real streaming should be performed or loads per batch. Since the project is in doing analysis yet, I don't feel that Data Flow is needed, but PubSub should be used since it allows to scale to Data Flow easier when the time comes. This is my first implementation of doing streaming pipelines and I'm using all what I've learned through courses and reading. Please, correct me if I'm having a wrong approach :).
What I would absolutely love to do is, for example, perform another insert to another table when the price difference between one tick and the n'th tick is, for example, 10. For this, should I use Data Flow or the Cloud Function approach is still valid? Because this is like a trigger condition. Basically, the trigger would be something like:
if price difference >= 10:
process all these ticks
insert the results in this table
But I'm unsure how to implement this trigger.
In addition to the great answer of Marton (Pentium10)
a) You can stream a JSON in BigQuery, a VALID json. your example isn't. About the type, there is an automatic coercion/conversion according with your schema. You can see this here
b) The load job loads file in GCS or a content that you put in the request. The batch is asynchronous and can take seconds or minutes. In addition, you are limited to 1500 load per days and per table -> 1 per minutes works (1440 minutes per day). There is several interesting aspect of the load job.
Firstly, it's free!
Your data are immediately loaded in the correct partition and immediately request-able in the partition
If the load fail, no data are inserted. So, it's easiest to replay a file without having doubled values.
At the opposite, the streaming job insert in real time the data into BigQuery. It's interesting when you have real time constraint (especially for visualisation, anomalie detections,...). But there is some bad sides
You are limited to 500k rows per seconds (in EU and US), 100k rows in other regions, and 1Gb max per seconds
The data aren't immediately in the partition, they are in a buffer name UNPARTITIONED for a while or up to have this buffer full.. So you have to take into account this specificity when you build and test your real time application.
It's not free. The cheapest region is $0.05 per Gb.
Now that you are aware of this, ask yourselves about your use case.
If you need real time (less than 2 minutes of delay), no doubt, streaming is for you.
If you have few Gb per month, streaming is also the easiest solution, for few $
If you have a huge volume of data (more than 1Gb per second), BigQuery isn't the good service, consider BigTable (that you can request with BigQuery as a federated table)
If you have an important volume of data (1 or 2Gb per minutes) and your use case required data freshness at the minute+, you can consider a special design
Create a PubSub pull subscription
Create a HTTP triggered Cloud Function (or a Cloud Run service) that pull the subscription for 1 minutes and then submit the pulled content to BigQuery as a load job (no file needed, you can post in memory content directly to BigQuery). And then exist gracefully
Create a Cloud Scheduler that trigger your service every minute.
Edit 1:
The cost shouldn't drive your use case.
If, for now, it's only for analytics, you simply imagine to trigger once per days your job to pull the full subscriptions. With your metrics: 60k metrics * 28 instruments * 100 bytes (24 + memory loss), you have only 168Mb. You can store this in Cloud Functions or Cloud Run memory and perform a load job.
Streaming is really important for real time!
Dataflow, in streaming mode, will cost you, at least $20 per month (1 small worker of type n1-standard1. Much more than 1.5Gb of streaming insert in BigQuery with Cloud Functions.
Eventually, about your smart trigger to stream or to batch insert, it's not really possible, you have to redesign the data ingestion if you change your logic. But before all, only if your use case requires this!!
To answer your questions:
a) you need to push to BigQuery using the library's accepting formats usually a collection or either a JSON document formatted to the table's definition.
b) To add data to BigQuery you can Stream data or Load a file.
For your example you need to stream data, so use the 'streaming api' methods insert_rows* family.

Need to create an SSIS Package for users to directly modify a table

I need to allow a couple of users to modify a table in my database, preferably as part of an integrated package that then submits the changes into our live database.
Please allow me to explain further:
We have an automated import task from one database system into another, with data transformation on the way through.
As part of this task, various checks are run before the final import and any rows with incomplete or incorrect data are sent to a rejections table and deleted from the import table.
I now need to allow a couple of senior users that ability to view and correct the missing/incorrect entries from the rejection table, before re-staging it and submitting to the live database.
(Obviously, it will be re-checked before submission and re-rejected if it is still wrong).
Can anyone tell me what I need to do in SSIS to display the contents of a specific table (e.g. MyDatabase.dbo.Reject_Table) to the user running this package from their local PC (the package will, of course, be located on the server).
Then they need the ability to modify the contents of the table - Either 1 row at a time or en-masse. Not bothered which).
When that is done, they hit a "Continue" or "Next" type button, which then continues to run the remainder of the package, which I am more than comfortable writing.
It is only the interactive stage(s) that I am struggling with and I would really appreciate some advice.
Thanks
Craig
That is non-native functionality in SSIS.
You can write pretty much anything you want in a script task and that includes GUI components. (I once had a package play music). In your data flow, you would have a Script Component that edits each row passing through the component.
Why this is a bad idea
Suitability - This isn't really what SSIS is for. The biggest challenge you'll run into is the data flow is tightly bound to the shape of the data. The reject table for Customer is probably different than the reject table for Phone.
Cost - How are you going to allow those senior users to run SSIS packages? If the answer involves installing SSIS on their machines, you are looking a production license for SQL Server. That's 8k to 23k ish per socket for SQL Server 2005-2008R2 and something insane per core for SQL Server 2012+.
What is a better approach
As always, I would decompose the problem into smaller tasks until I can solve it. I'd make 2 problem statements
As a data steward, I need the ability to correct (edit) incomplete data so that data can be imported into our application.
As an X, I need the ability to import (workflow) corrected rejected data so that we can properly bill our customers (or whatever the reason is).
Editing data. I'd make a basic web page or thick client app to provide edit capability. A DataGridView would be one way of doing. Heck, you could forgo custom development and just slap an Access front end to the tables and let them edit the data through that.
Import corrected data. This is where I'd use SSIS but possibly not exclusively. I'd probably look at adding a column to all the reject tables that indicates whether it's ready for reprocessing. For each reject table, I'd have a package that looks for any rows flagged as ready. I'd probably use a Delete first pattern to remove the flagged data and either insert it into the production tables or route it back into the reject table for further fixing. The mechanism for launching the packages could be whatever makes sense. Since I'm lazy,
I'd have a SQL Agent job that runs the packages and
Create a stored proc which can start that job
Grant security on that stored proc to the data stewards
Provide the stewards a big red button that says Import How that's physically implemented would depend on how you solved the edit question.

Resources