Need to create an SSIS Package for users to directly modify a table - user-controls

I need to allow a couple of users to modify a table in my database, preferably as part of an integrated package that then submits the changes into our live database.
Please allow me to explain further:
We have an automated import task from one database system into another, with data transformation on the way through.
As part of this task, various checks are run before the final import and any rows with incomplete or incorrect data are sent to a rejections table and deleted from the import table.
I now need to allow a couple of senior users that ability to view and correct the missing/incorrect entries from the rejection table, before re-staging it and submitting to the live database.
(Obviously, it will be re-checked before submission and re-rejected if it is still wrong).
Can anyone tell me what I need to do in SSIS to display the contents of a specific table (e.g. MyDatabase.dbo.Reject_Table) to the user running this package from their local PC (the package will, of course, be located on the server).
Then they need the ability to modify the contents of the table - Either 1 row at a time or en-masse. Not bothered which).
When that is done, they hit a "Continue" or "Next" type button, which then continues to run the remainder of the package, which I am more than comfortable writing.
It is only the interactive stage(s) that I am struggling with and I would really appreciate some advice.
Thanks
Craig

That is non-native functionality in SSIS.
You can write pretty much anything you want in a script task and that includes GUI components. (I once had a package play music). In your data flow, you would have a Script Component that edits each row passing through the component.
Why this is a bad idea
Suitability - This isn't really what SSIS is for. The biggest challenge you'll run into is the data flow is tightly bound to the shape of the data. The reject table for Customer is probably different than the reject table for Phone.
Cost - How are you going to allow those senior users to run SSIS packages? If the answer involves installing SSIS on their machines, you are looking a production license for SQL Server. That's 8k to 23k ish per socket for SQL Server 2005-2008R2 and something insane per core for SQL Server 2012+.
What is a better approach
As always, I would decompose the problem into smaller tasks until I can solve it. I'd make 2 problem statements
As a data steward, I need the ability to correct (edit) incomplete data so that data can be imported into our application.
As an X, I need the ability to import (workflow) corrected rejected data so that we can properly bill our customers (or whatever the reason is).
Editing data. I'd make a basic web page or thick client app to provide edit capability. A DataGridView would be one way of doing. Heck, you could forgo custom development and just slap an Access front end to the tables and let them edit the data through that.
Import corrected data. This is where I'd use SSIS but possibly not exclusively. I'd probably look at adding a column to all the reject tables that indicates whether it's ready for reprocessing. For each reject table, I'd have a package that looks for any rows flagged as ready. I'd probably use a Delete first pattern to remove the flagged data and either insert it into the production tables or route it back into the reject table for further fixing. The mechanism for launching the packages could be whatever makes sense. Since I'm lazy,
I'd have a SQL Agent job that runs the packages and
Create a stored proc which can start that job
Grant security on that stored proc to the data stewards
Provide the stewards a big red button that says Import How that's physically implemented would depend on how you solved the edit question.

Related

Live Connection to Database for Excel PowerQuery?

I currently have approximately 10M rows, ~50 columns in a table that I wrap up and share as a pivot. However, this also means that it takes approximately 30mins-1hour to download the csv or much longer to do a powerquery ODBC connection directly to Redshift.
So far the best solution I've found is to use Python -- Redshift_connector to run update queries and perform an unload a zipped resultset to an S3 bucket then use BOTO3/gzip to download and unzip the file, then finally performing a refresh from the CSV. This resulted in a 600MB excel file compiled in ~15-20 mins.
However, this process still feel clunky and sharing a 600MB excel file among teams isn't the best either. I've searched for several days but I'm not closer to finding an alternative: What would you use if you had to share a drillable table/pivot among a team with a 10GB datastore?
As a last note: I thought about programming a couple of PHP scripts, but my office doesn't have the infrastructure to support that.
Any help would or ideas would be most appreciated!
Call a meeting with the team and let them know about the constraints, you will get some suggestions and you can give some suggestions
Suggestions from my side:
For the file part
reduce the data, for example if it is time dependent, increase the interval time, for example an hourly data can be reduced to daily data
if the data is related to some groups you can divide the file into different parts each file belonging to each group
or send them only the final reports and numbers they require, don't send them full data.
For a fully functional app:
you can buy a desktop PC (if budget is a constraint buy a used one or use any desktop laptop from old inventory) and create a PHP/Python web application that can do all the steps automatically
create a local database and link it with the application
create the charting, pivoting etc modules on that application, and remove the excel altogether from your process
you can even use some pre build applications for charting and pivoting part, Oracle APEX is one examples that can be used.

CRM 365 Workflows and importing data file from Excel

When designing Workflows you have a chance to indicate how it is triggered.
In my particular case I am interested to detect changes in the Status Reason and, for specific states, do something. I can use the "After" filed change on the Status Reason or a Wait condition and everything looks to be OK.
The question I have is in the relation to an Excel Export/Import used for bulk operations. In this case the user can change (using Excel) the Status Reason field to a value matching the condition in the workflow.
Assuming the workflow is Activated at the time of Excel import, does the workflow get triggered for every row imported?
It might be very inefficient from a timing perspective but for small data sets might be beneficial and acting as a bulk update, which in fact I am looking for.
Thank you!
For your question,
Yes workflow does get triggered every time you Import data using Excel and it matches the criteria for your Workflow.
Workflow run on server side that means, they will trigger every time value changes in Database and matches criteria. You could run your workflow in asynchronous mode and Crm Async job will take care of allocating resources as and when it has capacity. In this way you will not see performance impact when you Import data via Excel.

Robot's Tracker Threads and Display

Application: The purposed application has an tcp server able to handle several connections with the robots.
I choosed to work with database/ no files, so i'm using a sqlite db to save information about the robots and their full history, models of robots, tasks, etc...
The robots send us several data like odometry, tasks information, and so on...
I create a thread for every new robot's connection to handle the messages and update the informations of the robots on the database. Now lets start talk about my problems:
The application got to show information about the robots in realtime, and I was thinking about using QSqlQueryModel, set the right query and the show it on a QTableView but then I got to some problems/ solutions to think about:
Problem number 1: There are informations to show on the QTableView that are not on the database: I have the current consumption on the database and the actual charge on the database in capacity, but I want to show also on my table the remaining battery time, how can I add that column with the right behaviour (math implemented) in my TableView.
Problem number 2: I will be receiving messages each second for each robot, so, updating the db and the the gui(loading the query) may not be the best solution when I have a big number of robots connected? Is it better to update the table, and only update the db each minute or something like this? If I use this method I cant work with the table with the QSqlQueryModel to update the tables, so what is the approach that you recommend me to use?
Thanks
SancheZ
I have run into similar problem before; my conclusion was QSqlQueryModel is not the best option for display purposes. You may want some processing on query results, or you may want to create, remove, change display data based on the result for a fancier gui. I think best is to implement your own delegates and override the view related methods - setData, setEditor
This way you have the control over all your columns and direct union of raw data and its display equivalent (i.e. EditData, UserData).
Yes, it is better if you update your view real-time and run a batch execute at lower frequency to update the big data. In general app is the middle layer and db is a bottom layer for data monitoring, unless you use db in memory shared cache.
EDIT: One important point, you cannot run updates in multiple threads (you can, but sqlite blocks the thread until it gets the lock) so it is best to run update from a single thread

How do production Cassandra DBA's do table changes & additions?

I am interested in how the Cassandra production DBA's processes change when using Cassandra and performing many releases over a year. During the releases, columns in tables would change frequently and so would the number of Cassandra tables, as new features and queries are supported.
In the relational DB, in production, you create the 'view' and BOOM you get the data already there - loaded from the view's query.
With Cassandra, does the DBA have to create a new Cassandra table AND have to write/run a script to copy all the required data into that table? Can a production level Cassandra DBA provide some pointers on their processes?
We run a small shop, so I can tell you how I manage table/keyspace changes, and that may differ from how others get it done. First, I keep a text .cql file in our (private) Git repository that has all of our tables and keyspaces in their current formats. When changes are made, I update that file. This lets other developers know what the current tables look like, without having to use SSH or DevCenter. This also has the added advantage of giving us a file that allows us to restore our schema with a single command.
If it's a small change (like adding a new column) I'll try to get that out there just prior to deploying our application code. If it's a new table, I may create that earlier, as a new table without code to use it really doesn't hurt anything.
However, if it is a significant change...such as updating/removing an existing column or changing a key...I will create it as a new table. That way, we can deploy our code to use the new table(s), and nobody ever knows that we switched something behind the scenes. Obviously, if the table needs to have data in it, I'll have export/import scripts ready ahead of time and run those right after we deploy.
Larger corporations with enterprise deployments use tools like Chef to manage their schema deployments. When you have a large number of nodes or clusters, an automated deployment tool is really the best way to go.

alternative to polling database?

I have an application that works as follows: Linux machines generate 28 different types of letter to customers. The letters must be sent in .docx (Microsoft Word format). A secretary maintains MS Word templates, which are automatically used as necessary. Changing from using MS Word is not an option.
To coordinate all this, document jobs are placed into a database table and a python program running on each of the windows machines polls the database frequently, locking out jobs and running them as necessary.
We use a central database table for the job information to coordinate different states ("new", "processing", "finished", "printed")... as well to give accurate status information.
Anyway, I don't like the clients polling the database frequently, seeing as they aren't working most of the time. Clients hpoll every 5 seconds.
To avoid polling, I kind of want a broadcast "there's some work to do" or "check your database for some work to do" message sent to all the client machines.
I think some kind of publish/subscribe message queue would be up to the job, but I don't want any massive extra complexity.
Is there a zero or near zero config/maintenance piece of software that would achieve this? What are the options?
X
Is there any objective evidence that any significant load is being put on the server? If it works, I'd make sure there's really a problem to solve here.
It must be nice to have everything running so smoothly that you're looking at things that might only possibly be improved!
Is there a zero or near zero config/maintenance piece of software that would achieve this? What are the options?
Possibly, but what you would save in configuration and implementation time would likely hurt performance more than your polling service ever could. SQL Server isn't made to do a push really (not easily anyway). There are things that you could use to push data out (replication service, log shipping - icky stuff), but they would be more complex and require more resources than your simple polling service. Some options would be:
some kind of trigger which runs your executable using command-line calls (sp_cmdshell)
using a COM object which SQL Server could open and run
using a SQL Agent job to run a VBScript (which would again be considered "polling")
These options are a bit ridiculous considering what you have already done is much simpler.
If you are worried about the polling service using too many cycles or something - you can always throttle it back - polling every minute, every 10 minutes, or even just once a day might be more appropriate - this would be a business decision, so go ask someone in the business how fast it needs to be.
Simple polling services are fairly common, because they are, well... simple. In addition they are also low overhead, remotely stable, and error-tolerant. The down side is that they can hammer the database into dust if not carefully controlled.
A message queue might work well, as they're usually setup to be able to block for a while without wasting resources. But with MySQL, I don't think that's an option.
If you just want to reduce load on the DB, you could create a table with a single row: the latest job ID. Then clients just need to compare that to their last ID to see if they need to run a full poll against the real table. This way the overhead should be greatly reduced, if it's an issue now.
Unlike Postgres and SQL Server (or object stores like CouchDb), MySQL does not emit database events. However there are some coding patterns you can use to simulate this.
If you have one or more tables that you wish to monitor, you can create triggers on these tables that add a row to a "changes" table that records a queue of events to process. Your triggers filter the subset of data changes that you care about and create records in your changes table for each event you wish to perform. Because this pattern queues and persists events it works well even when the workers that process these events have outages.
You might think that MyISAM is the best choice for the changes table since it's mostly performing writes (or even MEMORY if you don't need to persist the events between database server outages). However, keep in mind that both Memory and MEMORY and MyISAM have only full-table locks so your trigger on an InnoDB table might hit a bottle neck when performing an insert into a MEMORY and MyISAM table. You may also require InnoDB for the changes table if you're using a ON DELETE CASCADE with another InnoDB table (requires both tables to use the same engine).
You might also use SHOW TABLE STATUS to check the last update time of you changes table to check if there's something to perform. This feature wont work for InnoDB tables.
These articles describes in more depth some of alternative ways to implement queues in MySQL and even avoid polling!
How to notify event listeners in MySQL
How to implement a queue in SQL
5 subtle ways you're using MySQL as a queue, and why it'll bite you

Resources