This week I took a SQL Statement with 11 case statements with sub query lookups and made 11 joins to a temp table I created at the beginning of this query, as a quick and dirty solution to get this file out in a short term solution, with the idea of this part of the query being to take rows and flatten them to columns in the output file. I would like to convert this query to an efficient SSIS package as there are additional "massaging" steps that are taking place manually that can be taken care of in the package.
What is the best way to work these CASE statements in to SSIS so that this will execute efficiently? I tried using the Temp table but had multiple issues with that and wasn't sure if that's the best solution.
Thanks for your help.
Without more information about the issue you are trying to solve all i can really say is that you can use the conditional data flow component to separate your rows and do some further treatment. However, this might not be the fastest option, depending on your additional "massaging" steps.
Related
I am having memory issue when I execute DAX query inside the code in ref picture. If its like 10000 row it works however more than that it create memory issue. My query may return up to 50 Millions of data.
Question 1: What should be the efficient way to execute the query.
Question 2: What settings or properties might change to adjust huge amount of data.
Question 3: Is that possible to use partition and split data to fill into data table?
I am new in python coding. Please suggest me if code needs to change or any other efficient way to pull the data and send to data frame? my end goal is to send all data to CSV format into data lake. Its currently working however, for smaller amount of rows. I have tested till 10k is working in few min. its super inefficient seems to me.
Thanks in Advance!
I'm not sure how I would automate it, but the Export Data function from DAX Studio works really fast. I just did 500k rows in under a minute. It works against tables in the model, so if you are working with a DAX Expression, you would have to create it as a DAX Table in the model first.
I am wondering, if I make a pivottable in Excel from a recordset with about 50000 lines, it takes about 30 seconds to produce a running total in a date field. Yet when I want to achieve the same result in an Access table, the DSUM takes over 30 minutes. Same data... Why is there so much performance difference? What does Excel do in the background?
You might find this article helpful:
http://azlihassan.com/apps/articles/microsoft-access/queries/running-sum-total-count-average-in-a-query-using-a-correlated-subquery
Here's what it says about Dsum, DLookup, etc.
They involve VBA calls, Expression Service calls, and they waste
resources (opening additional connections to the data file.)
Particularly if JET must perform the operation on each row of a query,
this really bogs things down.
Alternatives include looping through the recordset in VBA or creating a subquery. If you need to use DSUM, make sure your field is indexed and avoid text fields.
I have around 200,000 data in excel which is separated in per 15min for each day for two years. Now, I want to add for each day(including all the 15mins data at once) eg. 01/01/2014 to 12/31/2016. I did it using a basic formula (=sum(range)) but it is very time consuming. Can anyone help me find a easy way to resolve this problem?
It's faster and more reliable to work with big data sets using Ms Power Query in which you can perform data analitics and process with single queries, or using Power View of the dataset loaded into the datamodel that really works really fast.
I'm new to SSIS. I'm trying to load the data from the excel to sql server table. What i have to do if the data already existed in the table then I have write it to a temp table or file if not existed then I have to insert into the table.
We are using sql server 2005. I'm using look up transformation to achieve that. But its not working. Is there any can I achieve it.
Please suggest me some tips. Your help greatly appreciated.
Regards,
VG.
I would write down the conceptual steps - as opposed to giving the step-by-step solution. This in my opinion be more helpful in developing the understanding. If you get stuck on any step, please let us know.
Step 1:
First of all load the file into a temporary table. You do not need to create the table manually; let BIDS create it for you. Alter the table to add a new column - ALREADY_EXISTS - BIT data type.
You would need to use Data Flow Task. Within it, use Excel data source and ADO destination.
Step 2a:
Write a sql statement in SSMS using inner join on your temp table and the final destination table. Make sure that the query you come up with gives the result you are expecting. Use this SELECT statement to UPDATE the ALREADY_EXISTS column inside the temp table.
Step 2:
Put Execute SQL task on the control surface. Use the query from Step 2.
Step 3:
Put another DFT on the control surface. Write a plain SELECT statement to pick up all columns - include the ALREADY_EXISTS.
Use a conditional split to determine new and existing records and point them accordingly to their destination.
Also, read up on Merge statement which is a feature introduced in SQL Server 2008.
Please share your experience with this solution.
I have a data flow task set up in SSIS.
The source is from an Excel source not an SQL DB.
The problem i seem to get is that, the package is importing empty rows.
My data has data in 555200 rows, but however when importing the SSIS package imports over 900,000 rows. The extra rows are imported even though the other empty.
When i then download this table into excel there are empty rows in between the data.
Is there anyway i can avoid this?
Thanks
Gerard
The best thing to do. If you can, is export the data to a flat file, csv or tab, and then read it in. The problem is even though those rows are blank they are not really empty. So when you hop across that ODBC-Excel bridge you are getting those rows as blanks.
You could possibly adjust the way the spreadsheet is generated to eliminate this problem or manually the delete the rows. The problem with these solutions is that they are not scalable or maintainable over the long term. You will also be stuck with that rickety ODBC bridge. The best long term solution is to avoid using the ODBC-Excel bridge entirely. By dumping the data to a flat file you have total control over how to read, validate, and interpret the data. You will not be at the mercy of a translation layer that is to this day riddled with bugs and is at the best of times "quirky"
You can also add in a Conditional Split component in your Data flow task, between the source task and the destination task. In here, check if somecolumn is null or empty - something that is consistent - meaning for every valid row, it has some data, and for every invalid row it's empty or null.
Then discard the output for that condition, sending the rest of the rows to the destination. You should then only get the amount of rows with valid data from Excel.