I'm trying to create a report on PowerBI using azure synapse, I have a few live tables which should only include today's data, I've used Bulk Insert to create the azure pipeline. I'm noticing that historical data is also saved on my tables, but I've made sure to include in my script that I only want the current days data saved on here. Here's an example for context, I do not see prior days on the preview on synapse but when I import the tables on pbi days prior to the 14th show - This issue is slowing down the timeliness of my data, as I work with large data that can reach the 1000000 rows limit at any moment :
SELECT
CAST(Date as DATE) as 'Date' , count(distinct id) as 'IDs', Step as 'Steps', DATEPART(HOUR, Date) as 'hour'
FROM
dbo.XExample
where [enter image description here]
CAST(Date as DATE) = CAST(GetDate() as Date)
GROUP BY
DATEPART(HOUR, Date), CAST(Date as DATE), Step
```(https://i.stack.imgur.com/FwVCg.png) [enter image description here](https://i.stack.imgur.com/qgOfN.png)
I've tried clearing the cache on PBI and that didn't help, Also tried setting data cache to 0 that didn't solve the issue as I couldn't import any tables afterwards.
Related
I have an ADF Pipleline which executes a DataFlow.
The Dataflow has Source A table which has around 1 Million Rows,
Filter which has a query to select only yesterday's records from the source table,
Alter Row settings which uses upsert,
Sink which is archival table where the records are getting upsert
This whole pipeline is taking around 2 hours or so which is not acceptable. Actually, the records being transferred / upserted are around 3000 only.
Core count is 16. Tried the partitioning with round robin and 20 partitions.
Similar archival doesn't take more than 15 minutes for another table which has around 100K records.
I thought of creating source which would select only yesterday's record but the dataset we can select only table.
Please suggest if I am missing anything to optimize it.
The table of the Data Set really doesn't matter. Whichever activity you use to access that Data Set can be toggled to use a query instead of the whole table, so that you can pass in a value to select only yesterday's data from the database.
Or course, if you have the ability to create a stored procedure on the source, you could also do that.
When migrating really large sets of data, you'll get much better performance using a Copy activity to stage the data into an Azure Storage Blob before using another Copy activity to pull from that Blob into the source. But, for what you're describing here, that doesn't seem necessary.
I am migrating data from SAP HANA view to ODS (Azure Data Factory). From there, the other third-party company is moving data to Salesforce database. Now, when I migrate it we are doing a truncate and load in sink.
There is no column in source which shows the date or last updated date when the news rows are added in SAP HANA.
Do we need to have the date in the source, or any other way we can write it in ODS?
It must show with a last updated date or something to denote when a row has been inserted or changed after initial load. So that they have a track when loading onto Salesforce database.
Truncate and Load a staging table, then run a stored procedure to MERGE into your target table, marking inserted and updated rows with the current sysdatetime(). Or MERGE from the staging table into a Temporal Table, or a table with Change Tracking enabled to track the changes automatically.
I wish to incrementally copy data from azure table to azure blob. I have created linked services, datasets and pipelines. I wish to copy data from table to blob after every hour. The table has a timestamp column.I want to transfer data from table to blob in such a way that the data which gets added to the table from 7am to 8am should be pushed to blob in activity window starting at 8 am. In other words, I don't want to miss any data flowing into the table.
I have changed the query used to extract data from the azure table.
"azureTableSourceQuery": "$$Text.Format('PartitionKey gt \\'{0:yyyyMMddHH} \\' and PartitionKey le \\'{1:yyyyMMddHH}\\'', Time.AddHours(WindowStart, -2), Time.AddHours(WindowEnd, -2))"
This query will get data which was added to the table 2 hours back and hence I wont miss any data.
I've been working on creating workbook and share them on PowerBI Preview service, and today I found that I couldn't schedule a refreshment on my workbook.
Inside this workbook, I connect with my data source(Azure SQL database) by using the Excel PowerQuery. At the moment that I add the scheduled refreshment, I got the message:
you can't schedule refresh because this dataset contains data sources that do not yet support refresh.
Does anyone see why this didn't work, any help will be really appropriated.
Answer: I should load directly my data into a data model instead of into a worksheet, now the refreshment works fine!
Now I got another question, I have two tables like below
Table devices.
deviceid, network_type, location, language
id001,wifi,us,english
id002,gsm,france,french
id003,wifi,italy,italian.....
Table data consuming.
deviceid, volume_consuming, date
id001, 200, 04-03-2015
id001,300, 04-05-2015
id002,500, 04-06-2015
id002, 600, 04-05-2015
id003,800, 04-03-2015
id003, 1000, 04-06-2015
I need to calculate average data consuming per device and aggregate by date, then I created this table below
Table aggregation by date
date, avg_data_per_device
04-03-2015, 500
04-05-2015, 450
04-06-2015, 750
Now comes my question, I want to add some filter to my graph which is based on the third table, since there are no deviceid in this table(it's an aggregation table), can I do some manipulation under PowerBI to acheive this, does anyone have any ideas please, thanks in advance !!!
There are two common issues you might be hitting. First, to use SQL Azure in Excel workbooks with refresh, you'll need to use the Power Query UI to build the query (if you specify a custom query it won't work). Second, if you have other queries that load data from excel worksheets it won't work either. Would suggest pairing down your queries until you have just one for SQL Azure. If that all doesn't fix the issue, then you should use the "Contact Support" feature in the Power BI UI that's under the question mark icon.
Appreciate your using Power BI,
-Lukasz
http://dev.powerbi.com
http://blogs.msdn.com/powerbidev
Make a feature request: https://support.powerbi.com/forums/265200-power-bi
Sign up for Power BI: http://www.powerbi.com
I have table contains more than 50 million records in Azure. I'm trying to create a nonclustered index on it using follow statment
create nonclustered index market_index_1 on MarketData(symbol, currency) with(online=on)
But I get a error message.
Msg -2, Level 11, State 0, Line 0 Timeout expired. The timeout period
elapsed prior to completion of the operation or the server is not
responding.
Any suggestions would be greatly appreciated.
Check out the Azure SQL Database Resource Limits document. Then compare the error code with the error codes listed on this document.
With data of that size I believe the only way to create new index in that table would be:
Create new table with same structure and only one clustered index
Copy the data from original table into the new one
Truncate the original table
Create desired indexes
Copy data back into original table
Note that moving the data between the tables will potentially once again exceed the resource limits, so you might have to do these operations in chunks.
Other possible approach is to upgrade the database server to the new Preview Version of Azure SQL Database (note: you cannot downgrade the server later!)