PowerPivot get excluded data from Access - excel

Sample Data
I have a Access, which has more than 1 million rows of data, as you can see from the screenshot. I want to dedupe the data in term of BRUIDREQID, as it has duplicates. Is there any way that when I connect data from Access to PowerPivot, I can get deduped dataset?
What I am doing now is using Python to dedupe the data and extract it as a csv file. I want to know whether I can use PowerPivot instead and save more time to dedupe large data set.

When accessing the Access database, you should be able to write arbitrary SQL, and you could just do a
SELECT DISTINCT
*
FROM Table
, which would de-dupe the table.
Power Pivot does not offer any functionality to change the existing data in a table once imported - you cannot add or remove rows, nor can you alter the values of any imported fields.

Related

Load Data from Excel Power Query to Access table

I have an excel file that collects data from multiple txt files into connected individual tables (1 table per each file) as connection only tables. I have done this because some of them contain >1m tables. In excel, I have appended those tables using Power Query/Apend function. I need to create a new table that contains all the data, however the resulting data is >1m rows, and I can't load it back to excel.
Is there a way to load my connection (summary of all tables) to access?
When I try to do that using import function in access, it does not recognise the connection as a table so I am not sure how to do this.
Thank you,
Load it to the Excel data model which doesn't have a 1m row limit.
You cannot load from PQ to Access (nor would you want to).

how can I manually update some table values in an Excel data model table imported from csv with power query

I am using Excel power query to import csv files containing transactions from a directory. That way adding a new file to the directory automatically makes it available when refreshing the query/data model. I load the table from the csv files into the data model. I do some cleaning and data transformation in the query.
However, there are some things that I can't do in the query that loads the raw data.
There may be missing data that I need to enter manually (a column missing some values)
I may need to split a transaction/row into multiple transactions/rows to categorize the parts correctly
It seems like there should be a way to do this that allows me to make my changes and not have them overwritten when I refresh the query to import new transactions.
Currently I am experimenting with creating a column with a unique id for the transaction table as part of the query. Then creating an aux table in excel relating to the raw transactions by unique id. I then make my changes in the aux table. And finally, I create a new table that merges the raw transactions with the aux table to create the working transaction table. This does work for missing data, or incorrect values, but it still doesn't allow me to split a row into multiple rows.
I would welcome any suggestions or references.

Limit data coming into Spotfire by a different data table

I have Table A prompted on Year/Month and Table B. Table B also has a Year/Month column. Table A is the default data table (gets pulled in first). I have set up a relationship between Table A and B on the common Year/Month column.
The goal is to get Table B to only pull through data where the Year/Month matches the Year/Month on Table A (what the user entered). The purpose is to keep the user from entering the Year/Month multiple times.
The issue is Table B contains almost 35 million records. What I do not want to do is have Spotfire pull across all 35 Million records. What is currently happening is Spotfire is pulling all those records, then by setting filtering to include Filtered Rows Only on Table B, I am limiting what is seen in the visualization to under 200,000 rows. I would much rather just pull across 200,000 rows to start with.
The question: Is there a way to force Spotfire to filter the data table (Table B) by another data table (Table A) as it pulls the data table (Table B) across, thus only pulling a small number of records into memory?
I'm writing this off the basis that most people utilize information links to get data into Spotfire, especially large data sets where the data is not embedded in the analysis. With that being said, I prefer to handle as much if not all of the joining / filtering / massaging at the data source versus the Spotfire application. Here are my views on the best practices and why.
Tables / Views vs Procedures as Information Links
Most people are familiar with the Table / View structure and get data into Spotfire in one of 2 ways
Create all joins / links in information designer based off data relations defined by the author by selecting individual tables from the data sources avaliable
Create a view (or similar object) at the data source where all joining / data relations are done, thus giving Spotfire a single flat file of data
Personally, option 2 is much easier IF you have access to the data source since the data source is designed to handle this type of work. Spotfire just makes it available but with limited functionality (i.e. complex queries, Intellisense, etc aren't available. No native IDE). What's even better is Stored Procedures IMHO and here is why.
In options 1 and 2 above, if you want to add a column you have to change the view / source code at the data source, or individually add a column in the information designer. This creates dwarfed objects and clutters up your library. For example, when you create an information link there is a folder with all the elements associated with it. If you want to add columns later, you'll have another folder for any columns added, and this gets confusing and hard to manage. If you create a procedure at the data source to return the data you need, and later want to add some columns, you only have to change this at the data source. i.e. change the procedure. Everything else will be inherited by Spotfire... all you have to do is click the "reload data" button in Spotfire. You don't have to change anything in the information designer. Additionally, you can easily add new parameters, set default parameter properties or prompt the user, making this a very efficient method of data retrieval. This is perfect when the data source is an OLTP and not a data-mart/data-warehouse (i.e. the data isn't already aggregated / cleansed) but can also be powerful in data warehouse environments as well.
Ditch the GUI, Edit the SQL
I find managing conditions, parameters, join paths, etc a bit annoying--but that's me. Instead, when possible, I prefer to click "Edit SQL" next to all the elements in my Information Link and alter the SQL there. This will allow database guys to work in an environment which is more familiar.

spotfire new table from file filtered

I am using spotfire client.
I have identified some records within a data table that I would like to send to a new data table. Is there some way to create a new table with marked or isolated data or using a data limiting expression on the source table? I have had to export my filtered data out and then import it back in but I am hoping there is a more direct way.
Thanks!
If you know the restrictions you need to set on your data to identify the records, you can create a second table based on the source data.
Go to the properties of the table / visualization, then go to the Data tab. You have to scroll all the way to the bottom. There you can edit the "Limit data using expression".
You could also create a detailed visualization if you want, but that is only useful if you can quickly identify the records.
Or insert a calculated column (e.g. case statement) and use this column to filter your data.

Create a Volatile table in teradata

I have a sharepoint list which i have linked to in MS Access.
The information in this table needs to be compared to information in our datawarehouse based on keys both sets of data have.
I want to be able to create a query which will upload the ishare data into our datawarehouse under my login run the comparison and then export the details to Excel somewhere. MS Access seems to be the way to go here.
I have managed to link the ishare list (with difficulties due to the attachment fields)and then create a local table based on this.
I have managed to create the temp table in my Volatile space.
How do i append the newly created table that i created from the list into my temporary space.
I am using Access 2010 and sharepoint 2007
Thank you for your time
If you can avoid using Access I'd recommend it since it is an extra step for what you are trying to do. You can easily manipulate or mesh data within the Teradata session and export results.
You can run the following types of queries using the standard Teradata SQL Assistant:
CREATE VOLATILE TABLE NewTable (
column1 DEC(18,0),
column2 DEC(18,0)
)
PRIMARY INDEX (column1)
ON COMMIT PRESERVE ROWS;
Change your assistant to Import Mode (File-> Import Data)
INSERT INTO NewTable (?,?)
Browse for your file, this example would be a comma delineated file with two numeric columns and column one being the index.
You can now query or join this table to any information in the uploaded database.
When you are finished you can drop with:
DROP TABLE NewTable
You can export results using File->Export Data as well.
If this is something you plan on running frequently there are many ways to easily do these type of imports and exports. The Python module Pandas has simple functionality for reading a query directly into DataFrame objects and dropping those objects into Excel through the pandas.io.sql.read_frame() and .to_excel functions.

Resources