Has anyone used Redshift to source an Excel pivot table?

Has anyone used Redshift to source an Excel pivot table? - excel

Is it possible to use Amazon Redshift as the data source for an Excel pivot table? Googling this question didn't yield any obvious answers. Thanks.

Yes I have.
However since the other answers were written, rather than use generic PostGres drivers, you should use customised Redshift Drivers provided by Amazon.
The answers you are looking for are here:
http://docs.aws.amazon.com/redshift/latest/mgmt/configure-odbc-connection.html

You can consume Amazon Redshift databases with the PostGRESQL ODBC drivers.
Download and install driver.
Set up a DSN on the box pointed to your Redshift server with your AWS credentials (you can find the ODBC connection string in the settings area of your cluster.)
Use that connection in Excel or any other product that can connect to ODBC connections.

You can convert Excel to CSV and upload it to S3. Once files are uploaded to S3 you can run copy command to copy data from S3 to Redshift cluster. You can run copy command via PostGRESQL JDBC connector or available tools like SqlWorkbench.

Related

How do I configure Talend Open Studio to connect to a Cassandra cluster?

I referred this Documentation
https://www.javatpoint.com/talend-jdbc-connection
For how to config DB connection on Talend. In the documentation mentioned, MySQL JDBC Connector is used to connect the MySQL DB to Talend. In my case, I need to connect with Cassandra JDBC Connector is used to connect the Cassandra DB to Talend, and the connection is also established successfully.
The documentation mentioned when we right-click on the database connection it will show the popup menu. The pop-menu shows the retrieve schema option. This option is used to show the table. But when I right-click on the DB connection it's not showing a pop-menu on the Talend Open Studio. How to fix this issue.

I suspect the problem is that you're using the wrong JDBC driver although I'm unable to confirm that since you didn't actually say which one you're using.
You will need to download the Simba JDBC Driver for Apache Cassandra from DataStax Downloads in order to connect to a Cassandra cluster. Then you'll need to install the driver on to your Talend.
I don't have the detailed steps for doing that but I've previously written instructions for connecting clients like Pentaho Data Integration and DBeaver to Astra DB which is a Cassandra-as-a-service. The instructions for those should give you an idea of how to configure Talend. Cheers!

I encountered the same problem, you're supposed to make the connection under the 'NoSQL Connections' Tab since Cassandra is a NoSQL database.
I followed the instructions here

Read oracle dump file (.dmp) file to panda dataframe

I have one testdata.dmp available in AWS s3 bucket and want to load data into panda dataframe. Looking for some solution, I've boto3 installed.

Your Oracle dump file testdata.dmp has a proprietary binary format maintained by Oracle. This means that Oracle controls which tools can process it correctly. One of such tools is Oracle Data Pump.
A workflow to extract data from a Oracle dump file and write it as Parquet files (readable with Pandas) could look as follows:
Create an Oracle DB. As you are already using AWS S3, I suggest setting up an AWS RDS instance with Oracle engine.
Download testdata.dmp from S3 to the created Oracle DB. This can be done by RDS' S3 integration.
Run Oracle Data Pump Import on the RDS instance. This tool is installed by default. The RDS docs provide a detailed walk-through. Now the content of testdata.dmp lives as tables with data and other objects inside the Oracle DB.
Dump all tables (and other objects) with a tool that is able to query Oracle DBs and able to write the result as Parquet. Some choices:
Sqoop (Hadoop-based command line tool, but deprecated)
(Py)Spark (Popular data processing tool and imho the unofficial successor of Sqoop.)
python-oracledb + Pandas

Copy data into postgres from Redshift using Node.js

Is there an efficient way to copy a table from redshift to postgres using nodejs, couldn't find any concrete examples

There does not seem to be any utility pre-written. the process that you must adopt (set up) for anything more than just a few rows is:
Push data to S3
Use AWS Copy command (using SDK) to copy from S3 to Redshift
Transform data in Redshift (optional)

HDFS connection from Excel Power Query

Have a server with a Hadoop instance running on it.
Basically I'd like to connect to some HDFS table via Excel on my local machine. I know that Power Query Add-in helps in dealing that operation and provide an opportunity to establish connection with HDFS. But here is the thing-I have Excel 2016, so according to Microsoft Documentation, Power Query is already built in the Excel. But when I'm trying to do "Data-Get Data-From Other Sources" there is simply no option like "Get Data from From Hadoop File (HDFS)"
What am I doing wrong and what exact steps do I need to take to get access to HDFS from Excel?

For me HDFS shows up here:
but not here:
What does the first New Query > From Other Sources look like for you?

SSIS to Azure HDInsight Using Microsoft Hive ODBC Driver

Currently driving an RnD project testing hard against Azure's HDInsight Hadoop service. We use SQL Server Integration Services to manage ETL workflows, and so making HDInsight work with SSIS is a must.
I've had good success with a few of the Azure Feature Pack tasks. But there is no native HDInsight/Hadoop Destination task for use with DFTs.
Problem With Microsoft's Hive ODBC Driver Within An SSIS DFT
I create a DFT with a simple SQL Server "OLE DB Source" pointing to the cluster with a "ODBC Destination" using Microsoft HIVE ODBC Driver. (Ignore red error. It has detected the cluster is destroyed).
I've tested the cluster ODBC connection after entering all parameters, and it tests "OK". It is able to read the HIVE table even and map all columns to. The problem arrives at run time. It generally just locks up, with no rows in counter, or it will get to a handful of rows in the buffer and freeze.
I've troubleshooted with:
Verified connection string and Hadoop cluster username/password.
Recreated cluster and task several times.
Source is SQL Server, and runs fine if i point it to only a file destination or recordset destination.
Tested a smaller number off rows to see if it is a simple performance issue (SELECT TOP 100 FROM stupidTable). Also tested with only 4 columns.
Tested on a separate workstation to make sure it wasn't related to the machine.
All that said, and I can't figure out what else to try. I'm not doing much different than examples on the web like this one, except that I'm using the ODBC as a Destination and not a Source.
Has anyone had success with using the HIVE driver or another one within an SSIS Destination task? Thanks in advanced.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string