How can we export a cassandra table into a csv format using its snapshots file - cassandra

I have taken snapshot of a cassandra table . Following are the files generated :-
manifest.json mc-10-big-Filter.db mc-10-big-TOC.txt mc-11-big-Filter.db mc-11-big-TOC.txt mc-9-big-Filter.db mc-9-big-TOC.txt
mc-10-big-CompressionInfo.db mc-10-big-Index.db mc-11-big-CompressionInfo.db mc-11-big-Index.db mc-9-big-CompressionInfo.db mc-9-big-Index.db schema.cql
mc-10-big-Data.db mc-10-big-Statistics.db mc-11-big-Data.db mc-11-big-Statistics.db mc-9-big-Data.db mc-9-big-Statistics.db
mc-10-big-Digest.crc32 mc-10-big-Summary.db mc-11-big-Digest.crc32 mc-11-big-Summary.db mc-9-big-Digest.crc32 mc-9-big-Summary.db
Is there a way to use these files to extract data of the table into a csv file .

Yes, you can do that with the sstable2json tool.
Use the tool against the *Data.db file
This outputs in JSON format. You need to convert to CSV after.

Related

How to convert Delta file format to Parquet File only

Delta Lake is the default storage format.I understand how to convert a parquet to Delta.
My question is is there any way to revert it back to parquet.Any options ?
What I need is I want single parquet file while writing .Do not need the extra log file !
If you run vacuum on the table and delete the log folder, you end up with regular parquet files.

parse gz file in aws s3 using python

I am trying to bulk copy tables from SnowFlake to postgreSQL. From SnowFlake, I was able to extract tables in CSV format using COPY. The COPY compresses the extract in gz format in aws s3.
Now the second step is to load these files in postgreSQL. I am planning to use postgreSQL COPY utility to ingest the data. However, I don't want to unzip the files. I would rather like to buffer the data directly from gz files and give the buffer file as input to the psycopg2 copy_from function.
Is there a way to parse gz files in AWS S3 using python? Thanks in advance!

write data to text file in azure data factory version 2

It's seem ADF v2 does not support writing data to TEXT file (.TXT).
After select File System
But don't see TextFormat at the next screen
So do we any method to write data to TEXT file ?
Thanks,
Thai
Data Factory only support these 6 file formats:
Please see: Supported file formats and compression codecs in Azure Data Factory.
If we want to write data to a txt file, the only format we can using is Delimited text, when the pipeline finished, you will get a txt file.
Reference: Delimited text: Follow this article when you want to parse the delimited text files or write the data into delimited text format.
For example, I create a pipeline to copy data from Azure SQL to Blob, choose DelimitedText format as Sink dataset:
The txt file I get in Blob Storeage:
Hope this helps
I think what you are looking for is DelimitedText dataset. You can specify extension as part of the file name

How to convert CSV to ORC format using Azure Datafactory

I am coping comma separated partition data files into ADLS using azure datafactory.
The requirement is to copy the comma separated files to ORC format with SNAPPY compression.
Is it possible to achieve this with ADF? if yes, then could you please help me?
Unfortunatelly, data factory can read from ZLIB and SNAPPY, but can only write ZLIB, which is the default for the orc file format.
More info here: https://learn.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs#orc-format
Hope this helped!!

Cassandra CQL : insert data from existing file

I have a JSON file that I want to insert into a Cassandra table using CQL.
According to datastax documentation, you can insert json with the following command :
INSERT INTO data JSON '{My_Json}';
But I can't find a way to do that directly from an existing json file. Is this possible or do I need to to some Java code to do that insert ?
Note : I am using Cassandra 3.9
The only file format supported for importing is csv. It is possible to convert your json file to CSV format and import it with the copy command. If that is not an option for you, java code is needed to parse your file and insert it into Cassandra.

Resources