export file in csv.gz format in cassandra

export file in csv.gz format in cassandra - cassandra

How to export data from cassandra db to csv.gz file without using stdout and tool like dsbulk?
Is there any way to export in gzip format like we do in postgres "copy tablename to program 'gzip > /path/to/file.csv.gz'"?

It's much the same with DSBulk. You just have to pipe the output to gzip.
Here's an example:
$ dsbulk unload -k ks_name -t table_name | gzip > export.csv.gz
For more info, see Unloading data examples with DSBulk. Cheers!

Related

Read csv file from Hadoop using Spark

I'm using spark-shell to read csv files from hdfs.
I can read those csv file using the following code in bash:
bin/hadoop fs -cat /input/housing.csv |tail -5
so this suggest the housing.csv is indeed in hdfs right now.
How can I read it using spark-shell?
Thanks in advance.
sc.textFile("hdfs://input/housing.csv").first()
I tried this way, but failed.

Include the csv package in the shell and
var df = spark.read.format("csv").option("header", "true").load("hdfs://x.x.x.x:8020/folder/file.csv")
8020 is the default port.
Thanks,
Ash

You can read this easily with spark using csv method or by specifying format("csv"). In your case either you should not specify hdfs:// or you should specify complete path hdfs://localhost:8020/input/housing.csv.
Here is a snippet of code that can read csv.
val df = spark.
read.
schema(dataSchema).
csv(s"/input/housing.csv")

how to use /dev/null for bash command

I am using below command to get result for my SQL query.
su - postgres -c 'psql -d dbname' with stdin "COPY ( my SQL query ) TO STDOUT WITH CSV HEADER"
This works fine on my server but on different machine it is printing bash warning with output of SQL query.
For example -
/etc/profile: line 46: HISTSIZE: readonly variable
/etc/profile: line 50: HISTCONTROL: readonly variable
/etc/profile.d/20-tmout.sh: line 1: TMOUT: readonly variable
/etc/profile.d/history.sh: line 6: hcmnt_tty: readonly variable
name
abc
Please let me know anyway so that I can skip above warning messages and only get data.
If I would like to use /dev/null in this case how to modify above command to get data only.

if what you mean is "how to discard only error output?", the way to go is to redirect the standard error stream to oblivion (/dev/null), like so:
your-command 2>/dev/null
that way, if the command outputs data to standard out, it passes through, but any output to the standard error socket is discarded, so you won't see these error messages.
by the way, 2 here is a shorthand file descriptor for the standard error.

Sorry this is untested, but I hit this same error, your db session isn't read/write. You can echo the statements to psql to force a proper session as follows. I'm unsure as to how stdin may be effected
echo 'SET TRANSACTION READ WRITE; SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE ; COPY ( my SQL query ) TO STDOUT WITH CSV HEADER' | su - postgres -c 'psql -d dbname' with stdin

caution - bash hack
su - postgres -c 'psql -d dbname' with stdin "COPY ( my SQL query ) TO STDOUT WITH CSV HEADER" | grep -v "readonly"

Writing JSON to an output file using tool sstable2json in Cassandra

I want to export the SSTables to JSON. So I am using sstable2json.bat. I am able to run this bat using command prompt and can see the JSON result printing on command prompt itself. I used the following command:
sstable2json H:/cassandra/db/data/191/191/191-191-hd-1-Data.db
I have to write this JSON content to an output file. For that I used the following command:
sstable2json -f H:/output.json H:/cassandra/db/data/191/191/191-191-hd-1-Data.db
But this command is showing me exception like:
You must supply exactly one sstable
Usage: org.apache.cassandra.tools.SSTableExport<sstable> [-k key [-k key [...]]
-x key [-x key [...]]]
Can any one correct my mistake if any. I am using Cassandra 1.1.2 version.

Just redirect stdout to a file. You can find the documentation for redirection here: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/redirection.mspx?mfr=true
For example:
sstable2json H:/cassandra/db/data/191/191/191-191-hd-1-Data.db>mysstable.json
The contents will then be in a file named mysstable.json.

PostgreSQL import file

In PostgreSQL and bash(linux), are there ways to directly import a file from the filesystem like
[execute.sh]
pgsql .... -f insert.txt
[insert.txt]
insert into table(id,file) values(1,import('/path/to/file'))
There seems to be no import function, bytea_import as well, lo_import would save int and I don't know how to get the file back (these files are in small sizes so using lo_import seems not appropriate)
And can how do I move the insert.txt statement to PostgreSQL?

I'm not sure what you're after, but if you have script with SQL-statements, for example the insert statements that you mention, you can run psql and then run the script from within psql. For example:
postgres#server:~$ psql dbname
psql (8.4.1)
Type "help" for help.
dbname=# \i /tmp/queries.sql
This will run the statements in /tmp/queries.sql.
Hope this was what you asked for.

In case of more detailed parameters:
$ psql -h [host] -p [port] -d [databaseName] -U [user] -f [/absolute/path/to/file]

The manual has some examples:
testdb=> \set content '''' `cat my_file.txt` ''''
testdb=> INSERT INTO my_table VALUES (:content);
See http://www.postgresql.org/docs/8.4/interactive/app-psql.html

restoring mysql db from the contents of split up mysqldump

Hi my database has started to go over 2GB in backed up size, so I'm looking at options for splitting the file and then reassembling it to restore the database.
I've got a series of files from doing the following backup shell file:
DATE_STRING=`date +%u%a`
BACKUP_DIR=/home/myhome/backups
/usr/local/mysql_versions/mysql-5.0.27/bin/mysqldump --defaults-file=/usr/local/mysql_versions/mysql-5.0.27/my.cnf
--user=myuser
--password=mypw
--add-drop-table
--single-transaction
mydb |
split -b 100000000 - rank-$DATE_STRING.sql-;
this prodes a sequence of files like:
mydb-3Wed.sql-aa
mydb-3Wed.sql-ab
mydb-3Wed.sql-ac
...
my question is what is the corresponding sequence of commands that I need to use for linux to do the restore?
Previously I was using this command:
/usr/local/mysql_versions/mysql-5.0.27/bin/mysql
--defaults-file=/usr/local/mysql_versions/mysql-5.0.27/my.cnf
--user=myuser
--password=mypw
-D mydb < the_old_big_dbdump.sql
Any suggestions even if they don't involve split / cat would be greatly appreciated

I don't see why you can't just do:
cat mydb-3Wed.sql-* | /usr/local/mysql_versions/mysql-5.0.27/bin/mysql --defaults-file=/usr/local/mysql_versions/mysql-5.0.27/my.cnf --user=myuser --password=mypw -D mydb
The * globbing should provide the files in the sorted order, check with ls mydb-3Wed.sql-* that they actually are though.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

export file in csv.gz format in cassandra - cassandra

How to export data from cassandra db to csv.gz file without using stdout and tool like dsbulk? Is there any way to export in gzip format like we do in postgres "copy tablename to program 'gzip > /path/to/file.csv.gz'"?

It's much the same with DSBulk. You just have to pipe the output to gzip. Here's an example: $ dsbulk unload -k ks_name -t table_name | gzip > export.csv.gz For more info, see Unloading data examples with DSBulk. Cheers!

Related

Read csv file from Hadoop using Spark

how to use /dev/null for bash command

Writing JSON to an output file using tool sstable2json in Cassandra

PostgreSQL import file

restoring mysql db from the contents of split up mysqldump

Categories

Resources