Cassandra : Update gc_grace_seconds for keyspace - cassandra

Is it possible to update gc_grace_seconds for all the tables in a keyspace? Or should it had to be done per folder

There is no built-in command to change table options for all tables in the keyspace, but it's easy to implement using the bash + cqlsh. Something like this (replace keyspace_name and new_value with actual parameters):
cqlsh -e 'DESCRIBE FULL SCHEMA;'|grep -e '^CREATE TABLE keyspace_name'|\
sed -e 's|^CREATE TABLE \(.*\) (|ALTER TABLE \1 WITH gc_grace_seconds = new_value; |'|\
tee schema-changes.cql
cqlsh -f schema-changes.cql

Related

Is it possible to suppress the column header in the cqlsh output?

I want one of my several SELECT statements to not print the column headers, just the selected records. Is this possible in Cassandra 3.0?
I tried the below but it returns the column name:
cqlsh -e "select count(1) from system_schema.keyspaces where keyspace_name='test'";
count
-------
1
MySQL has options like -s -N to suppress the same.
There isn't a built-in option in cqlsh that would allow you to suppress the output header from CQL SELECT.
Your best option is to use shell scripting to parse the output. There are several Linux utilities available you can use depending on the outcome you're after. Here are just some examples in a long list of possibilities:
EXAMPLE 1 - To print the first row of results (4th line of the cqlsh output), you can use the awk utility:
$ cqlsh -e "SELECT ... FROM ..." | awk 'NR==4'
EXAMPLE 2 - The sed utility equivalent is:
$ cqlsh -e "SELECT ... FROM ..." | sed -n '4p'
EXAMPLE 3 - If you want to print all the rows, not just the first (assuming your query returns multiple rows):
$ cqlsh -e "SELECT ... FROM ..." | tail -n +4 | head -n -2
The tail -n +4 will print all lines from the 4th onwards and head -n -2 will strip out the last 2 lines (blank line + (# rows) at the end). Cheers!
Try this option as workaround:
# cqlsh -e "select count(1) from system_schema.keyspaces where keyspace_name='test'" | tail -n +4
0
(1 rows)
#Dexter, for selecting records, why can't you simply leverage SELECT * FROM system_schema.keyspaces where keyspace_name='test';?
What are you trying to achieve here, i.e. the end result?
If you simply want to count the number of records, you could simply leverage DataStax Bulk Loader to perform the count operation.
References:
https://www.datastax.com/blog/datastax-bulk-loader-counting
https://docs.datastax.com/en/dsbulk/docs/dsbulkAbout.html
./dsbulk count -k system_schema -t keyspaces
Alternatively, you could leverage the dsbulk unload -query <...> to selectively unload records based on the query that you pass in.

Shell script to pull row counts from all Hive tables in multiple Hive databases

I am trying to create a shell script that will pull row counts in all tables from multiple databases. All of the databases follow the same naming convention "the_same_databasename_<%>" except the final layer in the name, which varies. I am trying to run the following:
use <database_name>;
show tables;
select count(*) from <table_name>;
Since I have 40 different databases, I would need to run the first two queries for each database 40 different times, plus the select count query even more depending on how many table in the database (very time consuming). I have my PuTTy configuration settings set to save my PuTTy sessions into a .txt on my local directory, so I can have the row count results displayed right in my command line interface. So far this is what I have but not sure how to include the final commands to get the actual row counts from the tables in each database.
#!/bin/bash
for db in $(hive -e "show databases like 'the_same_databasename_*;")
do
tbl_count=$(hive -S -e "use $db; show tables;" | wc -l)
echo "Database $db contains $tbl_count tables."
done
I'm not very experienced in shell scripting so any guidance/help is greatly appreciated. Thanks in advance.
You can use nested for-loop:
#!/bin/bash
for db in $(hive -e "show databases like 'the_same_databasename_*;")
do
tbl_count=$(hive -S -e "use $db; show tables;" | wc -l)
echo "Database $db contains $tbl_count tables."
for table in $(hive -S -e "use $db; show tables;")
do
count=$(hive -S -e "use $db; select count(*) from $table;")
echo "Table $db.$table contains $count rows."
done
done
Or you can use variable to increment count of tables
#!/bin/bash
for db in $(hive -e "show databases like 'the_same_databasename_*;")
do
tbl_count=0
for table in $(hive -S -e "use $db; show tables;")
do
(( tbl_count++ ))
count=$(hive -S -e "use $db; select count(*) from $table;")
echo "Table $db.$table contains $count rows."
done
echo "Database $db contains $tbl_count tables."
done

How to get comma/tab separated output of HDFS Quota?

I am using below command to retrieve HDFS quota but I dont want the fancy output. Instead I need this output to be stored in a comma or tab separated format. By default it is not a tab separated.. Can anyone suggest this?
Command:
hdfs dfs -count -q -h -v /path/to/directory
Output is like this:
none inf 250 G 114.9 G 518 2.8 K 45.0 G /new/directory/X
Expected Output:
none,inf,250 G,114.9 G,518,2.8 K,45.0 G,/new/directory/X
How about using sed. They key thing is to identify a unique string to identify the separator in the hdfs output. That could be tab since you said they are tab separated. But, the sample output you posted used spaces.
Once you decide on a unique string use sed to search for that unique string and replace it with a comma. It looks like two or more spaces are unique to field separation in the hdfs output in all cases but the start of the line and the path. Perhaps you can accept a leading comma and do a second pass of sed for the path.
This Stack Overflow question covers sed replacing consecutive spaces.
hdfs dfs -count -q -h -v /path/to/directory | sed -e "s/[[:space:]]\{2,\}/,/g" | sed -e "s/[[:space:]]\//,\//g"
The solution is even simpler if they are tabs.
hdfs | sed -e $'s/\t/,/g'

Add multiple and various content from bash parameter to a variable set into the sql script called

I have a SQL script and a sh executable to run a script doing some operations on my database.
My principal problem is I'm searching how I could do the following thing:
Send an array of parameter from my bash parameters when launching the script, my actual command is:
./myscript.sh databaseName user thirdParameterToPassAsAString
'fourthParameterElement1','fourthParameterElement2','fourthParameterElement3'
the content of my script:
#!/bin/bash
set -e
psql $1 $2 <<EOF
set search_path = search_path;
set firstParameterusedinSQLScript = $3;
set Param_List = $4;
\i my_script.sql
EOF
and the sql part where I have the problem:
where ae.example in (:Param_List)
I have of course some issues with this where clause.
So the question is how could i do this?
Have you considered changeing the sql itself (not changeing the original sql file that contains it) before executing it (replaceing the parameter via sed).
If that is an option for you, you could define a helpber function like
function prepare_script() {
cat <<EOF
set search_path = search_path;
EOF
sed -e"s|:Param_List|$3|g" -e"s|firstParameterusedinSQLScript|$2|g" Requetes_retour_arriere_fiab_x_siret.sql
}
You could then call it like:
prepare_script "$1" "$2" "$3" | psql $1 $2
Note, that you do not change the file on disk itself, you just read it using set and have it output the altered sql on stdout and pipe it to psql.

Pipe envsubst output to hive

Working with Hive 0.13.0, I would like to evaluate variables against a template and then immediately execute the resulting Hive code (avoiding a temporary intermediate file is preferable).
Here is a (non-working) example of what I'd like to do:
template.hql
SELECT COUNT(*) FROM ${TABLE};
In the shell:
export TABLE=DEFAULT.FOOTABLE
envsubst < template.hql | hive
Is there a particular reason this does not work, and is there a proper way to achieve it?
The substitution works as expected:
$ cat template.hql
SELECT COUNT(*) FROM ${TABLE};
$ export TABLE=DEFAULT.FOOTABLE
$ envsubst < template.hql
SELECT COUNT(*) FROM DEFAULT.FOOTABLE;
So I suspect hive does not read queries from the standard in. I see from an online manual that it supports the -f parameter, so you can create the file manually:
TMPFILE=$(mktemp)
envsubst < template.hql > $TMPFILE
hive -f $TMPFILE
rm $TMPFILE
If you're on a newish version of bash, you can avoid an intermediate file:
hive -f <( envsubst < template.hql )
I'm not sure, but also check if hive -f - might read from stdin.

Resources