I am stuck in a very odd situation related to Hbase design i would say.
Hbase version >> Version 2.1.0-cdh6.2.1
So, the problem statement is, in Hbase, we have a row in our table.
We perform new insert and then subsequent updates of the same Hbase row, as we receive the data from downstream.
say we received data like below
INSERT of {a=1,b=1,c=1,d=1,rowkey='row1'}
UPDATE of {b=1,c=1,d=1,rowkey='row1'}
and
say the final row is like this in our Hbase table
hbase(main):008:0> get 'test', 'row1'
COLUMN CELL
cf:b timestamp=1288380727188, value=value1
cf:c timestamp=1288380727188, value=value1
cf:d timestamp=1288380727188, value=value1
1 row(s) in 0.0400 seconds
So, cf:a, column qualifier is missing in above data as visible above when fetched via scan or get commands. But as per our ingestion flow/process, it should have been there. So, we are triaging as to where it went or what happened and so on. Still the analysis is in process and we are kind of clueless as to where it is.
Now, cut story short, we have a spark util to read the Hbase table into a Rdd, via
hbasecontext.hbaseRdd API function, convert it into a dataframe and display the tabular data. So, we ran this spark util on the same table to help locate this row and very surprisingly it returned 2 rows for the this same rowkey 'row1', where 1st row was the same as above get/scan row (above data) and the 2nd row had our missing column cf:a (surprising it had the same value which was expected). Say the output dataframe appeared something like below.
rowkey |cf:a |cf:b|cf:c|cf:d
row1 |null | 1 | 1 | 1 >> cf:a col qualifier missing (same as in Hbase shell)
row1 | 1 | 1 | 1 | 1 >> This cf:a was expected
We checked our Hbase table schema as well, so we dont have multiple versions of the cf:a in the describe or we dont do versioning on the table. The schema of the Hbase table describe has
VERSIONS => '1'
Anyways, i am clueless as to how hbaseRdd is able to read that row or missing col qualifier, but the Hbase shell cmds via get, scans does not read the missing col qualifier or row.
Any Hbase expert or suggestions please.
Fyi, i tried Hbase shell cmds as well via get - versions on the row, but it only returns the above get data and not the missing cf:a.
Is the col qualifier cf:a marked for deletion or something like that, which the Hbase shell cmd doesn't show ?
Any help would be appreciated.
Thanks !!
This is a strange problem, which I suspect has to do with puts with the same rowkey having different column qualifiers at different times. However, I just tried to recreate this behaviour and I don't seem to be getting this problem. But I have a regular HBase 2.x build, as opposed to yours.
One option I would recommend to explore the problem more closely is to inspect the HFiles physically, outside of hbase shell. You can use the HBase HFile utility to print the physical key-value content at the HFile level. Obviously try to do this on a small HFile! Don't forget to flush and major-compact your table before you do it though, because HBase stores all updates in memory while it can.
You can launch the utility as below, and it will print all key-values sequentially:
hbase hfile -f hdfs://HDFS-NAMENODE:9000/hbase/data/default/test/29cfaecf083bff2f8aa2289c6a078678/f/09f569670678405a9262c8dfa7af8924 -p --printkv
In the above command, HDFS-NAMENODE is your HDFS server, default is your namespace (assuming you have none), test is your table name, and f is the column family name. You can find out the exact path to your HFiles by using the HDFS browse command recursively:
hdfs dfs -ls /hbase/data
[Updated] We worked with Cloudera and found the issue was due to the Hbase region servers getting overlapped. Cloudera fixed it for us. I dont have the full details how they did it.
create table if not exists map_table like position_map_view;
While using this it is giving me operation not allowed error
As pointed in documentation, you need to use CREATE TABLE AS, just use LIMIT 0 in SELECT:
create table map_table as select * from position_map_view limit 0;
I didn't find an easy way of getting CREATE TABLE LIKE to work, but I've got a workaround. On DBR in Databricks you should be able to use SHALLOW CLONE to do something similar:
%sql
CREATE OR REPLACE TABLE $new_database.$new_table
SHALLOW CLONE $original_database.$original_table`;
You'll need to replace $templates manually.
Notes:
This has an added side-effect of preserving the table content in case you need it.
Ironically, creating empty table is much harder and involves manipulating show create table statement with custom code
I'm using cassandra. I am trying to update the gc_grace value using new bind variable.
ALTER table keyspace.table_name with gc_grace_seconds = ? ;
I got the following error:
no viable alternative at input '?'
How can I solve this?
As I see from the source code (maybe I'm wrong), but ALTER TABLE doesn't support bindings, so you can't use them for this command (and all DDL commands), and need to just use execute with specific value
It looks like you're trying to bind parameters programatically to set GC grace on a table. It isn't possible to do that using the Cassandra drivers.
It will only work through cqlsh. For example:
cqlsh> ALTER TABLE community.maptbl WITH gc_grace_seconds = 3600;
It doesn't make sense to do it in your app and it is not recommended. Cheers!
CREATE TYPE IF NOT EXISTS myks.profiles (
"field" text
);
It gives me the below exception when I try to create a UDT with name profiles com.datastax.driver.core.exceptions.SyntaxError: line 1:17 no viable alternative at input 'profiles' (CREATE TYPE myks.[profiles]...)
Update: it looks like a bug. I suggest to use word profile instead...
Original answer:
Keyspaces in Cassandra are created with CREATE KEYSPACE command, and you're trying to create a new user-defined type instead. This error is returned because the keyspace myks is not yet defined.
In your case full command will be:
CREATE KEYSPACE IF NOT EXISTS profiles WITH replication =
{'class': 'NetworkTopologyStrategy', 'your_dc': 'rep_factor'};
you need to substitute the name of your datacenter(s) instead of your_dc, and adjust rep_factor to match number of nodes.
But for beginning I suggest to watch at least DS201 course on DatStax Academy - it should give your overview of basic operations, etc.
$describe = new Cassandra\SimpleStatement(<<<EOD
describe keyspace.tablename
EOD
);
$session->execute($describe);
i used above code but it is not working.
how can i fetch field name and it's data type from Cassandra table ?
Refer to CQL documentation. Describe expects a table/schema/keyspace.
describe table keyspace.tablename
Its also a cqlsh command, not an actual cql command. To get this information query the system tables. try
select * from system.schema_columns;
- or for more recent versions -
select * from system_schema.columns ;
if using php driver may want to check out http://datastax.github.io/php-driver/features/#schema-metadata
Try desc table keyspace.tablename;