pig cql not working with greater/lesser than condition - cassandra

I am using pig 0.11.1 with cassandra 1.2.12, when i write a pig cql like
PK for "employees" is ((date,id),joindatems)
data = LOAD
'cql://company/employees?where_clause=+joindatems+%3C+1388769238536746+and+joindatems+%3E+1388768338536746'
using CqlStorage();
Its joindatems is datetime in milliseconds, here am having a condition joindatems < 1388769238536746 and joindatems > 1388768338536746
when i run this am getting
Caused by: InvalidRequestException(why:joindatems cannot be restricted by both an equal and an inequal relation)
but this this kind of query(joindatems < 1388769238536746 and joindatems > 1388768338536746) is supported in cassandra cql, am i doing anything wrong in pig script.
Thanks in advance

Related

User does not have privileges for ALTERTABLE_ADDCOLS while using spark.sql to read the data

Select query in spark.sql is resulting in the following error:
User *username* does not have privileges for ALTERTABLE_ADDCOLS
Spark version - 2.1.0
Trying to execute the following query:
dig = spark.sql("""select col1, col2 from dbname.tablename""")
It's caused by the spark.sql.hive.caseSensitiveInferenceMode propertie.
By default, spark tries to infer the table's schema and then change its properties.
To avoid these messages you can alter the default configuration to INFER_ONLY.
Considering a spark session named spark, the code below should work:
spark.conf.set("spark.sql.hive.caseSensitiveInferenceMode", "INFER_ONLY")

How can i extract values from cassandra output using python?

I'm trying to connect cassandra database through python using cassandra driver .And it went successful with out any problem . When i tried to fetch the values from cassandra ,it has some formatted output like Row(values) .
python version 3.6
package : cassandra
from cassandra.cluster import Cluster
cluster = Cluster()
session = cluster.connect('employee')
k=session.execute("select count(*) from users")
print(k[0])
Output :
Row(count=11)
Expected :
11
From documentation:
By default, each row in the result set will be a named tuple. Each row will have a matching attribute for each column defined in the schema, such as name, age, and so on. You can also treat them as normal tuples by unpacking them or accessing fields by position.
So you can access your data by name as k[0].count, or by position as rows[0][0]
Please read Getting started document from driver's documentation - it will answer most of your questions.
Cassandra reply everything using something called row factory, which by default is a named tuple.
In your case, to access the output you should access k[0].count.

Existing column can't be found by DataFrame#filter in PySpark

I am using PySpark to perform SparkSQL on my Hive tables.
records = sqlContext.sql("SELECT * FROM my_table")
which retrieves the contents of the table.
When I use the filter argument as a string, it works okay:
records.filter("field_i = 3")
However, when I try to use the filter method, as documented here
records.filter(records.field_i == 3)
I am encountering this error
py4j.protocol.Py4JJavaError: An error occurred while calling o19.filter.
: org.apache.spark.sql.AnalysisException: resolved attributes field_i missing from field_1,field_2,...,field_i,...field_n
eventhough this field_i column clearly exists in the DataFrame object.
I prefer to use the second way because I need to use Python functions to perform record and field manipulations.
I am using Spark 1.3.0 in Cloudera Quickstart CDH-5.4.0 and Python 2.6.
From Spark DataFrame documentation
In Python it’s possible to access a DataFrame’s columns either by attribute (df.age) or by indexing (df['age']). While the former is convenient for interactive data exploration, users are highly encouraged to use the latter form, which is future proof and won’t break with column names that are also attributes on the DataFrame class.
It seems that the name of your field can be a reserved word, try with:
records.filter(records['field_i'] == 3)
What I did was to upgrade my Spark from 1.3.0 to 1.4.0 in Cloudera Quick Start CDH-5.4.0 and the second filtering feature works. Although I still can't explain why 1.3.0 has problems on that.

Not able to load simple table from Cassandra using Pig

I am trying to load simple table created in Cassandra sing cql command. But load is failing when I try to dump .my pig script looks like this.
A = LOAD 'cql://pigtest/myusers' USING CqlStorage()
AS (user_id:int,fname:chararray,lname:chararray);
describe A;
DUMP A;
My users table schema looks like
CREATE TABLE users (
user_id int ( primary key),
fnam text,
lname text
)
I am getting following exception ( I tried with Cassandra 2.0.9 and 2.1.0, and pig 0.13). Please help us with root cause/
ERROR 1002: Unable to store alias A
Caused by: InvalidRequestException(why:Expected 8 or 0 byte long (7))
at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:54918)
at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultStandardScheme.read(Cassandra.java:54895)
at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result.read(Cassandra.java:54810)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_prepared_cql3_query(Cassandra.java:1861)
at org.apache.cassandra.thrift.Cassandra$Client.execute_prepared_cql3_query(Cassandra.java:1846)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:635)
... 28 more
Verify that the partitioner is the same on the server and client.
Murmur3Partitioner vs RandomPartitioner
> cqlsh -e "describe cluster" | head
Cluster: Test Cluster
Partitioner: Murmur3Partitioner
-
Pig Script
set cassandra.input.partitioner.class org.apache.cassandra.dht.Murmur3Partitioner;
set cassandra.output.partitioner.class org.apache.cassandra.dht.Murmur3Partitioner;

How to insert data in cassandra using Pig

I am trying to copy data from a file in HDFS to a table in Cassandra using Pig. But the job fails with null pointer exception while storing the data in Cassandra. Can someone help me with this?
Users table structure:
CREATE TABLE users (
user_id text PRIMARY KEY,
age int,
first text,
last text
)
My pig script
A = load '/user/hduser/user.txt' using PigStorage(',') as (id:chararray,age:int,fname:chararray,lname:chararray);
C = foreach A GENERATE TOTUPLE(TOTUPLE('user_id',id)), TOTUPLE('age',age),TOTUPLE('first',fname),TOTUPLE('last',lname);
STORE C into 'cql://ram_keyspace/users' USING CqlStorage();
Exception:
java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.cassandra.hadoop.cql3.CqlRecordWriter.(CqlRecordWriter.java:123)
at org.apache.cassandra.hadoop.cql3.CqlRecordWriter.(CqlRecordWriter.java:90)
at org.apache.cassandra.hadoop.cql3.CqlOutputFormat.getRecordWriter(CqlOutputFormat.java:76)
at org.apache.cassandra.hadoop.cql3.CqlOutputFormat.getRecordWriter(CqlOutputFormat.java:57)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:84)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.(MapTask.java:627)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:753)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.NullPointerException
at org.apache.cassandra.hadoop.cql3.CqlRecordWriter.(CqlRecordWriter.java:109)
... 12 more
Can someone who has used Pig with Cassandra help me fix this?
You are using CqlStorage which requires you to specify the output_query which is a prepared statement that will be used to insert the data into the column family. The DSE pig documentation provides an example:
grunt> STORE insertformat INTO
'cql://cql3ks/simple_table1?output_query=UPDATE+cql3ks.simple_table1+set+b+%3D+%3F'
USING CqlStorage;

Resources