JSR223 Sampler return Response same as Request for EDIFACT message in JMeter - performance-testing

JSR223 Sampler return Response same as Request for EDIFACT message.
Request:
def payload = "UNB+IATA:1+1S+XX+121103+FF168019110033++ETK1+O'\n" +
"UNH+1+TKCREQ:00:1:IA'\n" +
"MSG+:131'\n" +
"ORG+1S+99999999:X7HH+VZX++T+GR+CXN'\n" +
"TKT+676713121:T'\n" +
"UNT+5+1'\n" +
"UNZ+1+FF168019110033
Response:
"UNB+IATA:1+1S+XX+121103+FF168019110033++ETK1+O'\n" +
"UNH+1+TKCREQ:00:1:IA'\n" +
"MSG+:131'\n" +
"ORG+1S+99999999:X7HH+VZX++T+GR+CXN'\n" +
"TKT+676713121:T'\n" +
"UNT+5+1'\n" +
"UNZ+1+FF168019110033
Log:
2023-02-07 15:33:39,890 DEBUG o.a.j.p.t.s.TCPSampler: Created org.apache.jmeter.protocol.tcp.sampler.TCPSampler#4a6c4b41
2023-02-07 15:33:39,912 DEBUG o.a.j.p.t.s.TCPSampler: Created org.apache.jmeter.protocol.tcp.sampler.TCPSampler#7bd45f7b
2023-02-07 15:33:39,939 INFO o.a.j.e.StandardJMeterEngine: Running the test!
2023-02-07 15:33:39,939 INFO o.a.j.s.SampleEvent: List of sample_variables: []
2023-02-07 15:33:39,942 INFO o.a.j.g.u.JMeterMenuBar: setRunning(true, *local*)
2023-02-07 15:33:40,147 INFO o.a.j.e.StandardJMeterEngine: Starting ThreadGroup: 1 : Thread Group
2023-02-07 15:33:40,147 INFO o.a.j.e.StandardJMeterEngine: Starting 1 threads for group Thread Group.
2023-02-07 15:33:40,147 INFO o.a.j.e.StandardJMeterEngine: Thread will start next loop on error
2023-02-07 15:33:40,147 INFO o.a.j.t.ThreadGroup: Starting thread group... number=1 threads=1 ramp-up=1 perThread=1000.0 delayedStart=false
2023-02-07 15:33:40,150 INFO o.a.j.t.ThreadGroup: Started thread group number 1
2023-02-07 15:33:40,150 INFO o.a.j.e.StandardJMeterEngine: All thread groups have been started
2023-02-07 15:33:40,150 INFO o.a.j.t.JMeterThread: Thread started: Thread Group 1-1
2023-02-07 15:33:40,162 INFO o.a.j.t.JMeterThread: Thread is done: Thread Group 1-1
2023-02-07 15:33:40,162 INFO o.a.j.t.JMeterThread: Thread finished: Thread Group 1-1
2023-02-07 15:33:40,162 INFO o.a.j.e.StandardJMeterEngine: Notifying test listeners of end of test
2023-02-07 15:33:40,162 INFO o.a.j.g.u.JMeterMenuBar: setRunning(false, *local*)
SampleResult fields:
ContentType:
DataEncoding: windows-1252
Steps followed:
Set up TCP in JMeter properties:
tcp.handler=TCPClientImpl
eolByte = 111
tcp.eolByte=1000
tcp.charset=
tcp.status.prefix=Status=
tcp.status.suffix=.
tcp.binarylength.prefix.length=2
TCP Sampler Config
TCPClient classname=TCPClientImpl
Servername=xxxxxx
Port: 3432
Timeouts: Connect(2000ms,)Response: 2000ms
Reuse Connection -enabled
JSR223 Sampler
Payload Request :
def payload = "UNB+IATA:1+1S+XX+121103+FF168019110033++ETK1+O'\n" +
"UNH+1+TKCREQ:00:1:IA'\n" +
"MSG+:131'\n" +
"ORG+1S+99999999:X7HH+VZX++T+GR+CXN'\n" +
"TKT+676713121:T'\n" +
"UNT+5+1'\n" +
"UNZ+1+FF168019110033'

If this code:
def payload = "UNB+IATA:1+1S+XX+121103+FF168019110033++ETK1+O'\n" +
"UNH+1+TKCREQ:00:1:IA'\n" +
"MSG+:131'\n" +
"ORG+1S+99999999:X7HH+VZX++T+GR+CXN'\n" +
"TKT+676713121:T'\n" +
"UNT+5+1'\n" +
"UNZ+1+FF168019110033"
is the complete code for the JSR223 Sampler - it does absolutely nothing.
If you want to send it over TCP you need to put the payload into "Text to send" field of the TCP Sampler:
If you want to send the "payload" using JSR223 Sampler you need to add the relevant code to do it, the simplest one is:
def payload = "UNB+IATA:1+1S+XX+121103+FF168019110033++ETK1+O'\n" +
"UNH+1+TKCREQ:00:1:IA'\n" +
"MSG+:131'\n" +
"ORG+1S+99999999:X7HH+VZX++T+GR+CXN'\n" +
"TKT+676713121:T'\n" +
"UNT+5+1'\n" +
"UNZ+1+FF168019110033"
def socket = new Socket('xxxxxx', 3432)
socket.withStreams { input, output ->
log.info(input.newReader().readLine())
output << payload
}
The output will go to jmeter.log file.
More information on Groovy scripting in JMeter: Apache Groovy: What Is Groovy Used For?

Sending Request in Jmeter as below:
import org.apache.jmeter.util.JsseSSLManager;
import org.apache.jmeter.util.SSLManager;
import javax.net.ssl.SSLSocket;
import javax.net.ssl.SSLSocketFactory;
int your_app_port = 7XX0;
String your_app_host = 'XXXXXXX';
JsseSSLManager sslManager = (JsseSSLManager) SSLManager.getInstance();
SSLSocketFactory sslsocketfactory = sslManager.getContext().getSocketFactory();
SSLSocket sslsocket = (SSLSocket) sslsocketfactory.createSocket('XXXXXXX', 7XX0)
def sessInboundQueue = System.getProperties().get("SessionInbound")
def destinationInboundQueue = System.getProperties().get("DestinationInbound")
def payload = """UNB+IATA:1+1S+1D+201112:0204+FF168019119008++ETK1+O'
UNH+1+TKCREQ:00:1:IA'
MSG+:131'
ORG+1S+99999909:X7HG+ATH++T+GR+ABK'
TKT+9992170003108:T'
UNT+5+1'
UNZ+1+FF168019119008'"""
def msg = sessInboundQueue.createTextMessage(payload)
Response:
Response code:500
Response message:javax.script.ScriptException: java.lang.NullPointerException: Cannot invoke method createTextMessage() on null object
Can some one help me?
#java #jmeter

Related

Use RabbitMQ as procedure and Celery as consumer

I'm trying to use RabbitMQ, Celery, and Flask app to simply update the database. ProcedureAPI.py is an API that gets the data, inserts records in the database, and pushes data to the Radbbitmq server. Celery gets the data from Rabbit Queue and updates the database.
I'm new to this, please point out what I'm doing wrong.
consumer.py
from celery import Celery
import sqlite3
import time
#app = Celery('Task_Queue')
#default_config = 'celeryconfig'
#app.config_from_object(default_config)
app = Celery('tasks', backend='rpc://', broker='pyamqp://guest:guest#localhost')
#app.task(serializer='json')
def updateDB(x):
x=x["item"]
with sqlite3.connect("test.db") as conn:
time.sleep(5)
conn.execute('''updateQuery''', [x])
# app.log(f"{x['item']} status is updated as completed!")
return x
ProcedureAPI.py
from flask import Flask,request,jsonify
import pandas as pd
import sqlite3
import json
import pika
import configparser
parser = configparser.RawConfigParser()
configFilePath = 'appconfig.conf'
parser.read(configFilePath)
# RabbitMQ Config
rmq_username = parser.get('general', 'rmq_USERNAME')
rmq_password = parser.get('general', 'rmq_PASSWORD')
host= parser.get('general', 'rmq_IP')
port= parser.get('general', 'rmq_PORT')
# Database
DATABASE= parser.get('general', 'DATABASE_FILE')
app = Flask(__name__)
conn_credentials = pika.PlainCredentials(rmq_username, rmq_password)
connection = pika.BlockingConnection(pika.ConnectionParameters(
host=host,
port=port,
credentials=conn_credentials))
channel = connection.channel()
#app.route('/create', methods=['POST'])
def create_main():
if request.method=="POST":
print(DATABASE)
with sqlite3.connect(DATABASE) as conn:
conn.execute('''CREATE TABLE table1
(feild1 INTEGER PRIMARY KEY, ##AUTOINCREMENT
feild2 varchar(20) NOT NULL,
feild3 varchar(20) DEFAULT 'pending');''')
return "Table created",202
#app.route('/getData', methods=['GET'])
def display_main():
if request.method=="GET":
with sqlite3.connect(DATABASE) as conn:
df = pd.read_sql_query("SELECT * from table1", conn)
df_list = df.values.tolist()
JSONP_data = jsonify(df_list)
return JSONP_data,200
#app.route('/', methods=['POST'])
def update_main():
if request.method=="POST":
updatedata=request.get_json()
with sqlite3.connect(DATABASE) as conn:
conn.execute("INSERT_Query")
print("Records Inserted successfully")
channel.queue_declare(queue='celery', durable=True)
channel.basic_publish(exchange = 'celery',routing_key ='celery' ,body = json.dumps(updatedata),properties=pika.BasicProperties(delivery_mode = 2))
return updatedata,202
# main driver function
if __name__ == '__main__':
app.run()
configfile
[general]
# RabbitMQ server (broker) IP address
rmq_IP=127.0.0.1
# RabbitMQ server (broker) TCP port number (5672 or 5671 for SSL)
rmq_PORT=5672
# queue name (storage node hostname)
rmq_QUEUENAME=Task_Queue
# RabbitMQ authentication
rmq_USERNAME=guest
rmq_PASSWORD=guest
DATABASE_FILE=test.db
# log file
receiver_LOG_FILE=cmdmq_receiver.log
sender_LOG_FILE=cmdmq_sender.log
run celery
celery -A consumer worker --pool=solo -l info
The error I got:
(env1) PS C:\Users\USER\Desktop\Desktop\Jobs Search\nodepython\flaskapp> celery -A consumer worker --pool=solo -l info
-------------- celery#DESKTOP-FRBNH77 v5.2.0 (dawn-chorus)
--- ***** -----
-- ******* ---- Windows-10-10.0.19041-SP0 2021-11-12 17:35:04
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: tasks:0x1ec10c9c5c8
- ** ---------- .> transport: amqp://guest:**#localhost:5672//
- ** ---------- .> results: rpc://
- *** --- * --- .> concurrency: 12 (solo)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> celery exchange=celery(direct) key=celery
[tasks]
. consumer.updateDB
[2021-11-12 17:35:04,546: INFO/MainProcess] Connected to amqp://guest:**#127.0.0.1:5672//
[2021-11-12 17:35:04,571: INFO/MainProcess] mingle: searching for neighbors
[2021-11-12 17:35:05,594: INFO/MainProcess] mingle: all alone
[2021-11-12 17:35:05,605: INFO/MainProcess] celery#DESKTOP-FRBNH77 ready.
[2021-11-12 17:35:14,952: WARNING/MainProcess] Received and deleted unknown message. Wrong destination?!?
The full contents of the message body was: body: '{"item": "1BOOK"}' (17b)
{content_type:None content_encoding:None
delivery_info:{'consumer_tag': 'None4', 'delivery_tag': 1, 'redelivered': False, 'exchange':
'celery', 'routing_key': 'celery'} headers={}}
Any reference code or suggestion will be a great help.
Looks like you haven’t declare the exchange and bind into queue that you want to route
channel.exchange_declare(exchange='exchange_name', exchange_type="type_of_exchange")
channel.queue_bind(exchange='exchange_name, queue='your queue_name')
Producer : P
Exchange : E
Queue : Q
Bind : B
Producer(your pika script) does not able to send message directly Producer needs some intermediate to send to Queue, therefore message route from
P >> E >> B >> Q
Exchange route the request to one or multiple Queue depending upon exchange type
Bind (As name explain) it use to bind the exchanges to Queue depending upon exchange type
for more detail please refer ::
https://hevodata.com/learn/rabbitmq-exchange-type/

Custom synchronization using java.util.concurrent with cats.effect

I have a requirements of pretty custom non-trivial synchronization which can be implemented with a fair ReentrantLock and Phaser. It does not seem to be possible (without a non-trivial customization) to implement on fs2 and cats.effect.
Since it's required to wrap all blocking operation into a Blocker here is the code:
private val l: ReentrantLock = new ReentrantLock(true)
private val c: Condition = l.newCondition
private val b: Blocker = //...
//F is declared on the class level
def lockedMutex(conditionPredicate: Int => Boolean): F[Unit] = blocker.blockOn {
Sync[F].delay(l.lock()).bracket(_ => Sync[F].delay{
while(!conditionPredicate(2)){
c.await()
}
})(_ => Sync[F].delay(l.unlock()))
}
QUESTION:
Is it guaranteed that the code containing c.await() will be executed in the same Thread which acquires/releases the ReentrantLock?
This is a crucial part since if it's not IllegalMonitorStateException will be thrown.
You really do not need to worry about threads when using something like cats-effect, rather you can describe your problem on a higher level.
This should get the same behavior you want, it will be running high-priority jobs until there isn't more to then pick low-priority jobs. After finishing a low-priority job each fiber will first check if there are more high-priority jobs before trying to pick again a low-priority one:
import cats.effect.Async
import cats.effect.std.Queue
import cats.effect.syntax.all._
import cats.syntax.all._
import scala.concurrent.ExecutionContext
object HighLowPriorityRunner {
final case class Config[F[_]](
highPriorityJobs: Queue[F, F[Unit]],
lowPriorityJobs: Queue[F, F[Unit]],
customEC: Option[ExecutionContext]
)
def apply[F[_]](config: Config[F])
(implicit F: Async[F]): F[Unit] = {
val processOneJob =
config.highPriorityJobs.tryTake.flatMap {
case Some(hpJob) => hpJob
case None => config.lowPriorityJobs.tryTake.flatMap {
case Some(lpJob) => lpJob
case None => F.unit
}
}
val loop: F[Unit] = processOneJob.start.foreverM
config.customEC.fold(ifEmpty = loop)(ec => loop.evalOn(ec))
}
}
You can use the customEC to provide your own ExecutionContext to control the number of real threads that are running your fibers under the hood.
The code can be used like this:
import cats.effect.{Async, IO, IOApp, Resource}
import cats.effect.std.Queue
import cats.effect.syntax.all._
import cats.syntax.all._
import java.util.concurrent.Executors
import scala.concurrent.ExecutionContext
import scala.concurrent.duration._
object Main extends IOApp.Simple {
override final val run: IO[Unit] =
Resource.make(IO(Executors.newFixedThreadPool(2)))(ec => IO.blocking(ec.shutdown())).use { ec =>
Program[IO](ExecutionContext.fromExecutor(ec))
}
}
object Program {
private def createJob[F[_]](id: Int)(implicit F: Async[F]): F[Unit] =
F.delay(println(s"Starting job ${id} on thread ${Thread.currentThread.getName}")) *>
F.delay(Thread.sleep(1.second.toMillis)) *> // Blocks the Fiber! - Only for testing, use F.sleep on real code.
F.delay(println(s"Finished job ${id}!"))
def apply[F[_]](customEC: ExecutionContext)(implicit F: Async[F]): F[Unit] = for {
highPriorityJobs <- Queue.unbounded[F, F[Unit]]
lowPriorityJobs <- Queue.unbounded[F, F[Unit]]
runnerFiber <- HighLowPriorityRunner(HighLowPriorityRunner.Config(
highPriorityJobs,
lowPriorityJobs,
Some(customEC)
)).start
_ <- List.range(0, 10).traverse_(id => highPriorityJobs.offer(createJob(id)))
_ <- List.range(10, 15).traverse_(id => lowPriorityJobs.offer(createJob(id)))
_ <- F.sleep(5.seconds)
_ <- List.range(15, 20).traverse_(id => highPriorityJobs.offer(createJob(id)))
_ <- runnerFiber.join.void
} yield ()
}
Which should produce an output like this:
Starting job 0 on thread pool-1-thread-1
Starting job 1 on thread pool-1-thread-2
Finished job 0!
Finished job 1!
Starting job 2 on thread pool-1-thread-1
Starting job 3 on thread pool-1-thread-2
Finished job 2!
Finished job 3!
Starting job 4 on thread pool-1-thread-1
Starting job 5 on thread pool-1-thread-2
Finished job 4!
Finished job 5!
Starting job 6 on thread pool-1-thread-1
Starting job 7 on thread pool-1-thread-2
Finished job 6!
Finished job 7!
Starting job 8 on thread pool-1-thread-1
Starting job 9 on thread pool-1-thread-2
Finished job 8!
Finished job 9!
Starting job 10 on thread pool-1-thread-1
Starting job 11 on thread pool-1-thread-2
Finished job 10!
Finished job 11!
Starting job 15 on thread pool-1-thread-1
Starting job 16 on thread pool-1-thread-2
Finished job 15!
Finished job 16!
Starting job 17 on thread pool-1-thread-1
Starting job 18 on thread pool-1-thread-2
Finished job 17!
Finished job 18!
Starting job 19 on thread pool-1-thread-1
Starting job 12 on thread pool-1-thread-2
Finished job 19!
Starting job 13 on thread pool-1-thread-1
Finished job 12!
Starting job 14 on thread pool-1-thread-2
Finished job 13!
Finished job 14!
Thanks to Gavin Bisesi (#Daenyth) for refining my original idea into this!
Full code available here.

Task scheduling in airflow is not working

I am scheduling dag in airflow for 10 minutes is not doing anything.
here is my dags code:
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date':datetime.now(),
'email': ['airflow#airflow.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
dag = DAG('Python_call', default_args=default_args, schedule_interval= '*/10 * * * *')
t1 = BashOperator(
task_id='testairflow',
bash_command='python /var/www/projects/python_airflow/airpy/hello.py',
dag=dag)
and the scheduler log looks like this:
[2018-01-05 14:05:08,536] {jobs.py:351} DagFileProcessor484 INFO - Processing /var/www/projects/python_airflow/airpy/airflow_home/dags/scheduler.py took 2.278 seconds
[2018-01-05 14:05:09,712] {jobs.py:343} DagFileProcessor485 INFO - Started process (PID=29795) to work on /var/www/projects/python_airflow/airpy/airflow_home/dags/scheduler.py
[2018-01-05 14:05:09,715] {jobs.py:534} DagFileProcessor485 ERROR - Cannot use more than 1 thread when using sqlite. Setting max_threads to 1
[2018-01-05 14:05:09,717] {jobs.py:1521} DagFileProcessor485 INFO - Processing file /var/www/projects/python_airflow/airpy/airflow_home/dags/scheduler.py for tasks to queue
[2018-01-05 14:05:09,717] {models.py:167} DagFileProcessor485 INFO - Filling up the DagBag from /var/www/projects/python_airflow/airpy/airflow_home/dags/scheduler.py
[2018-01-05 14:05:10,057] {jobs.py:1535} DagFileProcessor485 INFO - DAG(s) dict_keys(['example_passing_params_via_test_command', 'latest_only_with_trigger', 'example_branch_operator', 'example_subdag_operator', 'latest_only', 'example_skip_dag', 'example_subdag_operator.section-1', 'example_subdag_operator.section-2', 'tutorial', 'example_http_operator', 'example_trigger_controller_dag', 'example_bash_operator', 'example_python_operator', 'test_utils', 'Python_call', 'example_trigger_target_dag', 'example_xcom', 'example_short_circuit_operator', 'example_branch_dop_operator_v3']) retrieved from /var/www/projects/python_airflow/airpy/airflow_home/dags/scheduler.py
[2018-01-05 14:05:12,039] {jobs.py:1169} DagFileProcessor485 INFO - Processing Python_call
[2018-01-05 14:05:12,048] {jobs.py:566} DagFileProcessor485 INFO - Skipping SLA check for <DAG: Python_call> because no tasks in DAG have SLAs
[2018-01-05 14:05:12,060] {models.py:322} DagFileProcessor485 INFO - Finding 'running' jobs without a recent heartbeat
[2018-01-05 14:05:12,061] {models.py:328} DagFileProcessor485 INFO - Failing jobs without heartbeat after 2018-01-05 14:00:12.061146
command line airflow scheduler :
[2018-01-05 14:31:20,496] {dag_processing.py:627} INFO - Started a process (PID: 32222) to generate tasks for /var/www/projects/python_airflow/airpy/airflow_home/dags/scheduler.py - logging into /var/www/projects/python_airflow/airpy/airflow_home/logs/scheduler/2018-01-05/scheduler.py.log
[2018-01-05 14:31:23,122] {jobs.py:1002} INFO - No tasks to send to the executor
[2018-01-05 14:31:23,123] {jobs.py:1440} INFO - Heartbeating the executor
[2018-01-05 14:31:23,123] {jobs.py:1450} INFO - Heartbeating the scheduler
[2018-01-05 14:31:24,243] {jobs.py:1404} INFO - Heartbeating the process manager
[2018-01-05 14:31:24,244] {dag_processing.py:559} INFO - Processor for /var/www/projects/python_airflow/airpy/airflow_home/dags/scheduler.py finished
Airflow is an ETL/data pipelining tool. This means its meant to execute things over already "gone by" periods. E.g. using:
Task parameter 'start_date': datetime(2018,1,4)
Default dag parameter schedule_interval='#daily'
Means that the DAG won't run until the whole schedule interval unit (one day) has gone through since the start date; thus on Airflow server time equal to datetime(2018,1,5).
Since you have a start_date of datetime.now() with a #daily inteval (which again is the default), the aforementioned condition is never fulfilled (refer to the official FAQ).
You can change the start_date parameter to, e.g. yesterday using timedelta for a relative start_date earlier than today (although this is not recommended). I would advise using 'start_date': datetime(2018,1,1) and adding a scheduler_interval='#once' to the DAG parameters for test purposes. This should get your DAG to run.

Apache airflow: Cannot use more than 1 thread when using sqlite. Setting max_threads to 1

I have setup airflow with postgresql database, I am creating multiple dags
def subdag(parent_dag_name, child_dag_name,currentDate,batchId,category,subCategory,yearMonth,utilityType,REALTIME_HOME, args):
dag_subdag = DAG(
dag_id='%s.%s' % (parent_dag_name, child_dag_name),
default_args=args,
schedule_interval="#once",
)
# get site list to run bs reports
site_list = getSiteListforProcessing(category,subCategory,utilityType,yearMonth);
print (site_list)
def update_status(siteId,**kwargs):
createdDate=getCurrentTimestamp();
print ('N',siteId,batchId,yearMonth,utilityType,'N')
updateJobStatusLog('N',siteId,batchId,yearMonth,utilityType,'P')
def error_status(siteId,**kwargs):
createdDate=getCurrentTimestamp();
print ('N',siteId,batchId,yearMonth,utilityType,'N')
BS_template = """
echo "{{ params.date }}"
java -cp xx.jar com.xGenerator {{params.siteId}} {{params.utilityType}} {{params.date}}
"""
for index,siteid in enumerate(site_list):
t1 = BashOperator(
task_id='%s-task-%s' % (child_dag_name, index + 1),
bash_command=BS_template,
params={'date': currentDate, 'realtime_home': REALTIME_HOME,'siteId': siteid, "utilityType":utilityType},
default_args=args,
dag=dag_subdag)
t2 = PythonOperator(
task_id='%s-updatetask-%s' % (child_dag_name, index + 1),
dag=dag_subdag,
python_callable=update_status,
op_kwargs={'siteId':siteid})
t2.set_upstream(t1)
return dag_subdag
It creates dynamic tasks but on all number of dynamic task, it always fails the last one and logs error as:
"Cannot use more than 1 thread when using sqlite. Setting max_threads to 1"
E.g. : if 4 tasks are created 3 runs, and if 2 tasks are created 1 runs.

Error with CASSANDRA + PIG + CQL + Counter Column

I'm using pig to access a column family in cassandra with counter column. When i try to dump the data i get the error below:
cqlsh:pollkan> CREATE TABLE votes_count_period_1 (
... period int,
... poll text,
... votes counter,
... PRIMARY KEY (period, poll)
... );
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_1 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> select * from votes_count_period_1;
period | poll | votes
----------+--------------------------------------+-------
20130830 | 605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a | 5
20130831 | 405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a | 2
20130831 | 505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a | 3
root#batch:/usr/share/cassandra# pig -x local
2013-08-31 23:02:06,135 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459164) compiled Mar 21 2013, 06:14:38
2013-08-31 23:02:06,136 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/share/cassandra/pig_1377982926133.log
2013-08-31 23:02:06,154 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2013-08-31 23:02:06,252 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
grunt> register /usr/share/cassandra/apache-cassandra-1.2.9.jar
grunt> register /usr/share/cassandra/apache-cassandra-thrift-1.2.9.jar
grunt> register /usr/share/cassandra/lib/libthrift-0.7.0.jar
grunt> A = LOAD 'cql://pollkan/votes_count_period_1' USING org.apache.cassandra.hadoop.pig.CqlStorage();
grunt> DUMP A;
Causes:
2013-08-31 23:01:35,397 [pool-4-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed ColumnFamilySplit((-69569900416187863, '-54603788994328078] #[cassandra001, cassandra002, cassandra003])
2013-08-31 23:01:35,417 [pool-4-thread-1] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2013-08-31 23:01:35,418 [pool-4-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: A[2,4] C: R:
2013-08-31 23:01:35,424 [Thread-10] INFO org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2013-08-31 23:01:35,428 [Thread-10] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local712790083_0002
java.lang.Exception: java.lang.IndexOutOfBoundsException
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:538)
at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:410)
at org.apache.cassandra.db.context.CounterContext.total(CounterContext.java:477)
at org.apache.cassandra.db.marshal.AbstractCommutativeType.compose(AbstractCommutativeType.java:34)
at org.apache.cassandra.db.marshal.AbstractCommutativeType.compose(AbstractCommutativeType.java:25)
at org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.columnToTuple(AbstractCassandraStorage.java:137)
at org.apache.cassandra.hadoop.pig.CqlStorage.getNext(CqlStorage.java:110)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
I read that https://issues.apache.org/jira/browse/CASSANDRA-5234 was resolved issues with cql3 tables and counter column, but i stil having issues.
By the way, i tried re-create the table with old style COMPACT STORAGE, and i have advanced a little more, but stucked in a new issue with the below error:
cqlsh:pollkan> CREATE TABLE votes_count_period_2 (
... period int,
... poll text,
... votes counter,
... PRIMARY KEY (period, poll)
... ) WITH COMPACT STORAGE;
cqlsh:pollkan>
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130831 AND poll = '505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan> UPDATE votes_count_period_2 SET votes = votes + 1 WHERE period = 20130830 AND poll = '605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a';
cqlsh:pollkan>
cqlsh:pollkan> select * from votes_count_period_2;
period | poll | votes
----------+--------------------------------------+-------
20130830 | 605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a | 5
20130831 | 405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a | 2
20130831 | 505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a | 3
root#batch:/usr/share/cassandra# pig -x local
2013-08-31 23:02:06,135 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459164) compiled Mar 21 2013, 06:14:38
2013-08-31 23:02:06,136 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/share/cassandra/pig_1377982926133.log
2013-08-31 23:02:06,154 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2013-08-31 23:02:06,252 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
grunt> register /usr/share/cassandra/apache-cassandra-1.2.9.jar
grunt> register /usr/share/cassandra/apache-cassandra-thrift-1.2.9.jar
grunt> register /usr/share/cassandra/lib/libthrift-0.7.0.jar
grunt> A = LOAD 'cql://pollkan/votes_count_period_2' USING org.apache.cassandra.hadoop.pig.CqlStorage();
grunt> DUMP A;
2013-08-31 23:05:59,454 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2013-08-31 23:05:59,458 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2013-08-31 23:05:59,465 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-08-31 23:05:59,466 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
((period,20130830),(poll,605bd9c0-aa05-11e3-8c9a-4d42ba09ab2a),(votes,5))
((period,20130831),(poll,405bd9c0-0d05-11e3-8c9a-4d42ba09ab2a),(votes,2))
((period,20130831),(poll,505bd9c0-ff05-11e3-8c9a-4d42ba09ab2a),(votes,3))
grunt> A = LOAD 'cql://pollkan/votes_count_period_2' USING org.apache.cassandra.hadoop.pig.CqlStorage();
grunt> B = FOREACH A GENERATE poll, votes;
grunt> describe B;
B: {poll: chararray,votes: long}
grunt> C = GROUP B BY poll;
grunt> describe C;
C: {group: chararray,B: {(poll: chararray,votes: long)}}
grunt> D = FOREACH C GENERATE group AS pollgroup, SUM(B.votes);
grunt> describe D;
D: {pollgroup: chararray,long}
grunt> dump D;
2013-08-31 23:53:32,577 [pool-33-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: A[13,4],B[14,4],D[18,4],C[17,4] C: D[18,4],C[17,4] R: D[18,4]
2013-08-31 23:53:32,586 [pool-33-thread-1] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
2013-08-31 23:53:32,589 [Thread-65] INFO org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2013-08-31 23:53:32,591 [Thread-65] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local814297309_0018
java.lang.Exception: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to java.lang.String
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to java.lang.String
at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:76)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:112)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
My versions are Pig 0.11.1 and Cassandra 1.2.9.
Any help?
Thanks
I found the same problem earlier today while testing the latest Pig cql3 integration with similar data structures.
The JIRA issue you mentioned, https://issues.apache.org/jira/browse/CASSANDRA-5234, does contain a patch which has been verified to work for one of the commenters. However, a quick look through the cassandra git reveals that it has not been applied either on the 1.2 branch or on the trunk. I have added a comment to that effect to the JIRA issue.
Until the patch gets committed and a new stable version gets released, a solution would be to apply the patch on a fresh checkout of 1.2.9, recompile and deploy to your hadoop nodes, if that is an option for you.

Resources