Kafka Connect error while starting with AvroConverter - cassandra

I'm trying to sink from one topic to Cassandra. That topic has schema-registry for an avro file created by a kafka stream.
This is my connect-standalone.properties:
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/usr/share/java
This is my cassandra-sink.properties:
connector.class=io.confluent.connect.cassandra.CassandraSinkConnector
cassandra.contact.points=ip.to.endpoint
cassandra.port=9042
cassandra.keyspace=my_keyspace
cassandra.write.mode=Update
tasks.max=1
topics=avro-sink
key.converter.schema.registry.url=http://localhost:8081
value.converter.schema.registry.url=http://localhost:8081
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.avro.AvroConverter
name=cassandra-sink-connector
internal.key.converter=org.apache.kafka.connect.storage.StringConverter
internal.value.converter=org.apache.kafka.connect.avro.AvroConverter
transforms=createKey
transforms.createKey.fields=id,timestamp
transforms.createKey.type=org.apache.kafka.connect.transforms.ValueToKey
And this is the error:
[2019-03-15 16:32:05,238] ERROR Failed to create job for /etc/kafka/cassandra-sink.properties (org.apache.kafka.connect.cli.ConnectStandalone:108)
[2019-03-15 16:32:05,238] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:119)
java.util.concurrent.ExecutionException: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector configuration is invalid and contains the following 1 error(s):
Invalid value org.apache.kafka.connect.avro.AvroConverter for configuration value.converter: Class org.apache.kafka.connect.avro.AvroConverter could not be found.
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:79)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:66)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:116)
Caused by: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector configuration is invalid and contains the following 1 error(s):
Invalid value org.apache.kafka.connect.avro.AvroConverter for configuration value.converter: Class org.apache.kafka.connect.avro.AvroConverter could not be found.
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`
at org.apache.kafka.connect.runtime.AbstractHerder.maybeAddConfigErrors(AbstractHerder.java:423)
at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:188)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:113)
This broker is the one that can be found in https://github.com/confluentinc/cp-docker-images
Any ideas? Thanks!

I solved it by removing all converter properties since the default are already avro for value and string for key on confluent docker container

Related

Getting py4j.protocol.Py4JJavaError: An error occurred while calling o65.jdbc. : java.sql.SQLException: Unsupported type TIMESTAMP_WITH_TIMEZONE

I am making JDBC connection to Denodo database using pyspark. The table that i am connecting to contains "TIMESTAMP_WITH_TIMEZONE" datatype for 2 columns. Since spark provides builtin jdbc connection to a handful of dbs only of which denodo is not a part, it is not able to recognize "TIMESTAMP_WITH_TIMEZONE" datatype and hence not able to map to any of its spark sql dataype.
To overcome this i am providing my custom schema(c_schema here) but this is not working as well and i am getting the same error. Below is the code snippet.
c_schema="game start date TIMESTAMP,game end date TIMESTAMP"
df = spark.read.jdbc("jdbc_url", "schema.table_name",properties={"user": "user_name", "password": "password","customSchema":c_schema,"driver": "com.denodo.vdp.jdbc.Driver"})
Please let me know how shall i fix this.
For anyone else facing this issue while connecting to denodo using spark,use CAST function to convert the datatype "TIMESTAMP_WITH_TIMEZONE" into any other datatype like String,Date or Timestamp etc. I had posted this question on denodo community page too and i have attached its official response.
CAST("PLANNED START DATE" as DATE) as "PLANNED_START_DATE"

Logstash 6.2 - full persistent queue (wrong mapping?)

My queue is almost full and I see this errors in my log file:
[2018-05-16T00:01:33,334][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"2018.05.15-el-mg_papi-prod", :_type=>"doc", :_routing=>nil}, #<LogStash::Event:0x608d85c1>], :response=>{"index"=>{"_index"=>"2018.05.15-el-mg_papi-prod", "_type"=>"doc", "_id"=>"mHvSZWMB8oeeM9BTo0V2", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [papi_request_json.query.disableFacets]", "caused_by"=>{"type"=>"i_o_exception", "reason"=>"Current token (VALUE_TRUE) not numeric, can not use numeric value accessors\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper#56b8442f; line: 1, column: 555]"}}}}}
[2018-05-16T00:01:37,145][INFO ][org.logstash.beats.BeatsHandler] [local: 0:0:0:0:0:0:0:1:5000, remote: 0:0:0:0:0:0:0:1:50222] Handling exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 69
[2018-05-16T00:01:37,147][INFO ][org.logstash.beats.BeatsHandler] [local: 0:0:0:0:0:0:0:1:5000, remote: 0:0:0:0:0:0:0:1:50222] Handling exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 84
...
[2018-05-16T15:28:09,981][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2018-05-16T15:28:09,982][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2018-05-16T15:28:09,982][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
If I understand first warning, problem is with mapping. I have a lot of files in my queue Logstash folder. My questions is:
How to empty my queue, can i just delete all files from logstash queue folder(all logs will be lost)? And then resend all the data to logstash to proper index?
How can I determine where exactly is problem in mapping, or which servers sending data of wrong type?
I have pipeline on port 5000 named testing-pipeline just for checking if Logstash is active from nagios. What is that [INFO ][org.logstash.beats.BeatsHandler] logs?
If I understand correctly, [INFO ][logstash.outputs.elasticsearch] is just logs about retrying to process logstash queue?
On all servers is FIlebeat 6.2.2. Thank you for your help.
All pages in queue could be deleted but it is not the proper solution. In my case, the queue was full because there was events with different mapping of index. In Elasticsearch 6, you cannot send documents with different mapping to the same index so the logs stacked in queue because of this logs (even if there is only one wrong event, all others will not be processed). So how to process all data you can process an skip the wrong one? Solution is to configure DLQ (dead letter queue). Every event with response code 400 or 404 is moved to DLQ so others could be process. The data from DLQ can be processed later with pipeline.
Wrong mapping can be determined by error log "error"=>{"type"=>"mapper_parsing_exception", ..... }. To specify exact place with wrong mapping, you have to compare mapping of events and indices.
[INFO ][org.logstash.beats.BeatsHandler] was caused by Nagios server. The check did not consist of valid request, that's why the Handling exception. The check should test if Logstash service is active. Now I check Logstas service on localhost:9600, for more info here.
[INFO ][logstash.outputs.elasticsearch] means that Logstash trying to process the queue but index is locked ([FORBIDDEN/12/index read-only / allow delete (api)]) because the indices was set to read-only state. Elasticsearch, when there is not enough space on server, automatically configure indices to read-only. This can be change by cluster.routing.allocation.disk.watermark.low, for more info here.

Data Pipeline failing for EMR Activity

I am trying to run a spark step on AWS Data-pipeline. I am getting the following exception:-
amazonaws.datapipeline.taskrunner.TaskExecutionException: Failed to
complete EMR transform. at
amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:67)
at
amazonaws.datapipeline.objects.AbstractActivity.run(AbstractActivity.java:16)
at
amazonaws.datapipeline.taskrunner.TaskPoller.executeRemoteRunner(TaskPoller.java:136)
at
amazonaws.datapipeline.taskrunner.TaskPoller.executeTask(TaskPoller.java:105)
at
amazonaws.datapipeline.taskrunner.TaskPoller$1.run(TaskPoller.java:81)
at
private.com.amazonaws.services.datapipeline.poller.PollWorker.executeWork(PollWorker.java:76)
at
private.com.amazonaws.services.datapipeline.poller.PollWorker.run(PollWorker.java:53)
at java.lang.Thread.run(Thread.java:748) Caused by:
amazonaws.datapipeline.taskrunner.TaskExecutionException: EMR job
'#DefaultEmrActivity1_2017-11-20T12:13:08_Attempt=1' with jobFlowId
'j-2E7PU1OK3GIJI' is failed with status 'FAILED' and reason 'Cluster
ready after last step completed.'. Step
'df-0693981356F3KEDFQ6GG_#DefaultEmrActivity1_2017-11-20T12:13:08_Attempt=1'
is in status 'FAILED' with reason 'null' at
amazonaws.datapipeline.cluster.EmrUtil.runSteps(EmrUtil.java:286) at
amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:63)
... 7 more
The cluster is getting spun up correctly.
Here is the screenshot of the pipeline:-
I think there is some issue with the 'step' in activity. Any input would be helpful.
The issue was that the:-
1) script should have been comma-separated. Something like:-
command-runner.jar,spark-submit,--deploy-mode,cluster,--class,com.amazon.Main
Link:- http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-emrcluster.html
2) EmrActivity does not support Staging. So, we cannot use ${INPUT1_STAGING_DIR} in the step instruction. Currently, I have replaced this with the hardcoded S3 URL's.

Error Launching Presto Server while trying to access Hive

I am trying to Launch Presto to query Hive ON RHEL Machine.
But while Launching the Presto Server via "./launcher run", i am getting the following error :
3 errors
com.google.inject.CreationException: Unable to create injector, see the following errors:
1) Configuration property 'http-server.http.port=8080 ' was not used
at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:235)
2) Error: Could not coerce value '8080 ' to int (property 'http-server.http.port') in order to call [public io.airlift.http.server.HttpServerConfig io.airlift.http.server.HttpServerConfig.setHttpPort(int)]
at io.airlift.http.server.HttpServerModule.configure(HttpServerModule.java:74)
3) Error: Invalid configuration property node.id: is malformed (for class io.airlift.node.NodeConfig.nodeId)
at io.airlift.node.NodeModule.configure(NodeModule.java:34)
3 errors
at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:466)
at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:155)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:107)
at com.google.inject.Guice.createInjector(Guice.java:96)
at io.airlift.bootstrap.Bootstrap.initialize(Bootstrap.java:242)
at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:116)
at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:67)
A few things. You need to remove the space after "http-server.http.port=8080".
Also, your node.id property is invalid. The node.id should match the following regex: "[A-Za-z0-9][_A-Za-z0-9-]*" (see https://github.com/airlift/airlift/blob/1e5694fb13ac6ca9cbdae1a1e60909c62fc7a64e/node/src/main/java/io/airlift/node/NodeConfig.java#L30).
same question: No factory for connector jmx
remove all /etc/* config files space

Web Api Returning Json - [System.NotSupportedException] Specified method is not supported. (Sybase Ase)

I'm using Web api with Entity Framework 4.2 and the Sybase Ase connector.
This was working without issues returning JSon, until I tried to add a new table.
return db.car
.Include("tires")
.Include("tires.hub_caps")
.Include("tires.hub_caps.colors")
.Include("tires.hub_caps.sizes")
.Include("tires.hub_caps.sizes.units")
.Where(c => c.tires == 13);
The above works without issues if the following line is removed:
.Include("tires.hub_caps.colors")
However, when that line is included, I am given the error:
""An error occurred while preparing the command definition. See the inner exception for details."
The inner exception reads:
"InnerException = {"Specified method is not supported."}"
"source = Sybase.AdoNet4.AseClient"
The following also results in an error:
List<car> cars = db.car.AsNoTracking()
.Include("tires")
.Include("tires.hub_caps")
.Include("tires.hub_caps.colors")
.Include("tires.hub_caps.sizes")
.Include("tires.hub_caps.sizes.units")
.Where(c => c.tires == 13).ToList();
The error is as follows:
An exception of type 'System.Data.EntityCommandCompilationException' occurred in System.Data.Entity.dll but was not handled in user code
Additional information: An error occurred while preparing the command definition. See the inner exception for details.
Inner exception: "Specified method is not supported."
This points to a fault with with the Sybase Ase Data Connector.
I am using data annotations on all tables to control which fields are returned. On the colors table, I have tried the following annotations to limit the properties returned just the key:
[JsonIgnore]
[IgnoreDataMember]
Any ideas what might be causing this issue?
Alternatively, if I keep colors in and remove,
.Include("tires.hub_caps.sizes")
.Include("tires.hub_caps.sizes.units")
then this works also. It seems that the Sybase Ase connector does not support cases when an include statement forks from one object in two directions. Is there a way round this? The same issue occurs with Sybase Ase and the progress data connector.
The issue does not occur in a standard ASP.net MVC controller class - the problem is with serializing two one to many relationships on a single table to JSON.
This issue still occurs if lazy loading is turned on.
It seems to me that this is a bug with Sybase ASE, that none of the connectors are able to solve.

Resources