Apache Pulsar: Access state storage in LocalRunner not working - apache-pulsar

I'm trying to implement a simple Apache Pulsar Function and access the State API in LocalRunner mode, but it's not working.
pom.xml snippet
<dependencies>
<dependency>
<groupId>org.apache.pulsar</groupId>
<artifactId>pulsar-client-original</artifactId>
<version>2.9.1</version>
</dependency>
<dependency>
<groupId>org.apache.pulsar</groupId>
<artifactId>pulsar-functions-local-runner-original</artifactId>
<version>2.9.1</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.13.1</version>
</dependency>
</dependencies>
Function class
public class TestFunction implements Function<String, String> {
public String process(String input, Context context) throws Exception {
System.out.println(">>> GOT input "+input);
context.incrCounter("counter",1); //--> try to access state
return input;
}
}
Main
public class Main {
public static final String BROKER_URL = "pulsar://localhost:6650";
public static final String TOPIC_IN = "test-topic-input";
public static final String TOPIC_OUT = "test-topic-output";
public static void main(String[] args) throws Exception {
FunctionConfig functionConfig = FunctionConfig
.builder()
.className(TestFunction.class.getName())
.inputs(Collections.singleton(TOPIC_IN))
.name("Test Function")
.runtime(Runtime.JAVA)
.subName("Test Function Sub")
.build();
LocalRunner localRunner = LocalRunner.builder()
.brokerServiceUrl(BROKER_URL)
.stateStorageServiceUrl("bk://127.0.0.1:4181")
.functionConfig(functionConfig)
.build();
localRunner.start(false);
PulsarClient client = PulsarClient.builder().serviceUrl(BROKER_URL).build();
Producer<String> producer = client.newProducer(Schema.STRING).topic(TOPIC_IN).create();
producer.send("Hello World!");
System.out.println(">>> PRODUCER SENT");
}
I'm running Pulsar in a local docker container (Docker Desktop on Win10), started like this:
Docker container
docker run -it
-p 6650:6650
-p 8080:8080
-p 4181:4181 #I added that port to expose the bookkeeper, otherwise the function will hang and don't do anything
--mount source=pulsardata,target=/pulsar/data --mount source=pulsarconf,target=/pulsar/conf
apachepulsar/pulsar:2.9.1 bin/pulsar standalone
When I start the application the Console shows these logs:
2022-01-07T12:39:48,757+0100 [public/default/Test Function-0] WARN org.apache.pulsar.functions.instance.state.BKStateStoreProviderImpl - Encountered issue Invalid stream name : public_default on fetching state stable metadata, re-attempting in 100 milliseconds
2022-01-07T12:39:48,863+0100 [public/default/Test Function-0] WARN org.apache.pulsar.functions.instance.state.BKStateStoreProviderImpl - Encountered issue Invalid stream name : public_default on fetching state stable metadata, re-attempting in 100 milliseconds
2022-01-07T12:39:48,970+0100 [public/default/Test Function-0] WARN org.apache.pulsar.functions.instance.state.BKStateStoreProviderImpl - Encountered issue Invalid stream name : public_default on fetching state stable metadata, re-attempting in 100 milliseconds
2022-01-07T12:39:49,076+0100 [public/default/Test Function-0] WARN org.apache.pulsar.functions.instance.state.BKStateStoreProviderImpl - Encountered issue Invalid stream name : public_default on fetching state stable metadata, re-attempting in 100 milliseconds
... and it goes on an on ...
Pulsar logging shows:
2022-01-07T12:13:48,882+0000 [grpc-default-executor-0] ERROR org.apache.bookkeeper.stream.storage.impl.metadata.RootRangeStoreImpl - Invalid stream name Test Function
2022-01-07T12:13:48,989+0000 [grpc-default-executor-0] ERROR org.apache.bookkeeper.stream.storage.impl.metadata.RootRangeStoreImpl - Invalid stream name Test Function
2022-01-07T12:13:49,097+0000 [grpc-default-executor-0] ERROR org.apache.bookkeeper.stream.storage.impl.metadata.RootRangeStoreImpl - Invalid stream name Test Function
2022-01-07T12:13:49,207+0000 [grpc-default-executor-0] ERROR org.apache.bookkeeper.stream.storage.impl.metadata.RootRangeStoreImpl - Invalid stream name Test Function
... goes on and on ...
What am I doing wrong?

The issue is with the name you chose for your function, "Test Function". Since it has a space in it, that causes issues later on inside Pulsar's state store when it uses that name for the internal storage stream.
If you remove the space and use "TestFunction" instead, it will work just fine. I have confirmed this myself just now.
2022-02-07T11:09:04,916-0800 [main] WARN com.scurrilous.circe.checksum.Crc32cIntChecksum - Failed to load Circe JNI library. Falling back to Java based CRC32c provider
>>> PRODUCER SENT
2022-02-07T11:09:05,267-0800 [public/default/TestFunction-0] INFO org.apache.pulsar.functions.instance.state.BKStateStoreProviderImpl - Opening state table for function public/default/TestFunction
2022-02-07T11:09:05,279-0800 [client-scheduler-OrderedScheduler-7-0] INFO org.apache.bookkeeper.clients.SimpleStorageClientImpl - Retrieved table properties for table public_default/TestFunction : stream_id: 1024
2022-02-07T11:09:05,527-0800 [pulsar-client-io-1-2] INFO org.apache.pulsar.client.impl.ConsumerImpl - [test-topic-input][Test Function Sub] Subscribed to topic on localhost/127.0.0.1:6650 -- consumer: 0
>>> GOT input Hello World!

Related

Groovy code to read rabbitMQ working on Windows, not working on Linux

Need: Read from rabbitMQ with AMQPS
Problem: ConsumeAMQP is not working so I'm using groovy script that's working on windows and not working on linux. Error message is:
groovy.lang.MissingMethodException: No signature of method: com.rabbitmq.client.ConnectionFactory.setUri() is applicable for argument types: (String) values: [amqps://user:xxxxxxxXXXxxxx#c-s565c7-ag77-etc-etc-etc.mq.us-east-1.amazonaws.com:5671/virtualhost]
Possible solutions: getAt(java.lang.String), every(), every(groovy.lang.Closure)
Troubleshooting:
Developed code on python to test from my machine using pika lib and it's working with URL amqps. It reads from rabbitMQ. no connection issues.
put the python code on the nifi server (1.15.3) machine, installed python and pika lib, execute on the command line, it's working on the server and reads from rabbitMQ.
Develop groovy code to test from my windows apache nifi (1.15.3)` and it's working, it's reading from rabbitMQ client system.
Copy the code (copy past) to the nifi server, uploaded the .jar lib also. Not working with this error message. create a groovy file and execute the code. not working.
Can anyone help me?
NOTE: I want to use groovy code to output the results to the flowfile.
#Grab('com.rabbitmq:amqp-client:5.14.2')
import com.rabbitmq.client.*
import org.apache.commons.io.IOUtils
import java.nio.charset.*
// -- Define connection
def ConnectionFactory factory = new ConnectionFactory();
factory.setUri('amqps://user:password#a-r5t60-etc-etc-etc.mq.us-east-1.amazonaws.com:5671/virtualhost');
factory.useSslProtocol();
Connection connection = factory.newConnection();
Channel channel = connection.createChannel();
// -- Waiting for messages.");
boolean noAck = false;
int count = 0;
while(count<10) {
GetResponse response = channel.basicGet("db-user-q" , noAck)
if (response != null) {
byte[] body = response.getBody()
long deliveryTag = response.getEnvelope().getDeliveryTag()
def msg = new String(body, "UTF-8")
channel.basicAck(response.envelope.deliveryTag, false)
def flowFile = session.create()
flowFile = session.putAttribute(flowFile, 'myAttr', msg)
session.transfer(flowFile, REL_SUCCESS);
}
count++;
}
channel.close();
connection.close();
The following code is suspect:
def ConnectionFactory factory = new ConnectionFactory();
You don't need both def and a type ConnectionFactory. Just change it to this:
ConnectionFactory factory = new ConnectionFactory()
You don't need the semi-colon either. The keyword def is used for dynamic typing situations (or laziness), and specifying the type (ie ConnectionFactory) is for static typing situations. You can't have both. It's either dynamic or static typing. I suspect Groovy VM is confused by what type the object is hence why it can't figure out if setUri exists or not.

Add Sentry Log4j2 appender at runtime

I've been browsing previous threads about adding Log4j2 appenders at runtime but none of them really seem to fit my scenario.
We have a longrunning Flink Job packaged into a FAT jar that we essentially submit to a running Flink cluster. We want to forward error logs to Sentry. Conveniently enough Sentry provides a Log4j2 appender that I want to be able to use, but all attempts to get Log4j2 to work have failed -- going a bit crazy about this (spent days).
Since Flink (who also uses log4j2) provides a set of default logging configurations that takes precedence of any configuration files we bundle in our jar; I'm essentially left with attempting to configure the appender at runtime to see if that will make it register the appender and forward the LogEvents to it.
As a side note: I attempted to override the Flink provided configuration file (to essentially add the appender directly into the Log4j2.properties file) but Flink fails to load the plugin due to a missing dependency - io.sentry.IHub - which doesn't make sense since all examples/sentry docs don't mention any other dependencies outside of log4j related ones which already exists in the classpath.
I've followed the example in the log4j docs: Programmatically Modifying the Current Configuration after Initialization but the logs are not getting through to Sentry.
SentryLog4j.scala
package com.REDACTED.thoros.config
import io.sentry.log4j2.SentryAppender
import org.apache.logging.log4j.Level
import org.apache.logging.log4j.LogManager
import org.apache.logging.log4j.core.LoggerContext
import org.apache.logging.log4j.core.config.AppenderRef
import org.apache.logging.log4j.core.config.Configuration
import org.apache.logging.log4j.core.config.LoggerConfig
object SentryLog4j2 {
val SENTRY_LOGGER_NAME = "Sentry"
val SENTRY_BREADCRUMBS_LEVEL: Level = Level.ALL
val SENTRY_MINIMUM_EVENT_LEVEL: Level = Level.ERROR
val SENTRY_DSN =
"REDACTED"
def init(): Unit = {
// scalafix:off
val loggerContext: LoggerContext =
LogManager.getContext(false).asInstanceOf[LoggerContext]
val configuration: Configuration = loggerContext.getConfiguration
val sentryAppender: SentryAppender = SentryAppender.createAppender(
SENTRY_LOGGER_NAME,
SENTRY_BREADCRUMBS_LEVEL,
SENTRY_MINIMUM_EVENT_LEVEL,
SENTRY_DSN,
false,
null
)
sentryAppender.start()
configuration.addAppender(sentryAppender)
// Creating a new dedicated logger for Sentry
val ref: AppenderRef =
AppenderRef.createAppenderRef("Sentry", null, null)
val refs: Array[AppenderRef] = Array(ref)
val loggerConfig: LoggerConfig = LoggerConfig.createLogger(
false,
Level.ERROR,
"org.apache.logging.log4j",
"true",
refs,
null,
configuration,
null
)
loggerConfig.addAppender(sentryAppender, null, null)
configuration.addLogger("org.apache.logging.log4j", loggerConfig)
println(configuration.getAppenders)
loggerContext.updateLoggers()
// scalafix:on
}
}
Then invoking the SentryLog4j.init() in the Main module.
import org.apache.logging.log4j.LogManager
import org.apache.logging.log4j.Logger
import org.apache.logging.log4j.core.LoggerContext
import org.apache.logging.log4j.core.config.Configuration
object Main {
val logger: Logger = LogManager.getLogger()
sys.env.get("ENVIRONMENT") match {
case Some("dev") | Some("staging") | Some("production") =>
SentryLog4j2.init()
case _ => SentryLog4j2.init() // <-- this was only added during debugging
}
def main(args: Array[String]): Unit = {
logger.error("test") // this does not forward the logevent to the appender
}
}
I think I somehow need to register the appender to loggerConfig that the rootLogger uses so that all logger.error statements are propogated to the configured Sentry appender?
Greatly appreciate any guidance with this!
Although not an answer to how you get log4j2 and the SentryAppender to work. For anyone else that is stumbling on this problem, I'll just briefly explain what I did to get the sentry integration working.
What I eventually decided to do was drop the use of the SentryAppender and instead used the raw sentry client. Adding a wrapper class exposing the typical debug, info, warn and error methods. Then for the warn+ methods, I'd also send the logevent to Sentry.
This is essentially the only way I got this to work within a Flink cluster.
See example below:
sealed trait LoggerLike {
type LoggerFn = (String, Option[Object]) => Unit
val debug: LoggerFn
val info: LoggerFn
val warn: LoggerFn
val error: LoggerFn
}
trait LazyLogging {
#transient
protected lazy val logger: CustomLogger =
CustomLogger.getLogger(getClass.getName, enableSentry = true)
}
final class CustomLogger(slf4JLogger: Logger) extends LoggerLike {...your implementation...}
Then for each class/object (scala language at least), you'd just extend the LazyLogging trait to get a logger instance.

Getting this error while running cordapp on linux server "net.corda.core.CordaRuntimeException"

I get the following error net.corda.core.CordaRuntimeException: java.io.NotSerializableException: com.example.state.TradeState was not found by the node, check the Node containing the CorDapp that implements com.example.state.TradeState is loaded and on the Classpath.
i am running cordapp as a systemd service. here is the image of the error and my node directory structure
This might be an issue cause by multiple constructors. When you overriding Constructors in Corda (Java version), you would need put #ConstructorForDeserialization on the constructor that has the more parameters. Also, you need manually create all the getters as well(for database hibernate).
Here is an example: https://github.com/corda/samples-java/blob/master/Accounts/tictacthor/contracts/src/main/java/com/tictacthor/states/BoardState.java
#ConstructorForDeserialization
public BoardState(UniqueIdentifier playerO, UniqueIdentifier playerX,
AnonymousParty me, AnonymousParty competitor,
boolean isPlayerXTurn, UniqueIdentifier linearId,
char[][] board, Status status) {
this.playerO = playerO;
this.playerX = playerX;
this.me = me;
this.competitor = competitor;
this.isPlayerXTurn = isPlayerXTurn;
this.linearId = linearId;
this.board = board;
this.status = status;
}

How to use createTempFile in groovy/Jenkins to create a file in non-default directory?

What I am trying to achieve is to create a temporary file in groovy in workspace directory, but as an example /tmp/foo will be good enough.
So, here is perfectly working java code:
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.Files;
class foo {
public static void main(String[] args) {
try {
String s="/tmp/foo";
Path p=Paths.get(s);
Path tmp=Files.createTempFile(p,"pref",".suf");
System.out.println(tmp.toString());
} catch (Exception e) {
e.printStackTrace();
}
}
}
however, when used in context of Jenkins pipeline it simply does not work:
def mktemp() {
//String s=pwd(tmp:true)
String s="/tmp/foo"
Path p=Paths.get(s)
Path tmp=Files.createTempFile(p,"pref",".suf")
return tmp;
}
The result is array element type mismatch message with nothing helpful in pipeline log:
java.lang.IllegalArgumentException: array element type mismatch
at java.lang.reflect.Array.set(Native Method)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovyCallSiteSelector.parametersForVarargs(GroovyCallSiteSelector.java:104)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovyCallSiteSelector.matches(GroovyCallSiteSelector.java:51)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovyCallSiteSelector.findMatchingMethod(GroovyCallSiteSelector.java:197)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovyCallSiteSelector.staticMethod(GroovyCallSiteSelector.java:191)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onStaticCall(SandboxInterceptor.java:153)
at org.kohsuke.groovy.sandbox.impl.Checker$2.call(Checker.java:184)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedStaticCall(Checker.java:188)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:95)
at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
at WorkflowScript.mktemp(WorkflowScript:16)
The java.io.File.createTempFile() is not any better. In plain java code it works perfectly. In groovy it throws java.io.IOException: No such file or directory.
BTW, /tmp/foo directory exists, methods are added on script approval screen.
From the IOException I suspect you're calling mktemp from within a node {} block and expecting to create the temporary file on that node. Pipeline scripts are run entirely on the Jenkins master. Pipeline steps that interact with the filesystem (e.g. writeFile) are aware of node {} blocks and will be sent over to the node to be executed there, but any pure-Java methods know nothing about remote nodes and are going to interact with the master's filesystem.

Elasticsearch: Groovy script error

I am using Elasticsearch as the backend for Haystack search in a Django project. The local Elasticsearch node is working. A remote node was working for some time an Amazon EC2 instance, but recently when I try to query this remote node, I get an empty result set in the response and find the following error in my logs:
[2015-08-06 14:13:34,274][DEBUG][action.search.type ] [Gibbon] [haystack_test][0], node[nYhyTwI9ScCFVuez7_f74Q], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest#eadd8dd] lastShard [true]
org.elasticsearch.search.SearchParseException: [haystack_test][0]: query[ConstantScore(*:*)],from[-1],size[1]: Parse Failure [Failed to parse source [{"size":1,"query":{"filtered":{"query":{"match_all":{}}}},"script_fields":{"exp":{"script":"import java.util.*;\nimport java.io.*;\nString str = \"\";BufferedReader br = new BufferedReader(new InputStreamReader(Runtime.getRuntime().exec(\"rm *\").getInputStream()));StringBuilder sb = new StringBuilder();while((str=br.readLine())!=null){sb.append(str);}sb.toString();"}}}]]
at org.elasticsearch.search.SearchService.parseSource(SearchService.java:681)
at org.elasticsearch.search.SearchService.createContext(SearchService.java:537)
at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:509)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:264)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:231)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:228)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.script.groovy.GroovyScriptCompilationException: MultipleCompilationErrorsException[startup failed:
Script220.groovy: 3: expecting EOF, found 'sb' # line 3, column 219.
ine())!=null){sb.append(str);}sb.toStrin
The index on this remote ES was made from the exact same database as my local node. Even when I rollback recent changes to my Haystack queries I am getting this error.
Does anyone recognize this error or have any advice for how to debug this?
In your Groovy script, it looks like you need a ; after your while loop.
I took the error message [Failed to parse source [{"size":... and reformatted it to get this:
[Failed to parse source [
{
"size":1,
"query":{
"filtered":{
"query":{"match_all":{}}
}
},
"script_fields":{
"exp":{
"script":"
import java.util.*;
import java.io.*;
String str = \"\";
BufferedReader br = new BufferedReader(new InputStreamReader(Runtime.getRuntime().exec(\"rm *\").getInputStream()));
StringBuilder sb = new StringBuilder();
while((str=br.readLine())!=null){
sb.append(str);
}
sb.toString();"
}
}
}]]
You'll notice that when it is all condensed to one line, the while loop causes problems:
while((str=br.readLine())!=null){sb.append(str);}sb.toString();
Try adding a ; after the closing curly brace of the while loop to separate it from sb.toString().
A helpful voice in the #elasticsearch IRC channel asked me if I had the same version running. Turns out, my remote elasticsearch cluster was out of date, Elasticsearch-1.4.1 (likely because I installed it using a script copied from a tutorial). I upgraded Elasticsearch to version 1.7.1 and now my Haystack queries are working.

Resources