WSO2 BAM wirh offset 1. Cassandra error - cassandra

i have a problem on startup of BAM server.
My machine has the IP 1.33.33.127 and hostname "srv-lc-presen".
I it have configurated using this document:
Monitoring and statistics.
I have modified the at carbon.xml. I have it set to 1.
I've modified the master-datasources.xml and set
WSO2BAM_CASSANDRA_DATASOURCE url = jdbc:cassandra://srv-lc-presen:9161/EVENT_KS
WSO2BAM_UTIL_DATASOURCE url = jdbc:cassandra://srv-lc-presen:9161/BAM_UTIL_KS
I have tried with localhost, 1.33.33.127 and srv-lc-presen.
I always get the same error:
ERROR {me.prettyprint.cassandra.connection.HConnectionManager} - Could not start connection pool for host srv-lc-presen(1.33.33.127):9161
[2014-05-07 12:04:24,983] WARN {me.prettyprint.cassandra.connection.CassandraHostRetryService} - Downed srv-lc-presen(1.33.33.127):9161 host still appears to be down: Unable to open transport to srv-lc-presen(1.33.33.127):9161 , java.net.ConnectException: Connection refused
[2014-05-07 12:04:24,987] ERROR {org.wso2.carbon.bam.notification.task.internal.NotificationDispatchComponent} - All host pools marked down. Retry burden pushed out to client.
me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client.
at me.prettyprint.cassandra.connection.HConnectionManager.getClientFromLBPolicy(HConnectionManager.java:393)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:249)
at me.prettyprint.cassandra.service.ThriftCluster.addKeyspace(ThriftCluster.java:168)
at org.wso2.carbon.bam.datasource.utils.DataSourceUtils.createKeyspaceIfNotExist(DataSourceUtils.java:80)
at org.wso2.carbon.bam.datasource.utils.DataSourceUtils.getClusterKeyspaceFromRDBMSConfig(DataSourceUtils.java:92)
at org.wso2.carbon.bam.datasource.utils.DataSourceUtils.getClusterKeyspaceFromRDBMSDataSource(DataSourceUtils.java:96)
NEW information
i have tried to reconfigure and i don't find the problem.
I see in BAM console this error
[2014-05-08 09:10:57,531] ERROR {me.prettyprint.cassandra.connection.HConnectionManager} - Could not start connection pool for host 1.33.33.127(1.33.33.127):9161
[2014-05-08 09:10:57,564] ERROR {org.wso2.carbon.bam.notification.task.internal.NotificationDispatchComponent} - All host pools marked down. Retry burden pushed out to client.
me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client.
at me.prettyprint.cassandra.connection.HConnectionManager.getClientFromLBPolicy(HConnectionManager.java:393)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:249)
at me.prettyprint.cassandra.service.ThriftCluster.addKeyspace(ThriftCluster.java:168)
at org.wso2.carbon.bam.datasource.utils.DataSourceUtils.createKeyspaceIfNotExist(DataSourceUtils.java:80)
at org.wso2.carbon.bam.datasource.utils.DataSourceUtils.getClusterKeyspaceFromRDBMSConfig(DataSourceUtils.java:92)
at org.wso2.carbon.bam.datasource.utils.DataSourceUtils.getClusterKeyspaceFromRDBMSDataSource(DataSourceUtils.java:96)
at org.wso2.carbon.bam.notification.task.internal.NotificationDispatchComponent.initRecordStore(NotificationDispatchComponent.java:72)
at org.wso2.carbon.bam.notification.task.internal.NotificationDispatchComponent.activate(NotificationDispatchComponent.java:64)
And in API Manager console this
[2014-05-08 09:14:52,096] ERROR - ReceiverGroup No receiver is reachable at reconnection, can't publish the events
[2014-05-08 09:14:55,102] ERROR - AsyncDataPublisher Reconnection failed for for tcp://1.33.33.127:7612/

Please use this command at startup or edit wso2server.sh if you are not using notification feature sh wso2server.sh -Ddisable.notification.task
https://docs.wso2.org/display/BAM240/Notifications

Related

FileSync local endpoint offline

I have 3 servers (one of them with Windows Server 2012 R2 and 2 with Windows Server 2019) and I use Azure FileSync to sync files between them.
Since a few days I have a problem, the 2012 R2 server is appearing offline in the azure portal (it shows "no activity"). I tried the Test-StorageSyncNetworkConnectivity cmdlet and it fails with the following message:
Discovery service connectivity result:
Result: Success
HostUri: unknown
HostIPv4Addr: Fail. DNS name does not exist. Resolution through GetAddrInfo failed with error: 11001
HostIPv6Addr: Fail. DNS name does not exist. Resolution through GetAddrInfo failed with error: 11001
Management service connectivity result:
Result: Fail. Failed to run test
HostUri: unknown
HostIPv4Addr: Fail. DNS name does not exist. Resolution through GetAddrInfo failed with error: 11001
HostIPv6Addr: Fail. DNS name does not exist. Resolution through GetAddrInfo failed with error: 11001
HostNetworkLatency [min,avg,max]: Network Latency Request Failed.
Monitoring service connectivity result:
Result: No response from monitoring agent process.
HostUri: unknown
HostIPsAddr: IPv4 and Ipv6 addresses do not exist
ServerEndpoint: faf66731-1e22-47eb-93eb-b8d3331f0de2
SyncServiceResult:
SyncServiceHostUri:
SyncServiceHostIPsAddr: IPv4 and Ipv6 addresses do not exist
SyncServiceHostNetworkLatency: Request Failed.
ServerEndpoint: 80f3bb96-463b-4f86-9e26-8dcf0c92f915
SyncServiceResult:
SyncServiceHostUri:
SyncServiceHostIPsAddr: IPv4 and Ipv6 addresses do not exist
SyncServiceHostNetworkLatency: Request Failed.
ServerEndpoint: b9a874b4-7acd-4174-b5e8-26ac23c84c7e
SyncServiceResult:
SyncServiceHostUri:
SyncServiceHostIPsAddr: IPv4 and Ipv6 addresses do not exist
SyncServiceHostNetworkLatency: Request Failed.
Remediation Steps
For Azure File Sync to work correctly, you will need to configure your servers to communicate with multiple Azure servic
es
Refer the following public document for details on proxy settings or firewall settings for Azure File Sync - https://aka
.ms/AFS/ProxyAndFirewall
If you have configured a private endpoint refer the following public document for configuring private endpoint for Azure
File Sync - https://aka.ms/AFS/PrivateEndpoint
NetworkTestPassed Report
----------------- ------
False ...
The problem seems to be DNS related, but I tried the Test-NetConnection -ComputerName <remote-host> -Port 443 cmdlet with the correct URLs (taken from https://learn.microsoft.com/it-it/azure/storage/file-sync/file-sync-firewall-and-proxy#test-network-connectivity-to-service-endpoints) and all the endpoints seems to be working fine (the ping is failing but I think that is regular behavior. E.g.:
PS C:\Program Files\Azure\StorageSyncAgent> Test-NetConnection -ComputerName tm-kailani7.one.microsoft.com -Port 443
AVVISO: Ping to tm-kailani7.one.microsoft.com failed -- Status: TimedOut
ComputerName : tm-kailani7.one.microsoft.com
RemoteAddress : 20.38.85.153
RemotePort : 443
InterfaceAlias : Ethernet 2
SourceAddress : 192.168.0.185
PingSucceeded : False
PingReplyDetails (RTT) : 0 ms
TcpTestSucceeded : True
I also tried the FileSyncErrorsReport.ps1 but even that doesn't give me any error:
AVVISO: There are no file sync errors to report. Either the last completed sync session did not have per-item errors or
the ItemResults event log on the server wrapped due to too many per-item errors and the event log no longer contains
errors for this sync group. To learn more, see the Azure File Sync troubleshooting documentation:
https://aka.ms/AFS/FileSyncErrorReport
I think the problem lies with the fact that the AzureStorageSyncMonitor.exe process is not running and if i try to run it manually it just closes itself after a few seconds.
I've got no event ID 9301 (specified here: https://learn.microsoft.com/it-it/azure/storage/file-sync/file-sync-troubleshoot?tabs=portal1%2Cazure-portal#server-endpoint-health) and by searching in the other folder of eventvwr i could only find the event 4104 which shows me some error dated to the last time the server has reached the Azure endpoint:
Querying for new jobs failed.
HttpErrorCode: 0x80C8700C
InternalErrorCode: 0x80C80300
Any help would be greatly appreciated, thank you.
• Kindly please check the event ID 9302 in the ‘FileSync’ telemetry logs under ‘Application and Service Logs’ in the event viewer for the active sync sessions logged every 5 to 10 minutes and check whether it is making any progress as the ‘AzureStorageSyncMonitor.exe’ utility synchronizes the status of the Server endpoint to the storage sync service in the portal.
• You can also check the ‘Perfmon.msc’, i.e., performance counter which is built-in to the Azure File Sync to monitor the sync activity locally on the server.
• Finally, please check the Server’s configured IP address settings too as you are encountering the DNS resolution issue while trying to execute the ‘Test-StorageSyncNetworkConnectivity’ command. In the IP address settings, please check whether the configured DNS server IP addresses (Preferred and Secondary) are configured correct and are reachable.
Also, check the ‘localhosts’ file in the ‘C:\Windows\System32\drivers\etc’ path whether it contains the correct IP address of the server, i.e., Windows Server 2012 R2 and its expected DNS hostname as various services on the server itself including the ‘AzureStorageSyncMonitor’ refer the ‘localhosts’ file for sending DNS requests to the connected/configured external services and for communicating between the internal services also.
• Finally, would suggest you to please disable negative caching on the DNS client, put the suffix with the matching host A record as the last entry in the suffix search list and use the ‘AF_UNSPEC’ for the family and let your code determine the ‘A/AAAA’ results for you.
For more detailed information on this, kindly refer to the below link: -
https://learn.microsoft.com/en-us/troubleshoot/windows-server/networking/getaddrinfo-fails-error-11001-call-af-inet6-family#workaround

Why does Cassandra Kerberos Connection to second keyspace fail?

We are trying to connect to two keyspaces of Cassandra (3.x) in the same application with the same Kerberos credentials. The application is able to connect to one keyspace but no the other. Access to the keyspaces has been verified.
Error on connection:
2022-08-22 13:15:10,972 [cluster-reconnection-0] DEBUG c.d.d.c.ControlConnection [--]- [Control connection] error on 169.24.167.109:9042 connection, trying next host
javax.security.auth.login.LoginException: No LoginModules configured for CassandraJavaClient
at javax.security.auth.login.LoginContext.init(LoginContext.java:264)
at javax.security.auth.login.LoginContext.<init>(LoginContext.java:417)
The ticket cache is :
CassandraJavaClient {
com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true ticketCache="/var//krb5cc_userlogin";
};
The same ticket cache file is used by the first connection - which succeeds. While the second connection fails. I am not even sure as to how to debug it (tried remote debugging and since the initial control connection is an Async call, unable to get to the actual error).
We are using com.datastax.cassandra:cassandra-driver-core:jar:3.6.0
Any ideas/help to debug / resolve this will be highly appreciated

Could not get connection while getPartitionedTopicMetadata - io.netty.channel.ConnectTimeoutException: connection timed out

I have a basic Pulsar app, and when I try to connect to Pulsar, I get this exception:
2021-03-10 14:38:26.107 WARN 7 --- [r-client-io-1-1]
o.a.pulsar.client.impl.ConnectionPool : Failed to open connection
to my-pulsar-server-ms-tls.domain.com:6651 :
io.netty.channel.ConnectTimeoutException: connection timed out:
my-pulsar-server-ms-tls.domain.com/10.80.13.38:6651 2021-03-10
14:38:26.212 WARN 7 --- [al-listener-3-1]
o.a.pulsar.client.impl.PulsarClientImpl : [topic:
persistent://myTenant/myNamespace/myTopic]
Could not get connection while getPartitionedTopicMetadata -- Will try
again in 100 ms
My Pulsar client is pretty basic:
PulsarClient.builder()
.serviceUrl(serviceUrl)
.authentication(AuthenticationFactory.token(authToken))
.tlsTrustCertsFilePath(serverCertificateFilePath.toString())
.enableTlsHostnameVerification(false)
.allowTlsInsecureConnection(false)
.build();
The producer is also pretty basic and looks like this:
pulsarClient.newProducer(Schema.STRING)
.topic(topic)
.create();
I've verified that the token and TLS cert are correct. I've also tried connecting a consumer from this same environment and got a similar exception, and I know that others with the same code are able to connect to the same Pulsar cluster from other environments. What is the issue?
Your connection is getting blocked by a firewall or network issue.
Verify that you can establish a connection to your endpoint my-pulsar-server-ms-tls.domain.com:6651 from your environment.
If you're able to run a network packet dump (like tcpdump), that should make it obvious if you're not able to establish a connection.
You can also try running curl my-pulsar-server-ms-tls.domain.com:6651, and if you get back some html, that means you were able to reach the server. However, if you get Could not resolve host, then you were blocked by the network configuration (such as a missing route) or firewall.

How to extend connection timeout in node-template, when joining a private network?

I was following the Creating Your Private Network tutorial.
I have a running the bootnode in my local machine, and I want a new participant from AWS to join in my network. But somehow, i keep getting 0 peers.
I added the logs RUST_LOG=debug and found this:
2021-08-04 02:06:40.563 DEBUG tokio-runtime-worker libp2p_dns: Dialing /ip4/130.105.xxx.xxx/tcp/30333/p2p/12D3KooxxxNr
2021-08-04 02:06:40.563 DEBUG tokio-runtime-worker libp2p_tcp: dialing 130.105.xxx.xxx:30333
2021-08-04 02:06:40.563 DEBUG tokio-runtime-worker libp2p_swarm: Connection attempt to PeerId("12D3KooxxxNr") via "/ip4/130.105.xxx.xxx/tcp/30333/p2p/12D3KooxxxNr" failed with Transport(Other(Custom { kind: Other, error: Timeout })). Attempts remaining: 2.
2021-08-04 02:06:40.563 DEBUG tokio-runtime-worker libp2p_kad::behaviour: Last remaining address '/ip4/130.105.xxx.xxx/tcp/30333/p2p/12D3KooxxxNr' of peer '12D3KooxxxNr' is unreachable: Pending connection: Transport error: Timeout has been reached.
I read it somewhere, that it takes 5 minutes to connect.
How do I increase the timeout period?
I think your local machine doesn't have a public IP address.
I recommend you to set your AWS as the bootnode then connect from your local machine.

prestodb| worker not found & v1/collector/general is not returning any values

Hi I have configured prestodb with one coordinator and one worker.
when I run the worker I do get the message like
Discovery server connect succeeded for refresh (presto/general)
Discovery server connect succeeded for refresh (collector/general)
io.airlift.discovery.client.Announcer Discovery server connect succeeded for announce
however when I run a query it says worker not available.
also when I try to see if the below urls works
http://<master>/v1/service/presto/general - works ( i can see both nodes)
However when i use
http://<master>//v1/service/collector/general - doesn't work below is the result
{"environment":"dev","services":[]}
Server config.properties
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8000
query.max-memory=50GB
query.max-memory-per-node=3GB
discovery-server.enabled=true
discovery.uri=http://gdcrtdev01.[domain]:8000
Worker config.properties
coordinator=false
http-server.http.port=8000
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery.uri=http://gdcrtdev01.[domain]:8000

Resources