Ops Center LCM HTTP 401 with Public DataStax Repository - cassandra

I have installed Ops Center 6.0 on Ubuntu 16.04 LTS.
I am using Lifecycle Manager to provision a new DSE 5.0.3 cluster on Ubuntu 16.04 LTS using the DataStax Public repository.
Both Ops Center and the DSE cluster nodes are running in Amazon EC2
I have configured the Repository in LCM using my DataStax login credentials.
However, LCM is reporting HTTP 401 errors when attempting to access the repository.
2016-11-14 08:02:46,975 [opscenterd] ERROR: Received error from node event-subtype="meld-error" job-id="71c7e70d-3c1d-479b-b1e1-dabb71758c33" name="Cassandra1" ssh-management-address="xxx.xxx.xxx.xxx" node-id="20cbe1cc-61f3-4218-b73d-cdd71167d488" event-type="error" message="Received an HTTP 401 Unauthorized response while attempting to access the package repository. Check your repository credentials." (opscd-pool-0)
Here a couple of screenshots of the Job Details and Event Details screens:
Job Details
Event Details
I've checked that I provided the correct credentials many times now, and am pretty confident I haven't made a mistake.
Furthermore, on one of the nodes where the error is reported, I created a /etc/apt/sources.list.d/datastax.sources.list file with the same credentials, used curl to download the DataStax repository key, and successfully installed the DSE package manually. This suggests my credentials and connectivity to the DataStax repository are fine.
I'm currently a bit stuck, so if anyone can offer any help on how to resolve this it would be much appreciated.
Thanks
Austin

OpsCenter developer here, this was a newly introduced bug in OpsCenter 6.0.4. We added an assertion early in the job to verify that repository credentials were entered correnctly (it previously took longer to fail and gave a more confusing message). Unfortunately, the assertion did not correctly handle certain special characters (like the '#' sign commonly present in datastax-academy account-names). OpsCenter 6.0.5 was released yesterday afternoon as a single-fix release to address this specific issue, and we've improved our test coverage to ensure this kind of issue doesn't slip through again.
Thanks everyone for your detailed reports, this SO thread was one important source of information that helped us characterize the bug to the point where we could fix it promptly.

OpsCenter developer here, I work on LCM. It's hard to know exactly what's up given the information you provided, but some hints:
Post the full content of the job-event when. It might have useful context that you haven't otherwise provided.
Compare the /etc/apt/sources.list.d/datastax.sources.list that you created manually with /etc/apt/sources.list.d/opsc.list that LCM creates automatically. Apt requires that the credentials be provided in the URL, which means that special characters must be escaped. It's possible you have some special character in your password that needs to escaped but isn't. But even if it's not an escaping problem, comparing your manually created file and the automatically created one may give some insight as to where things are going wrong.
Ensure that you're using your Datastax academy credentials from https://academy.datastax.com/, and not something else.

Related

Can't connect to cassandra, authentication error, please carefully check your auth settings, retrying soon

Stuck with the below error while configuring dse address.yaml.
INFO [async-dispatch-1] 2021-04-17 07:50:06,487 Starting DynamicEnvironmentComponent
 ;;;;;
 ;;;;;
 INFO [async-dispatch-1] 2021-04-17 07:50:06,503 Starting monitored database connection.
 ERROR [async-dispatch-1] 2021-04-17 07:50:08,717 Can't connect to Cassandra, authentication error, please carefully check your Auth settings, retrying soon.
 INFO [async-dispatch-1] 2021-04-17 07:50:08,720 Finished starting system.
Configured cassandra user and password in cluster-name.conf & address.yaml as well.
Any advice would be appreciated.
You've provided very little information about the issue you're running into so our ability to assist you is very limited.
In any case, my suspicion is that (a) you haven't configured the correct seed_hosts in cluster_name.conf, or (b) the seeds are unreachable from the agents.
If this is your first time installing OpsCenter, I highly recommend that you let OpsCenter install and configure the agents automatically. When you add the cluster to OpsCenter, you will get prompted with a choice on whether to do this automatically:
For details, see Installing DataStax Agents automatically.
As a side note, we don't recommend you set any properties in address.yaml unless it's a last resort. Wherever possible and in almost all cases, configure the agent properties in cluster_name.conf so it's managed centrally instead of individual nodes.
It's difficult to help you troubleshoot your problem in a Q&A forum. If you're still experiencing issues, my suggestion is to log a DataStax Support ticket so one of our engineers can assist you directly. Cheers!

Is the MemSQL reported version 5.5.8 adjustable?

In [MemSQL documentation FAQ page][1]
[1]: https://docs.memsql.com/v7.0/introduction/faqs/memsql-faq/, it says:
MemSQL reports the version engine variable as 5.5.8. Client drivers look for this version to determine how to interact with the server.
This is understandable, but an unfortunate side effect of this is MemSQL fails the security scan tests by security team and brings up a lots of red flags. In the same page, MemSQL says MemSQL is not necessary impacted by any of MySQL found security vulnerabilities:
The MemSQL and MySQL servers are separate database engines which do not share any code, so security issues in the MySQL server are not applicable to MemSQL.
But red flags are red flags, so I wonder if this reported version is user adjustable so that we can calm the security scan test? But also want to know what are known impacts that could be caused by changes of the reported version of this.
Yes, the "Mysql compatibility" version can be changed via the compat_version global variable. You should set it to the version string you want returned via select ##version (i.e., '8.0.20'). Keep in mind now and then client drivers and mysql applications check this version to enable\disable features so you need to test out the impact of the change on your applications.

ArangoDB - Help diagnosing database corruption after system restart

I've been working with Arango for a few months now within a local, single-node development environment that regularly gets restarted for maintenance reasons. About 5 or 6 times now my development database has become corrupted after a controlled restart of my system. When it occurs, the corruption is subtle in that the Arango daemon seems to start ok and the database structurally appears as expected through the web interface (collections, documents are there). The problems have included the Foxx microservice system failing to upload my validated service code (generic 500 service error) as well as queries using filters not returning expected results (damaged indexes?). When this happens, the only way I've been able to recover is by deleting the database and rebuilding it.
I'm looking for advice on how to debug this issue - such as what to look for in log files, server configuration options that may apply, etc. I've read most of the development documentation, but only skimmed over the deployment docs, so perhaps there's an obvious setting I'm missing somewhere to adjust reliability/resilience? (this is a single-node local instance).
Thanks for any help/advice!
please note that issues like this should rather be discussed on github.

Jenkins error trying to raise on-demand linux ec2 slave

Whenever I try to trigger a job that depends on that ec2 slave, it just stands in queue. I looked at the logs and saw this exception:
com.amazonaws.services.ec2.model.AmazonEC2Exception: Network interfaces and an instance-level security groups may not be specified on the same request
Whenever I click on build executor status on the left, there is a button that says "provision via ". I click on it and see the correct amazon linux image name that I entered under cloud on Jenkins' System Configuration, but when I click on that, I see that same exception as well... I just don't know how to fix this and cannot find any helpful information on this.
Any help would be much appreciated.
Ok, I'm not exactly sure what was causing the error since I don't really know how the Jenkins plugin interfaces with the aws api. But after a good amount of trial and error, I was able to provision the On Demand worker by adding more details/parameters in Configuration, under Cloud.
Adding a subnet ID for the VPC and a IAM Instance profile did the trick (I already had everything else including security groups, availability zone, instance type, etc). So it seems like you either leave out security groups, or go all in and fill in pretty much everything.
As an FYI if you see this with Jenkins EC2 Plugin v1.46 it looks like a genuine bug:
https://issues.jenkins-ci.org/browse/JENKINS-59543
The solution is to use 1.45 until it's fixed (see link above for more details).

Collectd server not writing down received client data

I have pretty strange problem with Collectd. I'm not new to Collectd, was using it for a long time on CentOS based boxes, but now we have Ubuntu TLS 12.04 boxes, and I have really strange issue.
So, using version 5.2 on Ubuntu 12.04 TLS. Two boxes residing on Rackspace (maybe important, but I'm not sure). Network plugin configured using two local IPs, without any firewall in between and without any security (just to try to set simple client server scenario).
On both servers collectd writes in configured folders as it should write, but on server machine it doesn't write data received from client.
Troubleshooted with tcpdump, and I can clearly see UDP traffic and collectd data, including hostname and plugin names from my client machine, received on server, but they are not flushed to appropriate folder (configured by collectd) ever. Also running everything as root user, to avoid troubleshooting permissions.
Anyone has any idea or similar experience with this? Or maybe some idea what could I do for troubleshooting this beside trying to crawl internet (I think I clicked on every sensible link Google gave me in last two days) and checking network layer (which looks fine)?
And just small note: exactly the same happened with official 4.10.2 version from Ubuntu's repo. After trying to troubleshoot it for hours moved to upgrade to version five.
I'd suggest trying out the quite generic troubleshooting procedure based on the csv and logfile plugins, as described in this answer. As everything seems to be fine locally, follow this procedure on the server, activating only the network plugin (in addition to logfile, csv and possibly rrdtool).
So after no way of fixing this, I upgraded my Ubuntu to 12.04.2 LTS (3.2.0-24-virtual) and this just started working fine, without any intervention.

Resources