I have a persistent volume in my cluster (Azure disk) that contains 8Gi.
I resized it to contain 9Gi, then changed my PV yaml to 9Gi as well (since it is not updated automatically) and everything worked fine.
Then I made a test and changed the yaml of my PV to 1000Gi (and expected to see an error) and received error from my pvc that claims this PV: "NodeExpand failed to expand the volume : rpc error: code = Internal desc = resize requested for 10, but after resizing volume size was 9"
However, if I typed kubectl get pv, it is still looks like this PV capacity is 1000Gi (and of course that in Azure this is still 9Gi since I not resized it).
Any advice?
As a general rule: you should not have to change anything on your PersistentVolumes.
When you request more space, editing a PersistentVolumeClaim: a controller (either CSI, or in-tree driver/kube-controllers) would implement that change against your storage provider (ceph, aws, ...).
Once done expanding the backend volume, that same controller would update the corresponding PV. At which point, you may (or might not) have to restart the Pods attached to your volume, for its filesystem to be grown.
While I'm not certain how to fix the error you saw: one way to avoid those would be to refrain from editing PVs.
Related
I have set up a custom docker image registry on Gitlab and AKS for some reason fails to pull the image from there.
Error that is being thrown out is:
Failed to pull image "{registry}/{image}:latest": rpc error: code = FailedPrecondition desc =
failed to pull and unpack image "{registry}/{image}:latest": failed commit on ref "layer-sha256:e1acddbe380c63f0de4b77d3f287b7c81cd9d89563a230692378126b46ea6546": "layer-sha256:e1acddbe380c63f0de4b77d3f287b7c81cd9d89563a230692378126b46ea6546" failed size validation: 0 != 27145985: failed precondition
What is interesting is that the image does not have the layer with id
sha256:e1acddbe380c63f0de4b77d3f287b7c81cd9d89563a230692378126b46ea6546
Perhaps something is cached on AKS side? I deleted the pod along with the deployment before redeploying.
I couldn't find much about this kind of errors and I have no idea what may be causing that. Pulling the same image from local docker environment works flawlessly.
Any tip would be much appreciated!
• You can try scaling up the registry to run on all nodes. Kubernetes controller tries to be smart and routes node requests internally, instead of sending traffic to the loadbalancer IP. The issue though that if there is no registry service on that node, the packets go nowhere. So, scale up or route through a non-AKS LB.
• Also, clean the image layer cache folder in ${containerd folder}/io.containerd.content.v1.content/ingest.Containerd would not clean this cache automatically when some layer data broken. You can also try purging the contents in this path ${containerd folder}/io.containerd.content.v1.content/ingest.
• Might be this can be a TCP network connection issue between the AKS cluster and the docker image registry on Gitlab, so you can try using a proxy and configure it to close the connection between them after ‘X’ bytes of data are transferred as the retry of the pull starts over at 0% for the layer which then results in the same error because after some time we get a connection close and the layer was again not pulled completely. So, will recommend using a registry which is located near their cluster to have the higher throughput.
• Also try restarting the communication pipeline between AKS cluster and the docker image registry on gitlab, it fixes this issue for the time being until it re-occurs.
Please find the below link for more information: -
https://docs.gitlab.com/ee/user/packages/container_registry/
Sorry for the noob question... I'm trying to figure out a way to have shared resources between my tf scripts, but I can't find anything, so probably I'm looking for the wrong keywords...
Let's say I have 3 scripts:
base/base.tf
one/one.tf
two/two.tf
base creates an aws vpc and a network load balancer
one and two are two ecs fargate services. they create the task definition and add the mappind to the network load balancer.
My goal is to have something to keep track of the mapped port in the load balancer and read it and update from one and two.
Something like
base sets last_port to 14000
one reads last_port, increases by 1 and updates the value
two reads last_port, increases by 1 and updates the value
Is it possible at all ?
thanks
The general solution to this problem in Terraform is Data Sources, which are special resources that retrieve data from elsewhere rather than creating and managing objects themselves.
In order to use data sources, the data you wish to share must be published somewhere. For sharing between Terraform configurations, you need a data storage location that can be both written to and read from by Terraform.
Since you mentioned ECS Fargate I will assume you're using AWS, in which case a reasonable choice is to store your data in AWS SSM Parameter Store and then have other configurations read it out.
The configuration that creates the data would use the aws_ssm_parameter resource type to create a new parameter:
resource "aws_ssm_parameter" "foo" {
name = "port_number"
type = "String"
value = aws_lb_listener.example.port
}
The configurations that will make use of this data can then read it using the corresponding data source:
data "aws_ssm_parameter" "foo" {
name = "port_number"
}
However, your question talks about the possibility of one configuration reading the value, incrementing it, and writing the new value back into the same place. That is not possible with Terraform because Terraform is a declarative system that works with descriptions of a static desired state. Only one configuration can be managing each object, though many configurations can read an an object.
Instead of dynamically allocating port numbers then, Terraform will require one of two solutions:
Use some other system to manage the port allocations persistently such that once a port number is allocated for a particular caller it will always get the same port number. I don't know of any existing system that is built for this, so this may not be a tenable option in this case, but if such a system did exist then we'd model it in Terraform with a resource type representing a particular port allocation, which Terraform can eventually destroy along with all of the other infrastructure when the port is no longer needed.
Decide on a systematic way to assign consistent port numbers to each system such that each configuration can just know (either by hard-coding or by some deterministic calculation) which port number it should use.
I have a Node.js task that converts values from my database to MP3 files, then uploads them to s3 storage. The code works beautifully when executing it on my laptop. I decided to migrate it to Lambda so I can run it automatically every couple hours. I made a few minor modifications, and again, it works great. But here's the catch: it's only working when my RDS instance is set to allow connections from ANY IP. Obviously, that's an unacceptable security risk.
I put my database and Lambda code in the same VPC and security group, but even so, my code wouldn't connect to S3. Then, I added an endpoint for S3, and it looked like everything was working per my console logs. However, the file in S3 storage is empty (0 bytes).
What do I need to change? I've heard that I might need to configure my VPC to have internet access, but I'm not sure if that's what I need to do. And honestly, those tutorials seem confusing to me.
Can someone point me in the right direction?
It is a known problem (known by users, not really acknowledged by AWS that I've seen) The lambda vps docs say:
http://docs.aws.amazon.com/lambda/latest/dg/vpc.html
"When a Lambda function is configured to run within a VPC, it incurs
an additional ENI start-up penalty. This means address resolution may
be delayed when trying to connect to network resources."
And
"If your Lambda function accesses a VPC, you must make sure that your
VPC has sufficient ENI capacity to support the scale requirements of
your Lambda function. "
Source: https://forums.aws.amazon.com/thread.jspa?messageID=767285
This means it has serious drawbacks that make it unworkable:
speed penalty
have to manually setup scaling
have to pay for NAT gateway 0.059 per hour (https://aws.amazon.com/vpc/pricing/)
So I'm trying to configure OpenNMS to check the disk space on my linux servers.
After some work I got it to check one server through SNMP :
I installed snmpd on the server I'm monitoring, defined a threshold(in fact I use the predefined default one) and connected it to an event that triggers when ns-dskPercent goes to high. up until here all went well.
Now I added a second server, installed the same stuff on it, it seems to monitor the snmp daemon and notifies me when the service is down, but it doesn't seem to see the threshold.
When I make changes in the threshold - for example lower it to 20% in order to force it to trigger - only the first server sees that it changed (and also gives a notification that the configuration has changed) and fires the alarm, but the second server doesn't respond.
(These are the notifications I get on the first server:)
High threshold rearmed for SNMP datasource ns-dskPercent on interface
xxx.xxx.xxx.xxx, parms: label="/" ds="ns-dskPercent" description="ns-dskPercent"
value="NaN (the threshold definition has been changed)" instance="1"
instanceLabel="_root_fs" resourceId="node[9].dskIndex[_root_fs]"
threshold="20.0" trigger="1" rearm="75.0" reason="Configuration has been changed"
High threshold exceeded for SNMP datasource ns-dskPercent on interface
xxx.xxx.xxx.xxx, parms: label="/" ds="ns-dskPercent" description="ns-dskPercent"
value="52" instance="1" instanceLabel="_root_fs"
resourceId="node[9].dskIndex[_root_fs]" threshold="20.0" trigger="1" rearm="75.0"
Any ideas why or how I can make the second server to respond also?
The issue could be based upon the source of the data collected. Thresholding in modern versions of OpenNMS (14+) is evaluated inline and in memory as data is collected, so you must ensure that the threshold is evaluated against the exact metrics the node you are interested in contains.
There are usually two forms that file system metrics on linux systems come in- mib2 use of the host resources table (hrStorageSize, etc in $OPENNMS_HOME/etc/datacollection/mib2.xml) or net-snmp metrics from the net-snmp MIB (ns-dskTotal, etc in $OPENNMS_HOME/etc/datacollection/netsnmp.xml).
So, first verify that you are getting good data from the new server and that it is, indeed, collecting metrics from the same MIB table that you seek to threshold against.
As a new user of SSH and the Amazon AWS EC2 Dashboard, I am trying to test to see whether I can, in one instance, save data onto a volume, then access that data from another instance by adding the volume to the instance (after terminating the first instance).
When I create the first instance, the AMI is "Amazon Linux AMI 2014.03.2 (HVM)" and the family is "general purpose" with EBS storage only. I automatically assign a public IP address to the instance. I configure the root volume so that it does NOT delete on termination.
As soon as the instance is launched, I open up PuTTY and set the host name to the instance's Public IP Address under Port 22, and authenticate using a private key saved onto the disc that I have already generated earlier.
Upon signing into the instance, I create a text file by typing the following code:
echo "testing">test.txt
I then confirm that the text "testing" is saved to the file "test.txt":
less test.txt
I see the text "testing", thus confirming that it is saved to the file. (I am assuming at this point that it is saved onto the volume, but I am not entirely sure.)
I then proceed to terminate the instance. I launch another one using the same AMI, same instance type, and a different public IP address. In addition to the root volume, I attempt to add the volume that was used as the root volume for the previous instance. (Oddly enough, the snapchat IDs for the previous volume and the root volume of the new instance are identical.) In addition, I use the same tag instance, the same security group and the same key pair as the previous instance.
I open up PuTTY again, this time using the Public IP Address of the new instance, but still using the same private key and port used for the previous instance. Opening logging in, I type:
less test.txt
but I am greeted with this message:
test.txt: No such file or directory
Is there any advice that anyone can offer me regarding this issue. Is it even possible to store a text file onto a volume? If so, am I performing this operation incorrectly?
As the secondary volume has the same UUID and the Amazon Linux used UUID based identification for root, then there might be a chance that the secondary volume was taken as the root volume. This may be the reason why there would be a mess up in choosing the root volume and the initial attempt to find test.txt would fail.
The reboot might have allowed it to take a different order which is why you were able to find it.