Auto scaling not happening on Amazon EC2 - linux

I am trying to perform auto scaling on Amazon EC2 with below commands:
elb-create-lb nalb1 --headers --listener "lb-port=80,instance-port=80,protocol=http" --availability-zones us-east-1c
elb-register-instances-with-lb nalb1 --headers --instances i-1ecef57c
elb-configure-healthcheck nalb1 --headers --target "HTTP:80/" --interval 30 --timeout 3 --unhealthy-threshold 2 --healthy-threshold 10
as-create-launch-config nalc1 --image-id ami-cdd306a4 --instance-type t1.micro
as-create-auto-scaling-group naasg1 --launch-configuration nalc1 --availability-zones us-east-1c --min-size 0 --max-size 10 --load-balancers nalb1
as-put-scaling-policy --auto-scaling-group naasg1 --name policy-scaleup --adjustment 100 --type PercentChangeInCapacity
as-put-scaling-policy --auto-scaling-group naasg1 --name policy-scaledown --adjustment=-1 --type ChangeInCapacity
as-create-or-update-trigger nat1 \
--auto-scaling-group naasg1 --namespace "AWS/EC2" \
--measure CPUUtilization --statistic Average \
--dimensions "AutoScalingGroupName=naasg1" \
--period 60 --lower-threshold 30 --upper-threshold 60 \
--lower-breach-increment=-1 --upper-breach-increment=1 \
--breach-duration 120
The following commands describe the status of various parameters once the above commands are hit.
root#domU-12-31-39-09-B8-12 ~# elb-describe-lbs
LOAD_BALANCER nalb1 nalb1-1717211844.us-east-1.elb.amazonaws.com 2012-01-24T09:45:11.440Z
root#domU-12-31-39-09-B8-12 ~# as-describe-launch-configs
LAUNCH-CONFIG nalc1 ami-cdd306a4 t1.micro
root#domU-12-31-39-09-B8-12 ~# as-describe-auto-scaling-groups
AUTO-SCALING-GROUP naasg1 nalc1 us-east-1c nalb1 0 10 0
root#domU-12-31-39-09-B8-12 ~# as-describe-policies
No policies found
root#domU-12-31-39-09-B8-12 ~# as-describe-triggers --auto-scaling-group naasg1
DEPRECATED: This command is deprecated and included only to facilitate migration to the new trigger mechanism. You should use this command for migration purposes only.
TRIGGER nat1 naasg1 NoData AWS/EC2 CPUUtilization Average 60
root#domU-12-31-39-09-B8-12 ~#
Despite all these, auto scaling is not happening
What might be the reason?
Thanks for help

The below command worked :)
elb-create-lb nalb1 --headers --listener "lb-port=80,instance-port=80,protocol=http" --availability-zones us-east-1c
elb-register-instances-with-lb nalb1 --headers --instances i-1ecef57c
elb-configure-healthcheck nalb1 --headers --target "HTTP:80/" --interval 30 --timeout 3 --unhealthy-threshold 2 --healthy-threshold 10
as-create-launch-config nalc1 --image-id ami-cdd306a4 --instance-type t1.micro
as-create-auto-scaling-group naasg1 --launch-configuration nalc1 --availability-zones us-east-1c --min-size 2 --max-size 10 --load-balancers nalb1
as-put-scaling-policy --auto-scaling-group naasg1 --name policy-scaleup --adjustment=2 --type ChangeInCapacity
as-put-scaling-policy --auto-scaling-group naasg1 --name policy-scaledown --adjustment=-1 --type ChangeInCapacity
as-set-desired-capacity naasg1 -c 2
Of course you need to create alarms on CloudWatch and associate these policies to two alarms, each handling step up and step down.

Related

How to increase pytorch timeout?

I'm training ML model with yolov5 this is my command -
python3 -m torch.distributed.run --nproc_per_node 2 train.py --batch 100 --epochs 1000 --data /home/username/Documents/folder_name/numbers.yaml --weights yolov5s.pt --device 0,1 --hyp data/hyps/hyp.scratch-high.yaml --name folder_name --patience 0
It will cut out after 30min, because of the default pytorch timeout 1800s. How can I increase it?
https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group
Thanks

Varnish 6 reload

I've upgraded my varnish from 6.2.x to 6.6.x. Amost everyting works Ok, but no reload.
After "start" ps show:
root 10919 0.0 0.0 18960 5288 ? Ss 22:38 0:00 /usr/sbin/varnishd -j unix,user=vcache -F -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -p thread_pools=8 -p thread_pool_min=100 -p thread_pool_max=4000 -p workspace_client=128k -p workspace_backend=128k -l 200m -S /etc/varnish/secret -s malloc,256m -s static=file,/data/varnish_storage.bin,80g
now I try to reload:
Apr 8 22:42:10 xxx varnishd[10919]: CLI telnet 127.0.0.1 5282 127.0.0.1 6082 Rd auth 0124ef9602b9e6aad2766e52755d02a0d17cd6cfe766304761d21ea058bd8b3b
Apr 8 22:42:10 xxx varnishd[10919]: CLI telnet 127.0.0.1 5282 127.0.0.1 6082 Wr 200 -----------------------------#012Varnish Cache CLI 1.0#012-----------------------------#012Linux,5.4.0-107-generic,x86_64,-junix,-smalloc,-sfile,-sdefa
ult,-hcritbit#012varnish-6.6.1 revision e6a8c860944c4f6a7e1af9f40674ea78bbdcdc66#012#012Type 'help' for command list.#012Type 'quit' to close CLI session.
Apr 8 22:42:10 xxx varnishd[10919]: CLI telnet 127.0.0.1 5282 127.0.0.1 6082 Rd ping
Apr 8 22:42:10 xxx varnishd[10919]: CLI telnet 127.0.0.1 5282 127.0.0.1 6082 Wr 200 PONG 1649450530 1.0
Apr 8 22:42:10 xxx varnishd[10919]: CLI telnet 127.0.0.1 5282 127.0.0.1 6082 Rd vcl.load reload_20220408_204210_11818 /etc/varnish/default.vcl
Apr 8 22:42:15 xxx varnishreload[11818]: VCL 'reload_20220408_204210_11818' compiled
Apr 8 22:42:20 xxx varnishreload[11818]: Command: varnishadm -n '' -- vcl.use reload_20220408_204210_11818
Apr 8 22:42:20 xxx varnishreload[11818]: Rejected 400
Apr 8 22:42:20 xxx varnishreload[11818]: CLI communication error (hdr)
Apr 8 22:42:20 xxx systemd[1]: varnish.service: Control process exited, code=exited, status=1/FAILURE
Apr 8 22:42:20 xxx systemd[1]: Reload failed for Varnish Cache, a high-performance HTTP accelerator.
and now ps shows:
vcache 10919 0.0 0.0 19048 5880 ? SLs 22:38 0:00 /usr/sbin/varnishd -j unix,user=vcache -F -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -p thread_pools=8 -p thread_pool_min=100 -p thread_pool_max=4000 -p workspace_client=128k -p workspace_backend=128k -l 200m -S /etc/varnish/secret -s malloc,256m -s static=file,/data/varnish_storage.bin,80g
vcache 10959 0.4 0.2 84585576 23088 ? SLl 22:39 0:01 /usr/sbin/varnishd -j unix,user=vcache -F -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -p thread_pools=8 -p thread_pool_min=100 -p thread_pool_max=4000 -p workspace_client=128k -p workspace_backend=128k -l 200m -S /etc/varnish/secret -s malloc,256m -s static=file,/data/varnish_storage.bin,80g
I see process owner was changed to vcache. What is wrong with it? anoder reload will fail too with same reject code.
Can you try removing -j unix,user=vcache from your varnishd runtime command. If I remember correctly, Varnish will automatically drop privileges on the worker process without really needing to explicitly set jailing settings.
If that doesn't work, please also explain which commands you used to start Varnish and reload Varnish.

Unable to connect to docker.sock when it exist with permissions running inside docker container

I'm setting up a base image that has docker installed and configured so when I run my jenkins pipeline I can do anchore scanning. I have to pull the anchore image inside the docker image because my pipeline is running on a docker agent. However, even running locally trying to build a docker image and just run a simple hello-world docker container or do a docker pull fails to connect to the docker socket. I added the root user to the docker group, I even chmod 777 and a+xX to the docker.sock. For some reason its in both /run/docker.sock and /var/run/docker.sock it seems it gets symlinked. I'm using ubuntu:18.04-bionic release to build from and installing from the ubuntu repository. The Ubuntu image doesn't have systemd installed and when I install systemd it says it wasn't started with boot which means it has to be installed on boot when the image starts up. I start it with service docker start.
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Processing triggers for dbus (1.12.2-1ubuntu1) ...
/usr/bin/docker
/usr/share/bash-completion/completions/docker
/etc/init.d/docker
/etc/default/docker
/etc/docker
* Starting Docker: docker
...done.
/run/docker.sock
total 32K
drwxr-xr-x 1 root root 4.0K Oct 31 17:49 .
drwxr-xr-x 1 root root 4.0K Oct 31 17:49 ..
drwxr-xr-x 2 dnsmasq nogroup 4.0K Oct 31 17:49 dnsmasq
drwx------ 4 root root 4.0K Oct 31 17:49 docker
-rw-r--r-- 1 root root 6 Oct 31 17:49 docker-ssd.pid
srwxrwxrwx 1 root docker 0 Oct 31 17:49 docker.sock
drwxrwxrwt 2 root root 4.0K Oct 18 21:02 lock
drwxr-xr-x 2 root root 4.0K Oct 18 21:02 mount
drwxr-xr-x 2 root root 4.0K Oct 19 00:47 systemd
-rw-rw-r-- 1 root utmp 0 Oct 18 21:02 utmp
total 32K
drwxr-xr-x 1 root root 4.0K Oct 31 17:49 .
drwxr-xr-x 1 root root 4.0K Oct 31 17:49 ..
drwxr-xr-x 2 dnsmasq nogroup 4.0K Oct 31 17:49 dnsmasq
drwx------ 4 root root 4.0K Oct 31 17:49 docker
-rw-r--r-- 1 root root 6 Oct 31 17:49 docker-ssd.pid
srwxrwxrwx 1 root docker 0 Oct 31 17:49 docker.sock
drwxrwxrwt 2 root root 4.0K Oct 18 21:02 lock
drwxr-xr-x 2 root root 4.0K Oct 18 21:02 mount
drwxr-xr-x 2 root root 4.0K Oct 19 00:47 systemd
-rw-rw-r-- 1 root utmp 0 Oct 18 21:02 utmp
docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
See 'docker run --help'.
My Dockerfile
FROM ubuntu:bionic
#requirements
#docker
#kubectl
#terraform
#kops
#mysql
#systemd
ENV DEBIAN_FRONTEND=noninteractive \
NVM_VERSION=0.33.11 \
NODE_VERSION=9.11.1
RUN set -e && \
echo "NODE_VERSION: $NODE_VERSION" && \
apt-get update --yes && \
apt-get install git \
gnupg \
wget \
curl \
apt-utils \
gcc \
g++ \
make \
build-essential \
nginx \
python \
vim \
gnupg \
gnupg2 \
net-tools \
software-properties-common \
npm \
curl \
libxss1 \
libappindicator1 \
libindicator7 \
apt-utils \
fonts-liberation \
xfonts-cyrillic \
xfonts-100dpi \
xfonts-75dpi \
xfonts-base \
xfonts-scalable \
libappindicator3-1 \
libasound2 \
libatk-bridge2.0-0 \
libgtk-3-0 \
libnspr4 \
libnss3 \
libx11-xcb1 \
libxtst6 \
xdg-utils \
lsb-release \
xvfb \
python-pip \
default-jre \
gtk2-engines-pixbuf -y && \
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb && \
dpkg -i google-chrome*.deb && \
NVM_DIR="$HOME/.nvm" && \
PROFILE="$HOME/.profile" && \
git clone --branch "v$NVM_VERSION" --depth 1 https://github.com/creationix/nvm.git "$NVM_DIR" && \
echo >> "$PROFILE" && \
echo 'export NVM_DIR="$HOME/.nvm"' >> "$PROFILE" && \
echo '[ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh" # This loads nvm' >> "$PROFILE" && \
echo '[ -s "$NVM_DIR/bash_completion" ] && . "$NVM_DIR/bash_completion" # This loads nvm bash_completion' >> "$PROFILE" && \
. $NVM_DIR/nvm.sh && \
nvm install $NODE_VERSION && \
apt-get install npm --yes && \
rm -rf /usr/lib/openssh/ssh-keysign && \
mkdir -p /tmp/nginx && \
pip install awscli && \
wget https://github.com/kubernetes/kops/releases/download/1.10.0/kops-linux-amd64 && \
wget https://storage.googleapis.com/kubernetes-release/release/v1.8.4/bin/linux/amd64/kubectl && \
cp kops-linux-amd64 /usr/local/bin/kops && \
cp kubectl /usr/local/bin/kubectl && \
chmod a+xX /usr/local/bin/kubectl && \
chmod a+xX /usr/local/bin/kops && \
apt install docker.io -y && \
find / -name 'docker' && \
usermod -aG docker root && \
service docker start && \
find / -name 'docker.sock' && \
chmod a+xX /run/docker.sock && \
chmod 777 /run/docker.sock && \
# this doens't work because it exists in var/run as well which doesn't make sense because its not there in the find command. ln -s /run/docker.sock /var/run/docker.sock && \
ls -lah /run/ && \
ls -lah /var/run/ && \
docker run hello-world
RUN npm install --global lerna
EXPOSE 80
Tried everything I can think of. Looking for ideas...
If you’re just trying to access the Docker daemon on the host, you don’t need to also (attempt to) start the daemon inside the container; you just need a compatible /usr/bin/docker, and to use docker run -v to bind-mount the host’s Docker socket into the container at startup time.
If you need to do basic Docker operations (docker pull, docker build, docker push) then the usual approach is to use the host’s Docker daemon and not try to run your own. There are a couple of old blog posts advising against running Docker inside Docker; it’s theoretically possible but leads to confusing questions about “which Docker am I talking to” and the setup is difficult in any case.
(All of the following statements are about 80% true: you can’t start a background daemon in a Dockerfile; you can’t service or systemctl anything inside Docker ever; you can’t run the Docker daemon inside a Docker container. Trying to work around these usually isn’t a best practice.)

Spark GraphX Out of memory error

I am running GraphX on Spark with input file size of around 100GB on aws EMR.
My cluster configuration is as follows
Nodes - 10
Memory - 122GB each
HDD - 320GB each
No matter what I do I'm getting out of memory error when I run spark job as
spark-submit --deploy-mode cluster \
--class com.news.ncg.report.graph.NcgGraphx \
ncgaka-graphx-assembly-1.0.jar true s3://<bkt>/<folder>/run=2016-08-19-02-06-20/part* output
Error
AM Container for appattempt_1474446853388_0001_000001 exited with exitCode: -104
For more detailed output, check application tracking page:http://ip-172-27-111-41.ap-southeast-2.compute.internal:8088/cluster/app/application_1474446853388_0001Then, click on links to logs of each attempt.
Diagnostics: Container [pid=7902,containerID=container_1474446853388_0001_01_000001] is running beyond physical memory limits. Current usage: 1.4 GB of 1.4 GB physical memory used; 3.4 GB of 6.9 GB virtual memory used. Killing container.
Dump of the process-tree for container_1474446853388_0001_01_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 7907 7902 7902 7902 (java) 36828 2081 3522265088 359788 /usr/lib/jvm/java-openjdk/bin/java -server -Xmx1024m -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1474446853388_0001/container_1474446853388_0001_01_000001/tmp -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1474446853388_0001/container_1474446853388_0001_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class com.news.ncg.report.graph.NcgGraphx --jar s3://discover-pixeltoucher/jar/ncgaka-graphx-assembly-1.0.jar --arg true --arg s3://discover-pixeltoucher/ncgus/run=2016-08-19-02-06-20/part* --arg s3://discover-pixeltoucher/output/20160819/ --properties-file /mnt/yarn/usercache/hadoop/appcache/application_1474446853388_0001/container_1474446853388_0001_01_000001/__spark_conf__/__spark_conf__.properties
|- 7902 7900 7902 7902 (bash) 0 0 115810304 687 /bin/bash -c LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native /usr/lib/jvm/java-openjdk/bin/java -server -Xmx1024m -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1474446853388_0001/container_1474446853388_0001_01_000001/tmp '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1474446853388_0001/container_1474446853388_0001_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'com.news.ncg.report.graph.NcgGraphx' --jar s3://discover-pixeltoucher/jar/ncgaka-graphx-assembly-1.0.jar --arg 'true' --arg 's3://discover-pixeltoucher/ncgus/run=2016-08-19-02-06-20/part*' --arg 's3://discover-pixeltoucher/output/20160819/' --properties-file /mnt/yarn/usercache/hadoop/appcache/application_1474446853388_0001/container_1474446853388_0001_01_000001/__spark_conf__/__spark_conf__.properties 1> /var/log/hadoop-yarn/containers/application_1474446853388_0001/container_1474446853388_0001_01_000001/stdout 2> /var/log/hadoop-yarn/containers/application_1474446853388_0001/container_1474446853388_0001_01_000001/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt
Any idea how can I stop getting this error?
I created sparkSession as below
val spark = SparkSession
.builder()
.master(mode)
.config("spark.hadoop.validateOutputSpecs", "false")
.config("spark.driver.cores", "1")
.config("spark.driver.memory", "30g")
.config("spark.executor.memory", "19g")
.config("spark.executor.cores", "5")
.config("spark.yarn.executor.memoryOverhead","2g")
.config("spark.yarn.driver.memoryOverhead ","1g")
.config("spark.shuffle.compress","true")
.config("spark.shuffle.service.enabled","true")
.config("spark.scheduler.mode","FAIR")
.config("spark.speculation","true")
.appName("NcgGraphX")
.getOrCreate()
It seems like you want to deploy your Spark application on YARN. If that is the case, you should not set up application properties in code, but rather using spark-submit:
$ ./bin/spark-submit --class com.news.ncg.report.graph.NcgGraphx \
--master yarn \
--deploy-mode cluster \
--driver-memory 30g \
--executor-memory 19g \
--executor-cores 5 \
<other options>
ncgaka-graphx-assembly-1.0.jar true s3://<bkt>/<folder>/run=2016-08-19-02-06-20/part* output
In client mode, the JVM would have been already set up, so I would personally use CLI to pass those options.
After passing memory options in spark-submit change your code to load variables dynamically: SparkSession.builder().getOrCreate()
PS. You might also want to increase memory for AM in spark.yarn.am.memory property.

Snort http_inspect preprocessor will not alert to traffic

I am currently testing the Snort IDS for a project, I followed the Snort 2.9.5.3 installation guide. I am having an issue to correctly configure http_inspect so that it alerts to traffic.
The (virtual) network Snort is monitoring consists of it, an Ubuntu machine running DVWA (192.168.9.30) and a Kali Linux VM (192.168.9.20). I have created a local rule for any packet's contents of /etc/passwd. This rule has detected fragmented packets sent from the Kali VM to the DVWA VM (using file inclusion)
I believe I have configured the http_inspect to generate alerts for URL encoding, multiply slashes and self-referencing (see below). After running the evasion methods I check the terminal output from Snort and it shows that it did detect the use of these methods but it doesn't generate an alert.
snort.conf
# HTTP normalization and anomaly detection. For more information, see README.http_inspect
preprocessor http_inspect: global iis_unicode_map unicode.map 1252 compress_depth 65535 decompress_depth 65535
preprocessor http_inspect_server: server default \
http_methods { GET POST PUT SEARCH MKCOL COPY MOVE LOCK UNLOCK NOTIFY POLL BCOPY BDELETE BMOVE LINK UNLINK OPTIONS HEAD DELETE TRACE TRACK CONNECT SOURCE SUBSCRIBE UNSUBSCRIBE PROPFIND PROPPATCH BPROPFIND BPROPPATCH RPC_CONNECT PROXY_SUCCESS BITS_POST CCM_POST SMS_POST RPC_IN_DATA RPC_OUT_DATA RPC_ECHO_DATA } \
chunk_length 500000 \
server_flow_depth 0 \
client_flow_depth 0 \
post_depth 65495 \
oversize_dir_length 500 \
max_header_length 750 \
max_headers 100 \
max_spaces 200 \
small_chunk_length { 10 5 } \
ports { 36 80 81 82 83 84 85 86 87 88 89 90 311 383 591 593 631 801 818 901 972 1158 1220 1414 1533 1741 1830 2301 2381 2809 3029 3037 3057 3128 3443 3702 4000 4343 4848 5117 5250 6080 6988 7000 7001 7144 7145 7510 7770 7777 7779 8000 8008 8014 8028 8080 8082 8085 8088 8090 8118 8123 8180 8181 8222 8243 8280 8300 8500 8509 8800 8888 8899 9000 9060 9080 9090 9091 9443 9999 10000 11371 12601 34443 34444 41080 50000 50002 55252 55555 } \
non_rfc_char { 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 } \
enable_cookie \
extended_response_inspection \
inspect_gzip \
normalize_utf \
unlimited_decompress \
normalize_javascript \
apache_whitespace no \
ascii yes \
bare_byte no \
directory yes \
double_decode yes \
iis_backslash no \
iis_delimiter no \
iis_unicode no \
multi_slash yes \
utf_8 yes \
u_encode yes \
webroot no
Local rule
alert tcp any any -> 192.168.9.30 80 (msg:"Potential File Inclusion of /etc/passwd"; flow:to_server,established; classtype:attempted-recon; content:"/etc/passwd"; nocase; sid:1122; rev:1;)
Discovered the answer, more through luck. Turns out the rule I have supplied in the question needed a slight modification over the snort.conf file. The 'content' field needed to be changed to 'uricontent'. With this modification the http_inspect pre-processor will examine the URI field of any packets examined.
Click here for more detail

Resources