Kafka Consumer optimization and should I use Apache Spark? - apache-spark

I have written a Kafka consumer to consume encrypted data(~1MB) stream and decrypt them before adding them to the S3 bucket. It takes ~20mins to process 1000 records, and if I remove the encryption logic and run the same, it takes less than 3mins to process 1000records.
The following are the configs I am currently using.
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = latest
check.crcs = true
client.dns.lookup = use_all_dns_ips
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
internal.throw.on.fetch.stable.offset.unsupported = false
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1000000
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 655360
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
send.buffer.bytes = 131072
session.timeout.ms = 10000
value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
The topic has 10partitions. I consumed with multiple consumers(1-10) by assigning them to the same consumer group. But no matter how many consumers I use, it consumed the same amount of data in the given time.
How do I make consumers faster? And can Apache Spark help in this?

Related

Can't run the spacy spancat (spancategorizer) model?

I am trying to train the spancat model without luck.
I am getting:
ValueError: [E143] Labels for component 'spancat' not initialized. This can be fixed by calling add_label, or by providing a representative batch of examples to the component's 'initialize' method.
I did convert my NER ents to spans:
def main(loc: Path, lang: str, span_key: str):
"""
Set the NER data into the doc.spans, under a given key.
The SpanCategorizer component uses the doc.spans, so that it can work with
overlapping or nested annotations, which can't be represented on the
per-token level.
"""
nlp = spacy.blank(lang)
docbin = DocBin().from_disk(loc)
docs = list(docbin.get_docs(nlp.vocab))
for doc in docs:
doc.spans[span_key] = list(doc.ents)
DocBin(docs=docs).to_disk(loc)
Here is my config file:
[paths]
train = null
dev = null
vectors = null
init_tok2vec = null
[system]
gpu_allocator = null
seed = 444
[nlp]
lang = "en"
pipeline = ["tok2vec","spancat"]
batch_size = 1000
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"#tokenizers":"spacy.Tokenizer.v1"}
[components]
[components.spancat]
factory = "spancat"
max_positive = null
scorer = {"#scorers":"spacy.spancat_scorer.v1"}
spans_key = "sc"
threshold = 0.5
[components.spancat.model]
#architectures = "spacy.SpanCategorizer.v1"
[components.spancat.model.reducer]
#layers = "spacy.mean_max_reducer.v1"
hidden_size = 128
[components.spancat.model.scorer]
#layers = "spacy.LinearLogistic.v1"
nO = null
nI = null
[components.spancat.model.tok2vec]
#architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
upstream = "*"
[components.spancat.suggester]
#misc = "spacy.ngram_suggester.v1"
sizes = [1,2,3]
[components.tok2vec]
factory = "tok2vec"
[components.tok2vec.model]
#architectures = "spacy.Tok2Vec.v2"
[components.tok2vec.model.embed]
#architectures = "spacy.MultiHashEmbed.v2"
width = ${components.tok2vec.model.encode.width}
attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
rows = [5000,1000,2500,2500]
include_static_vectors = true
[components.tok2vec.model.encode]
#architectures = "spacy.MaxoutWindowEncoder.v2"
width = 256
depth = 8
window_size = 1
maxout_pieces = 3
[corpora]
[corpora.dev]
#readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null
[corpora.train]
#readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null
[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
max_epochs = 70
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1600
max_steps = 20000
eval_frequency = 200
frozen_components = []
annotating_components = []
before_to_disk = null
[training.batcher]
#batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
get_length = null
[training.batcher.size]
#schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001
t = 0.0
[training.logger]
#loggers = "spacy.ConsoleLogger.v1"
progress_bar = false
[training.optimizer]
#optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
learn_rate = 0.001
[training.score_weights]
spans_sc_f = 1.0
spans_sc_p = 0.0
spans_sc_r = 0.0
[pretraining]
[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null
[initialize.components]
[initialize.tokenizer]
I am using the "sc" key. Please advise how to solve it.
I have solved it using the following function, but one should address the spans Span(doc, start, end, label) according to the project/text for their task. It worked for me because all the text (a few words in my case) are labeled with a label and this is my need.
def convert_to_docbin(input, output_path="./train.spacy", lang='en'):
""" Convert a pair of text annotations into DocBin then save """
# Load a new spacy model:
nlp = spacy.blank(lang)
# Create a DocBin object:
db = DocBin()
for text, annotations in input: # Data in previous format
doc = nlp(text)
ents = []
spans = []
for start, end, label in annotations: # Add character indexes
spans.append(Span(doc, 0, len(doc), label=label))
span = doc.char_span(start, end, label=label)
ents.append(span)
doc.ents = ents # Label the text with the ents
group = SpanGroup(doc, name="sc", spans=spans)
doc.spans["sc"] = group
db.add(doc)
db.to_disk(output_path)

How to solve issue of maximum iteration getting exceeded in python gekko in this case (explained in the body)?

I am trying to track multiple set-points in the case of interacting quadruple tank system process. Here, the upper limits of tanks are 25 and lower limits are 0. I want to track the set-point values of 5,12,7 and 5. Although, I am able to track the initial 3 set-points (i.e. 5,12 and 7), I am not able to track the last set-point due to solver exceeding max. iterations. I have attached the code below->
#MHE+MPC model
#to measure computational time of the code
start=time.time()
#Process Model
p = GEKKO(remote=False)
process=0
p.time = [0,0.5]
noise = 0.25
#Constants
g = 981
g1 = .9
g2 = .9
A1=32
A3=32
A2=32
A4=32
a1=0.057
a3=0.057
a2=0.057
a4=0.057
init_h=5
#Controlled process variables
p.h1=p.SV(lb=0,ub=25)
p.h2=p.SV(lb=0,ub=25)
p.h3=p.SV(lb=0,ub=25)
p.h4=p.SV(lb=0,ub=25)
#Manipulated process variables
p.v1=p.MV(value=3.15,lb=0.1,ub=8)
p.v2=p.MV(value=3.15,lb=0.1,ub=8)
#Parameters of process
p.k1=p.Param(value=3.14,lb=0,ub=10)
p.k2=p.Param(value=3.14,lb=0,ub=10)
#Equations process
p.Equation(A1*p.h1.dt()==a3*((2*g*p.h3)**0.5)-(a1*((2*g*p.h1)**0.5))+(g1*p.k1*p.v1))
p.Equation(A2*p.h2.dt()==a4*((2*g*p.h4)**0.5)-(a2*((2*g*p.h2)**0.5))+(g2*p.k2*p.v2))
p.Equation(A3*p.h3.dt()==-a3*((2*g*p.h3)**0.5)+((1-g2)*p.k2*p.v2))
p.Equation(A4*p.h4.dt()==-a4*((2*g*p.h4)**0.5)+((1-g1)*p.k1*p.v1))
#options
p.options.IMODE = 4
#p.h1.TAU=-10^10
#p.h2.TAU=-10^10
#%% MHE Model
m = GEKKO(remote=False)
#prediction horizon
m.time = np.linspace(0,40,41) #0-20 by 0.5 -- discretization must match simulation
#MHE control, manipulated variables and parameters
m.h1=m.CV(lb=0,ub=25)
m.h2=m.CV(lb=0,ub=25)
m.h3=m.SV(lb=0,ub=25)
m.h4=m.SV(lb=0,ub=25)
m.v1=m.MV(value=3.15,lb=0.10,ub=8)
m.v2=m.MV(value=3.15,lb=0.10,ub=8)
m.k1=m.FV(value=3.14,lb=0,ub=10)
m.k2=m.FV(value=3.14,lb=0,ub=10)
#m.h1.TAU=0
#m.h2.TAU=0
#Equations
m.Equation(A1*m.h1.dt()==a3*((2*g*m.h3)**0.5)-(a1*((2*g*m.h1)**0.5))+(g1*m.k1*m.v1))
m.Equation(A2*m.h2.dt()==a4*((2*g*m.h4)**0.5)-(a2*((2*g*m.h2)**0.5))+(g2*m.k2*m.v2))
m.Equation(A3*m.h3.dt()==-a3*((2*g*m.h3)**0.5)+((1-g2)*m.k2*m.v2))
m.Equation(A4*m.h4.dt()==-a4*((2*g*m.h4)**0.5)+((1-g1)*m.k1*m.v1))
#Options
m.options.IMODE = 5 #MHE
m.options.EV_TYPE = 2
# STATUS = 0, optimizer doesn't adjust value
# STATUS = 1, optimizer can adjust
m.v1.STATUS = 0
m.v2.STATUS = 0
m.k1.STATUS=1
m.k2.STATUS=1
m.h1.STATUS = 1
m.h2.STATUS = 1
#m.h3.STATUS = 0
#m.h4.STATUS = 0
# FSTATUS = 0, no measurement
# FSTATUS = 1, measurement used to update model
m.v1.FSTATUS = 1
m.v2.FSTATUS = 1
m.k1.FSTATUS=0
m.k2.FSTATUS=0
m.h1.FSTATUS = 1
m.h2.FSTATUS = 1
m.h3.FSTATUS = 1
m.h4.FSTATUS = 1
#m.options.MAX_ITER=1000
m.options.SOLVER=3
m.options.NODES=3
#%% MPC Model
c = GEKKO(remote=False)
c.time = np.linspace(0,10,11) #0-5 by 0.5 -- discretization must match simulation
c.v1=c.MV(value=3.15,lb=0.10,ub=8)
c.v2=c.MV(value=3.15,lb=0.10,ub=8)
c.k1=c.FV(value=3.14,lb=0,ub=10)
c.k2=c.FV(value=3.14,lb=0,ub=10)
#Variables
c.h1=c.CV(lb=0,ub=25)
c.h2=c.CV(lb=0,ub=25)
c.h3=c.SV(lb=0,ub=25)
c.h4=c.SV(lb=0,ub=25)
#Equations
c.Equation(A1*c.h1.dt()==a3*((2*g*c.h3)**0.5)-(a1*((2*g*c.h1)**0.5))+(g1*c.k1*c.v1))
c.Equation(A2*c.h2.dt()==a4*((2*g*c.h4)**0.5)-(a2*((2*g*c.h2)**0.5))+(g2*c.k2*c.v2))
c.Equation(A3*c.h3.dt()==-a3*((2*g*c.h3)**0.5)+((1-g2)*c.k2*c.v2))
c.Equation(A4*c.h4.dt()==-a4*((2*g*c.h4)**0.5)+((1-g1)*c.k1*c.v1))
#Options
c.options.IMODE = 6 #MPC
c.options.CV_TYPE = 2
# STATUS = 0, optimizer doesn't adjust value
# STATUS = 1, optimizer can adjust
c.v1.STATUS = 1
c.v2.STATUS = 1
c.k1.STATUS=0
c.k2.STATUS=0
c.h1.STATUS = 1
c.h2.STATUS = 1
#c.h3.STATUS = 0
#c.h4.STATUS = 0
# FSTATUS = 0, no measurement
# FSTATUS = 1, measurement used to update model
c.v1.FSTATUS = 0
c.v2.FSTATUS = 0
c.k1.FSTATUS=1
c.k2.FSTATUS=1
c.h1.FSTATUS = 1
c.h2.FSTATUS = 1
c.h3.FSTATUS = 1
c.h4.FSTATUS = 1
sp=5
c.h1.SP=sp
c.h2.SP=sp
p1 = GEKKO(remote=False)
p1.time = [0,0.5]
#Parameters
p1.h1=p1.CV(lb=0,ub=25)
p1.h2=p1.CV(lb=0,ub=25)
p1.h3=p1.CV(lb=0,ub=25)
p1.h4=p1.CV(lb=0,ub=25)
p1.v1=p1.MV(value=3.15,lb=0.1,ub=8)
p1.v2=p1.MV(value=3.15,lb=0.1,ub=8)
p1.k1=p1.Param(lb=0,ub=10,value=3.14)
p1.k2=p1.Param(lb=0,ub=10,value=3.14)
#Equations
p1.Equation(A1*p1.h1.dt()==a3*((2*g*p1.h3)**0.5)-a1*((2*g*p1.h1)**0.5)+g1*p1.k1*p1.v1)
p1.Equation(A2*p1.h2.dt()==a4*((2*g*p1.h4)**0.5)-a2*((2*g*p1.h2)**0.5)+g2*p1.k2*p1.v2)
p1.Equation(A3*p1.h3.dt()==-a3*((2*g*p1.h3)**0.5)+(1-g2)*p1.k2*p1.v2)
p1.Equation(A4*p1.h4.dt()==-a4*((2*g*p1.h4)**0.5)+(1-g1)*p1.k1*p1.v1)
#options
p1.options.IMODE = 4
#%% problem configuration
# number of cycles
cycles = 480
# noise level
#%% run process, estimator and control for cycles
h1_meas = np.empty(cycles)
h2_meas =np.empty(cycles)
h3_meas =np.empty(cycles)
h4_meas=np.empty(cycles)
h1_est = np.empty(cycles)
h2_est = np.empty(cycles)
h3_est = np.empty(cycles)
h4_est = np.empty(cycles)
h1_plant=np.empty(cycles)
h2_plant=np.empty(cycles)
h3_plant=np.empty(cycles)
h4_plant=np.empty(cycles)
h1_measured=np.empty(cycles)
h2_measured=np.empty(cycles)
h3_measured=np.empty(cycles)
h4_measured=np.empty(cycles)
v1_est = np.empty(cycles)
v2_est = np.empty(cycles)
k1_est = np.empty(cycles)
k2_est = np.empty(cycles)
u_cont_k1 = np.empty(cycles)
u_cont_k2 = np.empty(cycles)
sp_store = np.empty(cycles)
sum_est=np.empty(cycles)
sum_model=np.empty(cycles)
# Create plot
plt.figure(figsize=(10,7))
plt.ion()
plt.show()
p.MAX_ITER=20
c.MAX_ITER=20
m.MAX_ITER=20
p1.MAX_ITER=20
for i in range(cycles):
print(i)
# set point changes
if i==cycles/4:
sp = 12
elif i==2*cycles/4:
sp = 7
elif i==3*cycles/4:
sp = 5
sp_store[i] = sp
c.h1.SP=sp
c.h2.SP=sp
c.k1.MEAS = m.k1.NEWVAL
c.k2.MEAS = m.k2.NEWVAL
if p.options.SOLVESTATUS == 1:
# print("going:",i)
c.h1.MEAS = p.h1.MODEL
c.h2.MEAS = p.h2.MODEL
c.h3.MEAS = p.h3.MODEL
c.h4.MEAS = p.h4.MODEL
print(i,'Plant Model:',p.h1.MODEL,p.h2.MODEL,p.h3.MODEL,p.h4.MODEL)
c.solve(disp=False,debug=0)
#print("NEWVAL:",i,c.u,c.u.NEWVAL)
u_cont_k1[i] = c.v1.NEWVAL
u_cont_k2[i] = c.v2.NEWVAL
#print("Horizon:",i,c.h1[0:],c.h2[0:])
#print("Move:",i,c.v1.NEWVAL,c.v2.NEWVAL)
## process simulator
#load control move
p.v1.MEAS = u_cont_k1[i]
p.v2.MEAS = u_cont_k2[i]
#simulate
p.solve(disp=False,debug=0)
#plant model
p1.k1=3.14
p1.k2=3.14
p1.v1.MEAS = u_cont_k1[i]
p1.v2.MEAS = u_cont_k2[i]
p1.solve(disp=False,debug=0)
h1_plant[i]=p1.h1.MODEL
h2_plant[i]=p1.h2.MODEL
h3_plant[i]=p1.h3.MODEL
h4_plant[i]=p1.h4.MODEL
h1_measured[i]=p1.h1.MODEL+(random()*2)*noise
h2_measured[i]=p1.h2.MODEL+(random()*2)*noise
h3_measured[i]=p1.h3.MODEL+(random()*2)*noise
h4_measured[i]=p1.h4.MODEL+(random()*2)*noise
#print("Model process output:",i,p.h1.MODEL,p.h2.MODEL,p.h3.MODEL,p.h4.MODEL)
#load output with white noise
h1_meas[i] = p.h1.MODEL+(random()-0.5)*noise
h2_meas[i] = p.h2.MODEL+(random()-0.5)*noise
h3_meas[i] = p.h3.MODEL+(random()-0.5)*noise
h4_meas[i] = p.h4.MODEL+(random()-0.5)*noise
#Only MPC
## estimator
#load input and measured output
m.v1.MEAS = u_cont_k1[i]
m.v2.MEAS = u_cont_k2[i]
#m.h1.MEAS = h1_meas[i]+(random()*2)*noise
#m.h2.MEAS = h2_meas[i]+(random()*2)*noise
#m.h3.MEAS = h3_meas[i]+(random()*2)*noise
#m.h4.MEAS = h4_meas[i]+(random()*2)*noise
m.h1.MEAS = h1_meas[i]
m.h2.MEAS = h2_meas[i]
m.h3.MEAS = h3_meas[i]
m.h4.MEAS = h4_meas[i]
#m.COLDSTART=2
#optimize parameters
m.solve(disp=False,debug=0)
#store results
if i>=process:
h1_est[i] = m.h1.MODEL
h2_est[i] = m.h2.MODEL
h3_est[i] = m.h3.MODEL
h4_est[i] = m.h4.MODEL
v1_est[i] = m.v1.NEWVAL
v2_est[i] = m.v2.NEWVAL
k1_est[i]= m.k1.NEWVAL
k2_est[i] = m.k2.NEWVAL
print("Estimated h:",i,h1_est[i],h2_est[i],h3_est[i],h4_est[i])
print("Estimated k:",i,k1_est[i],k2_est[i],p.k1[0],p.k2[0])
print("Estimated v:",i,v1_est[i],v2_est[i])
print("dh1/dt:",(a3*((2*g*h3_est[i])**0.5)-(a1*((2*g*h3_est[i])**0.5))+(g1*k1_est[i]*v1_est[i]))/A3)
print("dh2/dt:",(a4*((2*g*h4_est[i])**0.5)-(a2*((2*g*h2_est[i])**0.5))+(g2*k2_est[i]*v2_est[i]))/A2)
print("dh3/dt:",(-a3*((2*g*h3_est[i])**0.5)+((1-g2)*k2_est[i]*v2_est[i]))/A3)
print("dh4/dt:",(-a4*((2*g*h4_est[i])**0.5)+((1-g1)*k1_est[i]*v1_est[i]))/A4)
if i%1==0:
plt.clf()
plt.subplot(4,1,1)
#plt.plot(h1_meas[0:i])
#plt.plot(h2_meas[0:i])
#plt.plot(h3_meas[0:i])
#plt.plot(h4_meas[0:i])
plt.plot(h1_est[0:i])
plt.plot(h2_est[0:i])
plt.plot(sp_store[0:i])
plt.subplot(4,1,2)
plt.plot(h3_est[0:i])
plt.plot(h4_est[0:i])
#plt.legend(('h1_pred','h2_pred','h3_pred','h4_pred'))
plt.subplot(4,1,3)
plt.plot(k1_est[0:i])
plt.plot(k2_est[0:i])
plt.subplot(4,1,4)
plt.plot(v1_est[0:i])
plt.plot(v2_est[0:i])
plt.draw()
plt.pause(0.05)
end=time.time()
print("total time:",end-start)
I feel there is some issue with my MHE+MPC code. However, I am not able to realize the mistake?
Nice application. I needed a few imports to make the script work. These may be loaded automatically for you.
from gekko import GEKKO
import time
import numpy as np
import matplotlib.pyplot as plt
from random import random
The script solves successfully if a lower bound is included on all the level variables (1e-6). There is a problem when the level goes below zero or is at zero when using m.sqrt(). This small adjustment helps it solve successfully so it doesn't get into a region where it is undefined. Gekko solvers can't deal with imaginary numbers.
Although the solution is successful, it appears that the control performance oscillates. There may need to be some tuning of the application.

BOW Chunking dataset

I am attempting chunk my data set while using countvectorizer and TfidfVectorizer. Here is what I have so far.
def BOW(data):
df_temp = data.copy(deep = True)
df_temp = basic_preprocessing(df_temp)
#number of words in each chunk
num_words = 2000
chunks = []
counter = 0
df_temp = splitter(df_temp['text'], num_words)
for text in df_temp:
chunk = {'index': counter, 'text': text}
chunks.append(chunk)
counter += 1
count_vectorizer = CountVectorizer(max_df = 0.85)
count_vectorizer.fit([chunk['text'] for chunk in chunks])
list_corpus = df_temp["text"].tolist()
list_labels = df_temp["class_label"].tolist()
X = count_vectorizer.transform(list_corpus)
return X, list_labels
Not sure if this is correct or not. Here are the links to where I tried to reproduce.
How to avoid memory overloads using SciKit Learn

Why does processes take different time to execute in multi-processing?

I am running a programm that processes 3 laks row dataframe using multi-processing. I do this over 64 core VM using 62 processes created using multiprocess.process in python. Each process is fed 4900 rows.
Oddly, process takes different time to finish. first process finished the task in 15 minutes while last one took more than 70. Below is code block for multi-processing that I've used.
import multiprocessing
# define dataframe here
data_thread = data
uid = "final" ### make sure to change uid
batch_size = 4900
counter = 0
datalen = len(data_thread)
Flag = True
processes = []
while(Flag):
start = counter*batch_size
end = min(datalen, start+batch_size)
if end>=datalen:
Flag = False
indices.append((start, end))
data_split = data_thread.iloc[start:end]
threadName = "process_"+str(counter)
processes.append(multiprocessing.Process(target=process, args = (data_split, uid, threadName, start, end, )))
counter = counter+1
startCount = 0
while(startCount<len(processes)):
t = processes[startCount]
try:
t.start()
except:
print("Error encountered while starting the process_%lf: %s"%(startCount, str(indices[startCount])))
print("Started: process_" + str(startCount))
startCount = startCount + 1
endCount = 0
while(endCount<len(processes)):
t = processes[endCount]
t.join()
print("Joined: process_" + str(endCount))
endCount = endCount + 1

Cent OS - sysctl.conf not working

The settings i have applied in my sysctl.conf are not working. Not on reboot or after applying it using sysctl -p
Below are my sysctl.conf settings. Can someone take a look and tell me what may be causing the issue?
# Controls IP packet forwarding
net.ipv4.ip_forward = 0
# Controls source route verification
net.ipv4.conf.default.rp_filter = 1
# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0
# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0
# Controls whether core dumps will append the PID to the core filename.
# Useful for debugging multi-threaded applications.
kernel.core_uses_pid = 1
# Controls the use of TCP syncookies
net.ipv4.tcp_syncookies = 1
# Controls the default maxmimum size of a mesage queue
kernel.msgmnb = 65536
# Controls the maximum size of a message, in bytes
kernel.msgmax = 65536
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736
# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296
net.ipv4.tcp_keepalive_time = 4
net.ipv4.tcp_keepalive_probes = 1
net.ipv4.tcp_keepalive_intvl = 2
net.ipv4.tcp_fin_timeout = 4
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_rfc1337 = 1
net.ipv4.tcp_congestion_control= cubic
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.route.flush = 1
net.ipv4.tcp_orphan_retries = 0
net.core.netdev_max_backlog = 400000
net.core.optmem_max = 10000000
net.core.rmem_default = 10000000
net.core.rmem_max = 10000000
net.core.somaxconn = 100000
net.core.wmem_default = 10000000
net.core.wmem_max = 10000000
net.ipv4.conf.all.rp_filter = 1
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_ecn = 0
net.ipv4.tcp_max_syn_backlog = 12000
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_mem = 10240 8738000 825829120
net.ipv4.tcp_rmem = 10240 8738000 825829120
net.ipv4.tcp_sack = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_wmem = 10240 8738000 825829120
net.netfilter.nf_conntrack_max = 768000
net.netfilter.nf_conntrack_generic_timeout = 4
net.netfilter.nf_conntrack_tcp_timeout_established = 4
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 4
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 4
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 4
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 4
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 4
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 4
net.netfilter.nf_conntrack_tcp_timeout_close = 4

Resources