Optimizing usage of memory of for loops in python - python-3.x

Let's say I have multiple for loops that insert data into a database like the example below:
categoria = ('Associação Nacional', 'Federação Nacional', 'Organização Não-Governamental', 'Sociedade Beneficente', 'Fundação', 'União', 'Instituto', 'Entidade')
nome = ('do Meio Ambiente', 'do Voluntariado', 'da Saúde', 'da Boa Vontade', 'da Preservação da Natureza', 'do Idoso', 'da Informática', 'do Empreendedorismo', 'da Assistência Social')
areas_atuacao = ('Ambientalismo Conservação', 'Conservação de recursos naturais', 'Controle da poluição Proteção de animais', 'Tecnologias alternativas')
group_nome = []
for x in range (5):
nome_entidade = random.choice(categoria)+" "+random.choice(nome)
group_nome.append(nome_entidade)
group_atuacao = []
for x in range(5):
atuacao = random.choice(areas_atuacao)
group_atuacao.append(AreaAtuacao.objects.filter(nome=atuacao)[0])
group_cep = []
for x in range(5):
cep = randint(10000000, 99999999)
group_cep.append(cep)
for x in range(5):
entidade = Entidade(razao_social=group_nome[x],
cep=group_cep[x],
area_atuacao=group_atuacao[x],
entidade.save()
My question: how can I reestructure the code in order to optimize it to use less ram in the process? I've tried but didn't work. Thanks in advance.

Related

Extracting the MAX value from a Naive Bayes Classifier

I am applying NB and NLTK to classify phrases according to some feelings, like sadness, fear, happyness etc..
classificador = nltk.NaiveBayesClassifier.train(base_completa_treinamento)
and applying this function to a phrase:
def avalia(teste):
#teste = 'Pqp, que trânsito chato da porra!'
testeStemming = []
stemmer = nltk.stem.RSLPStemmer()
for (palavras_treinamento) in teste.split():
comStem = [p for p in palavras_treinamento.split()]
testeStemming.append(str(stemmer.stem(comStem[0])))
novo = extrator_palavras(testeStemming)
distribuicao = classificador.prob_classify(novo)
for classe in distribuicao.samples():
print('%s: %f' % (classe, (distribuicao.prob(classe))))
like this:
avalia('he died')
and get this as result:
alegria: 0.117609
nojo: 0.050533
medo: 0.207932
raiva: 0.226550
surpresa: 0.045293
tristeza: 0.352083
How do i change the function avalia() to show only the highest value ('tristeza:0.35')? I tried to sue max function but didnt work.
Thanks
Instead of this part:
for classe in distribuicao.samples():
print('%s: %f' % (classe, (distribuicao.prob(classe))))
Try this way
classe_array = [(classe, (distribuicao.prob(classe))) for classe in distribuicao.samples()]
inverse = [(value, key) for key, value in classe_array]
max_key = max(inverse)[1]
for each in classe_array:
if each[0] == max_key:
print(each)

update value in a column of a table based on input in shiny

I am new in Shiny and I need yours knowledge. I will try explain with detail:
In my application, the user upload a csv file. This information is visualized in 2 tabPanel. In the first tabPanel named Original, the user can see the original dataset. The other one named Procesado, the user can see a select input with values from 1 to 4 and only 2 columns (internally before to show the information, the dataset is edited, eliminating columns and rows).
I want update the values of the first column every time that the user change the value in the select input. I tried with renderUI and reactive, but is not working. Thanks
shinyUI(
pageWithSidebar(
headerPanel("Ciclismo - Calidad de la sesion"),
sidebarPanel(
p("Escoge el dataset generado por la aplicacion de ciclismo Polar"),
fileInput("file1","Escoge un archivo CSV",
multiple = FALSE,
accept = c(".csv")),
tags$hr(),
submitButton("Evaluar Dataset")
),
mainPanel(
tabsetPanel(
tabPanel("Original", tableOutput("contents")),
tabPanel("Procesado",
p("Selecciona la zona de entrenamiento a evaluar"),
selectInput("zone", "Zona de entrenamiento",
choices = c("1", "2", "3", "4")),
uiOutput("ui"),
tags$hr(),
tableOutput("modified"),),
tabPanel("Graficos",
h3(textOutput("output_text")),
plotOutput("output_plot")
)
)
)
)
)
The code on the server is
shinyServer(function(input, output) {
output$contents <- renderTable({
#data.frame(x=auto)
req(input$file1)
# when reading semicolon separated files,
# having a comma separator causes `read.csv` to error
#tryCatch(
#{
df <- read.csv(input$file1$datapath)
#},
#error = function(e) {
# return a safeError if a parsing error occurs
# stop(safeError(e))
# }
#)
return(df)
})
output$ui <- renderUI({
switch(input$zone,
"1" = cbind(ZONA = 1, dfProc),
"2" = cbind(ZONA = 2, dfProc),
"3" = cbind(ZONA = 3, dfProc),
"4" = cbind(ZONA = 4, dfProc))
})
output$modified <- renderTable({
req(input$file1)
dff <- read.csv(input$file1$datapath)
#return(df)
dfN = dff[-c(1,2,3),-c(1,2)]
FC = dfN[,-c(2:26)]
#Se convierte en dataframe para realizar operaciones
dfProc = data.frame(FC)
newdata_test = cbind(ZONA = 1,dfProc)
return(newdata_test)
})
})
Finally, I did it.
I put two eventReactive. The first one is to read the CSV file. The second one with the input Zona using cbind function for insert the input value in the column. I hope this answer can help to other people
df <- eventReactive(input$csv,{
req(input$csv)
dff <- read.csv(input$csv$datapath)
#The dataset is updated, eliminating rows and columns
dfN = dff[-c(1,2,3),-c(1,2)]
FC = dfN[,-c(2:26)]
dfProc = data.frame(FC)
return(dfProc)
})
datos <- eventReactive(input$zone,{
newtable = cbind(input$zone, df())
})
output$modified <- renderTable({datos()})

how can I have the different max of several lists in python

I want get different max from different list but the problem i get the same max,this is my code ,why problem in this code ,i have got the same max for the first list,what i do change for obtain a result max for different list:
def best(contactList_id,ntf_DeliveredCount):
maxtForEvryDay = []
yPredMaxForDay = 0
for day in range(1,8):
for marge in range(1,5):
result = predictUsingNewSample([[contactList_id,ntf_DeliveredCount,day,marge]])
if (result > yPredMaxForDay):
yPredMaxForDay = 0
yPredMaxForDay = result
maxtForEvryDay.append(yPredMaxForDay)
return maxtForEvryDay
best(contactList_id = 13.0,ntf_DeliveredCount = 5280.0)
result:
[1669.16010381]
[1708.32915255]
[1747.49820129]
[1786.66725003]
[1570.05500351]
[1609.22405225]
[1648.39310099]
[1687.56214973]
[1491.60792629]
[1510.11895195]
[1549.28800069]
[1588.45704943]
[1402.21845533]
[1420.73953501]
[1450.18290039]
[1489.35194913]
[1367.15490803]
[1356.21411426]
[1345.27532239]
[1390.24684884]
[1378.1190426]
[1367.17824883]
[1419.23588013]
[1486.78241686]
[1450.21261674]
[1516.04342599]
[1581.87423524]
[1647.7050445]
[array([1786.66725003]),
array([1786.66725003]),
array([1786.66725003]),
array([1786.66725003]),
array([1786.66725003]),
array([1786.66725003]),
array([1786.66725003])]
this is my fonction predictUsingNewSample(X_test)
def predictUsingNewSample(X_test):
#print(X_test)
# Load from file
with open("pickle_model.pkl", 'rb') as file:
pickle_model = pickle.load(file)
Ypredict = pickle_model.predict(X_test)
print(Ypredict)
return Ypredict
Try this:
def best(contactList_id,ntf_DeliveredCount):
maxtForEvryDay = []
for day in range(1,8):
yPredMaxForDay = 0
for marge in range(1,5):
result = predictUsingNewSample([[contactList_id,ntf_DeliveredCount,day,marge]])
if (result > yPredMaxForDay):
yPredMaxForDay = result
maxtForEvryDay.append(yPredMaxForDay)
return maxtForEvryDay
best(contactList_id = 13.0,ntf_DeliveredCount = 5280.0)
I think the problem actually comes from the fact that you never clean up your yPredMaxForDay variable for each day.

Value error in assigning to dataframe

I am assigning different data to one dataframe. And I had the following
ValueError: If using all scalar values, you must pass an index
I follow the question post by other Here
But it did not work out.
The following is my code. All you have to do is copy and paste the code to IDE.
import pandas as pd
import numpy as np
#Loading Team performance Data (ExpG (Home away)) For and against
epl_1718 = pd.read_csv("http://www.football-data.co.uk/mmz4281/1718/E0.csv")
epl_1718 = epl_1718[['HomeTeam','AwayTeam','FTHG','FTAG']]
epl_1718 = epl_1718.rename(columns={'FTHG': 'HomeGoals', 'FTAG': 'AwayGoals'})
Home_goal_avg = epl_1718['HomeGoals'].mean()
Away_goal_avg = epl_1718['AwayGoals'].mean()
Home_team_goals = epl_1718.groupby(['HomeTeam'])['HomeGoals'].sum()
Home_count = epl_1718.groupby(['HomeTeam'])['HomeTeam'].count()
Home_team_avg_goal = Home_team_goals/Home_count
Home_team_concede = epl_1718.groupby(['HomeTeam'])['AwayGoals'].sum()
EPL_Home_average_score = epl_1718['HomeGoals'].mean()
EPL_Home_average_conc = epl_1718['HomeGoals'].mean()
Home_team_avg_conc = Home_team_concede/Home_count
Away_team_goals = epl_1718.groupby(['AwayTeam'])['AwayGoals'].sum()
Away_count = epl_1718.groupby(['AwayTeam'])['AwayTeam'].count()
Away_team_avg_goal = Away_team_goals/Away_count
Away_team_concede = epl_1718.groupby(['AwayTeam'])['HomeGoals'].sum()
EPL_Away_average_score = epl_1718['AwayGoals'].mean()
EPL_Away_average_conc = epl_1718['HomeGoals'].mean()
Away_team_avg_conc = Away_team_concede/Away_count
Home_attk_sth = Home_team_avg_goal/EPL_Home_average_score
Home_attk_sth = Home_attk_sth.sort_index().reset_index()
Home_def_sth = Home_team_avg_conc/EPL_Home_average_conc
Home_def_sth = Home_def_sth .sort_index().reset_index()
Away_attk_sth = Away_team_avg_goal/EPL_Away_average_score
Away_attk_sth = Away_attk_sth .sort_index().reset_index()
Away_def_sth = Away_team_avg_conc/EPL_Away_average_conc
Away_def_sth = Away_def_sth.sort_index().reset_index()
Home_def_sth
HomeTeam = epl_1718['HomeTeam'].drop_duplicates().sort_index().reset_index().set_index('HomeTeam')
AwayTeam = epl_1718['AwayTeam'].drop_duplicates().sort_index().reset_index().sort_values(['AwayTeam']).set_index(['AwayTeam'])
#HomeTeam = HomeTeam.sort_index().reset_index()
Team = HomeTeam.append(AwayTeam).drop_duplicates()
Data = pd.DataFrame({"Team":Team,
"Home_attkacking":Home_attk_sth,
"Home_def": Home_def_sth,
"Away_Attacking":Away_attk_sth,
"Away_def":Away_def_sth,
"EPL_Home_avg_score":EPL_Home_average_score,
"EPL_Home_average_conc":EPL_Home_average_conc,
"EPL_Away_average_score":EPL_Away_average_score,
"EPL_Away_average_conc":EPL_Away_average_conc},
columns =['Team','Home_attacking','Home_def','Away_attacking','Away_def',
'EPL_Home_avg_score','EPL_Home_avg_conc','EPL_Away_avg_score','EPL_Away_average_conc'])
In this code, what I am trying to do is to get average goal score per team per game, average goals conceded per team per game.
And then I am calculating other performance factors such as attacking strength, defensive strenght etc.
I have to paste the code as if i use example, creating data frame would work.
Thanks for understanding.
Thanks in advance for the advice too.
The format (or the columns) of final data frame will look like as follow:
Team Home Attacking Home Defensive Away attacking away defensive
and so on as mentioned in the data frame.
It means, there will be only 20 teams under team columns
The shape of dataframe will be ( 20,9)
Regards,
Zep
Here main idea is remove reset_index for Series with index by teams, so variable Team is not necessary and is created as last step by reset_index. Also be carefull with columns names in DataFrame constructor, if there are changed like EPL_Home_average_conc in dictionary and then EPL_Home_avg_conc get NaNs columns:
Home_team_goals = epl_1718.groupby(['HomeTeam'])['HomeGoals'].sum()
Home_count = epl_1718.groupby(['HomeTeam'])['HomeTeam'].count()
Home_team_avg_goal = Home_team_goals/Home_count
Home_team_concede = epl_1718.groupby(['HomeTeam'])['AwayGoals'].sum()
EPL_Home_average_score = epl_1718['HomeGoals'].mean()
EPL_Home_average_conc = epl_1718['HomeGoals'].mean()
Home_team_avg_conc = Home_team_concede/Home_count
Away_team_goals = epl_1718.groupby(['AwayTeam'])['AwayGoals'].sum()
Away_count = epl_1718.groupby(['AwayTeam'])['AwayTeam'].count()
Away_team_avg_goal = Away_team_goals/Away_count
Away_team_concede = epl_1718.groupby(['AwayTeam'])['HomeGoals'].sum()
EPL_Away_average_score = epl_1718['AwayGoals'].mean()
EPL_Away_average_conc = epl_1718['HomeGoals'].mean()
Away_team_avg_conc = Away_team_concede/Away_count
#removed reset_index
Home_attk_sth = Home_team_avg_goal/EPL_Home_average_score
Home_attk_sth = Home_attk_sth.sort_index()
Home_def_sth = Home_team_avg_conc/EPL_Home_average_conc
Home_def_sth = Home_def_sth .sort_index()
Away_attk_sth = Away_team_avg_goal/EPL_Away_average_score
Away_attk_sth = Away_attk_sth .sort_index()
Away_def_sth = Away_team_avg_conc/EPL_Away_average_conc
Away_def_sth = Away_def_sth.sort_index()
Data = pd.DataFrame({"Home_attacking":Home_attk_sth,
"Home_def": Home_def_sth,
"Away_attacking":Away_attk_sth,
"Away_def":Away_def_sth,
"EPL_Home_average_score":EPL_Home_average_score,
"EPL_Home_average_conc":EPL_Home_average_conc,
"EPL_Away_average_score":EPL_Away_average_score,
"EPL_Away_average_conc":EPL_Away_average_conc},
columns =['Home_attacking','Home_def','Away_attacking','Away_def',
'EPL_Home_average_score','EPL_Home_average_conc',
'EPL_Away_average_score','EPL_Away_average_conc'])
#column from index
Data = Data.rename_axis('Team').reset_index()
print (Data)

Daily average function that creates figures out of excel data

I need to make a function which creates plot figures out of excel data.
The data I have are hourly data, and the plots I need are plots of the hourly data, daily average en monthly average. I wrote some code but unfortunately it isn't working
Code:
function makegraph
[a,b] = xlsread('C:\..matlabdata.xlsx'); %read data
lengte = size(b);
datumstring = b(2:end,:);
formatin = 'mm/dd/yyyy' ;
Datums= datenum(datumstring(:,1),formatin);
[~,j] = size(a);
timeplusphysicalflow = [datums, a(:,4)];
nrDates = size(datums,1);
subplot(3,1,1), plot(datums,a(1:nrDates,4)); %plot voor de hourlyflow
datetick('x','mm/yy')
u = unique(datums); %bekijkt wat de unieke data zijn
[YU,MU,DU] = datevec(u);
CurrentYear = YU(1);
CurrentMonth = MU(1);
CurrentDay = DU(1);
nrMonth = unique(strcat(num2str(YU) ,num2str(MU)));
nrDays = unique(strcat(num2str(YU) ,num2str(MU), num2str(DU)));
AverageMonths = zeros(length(nrMonth),1);
for k=1:length(nrMonth)
index = (MU == CurrentMonth & YU == CurrentYear);
AverageMonth(k) = mean(timeplusphysicalflow(index,2));
AverageMonths(k) = [AverageMonth(k)];
end
AverageDays = zeros(length(nrDays),1);
for o=1:length(nrDays)
index = (MU == CurrentMonth & YU == CurrentYear & DU == CurrentDay);
AverageDay(o) = mean(timeplusphysicalflow(index,2));
AverageDays(o) = [AverageDay(o)];
end
subplot(3,1,2), plot(unique(strcat(num2str(YU),num2str(MU)),AverageMonths));
subplot(3,1,3), plot(unique(strcat(num2str(YU),num2str(MU), num2str(DU))));
end
Thanks in advance

Resources