Adding numbers from a string - python-3.x

So, my problem is that I can get the numbers to print as integers, but can't get them to add together to form a one number total. right now, I am using the following code:
if command == 4:
total = 0
for item in items:
number = (item.split()[-1])
total += float(number)
print(total)
and my output ends up being:
2.99
1.87
when I would just want:
4.86
it's pretty simple code, and I feel like I'm missing something fairly obvious, but for the life of me I can't figure it out. Not sure why this is printing all weird, but here is a picture of the code as wellpartial
The list of items is populated from user input, here is a picture of the rest of the code I am using full code. I have been entering the items as 'socks, 2.99' if that helps clarifying things

Despite the fact that you're printing the total value in each iteration, you can improve your code and do the sum in one line, just if you wanna implement and you're interested I'll let the code right here:
total = sum(map(float, items))

Related

Python Warning Panda Dataframe "Simple Issue!" - "A value is trying to be set on a copy of a slice from a DataFrame"

first post / total Python novice so be patient with my slow understanding!
I have a dataframe containing a list of transactions by order of transaction date.
I've appended an additional new field/column called ["DB/CR"], that dependant on the presence of "-" in the ["Amount"] field populates 'Debit', else 'Credit' in the absence of "-".
Noting the transactions are in date order, I've included another new field/column called [Top x]. The output of which is I want to populate and incremental independent number (starting at 1) for both debits and credits on a segregated basis.
As such, I have created a simple loop with a associated 'if' / 'elif' (prob could use else as it's binary) statement that loops through the data sent row 0 to the last row in the df and using an if statement 1) "Debit" or 2) "Credit" increments the number for each independently by "Debit" 'i' integer, and "Credit" 'ii' integer.
The code works as expected in terms of output of the 'Top x'; however, I always receive a warning "A value is trying to be set on a copy of a slice from a DataFrame".
Trying to perfect my script, without any warnings I've been trying to understand what I'm doing incorrect but not getting it in terms of my use case scenario.
Appreciate if someone can kindly shed light on / propose how the code needs to be refactored to avoid receiving this error.
Code (the df source data is an imported csv):
#top x debits/credits
i = 0
ii = 0
for ind in df.index:
if df["DB/CR"][ind] == "Debit":
i = i+1
df["Top x"][ind] = i
elif df["DB/CR"][ind] == "Credit":
ii = ii+1
df["Top x"][ind] = ii
Interpreter
df["Top x"][ind] = i
G:\Finances Backup\venv\Statementsv.03.py:173: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df["Top x"][ind] = ii
Many thanks :)
You should use df.loc["DB/CR", ind] = "Debit"
Use iterrows() to iterate over the DF. However, updating DF while iterating is not preferable
see documentation here
Refer to the documentation here Iterrows()
You should never modify something you are iterating over. This is not
guaranteed to work in all cases. Depending on the data types, the
iterator returns a copy and not a view, and writing to it will have no
effect.

What built in function can I use to get the last 3 elements in a list in python

I wanted to get the last 3 items or elements in a list. Also making sure that my code works no matter how many items are in the list.
sports_goods=["bat","ball","base_ball","hockey","basket_ball"]
I have tried to use Splice(:-1) and also sports_goods[-1]
but, couldn't get it working.
sports_goods=[bat,ball,base_ball,hockey,basket_ball]
My code is
sports_goods.splice(-3:-1)
And
sports_goods[-1]
Expected result:
The last 3 elements which are base_ball, hockey and basket_ball.
Actual results:
Shows error...Name error to be specific.
you should at least post valid python code. your list isnt valid since your words are not wrapped in quotes to make them strings. However if you have a list you can just select elements from the third last index to the end like so
sports_goods=['bat', 'ball', 'base_ball', 'hockey', 'basket_ball']
print('the last 3 elements are: ', ", ".join(sports_goods[-3:]))
OUTPUT
the last 3 elements are: base_ball, hockey, basket_ball

FuzzyWuzzy for very similar records in Python

I have a dataset with which I want to find the closest string match. For that purpose I'm using FuzzyWuzzy in this way
sol=process.extract(t,dev2,scorer=fuzz.token_sort_ratio)
Where t is the string and dev2 is the list to compare to. My problem is that sometimes it has very similar records and options provided by FuzzyWuzzy seems to be lacking. And I've tested with token_sort, token_set, partial_token sort and set, ratio, partial_ratio, and WRatio.
For example, the string Italy - Serie A gives me the following 2 closest matches.
Token_sort_ratio: (92, 'Italy - Serie D');(86, 'Italian - Serie A')
The one wanted is obviously the second one, but character by character is closer the first one, which is a different league.
This happens as well with teams. If, let's say I have a string Buchtholz I would obtains Buchtholz II before I get TSV Buchtholz.
My main guess now would be to try and weight the presence and absence of several characters more heavily, like single capital letters at the end of the string, so if there is a difference in the letter or an absence it is weighted as less close. Or for () and special characters.
I don't know if there is a way to take this into account or you guys have a better approach to get the string that really matches.
Similarity matches often require knowledge of the data being analysed. i.e. it is not just a blind single round of matching. I recommend that you pass your results through more steps of matching, starting with inclusive/optimistic approaches (like token_set_ratio) with low cut off scores and working toward more exclusive/pessimistic approaches with higher cut off scores until you have a clear winner. If you know more about the text you're analyzing, you can even modify the strings as you progress.
In a case I worked on, I did similarity matches of goods movement descriptions. In the descriptions the numbers sequences were more important than the text. e.g. when looking for a match for "SLURRY VALVE 250MM RAGMAX 2000" the 250 and 2000 part of the string are important, otherwise I get a "SLURRY VALVE 50MM RAGMAX 2000" as the best match instead of "VALVE B/F 250MM,RAGMAX 250RAG2000 RAGON" which is a better result.
I put the similarity match process through two steps: 1. Get a bunch of similar matches using an optimistic matching scorer (token_set_ratio) 2. get the number sequences of these results and pass them through another round of matching with a more strict scorer (token_sort_ratio). Doing this gave me the better result in the example I showed above.
Below is some blocks of code that could be of assistance:
here's a function to get numbers from the sequence. (In your case you might use this to exclude numbers from your string instead?)
def get_numbers_from_string(description):
numbers = ''.join((ch if ch in '0123456789.-' else ' ') for ch in description)
numbers = ' '.join([nr for nr in numbers.split()])
return numbers
and here is a portion of the code I used to put the description match through two rounds:
try:
# get close match from goods move that has material numbers
df_material = pd.DataFrame(process.extract(description,
corpus_material,
scorer=fuzz.token_set_ratio),
columns=['Similar Text','Score']
)
if df_material['Score'][df_material['Score']>=cut_off_accuracy_materials].count()>=1:
similar_text = df_material['Similar Text'].iloc[0]
score = df_material['Score'].iloc[0]
if nr_description_numbers>4:
# if there are multiple matches found, then get best number combination match
df_material = df_material[df_material['Score']>=cut_off_accuracy_materials]
new_corpus = list(df_material['Similar Text'])
new_corpus = np.vectorize(get_numbers_from_string)(new_corpus)
df_material['numbers'] = new_corpus
df_numbers = pd.DataFrame(process.extract(description_numbers,
new_corpus,
scorer=fuzz.token_sort_ratio),
columns=['numbers','Score']
)
similar_text = df_material['Similar Text'][df_material['numbers']==df_numbers['numbers'].iloc[0]].iloc[0]
nr_score = df_numbers['Score'].iloc[0]
hope it helps, and good luck

MATLAB with CVSread for a string (probably easy as I am a beginner)

Hey I have a problem where I have two potential ways the input data for a function can be entered.
One way is through a cvs file containing the data, the other is directly through a matrix.
Here is my code to get around this..
function printRouteStats(routeData)
% displays route info basedf on data for mountain bike tracks
if isa(routeData,'string')
gpsPoints = csvread(routeData); %fix this later as currently giving a string
else
gpsPoints = routeData;
end
end
This works for say
printRouteStats([0 0 0; 0 300 10; 100 300 20; 100 500 -10; 400 500 -10])
but for printRouteStats('exampledata.csv') is gives output of exampledata.csv as a string, rather than produce the desired matrix.
Any suggestions?
Thanks
Sorry I've only been learning matlab for two weeks so it's probably a really simple question for everyone here.
class(routeData) is char not string. To check use isa(routeData,'char') or ischar(routeData)

fminsearch is overwriting in loop while storing data

I have the code mentioned below in matlab. I want to write all the 162 rows and 4 columns calculated into an excel file.
When i use xlswrite in the code i get only one row and 4 columns as the value of P gets overwritten in each iterative step.
If i use another loop inside the for loop the execution time increase drastically. Please help to least write the values of P into an array which i can later write into excel file(when i tried 'In an assignment A(I) = B, the number of elements in B and I must be the same' error appeared.)
please help
function FitSMC_BC
clc
% Parameters: P(1)=theta_S; P(2)=theta_r; P(3)=psib; P(4)=lamda;
smcdata=xlsread('asimdata');
nn=length(smcdata)-1;
for i=1:nn
psi=smcdata(:,1);
thetaObs=smcdata(:,i+1);
%Make an initial guess:
Pini=[0.5 0.1 -1 1.5];
P=fminsearch(#ObFun,Pini,[],psi,thetaObs);
disp(['result',num2str(i),': P=',num2str(P)]);
theta=Gettheta(P,psi);
end
function OF=ObFun(P,psi,thetaObs)
theta=Gettheta(P,psi);
OF=sqrt(mean((theta - thetaObs).^2));
function theta=Gettheta(P,psi)
SoilPars.theta_S=P(1);
SoilPars.theta_r=P(2);
SoilPars.psib=P(3);
SoilPars.lamda=P(4);
[theta]=thetaFun(psi,SoilPars);
function [theta]=thetaFun(psi,SoilPars)
theta_S=SoilPars.theta_S;
theta_r=SoilPars.theta_r;
psib=SoilPars.psib;
lamda=SoilPars.lamda;
theta=theta_r+((theta_S-theta_r)*((psib./psi).^lamda));
theta(psi>psib)=theta_S;
You can modify the P line with
P(i,:) = fminsearch(#ObFun,Pini,[],psi,thetaObs);
P will store each calculation (4 element vector) in a new line.
You may also initialise P before the for loop with P = nan(nn, 4);
Then write P in an Excel file using xlswrite.
I haven't studied your code in-depth, but as far as I can tell, you have two options:
Create a matrix P and use xlswrite on the entire matrix. This seems to me like the most reasonable approach.
Use xlswrite1 from the fileexchange in a loop. This will increase execution time a bit, but not nearly as much as using regular xlswrite as it is specially deigned to be used inside loops. The reason why it is so much faster is because it only opens and closes the Excel-file once, whereas the regular xlswrite opens and closes it every time you call the function.
You seem to know how to use indexing so I'm not sure why you're simply doing something like this:
P = zeros(size(smcdata,1),nn)
for i=1:nn
...
P(:,i) = fminsearch(#ObFun,Pini,[],psi,thetaObs);
disp(['result',num2str(i),': P=',num2str(P(:,i))]);
theta = Gettheta(P(:,i),psi); % Why is this here? Are you writing it to file too?
end
xlswrite('My_FileName.xls',P);
Or you could call xlswrite on each iteration of the loop (probably slower) and append the new data using something like this:
for i=1:nn
...
P = fminsearch(#ObFun,Pini,[],psi,thetaObs);
disp(['result',num2str(i),': P=',num2str(P)]);
theta = Gettheta(P,psi); % Why is this here? Are you writing it to file too?
xlswrite('My_FileName.xls',P,1,['A' int2str((i-1)*size(P,2)+1)]);
end
Of course your code isn't runnable so you'll have to debug any other little errors. Also, since smcdata seems to be a matrix rather than a vector, you should be careful using length with it. You probably should use size.

Resources