Extract value from column with pandas lib (data frame) - python-3.x

original data frame:
Date
Detail
31/03/22
I watch Netflix at home with my family 4 hours
01/04/22
I walk to the market for 3km and I spent 11.54 dollar
02/04/22
my dog bite me, I go to hospital, spend 29.99 dollar
03/04/22
I bought a game on steam 7 games spen 19.23 dollar
result data frame:
Date
Detail
Cost
31/03/22
I watch Netflix at home with my family 4 hours
0
01/04/22
I walk to the market for 3km and I spent 11.54 dollar
11.54
02/04/22
my dog bite me, I go to hospital, spend 29.99 dollar
29.99
03/04/22
I bought a game on steam 7 games spen 19.23 dollar
19.23
Describe my question:
If Detail Column does not contain specific string which is begin with sp.. and end with dollar
then value in Cost col equal zero.
If Detail Column does contain specific string which is begin with sp.. and end with dollar,
then value in Cost col equal value in the middle of specific string which is begin with sp..
and end with dollar.
I try to use regex but it's got first int that contain in the col like
| 01/04/22 | I walk to the market for 3km and I spent 11.54 dollar| 3 |

You should be able to use a regex pattern of a form such as:
df['Cost'] = df['Detail'].str.extract(r'sp\D*([\d\.]*)\D*dollar')
This will look for the literal string sp and then any non-digit characters after it. The capture group (denoted by the ()) looks for any digits or period characters, representing the dollar amount. This is what is returned to the Cost column. The final part of the pattern allows any number of non-digit characters after the dollar amount, followed by the literal string dollar.
The pd.NA for rows which don't have a cost can then be replaced with 0:
df['Cost'] = df['Cost'].replace({pd.NA: 0})
If you want to make any enhancements I used this site to test the regex: https://regexr.com/6ir6o

Related

Excel Formula Extract any number greater than x charters from a string

I have a file which contains a list of data. In each cell is a name and number and a date the date is either mm/yy or mm-yy or mm-yyyy etc. (never the day just month and year)
The number I need is always going to be greater than 5 characters. Is there a way that I can get just the number from the string
xx company holding - 96923432 -02-22. (number required 96923432)
yy Company (HOLDINGS) LTD - 131002204 - 02/2023 (number required 131002204)
ab HOLDINGS LIMITED / 115472907 / Feb-23 (number required 115472907)
... prior removed
=========UPDATE=========
This formula will work for you, which splits your data by space, then converts to a number and then extracts the max. Adjust as needed if you have occasions where you may not have a number greater than 5 by wrapping with an IF().
=MAX(IFERROR(NUMBERVALUE(TEXTSPLIT(A2," ")),0))
This is interesting since you use 2 different delimiters. However, no worries you can simply use the following to capture both instances. If you have more possible delimiters simply just add them between the {} in both textbefore and textafter functions. Here is an example of the equation:=TEXTBEFORE(TEXTAFTER(A2, {"-","/"}), {"-","/"})
This should work for you then if you want to return nothing if output is less than 5. =IF(LEN(TEXTBEFORE(TEXTAFTER(A1,{"-","/"}),{"-","/"}))>5,TEXTBEFORE(TEXTAFTER(A1,{"-","/"}),{"-","/"}),"")

Haskell: Exercise problem (Convert Currencies that consist of two seperate Integers)

So I got three datatypes Euro, Dollar and Yen. The datatype Currency is one of those.
data Euro = MkEuro Integer Integer
data Dollar = MkDollar Integer Integer
data Yen = MkYen Integer
data Currency = MkE Euro | MkD Dollar | MkY Yen
Now I wanna convert f.e. Dollar to Euro. Lets say 1 Dollar is 0.90 Euro.
I really dont know how to implement that in Haskell. I need a function toEuro that takes in a Currency and converts it into Euro and gives it out as a Currency aswell. The problem is that f.e. Dollar und Cents are split into two seperate Integers and Iam not allowed to use any split or connection functions (if there even is some of these). I have no idea how to calculate with two seperate Integers. Lets say I have 12,20 Dollars and I want it as 10,98 Euros. How do I get it into Euros if 1 Dollar was 0.90 Cent. So I need 12 20 to be 10 98. I just dont see it.
Iam not allowed to use any split or connection functions (if there even is some of these).
It's not clear what you mean by that. I strongly suspect that you're supposed to use pattern matching. Joseph's comment is fine, and possibly helpful, but it sounds like the thing you're missing is how to get the integers you need out of the Currency. Try completing this fragment:
toEuro :: Currency -> Currency
toEuro (MkE e) = MkE e
toEuro (MkD (MkDollar d c)) = let usCents = (100 * d) + c
in MkE (MkEuro ... ...)
...
Protips:
That last ellipsis isn't a mistake, there's a whole line missing.
The first pattern seems awkward; we didn't unpack e into MkEuro eE eC, so why did we have to unpack (MkE e)? The answer is because we had to check that it was actually a Euro; obviously we couldn't just write toEuro e = e. But a "better" compromise may have been to use an "as" pattern: toEuro e#(MkE _) = e.
You suggested using 0.9 as a conversion factor; it seems inevitable that you'll want that to be an argument to your function. It should be your first argument; in Haskell your "subject" argument, the most "data-like" argument, should always go last. (Configuration arguments come first.) But it's more complicated than that because you also have to worry about Yen. I don't know how you're going to want to handle that...

Search and get row from large single string

Hi I have single large string and i need to search set of string from this string and get that row create a data frame with this rows.
large String:
This is democracy’s day.
A day of history and hope.
Of renewal and resolve.
Through a crucible for the ages America has been tested anew and America has risen to the challenge.
Today, we celebrate the triumph not of a candidate, but of a cause, the cause of democracy.
The will of the people has been heard and the will of the people has been heeded.
We have learned again that democracy is precious.
Now i want to search few set of strings from above.
and my final output dataframe should look like below
Searching string
democracy’s day
America has been tested
celebrate the triumph
democracy is precious
Thanks in advance
You can create a regex out of your search strings and compare them for a match against the Large String column using extract. Where there's a match, the match string will be the value in the Searching String column, otherwise it will be null. The dataframe can then be filtered on the Searching String value being not null:
import re
df = pd.DataFrame({ 'Large String': ["This is democracy's day.", "A day of history and hope.","Of renewal and resolve.","Through a crucible for the ages America has been tested anew and America has risen to the challenge.","Today, we celebrate the triumph not of a candidate, but of a cause, the cause of democracy.","The will of the people has been heard and the will of the people has been heeded.","We have learned again that democracy is precious."] })
search_strings = ["democracy's day", "America has been tested", "celebrate the triumph", "democracy is precious"]
regex = '|'.join(map(re.escape, search_strings))
df['Searching String'] = df['Large String'].str.extract(f'({regex})')
df = df[~df['Searching String'].isna()]
print(df)
Output:
Large String Searching String
0 This is democracy's day. democracy's day
3 Through a crucible for the ages America has be... America has been tested
4 Today, we celebrate the triumph not of a candi... celebrate the triumph
6 We have learned again that democracy is precious. democracy is precious
Note:
we use re.escape on the search strings in case they contain special characters for regex e.g. . or ( etc.
if one of the search strings is a subset of another, the list should be sorted by order of decreasing length to ensure the longer matches are captured

Excel: How to find six different combinations of words in string?

I have been working for several days on this and have researched everything looking for this answer. I'd appreciate any help you can give.
In Excel I am searching a string of text in column A:
Bought 1 HD Sep 3 2021 325.0 Call # 2.75
I am detecting the first word (in this case "Bought") and detecting the last word before "#" symbol (in this case "Call").
I am then detecting the price following the "#" symbol (in this case "2.75"). This number will go into column B (header "Open") or column C (header "Close") depending on the combination of words found:
Sold/Put=Close
Sold/Call=Open
Bought/Put=Open
Bought/Call=Close
Sold (by itself)=Open
Sold (by itself)=Close.
Bought 1 HD Sep 3 2021 325.0 Call # 2.75
The combination found in the above string is: "Bought Call". Therefore the number at the end ("2.75"), goes into "Open" column.
Here's another example:
Sold 4 AI Sep 17 2021 50.0 Put # 1.5
The combination found in the above string is: "Sold Put". Therefore the number at the end ("1.5") goes into "Close" column.
I am currently using this formula to determine if the string contains "Sold" and "Call" and get the desired number and it does work:
=IF(AND(
ISNUMBER(SEARCH({"Sold","Call"},A10))),
TRIM(MID(A10,SEARCH("#",A10)+LEN("#"),255))," ")
But, I don't know how to search for all the other possible combinations.
The point behind this is to be able to paste the transaction from the broker and have most of the entry process automated. I'm sure many will benefit from this as I've not found anything like this.
I'd appreciate any help and if possible, an explanation of the formula so I can better learn.
Thanks!
I think you have the right idea, but would just extend the IF statement.
Something like the below might work for you:
=IF(ISNUMBER(SEARCH("Call", $A1)),
IF(ISNUMBER(SEARCH({"Bought","Sold"}, $A1)),
NUMBERVALUE(RIGHT($A1, LEN($A1)-SEARCH("#", $A1))),""),
IF(ISNUMBER(SEARCH({"!!!","!!!","Bought","Sold"}, $A1)),
NUMBERVALUE(RIGHT($A1, LEN($A1)-SEARCH("#", $A1))),""))
Just enter in column B and drag down; columns B through E should fill as needed.
For example:
Note that the search for "!!!" is just random characters, it can be anything that you don't think has a good chance of appearing in the string.
Here/screenshots refer:
(requires Office 365 compatible version Excel)
Main lookup
=LET(fn_1,MATCH("*"&$H$7:$H$12&"*",B4,0),fn_2,MATCH("*"&$I$7:$I$12&"*",B4,0),IFERROR(INDEX($J$7:$J$12,MATCH(1,IF($I$7:$I$12="",fn_1*ISNUMBER(fn_2),fn_1*fn_2),0)),))
EDIT:
Other Excel versions:
=IFERROR(INDEX($J$7:$J$12,MATCH(1,IF($I$7:$I$12="",MATCH("*"&$H$7:$H$12&"*",B4,0)*ISNUMBER(MATCH("*"&$I$7:$I$12&"*",B4,0)),MATCH("*"&$H$7:$H$12&"*",B4,0)*MATCH("*"&$I$7:$I$12&"*",B4,0)),0)),)
(all that falls away is the 'Let' formula, replacing fn_1 and fn_2 with respective functions in index formula within the let making first equation somewhat longer, but otherwise identical)
Example applications
Have provided 2 examples of how one might customize to insert numeric in one of the columns (the key part to this question is really how to do lookup in first instance, from thereon it's a matter of finetuning/taking appropriate action)...
Assuming calls/buys are "long" position and strike price go in first col (here, D), and puts/sales are "short" position with strike price going in 2nd col (here, E):
Long - insert strike price col D
=IF(LET(fn_1,MATCH("*"&$H$7:$H$12&"*",B4,0),fn_2,MATCH("*"&$I$7:$I$12&"*",B4,0),IFERROR(INDEX($K$7:$K$12,MATCH(1,IF($I$7:$I$12="",fn_1*ISNUMBER(fn_2),fn_1*fn_2),0)),))=1,MID(SUBSTITUTE(B4," ",""),SEARCH("#",SUBSTITUTE(B4," ",""))+1,LEN(SUBSTITUTE(B4," ",""))),"")
EDIT
Other Excel versions:
=IF(IFERROR(INDEX($K$7:$K$12,MATCH(1,IF($I$7:$I$12="",MATCH("*"&$H$7:$H$12&"*",B4,0)*ISNUMBER(MATCH("*"&$I$7:$I$12&"*",B4,0)),MATCH("*"&$H$7:$H$12&"*",B4,0)*MATCH("*"&$I$7:$I$12&"*",B4,0)),0)),)=1,MID(SUBSTITUTE(B4," ",""),SEARCH("#",SUBSTITUTE(B4," ",""))+1,LEN(SUBSTITUTE(B4," ",""))),"")
Short - insert strike price col E
=IF(LET(fn_1,MATCH("*"&$H$7:$H$12&"*",B4,0),fn_2,MATCH("*"&$I$7:$I$12&"*",B4,0),IFERROR(INDEX($K$7:$K$12,MATCH(1,IF($I$7:$I$12="",fn_1*ISNUMBER(fn_2),fn_1*fn_2),0)),))=2,MID(SUBSTITUTE(B4," ",""),SEARCH("#",SUBSTITUTE(B4," ",""))+1,LEN(SUBSTITUTE(B4," ",""))),"")
EDIT
Other Excel versions:
Follow same routine in previous Edits (remove Let, replace fn_1 & fn_2 with respective formulae...)
Note similarity in all 3 equations above: 2nd and 3rd contain 1st (effectively they just wrap a big old 'if' statement around 1st, use lookup_2 col (here, col K), and use mid/search to extract rate after the hashtag.
Assumes you don't have other hashtags in the sentence..
Customize as required.

Want to calculate capital

I want to save a certain amount of money into savings account at the rate of 5% for 10 years. I need $10000.00 at the end of the 10 years.
Task is to know the capital or the initial investment
futureValue=(10000)
interestRate=(0.05)
depositePeriod=(10)
capital=futureValue/(1+interestRate)*depositePeriod
print(capital)
I get 95238.xxxxx. but i expect something like 952.00
because capital should not be more interest at the end of the period
this is more of a math problem then a code, but here, this FOR loop would give you the answer you're looking for
futureValue=(10000)
interestRate=(0.05)
depositePeriod=(10)
capital = futureValue
for no_of_years in range(depositePeriod):
capital = capital/(1+interestRate)
print(capital)

Resources