Extract 2 colums out of a Dataframe - python-3.x

I have a very simple Problem I guess.
I have loaded an csv file into python of the form:
Date
Time
18.07.2018
12:00 AM
18.07.2018
12:30 AM
...
...
19.07.2018
12:00 AM
19.07.2018
12:30 AM
...
...
I basically just want to extract all rows with the Date 18.07.2018 and the single one from 19.07.2018 at 12:00 AM to calculate some statistical measures from the Data.
My current Code (Klimadaten is the Name of the Dataframe):
Klimadaten = pd.read_csv ("Klimadaten_18-20-July.csv")
Day_1 = Klimadaten[Klimadaten.Date == "18.07.2018"]
I guess it could be solved with something like an if statment?
I have just a very basic knowledge of python but im willing to learn the necessary steps. I'm currently doing my Bachelorthesis with simulated climate Data, and I will have to perform statistical Tests and work with a lot of Data, so maybe someone also could tell me in what concepts I should look further in (I have access to an online Python course but will not have the time to watch all lessons)
Thanks in advance

From what I understand you want to take only the data that is there on the date 18.07.2018. The example I wrote to you below writes the date only if it is equal to 18.07.2018, but by changing the row and the columns ( you can search on an entire column or on an entire line (depends on your excel).
for i in range(len(Klimadaten)):
date = df.values[i][0]
if date == "18.07.2018":
print(date)
element = df.values[i][1]
print(element)
Hope i was helpful

Klimadaten = pd.read_csv ("Klimadaten_18-20-July.csv")
Day_1 = Klimadaten[Klimadaten.Date == "18.07.2018" | (Klimadaten.Date == "18.07.2018" & Klimadaten.Time == "12:00 AM")]
basically what it means is: bring me all the rows that date is 18.07.2018 OR (date is 19.07.2018 AND time is 12:00 AM)" so you can construct more complex queries like that :)

With help of the the Pandas Documentation I figured out the right syntax for my problem:
Day_1 = Klimadaten[(Klimadaten["Date"] == "18.07.2018") | (Klimadaten["Date"] == "19.07.2018") & (Klimadaten["Time"] == "12:00:00 AM")]

Related

Python: Convert time expressed in seconds to datetime for a series

I have a column of times expressed as seconds since Jan 1, 1990, that I need to convert to a DateTime. I can figure out how to do this for a constant (e.g. add 10 seconds), but not a series or column.
I eventually tried writing a loop to do this one row at a time. (Probably not the right way, and I'm new to python).
This code works for a single row:
def addSecs(secs):
fulldate = datetime(1990,1,1)
fulldate = fulldate + timedelta(seconds=secs)
return fulldate
b= addSecs(intag112['outTags_1_2'].iloc[1])
print(b)
2018-06-20 01:05:13
Does anyone know an easy way to do this for a whole column in a dataframe?
I tried this:
for i in range(len(intag112)):
intag112['TransactionTime'].iloc[i]=addSecs(intag112['outTags_1_2'].iloc[i])
but it errored out.
If you want to do something with column (series) in DataFrame you can use apply method, for example:
import datetime
# New column 'datetime' is created from old 'seconds'
df['datetime'] = df['seconds'].apply(lambda x: datetime.datetime.fromtimestamp(x))
Check documentation for more examples. Overall advice - try to think in terms of vectors (or series) of values. Most operations in pandas can be done with entire series or even dataframe.

Excel: Calculate average time (duration), with criteria, between 2 columns

Breaking my head over this, time to look for help :(
I have a sheet with raw data, as illustrated below.
I'd like to calculate the average duration per TestName (column A) between the 2 timestamps (B and C) in another sheet.
How can I do this in 1 formula?
Note 1: The correct answer is (done manually)
test1 = 26:41:23
test2 = 08:23:10
Note 2: 1 formula please, without adding extra columns to calculate the duration per each first
Note 3: I cannot change the format of the raw data
Note 4: ignore empty fields
Thank you!
Use a new Column D to calculate the difference between start and end date on each row:
=DATEVALUE(C2)+TIMEVALUE(C2) - (DATEVALUE(B2)+TIMEVALUE(B2))
Next calculate the average on test1 and test2
=AVERAGEIFS($D:$D, $A:$A, "test1")
=AVERAGEIFS($D:$D, $A:$A, "test2")
Note that I'm using a comma as separator, in some languages, other separators, like semicolon are needed to write the formulas properly.
Now format the view to display at least the days in addition to the time: "DD - hh:mm:ss". Going beyond 31 days is a bit difficult as the month will count up.
If you don't like the formatting, go with the raw number format and extract the information through a bit of math. If it shows for example 1,5 it means one and a half days. I hope you can handle converting decimals to hours, minutes and seconds. MOD(ulo) and Rounddown are going to be your friends. :-)

Vectorized version of df.apply(lambda x: x.value_counts())

I've got a dataframe with a somewhat large amount of time series balances in it. It looks something like
Run1 Run2 Run3 ... Run10000
2018 100 100 100 100
2019 101.2 99.2 101.0 ... 101.6
...
2038 142.2 151.3 102.7 ... 173.0
Essentially I want to check to see how many trials ran dipped below a certain number, for example 90% of the starting balance.
Currently I am doing
((portfolio_values < starting_value*0.9).apply(lambda x: x.value_counts()).loc[True] > 0).value_counts().loc[True]
Sorry that one liner is pretty atrocious, but the idea is that it creates a mask based on whether a value in the table is below 90% of the starting value, then it goes through and does a count of True and False values. It then checks which of those columns has some non-zero number of True values (meaning yes, it did dip below 90%), then it counts up how many of those values are true.
The problem is that this is really slow, and I'm sure Pandas has some kind of function that does exactly what I'm looking for, as it normally does.
Thanks in advance!
Can you use:
(portfolio_values < starting_values*.9).any().sum()
any returns True for each column where the condition is met at least once in the column, then use sum to count the columns or "runs" in your case.
Try this:
mask_df = df < starting_value*0.9
result = mask_df.any()
I tested it in a console on a dummy example and it appears to work.

Excel split cost figure between months with partial dates

I have a list of cost figures with start dates and end dates which I need to split between months, I have searched for the solution to this problem but cannot seem to find one that will work with partial months i.e.( startdate:01/01/2015 enddate: 15/04/2015 cost:10000) which would leave figures like Jan:2857, Feb:2857, Mar:2857, Apr:1429.
I have been trying to modify this example: http://www.excel-university.com/excel-formula-to-allocate-an-amount-into-monthly-columns/ but having no luck getting the partial months working.
Any suggestions or help would be most welcome. Thanks in Advance
if you calculate it on daily basis, would it be ok? the result would be:
01.01.2015 01.02.2015 01.03.2015 15.04.2015
2.857,14 2.857,14 2.857,14 1.428,57
your daily amount is:
=10.000/(DAYS360(startdate;enddate;TRUE)+1)
(be carefull of true and false argument)
under the dates or instead of 2.857,14 etc. insert the formula:
=IF(DAY("your date")>1;DAY("your date");30) * daily amount
This formula assumes that you want to have 30 days in each month:
=IF(DAY(01.01.2015)>1;DAY(01.01.2015);30)
result = 30
=IF(DAY(15.04.2015)>1;DAY(15.04.2015);30)
result = 15
so if months begins with a date different from the 1st it will give you the number of days.
if you want to match months with your startdate and enddate (if i understood your comment correctly), you could do:
=IF(OR(
AND(MONTH(startdate)=MONTH(your date);YEAR(startdate)=YEAR(your date));
AND(MONTH(enddate)=MONTH(your date);YEAR(enddate)=YEAR(your date))
);"match";"no match")
by this you make sure that month and year correspond.
If you want to get the number of days in a month automatically, you could use:
=DAY(DATE(YEAR("your date");MONTH("your date")+1;1)-1)
but this does not assume anymore 30 days, you can change it with if statement
I hope this helps,
Best - AB

Formula to compare time values

Below excel formula is working fine but in some cases its not give me proper value .
Input:
19:20:42
24:58:36
26:11:18
After using this formula:
=IF(TIMEVALUE(K7)>TIMEVALUE("09:00:00"),TRUE,FALSE)
I got the below output:
FALSE
TRUE
TRUE
What I Observe if the time value is > or = 24:00:00 it will not give me the proper answer.
How do I fix this?
As an alternative to Captain's excellent answer, you could also use:
=IF(K7>(9/24),TRUE,FALSE)
DateTime values are internally stored as a number of days since 1/1/1900,
therefore 1 = 1 day = 24 hours. So 9/24 = 0.375 = 9 hours :-)
You can easily verify this by clearing the format of your DateTime cells.
Edit: note that such Boolean formula can be expressed in a shorter way without losing legibility:
=K7>(9/24)
When you go over 24 hours, Excel counts it as the next day... and then the TIMEVALUE is the time the next day (i.e. 00:58:36 and 02:11:18 in your examples) and can, therefore, be before 0900.
You could do DATEVALUE(K7)+TIMEVALUE(K7) to ensure that you count the day part too...

Resources