check amount of time between different rows of data (time) and date and name of employee - python-3.x

I have a df with this info ['Name', 'Department', 'Date', 'Time', 'Activity'],
so for example looks like this:
Acosta, Hirto 225 West 28th Street 9/18/2019 07:25:00 Punch In
Acosta, Hirto 225 West 28th Street 9/18/2019 11:57:00 Punch Out
Acosta, Hirto 225 West 28th Street 9/18/2019 12:28:00 Punch In
Adams, Juan 225 West 28th Street 9/16/2019 06:57:00 Punch In
Adams, Juan 225 West 28th Street 9/16/2019 12:00:00 Punch Out
Adams, Juan 225 West 28th Street 9/16/2019 12:28:00 Punch In
Adams, Juan 225 West 28th Street 9/16/2019 15:30:00 Punch Out
Adams, Juan 225 West 28th Street 9/18/2019 07:04:00 Punch In
Adams, Juan 225 West 28th Street 9/18/2019 11:57:00 Punch Out
I need to calculate the time between the punch in and the punch out in the same day for the same employee.
i manage to just clean the data
like:
self.raw_data['Time'] = pd.to_datetime(self.raw_data['Time'], format='%H:%M').dt.time
sorted_db = self.raw_data.sort_values(['Name', 'Date'])
sorted_db = sorted_db[['Name', 'Department', 'Date', 'Time', 'Activity']]
any suggestions will be appreciated

so i found the answer of my problem and i wanted to share it.
first a separate the "Punch in" and the "Punch Out" if two columns
def process_info(self):
# filter data and organized --------------------------------------------------------------
self.raw_data['in'] = self.raw_data[self.raw_data['Activity'].str.contains('In')]['Time']
self.raw_data['pre_out'] = self.raw_data[self.raw_data['Activity'].str.contains('Out')]['Time']
after i sort the information base in date and time
sorted_data = self.raw_data.sort_values(['Date', 'Name'])
after that i use the shift function to move on level up the 'out' column so in parallel with the in.
sorted_data['out'] = sorted_data.shift(-1)['Time']
and finally i take out the extra out columns that was created in the first step. but checking if it is by itself.
filtered_data = sorted_data[sorted_data['pre_out'].isnull()]

Related

echo coming in new line in bash

My code is
cd /home/XXX/db-new
while read -r line; do
data=$(echo $line | awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "", $i) } 1' | awk '{gsub(/\"/,"")};1' | tr -d \'\" )
d2=$(echo $data | awk -F, '{print $2}')
d3=$(echo $data | awk -F, '{print $3}')
d17=$(echo $data | awk -F, '{print $17}')
d4=$(echo $data | awk -F, '{print $4","$5","$6","$7","$8","$9","$10","$11","$12","$13","$14","$15","$16","$17","$18","$19","$20","$21","$22","$23","$24","$25","$26","$27","$28","$29","$30","$31","$32","$33","$34","$35","$36","$37","$38","$39","$40","$45","$46","$47","$48","$49","$50","$51","$52","$53","$54","$55","$56","$57","$58}')
d1=$d2+$d3
d59=$(echo $d2 | cut -d "." -f 2,3)
d60=$(echo $data | awk -F, '{print $19}' | awk 'BEGIN{FS=OFS=","} {gsub(/[[:punct:] ]/,"",$1)} 1' | sed 's/[^0-9]*//g' )
echo $d1,$d2,$d4,$d59,$d17,$d60 >> abc.csv
done < /home/XXX/db-new/2021-09-04.csv
/home/domainsanalytics/db-new/2021-09-04.csv is very big so I add only 1st 3 lines.
head -3 /home/domainsanalytics/db-new/2021-09-04.csv
"num","domain_name","query_time","create_date","update_date","expiry_date","domain_registrar_id","domain_registrar_name","domain_registrar_whois","domain_registrar_url","registrant_name","registrant_company","registrant_address","registrant_city","registrant_state","registrant_zip","registrant_country","registrant_email","registrant_phone","registrant_fax","administrative_name","administrative_company","administrative_address","administrative_city","administrative_state","administrative_zip","administrative_country","administrative_email","administrative_phone","administrative_fax","technical_name","technical_company","technical_address","technical_city","technical_state","technical_zip","technical_country","technical_email","technical_phone","technical_fax","billing_name","billing_company","billing_address","billing_city","billing_state","billing_zip","billing_country","billing_email","billing_phone","billing_fax","name_server_1","name_server_2","name_server_3","name_server_4","domain_status_1","domain_status_2","domain_status_3","domain_status_4"
"1","accounting-fwppool.com","2021-09-04 00:53:04","2021-08-10","2021-08-10","2022-08-10","303","PDR Ltd. d/b/a PublicDomainRegistry.com","whois.publicdomainregistry.com","http://www.publicdomainregistry.com","Micael brown","","4941 Maui Cir Huntington Beach, CA 92649","CA","CA","92649","United States","michbrown7654gh#gmail.com","+1.9169136369","","Micael brown","","4941 Maui Cir Huntington Beach, CA 92649","CA","CA","92649","United States","michbrown7654gh#gmail.com","+1.9169136369","","Micael brown","","4941 Maui Cir Huntington Beach, CA 92649","CA","CA","92649","United States","michbrown7654gh#gmail.com","+1.9169136369","","","","","","","","","","","","ns1.verification-hold.suspended-domain.com","ns2.verification-hold.suspended-domain.com","","","clientTransferProhibited","","",""
"2","xjava.com","2021-09-04 00:53:11","2001-03-06","2021-03-12","2022-03-06","472","Dynadot, LLC","whois.dynadot.com","http://www.dynadot.com","Super Privacy Service LTD c/o Dynadot","","PO Box 701","San Mateo","California","94401","United States","xjava.com#superprivacyservice.com","+1.6505854708","","Super Privacy Service LTD c/o Dynadot","","PO Box 701","San Mateo","California","94401","United States","xjava.com#superprivacyservice.com","+1.6505854708","","Super Privacy Service LTD c/o Dynadot","","PO Box 701","San Mateo","California","94401","United States","xjava.com#superprivacyservice.com","+1.6505854708","","","","","","","","","","","","ns1.sedoparking.com","ns2.sedoparking.com","","","clientTransferProhibited","","",""
My code give me result good, but $59 ,$17 and $60 is coming in new line...
$59 is just tld i am getting,
$17 is reprint of country,
$60 is phone number without special characters
All I want is all in 1 row
My output is
domain_name+query_time domain_name create_date update_date expiry_date domain_registrar_id domain_registrar_name domain_registrar_whois domain_registrar_url registrant_name registrant_company registrant_address registrant_city registrant_state registrant_zip registrant_country registrant_email registrant_phone registrant_fax administrative_name administrative_company administrative_address administrative_city administrative_state administrative_zip administrative_country administrative_email administrative_phone administrative_fax technical_name technical_company technical_address technical_city technical_state technical_zip technical_country technical_email technical_phone technical_fax billing_state billing_zip billing_country billing_email billing_phone billing_fax name_server_1 name_server_2 name_server_3 name_server_4 domain_status_1 domain_status_2 domain_status_3 domain_status_4
domain_name registrant_country
accounting-fwppool.com+2021-09-04 00:53:04 accounting-fwppool.com 10/08/21 10/08/21 10/08/22 303 PDR Ltd. d/b/a PublicDomainRegistry.com whois.publicdomainregistry.com http://www.publicdomainregistry.com Micael brown 4941 Maui Cir Huntington Beach CA 92649 CA CA 92649 United States michbrown7654gh#gmail.com 1.916913637 Micael brown 4941 Maui Cir Huntington Beach CA 92649 CA CA 92649 United States michbrown7654gh#gmail.com 1.916913637 Micael brown 4941 Maui Cir Huntington Beach CA 92649 CA CA 92649 United States michbrown7654gh#gmail.com 1.916913637 ns1.verification-hold.suspended-domain.com ns2.verification-hold.suspended-domain.com clientTransferProhibited
com United States 19169136369
xjava.com+2021-09-04 00:53:11 xjava.com 06/03/01 12/03/21 06/03/22 472 Dynadot LLC whois.dynadot.com http://www.dynadot.com Super Privacy Service LTD c/o Dynadot PO Box 701 San Mateo California 94401 United States xjava.com#superprivacyservice.com 1.650585471 Super Privacy Service LTD c/o Dynadot PO Box 701 San Mateo California 94401 United States xjava.com#superprivacyservice.com 1.650585471 Super Privacy Service LTD c/o Dynadot PO Box 701 San Mateo California 94401 United States xjava.com#superprivacyservice.com 1.650585471 ns1.sedoparking.com ns2.sedoparking.com clientTransferProhibited
com United States 16505854708
accuratetactics.com+2021-09-04 00:53:14 accuratetactics.com 26/08/20 30/08/21 26/08/21 1660 Domainshype.com Inc. whois.domainshype.com http://www.domainshype.com This Domain For Sale Worldwide 339 222 5132 Buydomains.com 738 Main Street #389 Waltham Massachusetts 2451 United States brokerage#buydomains.com 1.339222513 1.78183928 This Domain For Sale Worldwide 339 222 5132 Buydomains.com 738 Main Street #389 Waltham Massachusetts 2451 United States brokerage#buydomains.com 1.339222513 1.78183928 This Domain For Sale Worldwide 339 222 5132 Buydomains.com 738 Main Street #389 Waltham Massachusetts 2451 United States brokerage#buydomains.com 1.339222513 1.78183928 dns7.parkpage.foundationapi.com dns8.parkpage.foundationapi.com OK
com United States 13392225132
vej.com+2021-09-04 00:53:16 vej.com 16/09/99 31/08/21 16/09/23 128 DomainRegistry.com Inc. nswhois.domainregistry.com http://www.domainregistry.com Scottcraft Label Co. Scottcraft Label Co. c/o Admin Svcs. PO Box 145 Marlton NJ 8053 United States itadmin#scottcraftlabel.com 1.215870212 IT Admin MS 445 Scottcraft Label Co. c/o Admin Svcs. PO Box 145 Marlton NJ 8053 United States itadmin#scottcraftlabel.com 1.215870212 IT Admin MS 445 Scottcraft Label Co. c/o Admin Svcs. PO Box 145 Marlton NJ 8053 United States itadmin#scottcraftlabel.com 1.215870212 colohost1.domainregistry.com cs03.domainregistry.com clientDeleteProhibited clientTransferProhibited clientUpdateProhibited
com United States 12158702120
accutekware.com+2021-09-04 00:53:24 accutekware.com 26/08/03 26/08/21 26/08/21 303 PDR Ltd. d/b/a PublicDomainRegistry.com whois.publicdomainregistry.com http://www.publicdomainregistry.com R Benedict Accutek Systems Inc PO Box 591125 Houston Texas 77259 United States rbeny09#hotmail.com 1.281461701 R Benedict Accutek Systems Inc PO Box 591125 Houston Texas 77259 United States rbeny09#hotmail.com 1.281461701 R Benedict Accutek Systems Inc PO Box 591125 Houston Texas 77259 United States rbeny09#hotmail.com 1.281461701 dns10.parkpage.foundationapi.com dns11.parkpage.foundationapi.com clientTransferProhibited
com United States 12814617007
crmxon.com+2021-09-04 00:53:27 crmxon.com 04/09/20 04/11/20 04/09/21 303 PDR Ltd. d/b/a PublicDomainRegistry.com whois.publicdomainregistry.com http://www.publicdomainregistry.com GDPR Masked GDPR Masked GDPR Masked GDPR Masked Newcastleupon Tyne(Cityof) GDPR Masked United Kingdom gdpr-masking#gdpr-masked.com GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked gdpr-masking#gdpr-masked.com GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked gdpr-masking#gdpr-masked.com GDPR Masked GDPR Masked ns1.edagent.com ns2.edagent.com ns3.edagent.com ns4.edagent.com clientTransferProhibited
com United Kingdom
Expected output
domain_name+query_time domain_name create_date update_date expiry_date domain_registrar_id domain_registrar_name domain_registrar_whois domain_registrar_url registrant_name registrant_company registrant_address registrant_city registrant_state registrant_zip registrant_country registrant_email registrant_phone registrant_fax administrative_name administrative_company administrative_address administrative_city administrative_state administrative_zip administrative_country administrative_email administrative_phone administrative_fax technical_name technical_company technical_address technical_city technical_state technical_zip technical_country technical_email technical_phone technical_fax billing_state billing_zip billing_country billing_email billing_phone billing_fax name_server_1 name_server_2 name_server_3 name_server_4 domain_status_1 domain_status_2 domain_status_3 domain_status_4 domain_name registrant_country
accounting-fwppool.com+2021-09-04 00:53:04 accounting-fwppool.com 10/08/21 10/08/21 10/08/22 303 PDR Ltd. d/b/a PublicDomainRegistry.com whois.publicdomainregistry.com http://www.publicdomainregistry.com Micael brown 4941 Maui Cir Huntington Beach CA 92649 CA CA 92649 United States michbrown7654gh#gmail.com 1.91691364 Micael brown 4941 Maui Cir Huntington Beach CA 92649 CA CA 92649 United States michbrown7654gh#gmail.com 1.91691364 Micael brown 4941 Maui Cir Huntington Beach CA 92649 CA CA 92649 United States michbrown7654gh#gmail.com 1.91691364 ns1.verification-hold.suspended-domain.com ns2.verification-hold.suspended-domain.com clientTransferProhibited com United States 1.9169E+10
xjava.com+2021-09-04 00:53:11 xjava.com 06/03/01 12/03/21 06/03/22 472 Dynadot LLC whois.dynadot.com http://www.dynadot.com Super Privacy Service LTD c/o Dynadot PO Box 701 San Mateo California 94401 United States xjava.com#superprivacyservice.com 1.65058547 Super Privacy Service LTD c/o Dynadot PO Box 701 San Mateo California 94401 United States xjava.com#superprivacyservice.com 1.65058547 Super Privacy Service LTD c/o Dynadot PO Box 701 San Mateo California 94401 United States xjava.com#superprivacyservice.com 1.65058547 ns1.sedoparking.com ns2.sedoparking.com clientTransferProhibited com United States 1.6506E+10
accuratetactics.com+2021-09-04 00:53:14 accuratetactics.com 26/08/20 30/08/21 26/08/21 1660 Domainshype.com Inc. whois.domainshype.com http://www.domainshype.com This Domain For Sale Worldwide 339 222 5132 Buydomains.com 738 Main Street #389 Waltham Massachusetts 2451 United States brokerage#buydomains.com 1.33922251 1.78183928 This Domain For Sale Worldwide 339 222 5132 Buydomains.com 738 Main Street #389 Waltham Massachusetts 2451 United States brokerage#buydomains.com 1.33922251 1.78183928 This Domain For Sale Worldwide 339 222 5132 Buydomains.com 738 Main Street #389 Waltham Massachusetts 2451 United States brokerage#buydomains.com 1.33922251 1.78183928 dns7.parkpage.foundationapi.com dns8.parkpage.foundationapi.com OK com United States 1.3392E+10
Suggesting single awk script to process all the data:
Staring with this:
script.awk
BEGIN{FS="\",\"|\"[[:space:]]*$|^[[:space:]]*\""; OFS=" "}
{
$1=$1; # recalculate fields
# num field start from $2
arr[1] = $3 "+" $4;
arr[2] = $4;
arr[4] = $5;
# right append to arr[4] fields 6-41
for (i = 6; i <= 41; i++) arr[4] = arr[4] "," $i;
# right append to arr[4] fields 46-59
for (i = 46; i <= 59; i++) arr[4] = arr[4] "," $i;
arr[17] = $18;
arr[59 ] = $3;
# in 3rd field remove text after first "."
sub(/\..*$/,"",arr[59]);
# remove all punctuations and digits from 20th field
gsub(/[[:punct:]]|[[:digit:]]*/,"",$20);
arr[60] = $20;
# output to stdout
print arr[1],arr[2],arr[4],arr[59],arr[17],arr[60];
}
Running:
awk -f script.awk input.csv > output.csv
Did not test since the sample data did not contain numeric values.

How do I merge two rows with the same value but keep all the other column data for both in Excel?

This should be pretty simple I think.
In Excel I've got a list of data I'm gathering from various sources:
name
time 1
time 2
time 3
Time 4
jimmy
00:30
1:30
john
01:09
1:45
bobby
01:09
2:49
elaine
00:39
1:19
greg
01:09
1:45
jimmy
0:33
1:29
bobby
0:45
1:15
elaine
1:24
2:01
jack
0:10
0:50
Desired result:
name
time 1
time 2
time 3
Time 4
jimmy
00:30
1:30
0:33
1:29
john
01:09
1:45
bobby
01:09
2:49
0:45
1:15
elaine
00:39
1:19
1:24
2:01
greg
01:09
1:45
jack
0:10
0:50
I'm either not knowing the proper way to search for this or something because my normally pretty good google-fu is failing me today.
Edit to clarify:
name
time 1
time 2
time 3
Time 4
jimmy
Burger
HotDog
john
Salami
Samosa
bobby
Burger
Paella
elaine
Sorbet
Muffin
greg
HotDog
Wonton
jimmy
Tamale
Waffle
bobby
Paella
Tamale
elaine
Waffle
Toffee
jack
Quinoa
Kiwano
name
time 1
time 2
time 3
Time 4
jimmy
Burger
HotDog
Tamale
Waffle
john
Salami
Samosa
bobby
Burger
Paella
Paella
Tamale
elaine
Sorbet
Muffin
Waffle
Toffee
greg
HotDog
Wonton
jack
Quinoa
Kiwano
Rollback
If you have Excel-365 then use below formulas.
G2 cell =UNIQUE(A2:A10)
H2 cell =#FILTER(B$2:B$10,($A$2:$A$10=$G2)*(B$2:B$10<>""),"")
Drag down and across H2 cell formula as needed.
You can use sumifs formula.
=SUMIFS(B:B,$A:$A,$K2)
for example,
i have kept the unique names in the K column
use sumifs formula for B , C , D , E column
change the format of the cells to time

Remove custom stop words from pandas dataframe not working

I am trying to remove a custom list of stop words, but its not working.
desc = pd.DataFrame(description, columns =['description'])
print(desc)
Which gives the following results
description
188693 The Kentucky Cannabis Company and Bluegrass He...
181535 Ohio County Sheriff
11443 According to new reports from federal authorit...
213919 KANSAS CITY, Mo. (AP)The Chiefs will be withou...
171509 The crew of Insight, WCNY's weekly public affa...
... ...
2732 The Arkansas Supreme Court on Thursday cleared...
183367 Larry Pegram, co-owner of Pure Ohio Wellness, ...
134291 Joe Biden will spend the next five months pres...
239270 Find out where your Texas representatives stan...
246070 SAN TAN VALLEY — Two men have been charged wit...
[9875 rows x 1 columns]
I found the following code here, but it doesn't seem to work
remove_words = ["marijuana", "cannabis", "hemp", "thc", "cbd"]
pat = '|'.join([r'\b{}\b'.format(w) for w in remove_words])
desc.assign(new_desc=desc.replace(dict(string={pat: ''}), regex=True))
Which produces the following results
description new_desc
188693 The Kentucky Cannabis Company and Bluegrass He... The Kentucky Cannabis Company and Bluegrass He...
181535 Ohio County Sheriff Ohio County Sheriff
11443 According to new reports from federal authorit... According to new reports from federal authorit...
213919 KANSAS CITY, Mo. (AP)The Chiefs will be withou... KANSAS CITY, Mo. (AP)The Chiefs will be withou...
171509 The crew of Insight, WCNY's weekly public affa... The crew of Insight, WCNY's weekly public affa...
... ... ...
2732 The Arkansas Supreme Court on Thursday cleared... The Arkansas Supreme Court on Thursday cleared...
183367 Larry Pegram, co-owner of Pure Ohio Wellness, ... Larry Pegram, co-owner of Pure Ohio Wellness, ...
134291 Joe Biden will spend the next five months pres... Joe Biden will spend the next five months pres...
239270 Find out where your Texas representatives stan... Find out where your Texas representatives stan...
246070 SAN TAN VALLEY — Two men have been charged wit... SAN TAN VALLEY — Two men have been charged wit...
9875 rows × 2 columns
As you can see, the stop words weren't removed. Any help you can provide would be greatly appreciated.
Handle the case, simplify pattern,
remove_words = ["marijuana", "cannabis", "hemp", "thc", "cbd"]
pat = '|'.join(remove_words)
desc['new_desc'] = desc.description.str.lower().replace(pat,'', regex=True)
description new_desc
0 The Kentucky Cannabis Company and Bluegrass He... the kentucky company and bluegrass he...
1 Ohio County Sheriff ohio county sheriff
2 According to new reports from federal authorit... according to new reports from federal authorit...
3 KANSAS CITY, Mo. (AP)The Chiefs will be mariju... kansas city, mo. (ap)the chiefs will be witho...
4 The crew of Insight, WCNY's weekly public affa... the crew of insight, wcny's weekly public affa...

How to identify where each person have lived in different cities in each time?

Here is a small set of the dataset that I am currently working on.
FirstName LastName cities occupation time
---------------------------------------------------------------
---------------------------------------------------------------
Alice Oumi Queens software engineer 1/1/2019
Alice Oumi New York software engineer 12/3/2018
Sam Charles Santa Clara Engineer 2/5/2017
Sam Charles Santa Monica Engineer 8/9/2018
Sam Charles Santa Clara Engineer 12/12/2019
Alice Oumi New York software engineer 1/2/2017
As you see above, the same person could be living in a same place but for a different duration of a time. I want to make clean this dataset that should what places did Alice and Sam live. For example, instead of having 2 rows of Alice living in New York, I only need to have one. Something similar to the following table
FirstName LastName cities FirstTime SecondTime
---------------------------------------------------------------
---------------------------------------------------------------
Alice Oumi Queens 1/1/2019 NA
Alice Oumi New York 1/2/2017 12/3/2018
Sam Charles Santa Clara 2/5/2017 12/12/2019
Sam Charles Santa Monica 8/9/2018 NA
I am kinda new to python and trying to learn. but i have tried to use for loops using iterrows() but didn't work.
What can use to achieve this table?
Thank you so much in advance
You can do that as follows:
# number the times a person lived in the same city (with the same occupation)
df['sequence']= df.groupby(['FirstName', 'LastName', 'cities', 'occupation']).cumcount()+1
# now create the "pivot" table
result= df.set_index(['FirstName', 'LastName', 'cities', 'occupation', 'sequence']).unstack()
# rename the columns
result.columns= ['FirstTime', 'SecondTime']
# reset the index (it was just needed for "pivoting"
result.reset_index(inplace=True)
The result looks like:
Out[483]:
FirstName LastName cities occupation FirstTime SecondTime
0 Alice Oumi New York software engineer 12/3/2018 1/2/2017
1 Alice Oumi Queens software engineer 1/1/2019 NaN
2 Sam Charles Santa Clara Engineer 2/5/2017 12/12/2019
3 Sam Charles Santa Monica Engineer 8/9/2018 None NaN

Multi Criterion Max If Statement

My dataset looks like this...
State Close Date Probability Highest Prob/State
WA 12/31/2016 50% FALSE
WA 12/19/2016 80% FALSE
WA 10/15/2016 80% TRUE
My objective is to build a formula to populate the right-most column. The formula should assess Close Dates and Probabilities within each state. First, it should select the highest probability, then it should select the nearest close date if there is a tie on probability (as in the example). For that record, it should read "TRUE".
I assume this would include a MAX IF statement but haven't been able to get it to work.
Here is a more robust set of data I'm working with. It may actually be easier to first find the highest probability within each Region then select the minimum (oldest) date if there is a tie on probability. This too will serve my purposes.
Region Forecast Close Date Probability (%)
Okeechobee FL 6/27/2016 90
Okeechobee West FL 7/1/2016 40
Albany GA 3/11/2016 100
Emerald Coast FL 6/30/2016 60
Emerald Coast FL 10/1/2016 40
Cullman_Hartselle TN 4/30/2016 10
North MS 10/1/2016 25
Roanoke VA 8/31/2016 25
Roanoke VA 8/1/2016 40
Gardena CA 6/1/2016 80
Gardena CA 6/1/2016 80
Lomita-Harbor City 6/30/2016 60
Lomita-Harbor City 6/30/2016 0
Lomita-Harbor City 6/30/2016 40
Eastern NC 6/30/2016 60
Northwest NC 9/16/2016 10
Fort Collins_Greeley CO 3/1/2016 100
Northwest OK 6/30/2016 100
Southwest MO 7/29/2016 90
Northern NH-VT 3/1/2016 20
South DE 12/1/2016 0
South DE 12/1/2016 20
Kingston NY 12/30/2016 5
Longview WA 11/30/2016 5
North DE 12/1/2016 20
North DE 12/1/2016 0
Salt Lake City UT 8/31/2016 20
Idaho Panhandle 8/26/2016 0
Bridgeton_Salem NJ 7/1/2016 25
Bridgeton_Salem NJ 7/1/2016 65
Layton_Ogden UT 3/25/2016 5
Central OR 6/30/2016 10
The following Array formula should work:
=(ABS(B2-$F$2)=MIN(IF(($A$2:$A$33=A2)*(C2=MAX(IF($A$2:$A$33=A2,$C$2:$C$33))),ABS($B$2:$B$33-$F$2))))*(C2=MAX(IF($A$2:$A$33=A2,$C$2:$C$33)))>0
Being an array formula use Ctrl-Shift-Enter when exiting Edit mode. If done properly Excel will put {} around the formula.
Edit
Added #tigeravatar suggestion to avoid volatile functions.
I think this is OK now but needs to be checked against the more complete set of data provided by OP.
It counts:-
(1) Any rows with same state but higher probability
(2) Any rows with same state and probability, in the future (or present) and nearer to today's date
(3) Any rows with same state and probability, in the past and nearer to today's date.
If all these are zero, you should have the right one.
=COUNTIFS($A$2:$A$100,$A2,$C$2:$C$100,">"&$C2)
+COUNTIFS($A$2:$A$100,$A2,$C$2:$C$100,$C2,$B$2:$B$100,"<"&$G$2+IF ($B2>=$G$2,DATEDIF($G$2,$B2,"d"),DATEDIF($B2,$G$2,"d")),$B$2:$B$100,">="&$G$2)
+COUNTIFS($A$2:$A$100,$A2,$C$2:$C$100,$C2,$B$2:$B$100,">"&$G$2-IF($B2>=$G$2,DATEDIF($G$2,$B2,"d"),DATEDIF($B2,$G$2,"d")),$B$2:$B$100,"<"&$G$2)
=0
If the dates are all in the future, it can be simplified a lot:-
=COUNTIFS($A$2:$A$100,$A2,$C$2:$C$100,">"&$C2)
+COUNTIFS($A$2:$A$100,$A2,$C$2:$C$100,$C2,$B$2:$B$100,"<"&$G$2+DATEDIF($G$2,$B2,"d"))
=0

Resources