Index Match Cell Contains Text - excel

I have no idea what I'm doing wrong but I'm getting an N/A error.
I have a cell that contains comma separated addresses. If one of the addresses matches an address in another table, I need a value from another cell in the prior table.
INDEX( GoogleFormResults[B], MATCH("*"&[#Address]&"*", GoogleFormResults[A],0))
Example Address in the Table:
707 W Cesar Chavez Ave LOS ANGELES CA 90067
Example GoogleFormResults[A]
4101 Crenshaw Blvd LOS ANGELES CA 90008, 707 W Cesar Chavez Ave LOS ANGELES CA 90067, 6820 Eastern Ave BELL GARDENS CA 90201, 12270 Paramount Blvd DOWNEY CA 90242, 1399 Artesia Blvd. GARDENA CA 90247, 14441 Inglewood Ave HAWTHORNE CA 90250, 4651 Firestone Blvd SOUTH GATE CA 90280, 5871 Firestone Blvd SOUTH GATE CA 90280, 19503 Normandie Ave TORRANCE CA 90501, 19340 Hawthorne Blvd TORRANCE CA 90503, 22015 Hawthorne Blvd TORRANCE CA 90503, 2601 Skypark Dr TORRANCE CA 90505, 8450 La Palma Ave BUENA PARK CA 90620, 5420 Lapalma Ave LA PALMA CA 90623, 1000 E Imperial Hwy LA HABRA CA 90631, 1340 SOUTH BEACH BLVD LA HABRA CA 90631, 1390 S. Beach Blvd. LA HABRA CA 90631, 14865 Telegraph Rd LA MIRADA CA 90638, 11729 Imperial Hwy NORWALK CA 90650, 8500 Washington Blvd PICO RIVERA CA 90660, 13310 TELEGRAPH ROAD SANTA FE SPRINGS CA 90670, 12540 Beach Blvd. STANTON CA 90680, 12840 Beach Blvd STANTON CA 90680, 12701 TOWNE CENTER D CERRITOS CA 90703, 2770 CARSON STREET LAKEWOOD CA 90712, 12120 Carson Street HAWAIIAN GARDENS CA 90716, 14501 LAKEWOOD BLVD PARAMOUNT CA 90723, 20226 AVALON BLVD. CARSON CA 90746, 151 EAST 5TH STREET LONG BEACH CA 90802, 3705 E. South Street LONG BEACH CA 90805, 7250 Carson Blvd LONG BEACH CA 90808, 7480 Carson Blvd LONG BEACH CA 90808, 6750 Kimball Ave Chino CA 91708, 3943 Grand Ave CHINO CA 91710, 3951 Grand Ave CHINO CA 91710, 1275 N Azusa Ave COVINA CA 91722, 4901 Santa Anita Ave EL MONTE CA 91731, 1425 N Hacienda Blvd LA PUENTE CA 91744, 17150 Gale Ave CITY OF INDUSTRY CA 91745, 17835 E. Gale Ave. HACIENDA HEIGHTS CA 91745, 4155 Wineville Ave MIRA LOMA CA 91752, 4250 Hamner Ave MIRA LOMA CA 91752, 1333 N Mountain Ave ONTARIO CA 91762, 951 N. Milliken Ave. ONTARIO CA 91764, 1180 S Diamond Bar Blvd Diamond Bar CA 91765, 80 RIO RANCHO ROAD POMONA CA 91766, 780 E Arrow Hwy Pomona CA 91767, 1827 WALNUT GROVE BLVD ROSEMEAD CA 91770, 1445 E Foothill Blvd UPLAND CA 91786, 1540 W. FOOTHILL BLVD UPLAND CA 91786, 2735 E Eastland Center Dr WEST COVINA CA 91791, 1550 Leucadia Blvd ENCINITAS CA 92024, 1266 E Valley Parkway ESCONDIDO CA 92025, 1330 East Grand Ave ESCONDIDO CA 92027, 1046 Mission Ave OCEANSIDE CA 92054, 2100 VISTA WAY OCEANSIDE CA 92054, 3405 Marron Rd OCEANSIDE CA 92056, 705 COLLEGE BLVD OCEANSIDE CA 92057, 2121 Imperial Ave SAN DIEGO CA 92102, 4840 Shawline St SAN DIEGO CA 92111, 3412 COLLEGE AVE. SAN DIEGO CA 92115, 6336 College Grove Way SAN DIEGO CA 92115, 3382 Murphy Canyon Rd SAN DIEGO CA 92123, 575 Saturn Blvd SAN DIEGO CA 92154, 710 DENNERY ROAD SAN DIEGO CA 92154, 13553-A San Bernardino Avenue FONTANA CA 92334, 4210 EAST HIGHLAND A HIGHLAND CA 92346, 16555 Von Karman Ave IRVINE CA 92606, 26502 TOWNE CENTER DRI FOOTHILL RANCH CA 92610, 71 Technology Dr IRVINE CA 92618, 8230 TALBERT AVENUE HUNTINGTON BEACH CA 92646, 6912 Edinger Ave HUNTINGTON BEACH CA 92647, 21134 Beach Blvd HUNTINGTON BEACH CA 92648, 951 Avenida Pico SAN CLEMENTE CA 92673, 27470 ALICIA PKWY LAGUNA NIGUEL CA 92677, 13331 BEACH BLVD WESTMINSTER CA 92683, 30491 Avenida De Las Flores RANCHO SANTA MARGARITA CA 92688, 3600 W McFadden Ave SANTA ANA CA 92704, 17099 Brookhurst St. FOUNTAIN VALLEY CA 92708, 121 N Beach Ave ANAHEIM CA 92801, 440 N Euclid St ANAHEIM CA 92801, 1120 S Anaheim Blvd ANAHEIM CA 92805, Lemon and Orange thorpe ANAHEIM CA 92817, 2595 EAST IMPERIAL HGWY BREA CA 92821, 629 S. Placentia Ave. FULLERTON CA 92831, 10912 Katella Ave GARDEN GROVE CA 92840, 11822 Gilbert St GARDEN GROVE CA 92841, 2300 NORTH TUSTIN ST ORANGE CA 92865, 479 N McKinley St CORONA CA 92879, 1290 E Ontario Ave CORONA CA 92881, 1375 E. Ontario Ave. CORONA CA 92881, 1560 West 6th St CORONA CA 92882

Here is an example of using FIND to return a value in some other column related to the cell where there is a partial match.
If you execute FIND against the column of the table, it will return an array of #VALUE and a number depending on whether the item exists in the string. You can then do a LOOKUP to find the last match by making the LOOKUP value a very large number, and the result vector the column you want to return.
You should be able to adapt the following to your data:
Given this Table: (note your long list of addresses is in B3)
and with the Address you are searching for in B9, you can use this formula to return the contents of the cell in column D:
=LOOKUP(1E+307,FIND(B9,Table1[Long List of Addresses]),Table1[Cell to Return])

Related

echo coming in new line in bash

My code is
cd /home/XXX/db-new
while read -r line; do
data=$(echo $line | awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "", $i) } 1' | awk '{gsub(/\"/,"")};1' | tr -d \'\" )
d2=$(echo $data | awk -F, '{print $2}')
d3=$(echo $data | awk -F, '{print $3}')
d17=$(echo $data | awk -F, '{print $17}')
d4=$(echo $data | awk -F, '{print $4","$5","$6","$7","$8","$9","$10","$11","$12","$13","$14","$15","$16","$17","$18","$19","$20","$21","$22","$23","$24","$25","$26","$27","$28","$29","$30","$31","$32","$33","$34","$35","$36","$37","$38","$39","$40","$45","$46","$47","$48","$49","$50","$51","$52","$53","$54","$55","$56","$57","$58}')
d1=$d2+$d3
d59=$(echo $d2 | cut -d "." -f 2,3)
d60=$(echo $data | awk -F, '{print $19}' | awk 'BEGIN{FS=OFS=","} {gsub(/[[:punct:] ]/,"",$1)} 1' | sed 's/[^0-9]*//g' )
echo $d1,$d2,$d4,$d59,$d17,$d60 >> abc.csv
done < /home/XXX/db-new/2021-09-04.csv
/home/domainsanalytics/db-new/2021-09-04.csv is very big so I add only 1st 3 lines.
head -3 /home/domainsanalytics/db-new/2021-09-04.csv
"num","domain_name","query_time","create_date","update_date","expiry_date","domain_registrar_id","domain_registrar_name","domain_registrar_whois","domain_registrar_url","registrant_name","registrant_company","registrant_address","registrant_city","registrant_state","registrant_zip","registrant_country","registrant_email","registrant_phone","registrant_fax","administrative_name","administrative_company","administrative_address","administrative_city","administrative_state","administrative_zip","administrative_country","administrative_email","administrative_phone","administrative_fax","technical_name","technical_company","technical_address","technical_city","technical_state","technical_zip","technical_country","technical_email","technical_phone","technical_fax","billing_name","billing_company","billing_address","billing_city","billing_state","billing_zip","billing_country","billing_email","billing_phone","billing_fax","name_server_1","name_server_2","name_server_3","name_server_4","domain_status_1","domain_status_2","domain_status_3","domain_status_4"
"1","accounting-fwppool.com","2021-09-04 00:53:04","2021-08-10","2021-08-10","2022-08-10","303","PDR Ltd. d/b/a PublicDomainRegistry.com","whois.publicdomainregistry.com","http://www.publicdomainregistry.com","Micael brown","","4941 Maui Cir Huntington Beach, CA 92649","CA","CA","92649","United States","michbrown7654gh#gmail.com","+1.9169136369","","Micael brown","","4941 Maui Cir Huntington Beach, CA 92649","CA","CA","92649","United States","michbrown7654gh#gmail.com","+1.9169136369","","Micael brown","","4941 Maui Cir Huntington Beach, CA 92649","CA","CA","92649","United States","michbrown7654gh#gmail.com","+1.9169136369","","","","","","","","","","","","ns1.verification-hold.suspended-domain.com","ns2.verification-hold.suspended-domain.com","","","clientTransferProhibited","","",""
"2","xjava.com","2021-09-04 00:53:11","2001-03-06","2021-03-12","2022-03-06","472","Dynadot, LLC","whois.dynadot.com","http://www.dynadot.com","Super Privacy Service LTD c/o Dynadot","","PO Box 701","San Mateo","California","94401","United States","xjava.com#superprivacyservice.com","+1.6505854708","","Super Privacy Service LTD c/o Dynadot","","PO Box 701","San Mateo","California","94401","United States","xjava.com#superprivacyservice.com","+1.6505854708","","Super Privacy Service LTD c/o Dynadot","","PO Box 701","San Mateo","California","94401","United States","xjava.com#superprivacyservice.com","+1.6505854708","","","","","","","","","","","","ns1.sedoparking.com","ns2.sedoparking.com","","","clientTransferProhibited","","",""
My code give me result good, but $59 ,$17 and $60 is coming in new line...
$59 is just tld i am getting,
$17 is reprint of country,
$60 is phone number without special characters
All I want is all in 1 row
My output is
domain_name+query_time domain_name create_date update_date expiry_date domain_registrar_id domain_registrar_name domain_registrar_whois domain_registrar_url registrant_name registrant_company registrant_address registrant_city registrant_state registrant_zip registrant_country registrant_email registrant_phone registrant_fax administrative_name administrative_company administrative_address administrative_city administrative_state administrative_zip administrative_country administrative_email administrative_phone administrative_fax technical_name technical_company technical_address technical_city technical_state technical_zip technical_country technical_email technical_phone technical_fax billing_state billing_zip billing_country billing_email billing_phone billing_fax name_server_1 name_server_2 name_server_3 name_server_4 domain_status_1 domain_status_2 domain_status_3 domain_status_4
domain_name registrant_country
accounting-fwppool.com+2021-09-04 00:53:04 accounting-fwppool.com 10/08/21 10/08/21 10/08/22 303 PDR Ltd. d/b/a PublicDomainRegistry.com whois.publicdomainregistry.com http://www.publicdomainregistry.com Micael brown 4941 Maui Cir Huntington Beach CA 92649 CA CA 92649 United States michbrown7654gh#gmail.com 1.916913637 Micael brown 4941 Maui Cir Huntington Beach CA 92649 CA CA 92649 United States michbrown7654gh#gmail.com 1.916913637 Micael brown 4941 Maui Cir Huntington Beach CA 92649 CA CA 92649 United States michbrown7654gh#gmail.com 1.916913637 ns1.verification-hold.suspended-domain.com ns2.verification-hold.suspended-domain.com clientTransferProhibited
com United States 19169136369
xjava.com+2021-09-04 00:53:11 xjava.com 06/03/01 12/03/21 06/03/22 472 Dynadot LLC whois.dynadot.com http://www.dynadot.com Super Privacy Service LTD c/o Dynadot PO Box 701 San Mateo California 94401 United States xjava.com#superprivacyservice.com 1.650585471 Super Privacy Service LTD c/o Dynadot PO Box 701 San Mateo California 94401 United States xjava.com#superprivacyservice.com 1.650585471 Super Privacy Service LTD c/o Dynadot PO Box 701 San Mateo California 94401 United States xjava.com#superprivacyservice.com 1.650585471 ns1.sedoparking.com ns2.sedoparking.com clientTransferProhibited
com United States 16505854708
accuratetactics.com+2021-09-04 00:53:14 accuratetactics.com 26/08/20 30/08/21 26/08/21 1660 Domainshype.com Inc. whois.domainshype.com http://www.domainshype.com This Domain For Sale Worldwide 339 222 5132 Buydomains.com 738 Main Street #389 Waltham Massachusetts 2451 United States brokerage#buydomains.com 1.339222513 1.78183928 This Domain For Sale Worldwide 339 222 5132 Buydomains.com 738 Main Street #389 Waltham Massachusetts 2451 United States brokerage#buydomains.com 1.339222513 1.78183928 This Domain For Sale Worldwide 339 222 5132 Buydomains.com 738 Main Street #389 Waltham Massachusetts 2451 United States brokerage#buydomains.com 1.339222513 1.78183928 dns7.parkpage.foundationapi.com dns8.parkpage.foundationapi.com OK
com United States 13392225132
vej.com+2021-09-04 00:53:16 vej.com 16/09/99 31/08/21 16/09/23 128 DomainRegistry.com Inc. nswhois.domainregistry.com http://www.domainregistry.com Scottcraft Label Co. Scottcraft Label Co. c/o Admin Svcs. PO Box 145 Marlton NJ 8053 United States itadmin#scottcraftlabel.com 1.215870212 IT Admin MS 445 Scottcraft Label Co. c/o Admin Svcs. PO Box 145 Marlton NJ 8053 United States itadmin#scottcraftlabel.com 1.215870212 IT Admin MS 445 Scottcraft Label Co. c/o Admin Svcs. PO Box 145 Marlton NJ 8053 United States itadmin#scottcraftlabel.com 1.215870212 colohost1.domainregistry.com cs03.domainregistry.com clientDeleteProhibited clientTransferProhibited clientUpdateProhibited
com United States 12158702120
accutekware.com+2021-09-04 00:53:24 accutekware.com 26/08/03 26/08/21 26/08/21 303 PDR Ltd. d/b/a PublicDomainRegistry.com whois.publicdomainregistry.com http://www.publicdomainregistry.com R Benedict Accutek Systems Inc PO Box 591125 Houston Texas 77259 United States rbeny09#hotmail.com 1.281461701 R Benedict Accutek Systems Inc PO Box 591125 Houston Texas 77259 United States rbeny09#hotmail.com 1.281461701 R Benedict Accutek Systems Inc PO Box 591125 Houston Texas 77259 United States rbeny09#hotmail.com 1.281461701 dns10.parkpage.foundationapi.com dns11.parkpage.foundationapi.com clientTransferProhibited
com United States 12814617007
crmxon.com+2021-09-04 00:53:27 crmxon.com 04/09/20 04/11/20 04/09/21 303 PDR Ltd. d/b/a PublicDomainRegistry.com whois.publicdomainregistry.com http://www.publicdomainregistry.com GDPR Masked GDPR Masked GDPR Masked GDPR Masked Newcastleupon Tyne(Cityof) GDPR Masked United Kingdom gdpr-masking#gdpr-masked.com GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked gdpr-masking#gdpr-masked.com GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked GDPR Masked gdpr-masking#gdpr-masked.com GDPR Masked GDPR Masked ns1.edagent.com ns2.edagent.com ns3.edagent.com ns4.edagent.com clientTransferProhibited
com United Kingdom
Expected output
domain_name+query_time domain_name create_date update_date expiry_date domain_registrar_id domain_registrar_name domain_registrar_whois domain_registrar_url registrant_name registrant_company registrant_address registrant_city registrant_state registrant_zip registrant_country registrant_email registrant_phone registrant_fax administrative_name administrative_company administrative_address administrative_city administrative_state administrative_zip administrative_country administrative_email administrative_phone administrative_fax technical_name technical_company technical_address technical_city technical_state technical_zip technical_country technical_email technical_phone technical_fax billing_state billing_zip billing_country billing_email billing_phone billing_fax name_server_1 name_server_2 name_server_3 name_server_4 domain_status_1 domain_status_2 domain_status_3 domain_status_4 domain_name registrant_country
accounting-fwppool.com+2021-09-04 00:53:04 accounting-fwppool.com 10/08/21 10/08/21 10/08/22 303 PDR Ltd. d/b/a PublicDomainRegistry.com whois.publicdomainregistry.com http://www.publicdomainregistry.com Micael brown 4941 Maui Cir Huntington Beach CA 92649 CA CA 92649 United States michbrown7654gh#gmail.com 1.91691364 Micael brown 4941 Maui Cir Huntington Beach CA 92649 CA CA 92649 United States michbrown7654gh#gmail.com 1.91691364 Micael brown 4941 Maui Cir Huntington Beach CA 92649 CA CA 92649 United States michbrown7654gh#gmail.com 1.91691364 ns1.verification-hold.suspended-domain.com ns2.verification-hold.suspended-domain.com clientTransferProhibited com United States 1.9169E+10
xjava.com+2021-09-04 00:53:11 xjava.com 06/03/01 12/03/21 06/03/22 472 Dynadot LLC whois.dynadot.com http://www.dynadot.com Super Privacy Service LTD c/o Dynadot PO Box 701 San Mateo California 94401 United States xjava.com#superprivacyservice.com 1.65058547 Super Privacy Service LTD c/o Dynadot PO Box 701 San Mateo California 94401 United States xjava.com#superprivacyservice.com 1.65058547 Super Privacy Service LTD c/o Dynadot PO Box 701 San Mateo California 94401 United States xjava.com#superprivacyservice.com 1.65058547 ns1.sedoparking.com ns2.sedoparking.com clientTransferProhibited com United States 1.6506E+10
accuratetactics.com+2021-09-04 00:53:14 accuratetactics.com 26/08/20 30/08/21 26/08/21 1660 Domainshype.com Inc. whois.domainshype.com http://www.domainshype.com This Domain For Sale Worldwide 339 222 5132 Buydomains.com 738 Main Street #389 Waltham Massachusetts 2451 United States brokerage#buydomains.com 1.33922251 1.78183928 This Domain For Sale Worldwide 339 222 5132 Buydomains.com 738 Main Street #389 Waltham Massachusetts 2451 United States brokerage#buydomains.com 1.33922251 1.78183928 This Domain For Sale Worldwide 339 222 5132 Buydomains.com 738 Main Street #389 Waltham Massachusetts 2451 United States brokerage#buydomains.com 1.33922251 1.78183928 dns7.parkpage.foundationapi.com dns8.parkpage.foundationapi.com OK com United States 1.3392E+10
Suggesting single awk script to process all the data:
Staring with this:
script.awk
BEGIN{FS="\",\"|\"[[:space:]]*$|^[[:space:]]*\""; OFS=" "}
{
$1=$1; # recalculate fields
# num field start from $2
arr[1] = $3 "+" $4;
arr[2] = $4;
arr[4] = $5;
# right append to arr[4] fields 6-41
for (i = 6; i <= 41; i++) arr[4] = arr[4] "," $i;
# right append to arr[4] fields 46-59
for (i = 46; i <= 59; i++) arr[4] = arr[4] "," $i;
arr[17] = $18;
arr[59 ] = $3;
# in 3rd field remove text after first "."
sub(/\..*$/,"",arr[59]);
# remove all punctuations and digits from 20th field
gsub(/[[:punct:]]|[[:digit:]]*/,"",$20);
arr[60] = $20;
# output to stdout
print arr[1],arr[2],arr[4],arr[59],arr[17],arr[60];
}
Running:
awk -f script.awk input.csv > output.csv
Did not test since the sample data did not contain numeric values.

Remove custom stop words from pandas dataframe not working

I am trying to remove a custom list of stop words, but its not working.
desc = pd.DataFrame(description, columns =['description'])
print(desc)
Which gives the following results
description
188693 The Kentucky Cannabis Company and Bluegrass He...
181535 Ohio County Sheriff
11443 According to new reports from federal authorit...
213919 KANSAS CITY, Mo. (AP)The Chiefs will be withou...
171509 The crew of Insight, WCNY's weekly public affa...
... ...
2732 The Arkansas Supreme Court on Thursday cleared...
183367 Larry Pegram, co-owner of Pure Ohio Wellness, ...
134291 Joe Biden will spend the next five months pres...
239270 Find out where your Texas representatives stan...
246070 SAN TAN VALLEY — Two men have been charged wit...
[9875 rows x 1 columns]
I found the following code here, but it doesn't seem to work
remove_words = ["marijuana", "cannabis", "hemp", "thc", "cbd"]
pat = '|'.join([r'\b{}\b'.format(w) for w in remove_words])
desc.assign(new_desc=desc.replace(dict(string={pat: ''}), regex=True))
Which produces the following results
description new_desc
188693 The Kentucky Cannabis Company and Bluegrass He... The Kentucky Cannabis Company and Bluegrass He...
181535 Ohio County Sheriff Ohio County Sheriff
11443 According to new reports from federal authorit... According to new reports from federal authorit...
213919 KANSAS CITY, Mo. (AP)The Chiefs will be withou... KANSAS CITY, Mo. (AP)The Chiefs will be withou...
171509 The crew of Insight, WCNY's weekly public affa... The crew of Insight, WCNY's weekly public affa...
... ... ...
2732 The Arkansas Supreme Court on Thursday cleared... The Arkansas Supreme Court on Thursday cleared...
183367 Larry Pegram, co-owner of Pure Ohio Wellness, ... Larry Pegram, co-owner of Pure Ohio Wellness, ...
134291 Joe Biden will spend the next five months pres... Joe Biden will spend the next five months pres...
239270 Find out where your Texas representatives stan... Find out where your Texas representatives stan...
246070 SAN TAN VALLEY — Two men have been charged wit... SAN TAN VALLEY — Two men have been charged wit...
9875 rows × 2 columns
As you can see, the stop words weren't removed. Any help you can provide would be greatly appreciated.
Handle the case, simplify pattern,
remove_words = ["marijuana", "cannabis", "hemp", "thc", "cbd"]
pat = '|'.join(remove_words)
desc['new_desc'] = desc.description.str.lower().replace(pat,'', regex=True)
description new_desc
0 The Kentucky Cannabis Company and Bluegrass He... the kentucky company and bluegrass he...
1 Ohio County Sheriff ohio county sheriff
2 According to new reports from federal authorit... according to new reports from federal authorit...
3 KANSAS CITY, Mo. (AP)The Chiefs will be mariju... kansas city, mo. (ap)the chiefs will be witho...
4 The crew of Insight, WCNY's weekly public affa... the crew of insight, wcny's weekly public affa...

check amount of time between different rows of data (time) and date and name of employee

I have a df with this info ['Name', 'Department', 'Date', 'Time', 'Activity'],
so for example looks like this:
Acosta, Hirto 225 West 28th Street 9/18/2019 07:25:00 Punch In
Acosta, Hirto 225 West 28th Street 9/18/2019 11:57:00 Punch Out
Acosta, Hirto 225 West 28th Street 9/18/2019 12:28:00 Punch In
Adams, Juan 225 West 28th Street 9/16/2019 06:57:00 Punch In
Adams, Juan 225 West 28th Street 9/16/2019 12:00:00 Punch Out
Adams, Juan 225 West 28th Street 9/16/2019 12:28:00 Punch In
Adams, Juan 225 West 28th Street 9/16/2019 15:30:00 Punch Out
Adams, Juan 225 West 28th Street 9/18/2019 07:04:00 Punch In
Adams, Juan 225 West 28th Street 9/18/2019 11:57:00 Punch Out
I need to calculate the time between the punch in and the punch out in the same day for the same employee.
i manage to just clean the data
like:
self.raw_data['Time'] = pd.to_datetime(self.raw_data['Time'], format='%H:%M').dt.time
sorted_db = self.raw_data.sort_values(['Name', 'Date'])
sorted_db = sorted_db[['Name', 'Department', 'Date', 'Time', 'Activity']]
any suggestions will be appreciated
so i found the answer of my problem and i wanted to share it.
first a separate the "Punch in" and the "Punch Out" if two columns
def process_info(self):
# filter data and organized --------------------------------------------------------------
self.raw_data['in'] = self.raw_data[self.raw_data['Activity'].str.contains('In')]['Time']
self.raw_data['pre_out'] = self.raw_data[self.raw_data['Activity'].str.contains('Out')]['Time']
after i sort the information base in date and time
sorted_data = self.raw_data.sort_values(['Date', 'Name'])
after that i use the shift function to move on level up the 'out' column so in parallel with the in.
sorted_data['out'] = sorted_data.shift(-1)['Time']
and finally i take out the extra out columns that was created in the first step. but checking if it is by itself.
filtered_data = sorted_data[sorted_data['pre_out'].isnull()]

if username in dataframe 1 is equal to username in dateframe 2 then place the nextcolumn in dataframe 1

dataframe 1 is
View Name member user id
Admin_Case_View Catherine Kear ckear
Admin_IT Atul Dhiwar adhiwar-sa
Admin_IT Costin Bulisache cbulisac
Admin_IT Deepa Gopal SA
Admin_IT Geoff Semonian SA
Admin_IT Glenn Castan SA
Admin_IT Nikhil Manekar nmanekar
Admin_Questions Chaitanya Kondury kkondury
Admin_Questions Geetha Maddala gmaddala
Admin_Questions Kelly Kim jungeunk
Admin_Questions Megan Yeh megany
dataframe 2 is
Case Owner Alias Owner Region
cbulisac Other
aandiapp India
gmaddala North America
abarak Europe
abell Europe
nmanekar India
abhghos India
kkondury India
abhishuk India
acai China
megany North America
adasari India
adhiwar-sa North America
here if username in dataframe 1 is equal to username in dataframe 2 then place the region in dataframe 1.
output should be :-
View Name member user id region
Admin_Case_View Catherine Kear ckear
Admin_IT Atul Dhiwar adhiwar-sa North America
Admin_IT Costin Bulisache cbulisac Other
Admin_IT Deepa Gopal SA
Admin_IT Geoff Semonian SA
Admin_IT Glenn Castan SA
Admin_IT Nikhil Manekar nmanekar India
Admin_Questions Chaitanya Kondury kkondury india
Admin_Questions Geetha Maddala gmaddala North America
Admin_Questions Kelly Kim jungeunk Europe
Admin_Questions Megan Yeh adhiwar-sa North America
Try this you just need merge,
df3=pd.merge(df1,df2,left_on=['user id'],right_on=['Case Owner Alias'],how='left').rename(columns={'Owner Region':'region'}).drop('Case Owner Alias',1).fillna('')
O/P:
View Name member user id region
0 Admin_Case_View Catherine Kear ckear
1 Admin_IT Atul Dhiwar adhiwar-sa North America
2 Admin_IT Costin Bulisache cbulisac Other
3 Admin_IT Deepa Gopal SA
4 Admin_IT Geoff Semonian SA
5 Admin_IT Glenn Castan SA
6 Admin_IT Nikhil Manekar nmanekar India
7 Admin_Questions Chaitanya Kondury kkondury India
8 Admin_Questions Geetha Maddala gmaddala North America
9 Admin_Questions Kelly Kim jungeunk
10 Admin_Questions Megan Yeh megany North America
Note: Map is not advisable when you have a large Dataframe.

PDF and report image files to table

I am using Spotfire and I have hundreds of print image reports that a client receives daily. I would like to extract the data in these reports into table form and combine them together into a single table so that we can do analysis on the entire time period.
The example of the report is as follows
$CL04303 UNCLAIMED MAIL REPORT 02122015
PAGE: 1
UNCLAIMED MAIL REPORT
TRANSACTION DATE = 02/11/2015
POLICY NUMBER: ABC80230 CLIENT NAME: Andrew auditid: H8G3J1AY
PRIOR ADDRESS:
428 SANDOVAL ST
SANTA FE NM 87501-7312
NEW ADDRESS:
1583 PACHECO ST STE B
POLICY NUMBER: XYZ05720 CLIENT NAME: Mike auditid: H8G3HIZE
PRIOR ADDRESS:
6047 WEST END BLVD
NEW ORLEANS LA 70124-1933
NEW ADDRESS:
6047 WEST END BLVD
NEW ORLEANS LA 70124-1933
POLICY NUMBER: KJU10110 CLIENT NAME: TOM auditid: H8G3HIZE
PRIOR ADDRESS:
6047 WEST END BLVD
NEW ORLEANS LA 70124-1933
NEW ADDRESS:
6047 WEST END BLVD
NEW ORLEANS LA 70124-1933
page 1
INDIVIDUAL POLICY SERVICES PAGE: 2
DALLAS SERVICE CENTER
UNCLAIMED MAIL REPORT
TRANSACTION DATE = 02/11/2015
POLICY NUMBER: LIP60004 CLIENT NAME: Eric auditid: H8G3HIZE
PRIOR ADDRESS:
6047 WEST END BLVD
NEW ORLEANS LA 70124-1933
NEW ADDRESS:
6047 WEST END BLVD
NEW ORLEANS LA 70124-1933
POLICY NUMBER: PYT04785 CLIENT NAME: Linda auditid: H8G3HIZE
PRIOR ADDRESS:
6047 WEST END BLVD
NEW ORLEANS LA 70124-1933
NEW ADDRESS:
6047 WEST END BLVD
NEW ORLEANS LA 70124-1933
I want the resulting data table to look like this:
Policy_Number Audit_Trail Prior_Address_01 Prior_Address_02 Prior_Address_03 Prior_Address_04 Prior_Address_05 New_Address_01 New_Address_02 New_Address_03 New_Address_04 New_Address_05
KYT24045 JAYANT JUYHIZE 19 GREENGROVE AVE INDIANAPOLIS IN 46234-2722 72 WEIR LAKE RD SOMERS NY 10589-1735
KYT63030 MARYNETTA JUYJJ6A 19 GREENGROVE AVE INDIANAPOLIS IN 46234-2722 72 WEIR LAKE RD SOMERS NY 10589-1736
KYT63051 MARYNETTA JUYJJ6A 858 W 83RD ST BRONX NY 10457-2713 858 W 83RD ST SOMERS NY 10589-1737
KYT65454 MARYNETTA JUYJJ6A 858 W 83RD ST BRONX NY 10457-2713 858 W 83RD ST NEW ORLEANS LA 70124-1933
KYT73439 MARYNETTA JUYJJ6A 858 W 83RD ST BRONX NY 10457-2713 858 W 83RD ST NEW ORLEANS LA 70124-1934
KYT87866 KAREN JUYJ1AY 858 W 83RD ST BRONX NY 10457-2713 858 W 83RD ST NEW ORLEANS LA 70124-1935
KYT03747 CRYSTAL JUYJX4N 35 JOHNSON RD APT 70 YONKERS NY 10705-2422 165 BLOOMFIELD DR APT L5 NEW ORLEANS LA 70124-1936
KYT01138 ROSALINDA JUYJ5IX 119 JOSEPHINE ST YONKERS NY 10705-2422 119 JOSEPHINE ST NEW ORLEANS LA 70113-1428
KYT43124 ROSALINDA JUYJ5IX 4724 VICTORY BLVD APT 202 YONKERS NY 10705-2422 3316 GOLDMEDAL AVE SANTA FE NM 87501-7312
KYT46829 LINDA JUYJ1AY 4724 VICTORY BLVD APT 202 CHICAGO IL 60621-1129 3316 GOLDMEDAL AVE SANTA FE NM 87501-7313
KYT44940 LINDA JUYJ1AY 8 CREEKVIEW LN CHICAGO IL 60621-1129 620 ARRINGTON RD SANTA FE NM 87501-7314
KYT44946 LINDA JUYJ1AY 8 CREEKVIEW LN CHICAGO IL 60621-1129 620 ARRINGTON RD SANTA FE NM 87501-7315
Any suggestions on how this can be done in Spotfire or any other method?
Thanks!
That's not a job for Spotfire, but for an OCR tool. From the OCR tooling, some form of textmining to get everything in the right shape and maybe then it can be used for reporting in Spotfire.

Resources