paypal is giving me this format '03:00:00 Mar 14, 2023 PDT'. I've tried different solutions but i can't get with the right one. How can i separate each item into a string using python?
Assuming you want a datetime object
from datetime import datetime
time_string = "03:00:00 Mar 14, 2023 PDT"
d = datetime.strptime(time_string[:-4], "%H:%M:%S %b %d, %Y")
or
import parsedatetime as pdt # $ pip3 install parsedatetime
cal = pdt.Calendar()
time_string = "03:00:00 Mar 14, 2023 PDT"
d = cal.parseDT(time_string)[0]
Once you have your datetime, output the part(s) you want using strftime
I got the regex pattern of 15 July - 3 September 2022 as
[\d]{1,2} [ADFJMNOS]\w* [\-] [\d]{1,2} [ADFJMNOS]\w* [\d]{4}
My doubts are
What will be the regex pattern if the date is not on a single line
example
15 July - 3 September
2022
22
July
Desired Output
15 July - 3 September 2022
22 July
Also if it is seen as a part of another word
example
delayed15 July – 3 Septemer 2022
Here it is attached with the word "delayed". The word can be anything.
Desired Output
15 July – 3 Septemer 2022
Code i am trying
from selenium import webdriver
from bs4 import BeautifulSoup
import re
url_list = ['https://alabamasymphony.org/event/bachmozart']
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.3"
}
driver = webdriver.Chrome('/home/ubuntu/selenium_drivers/chromedriver')
format_list = ["[\d]{1,2} [ADFJMNOS]\w* [\-] [\d]{1,2} [ADFJMNOS]\w* [\d]{4}"]
for URL in url_list:
date = []
driver.get(URL)
driver.implicitly_wait(2)
data = driver.page_source
cleantext = BeautifulSoup(data, "lxml").text
cleanr = re.compile('<.*?>')
x = re.sub(cleanr, ' ', cleantext)
print(URL)
for pattern in format_list:
all_dates = re.findall(pattern, x)
if all_dates == []:
continue
else:
date.append(all_dates)
for s in date:
print(s)
You can try this regex (Demo):
(\d{1,2})\s*([A-Za-z]+)[ \t]*(?:(-\s*\d{1,2}\s*[A-Za-z]+)\s+(\d{4}))?
The idea is to break down the date and text in 4 groups and print those as needed by you.
source code (run here) :
import re
regex = r"(\d{1,2})\s*([A-Za-z]+)[ \t]*(?:(-\s*\d{1,2}\s*[A-Za-z]+)\s+(\d{4}))?"
test_str = ("15 July - 3 September 2022 as\n\n\n"
"15 July - 3 September \n"
"2022\n"
"22\n"
"July\n\n\n\n"
" Also if it is seen as a part of another word\n\n"
"example\n\n"
"delayed15 July - 3 Septemer 2022\n\n"
"Here it is attached with the word \"delayed\". The word can be anything.\n\n\n")
matches = re.finditer(regex, test_str, re.MULTILINE)
for match in matches:
if match.group(3) is not None :
print(match.group(1)+" "+match.group(2)+match.group(3)+" "+match.group(4))
else:
print(match.group(1)+" "+match.group(2))
The below code in python 3 returns as
fro val in A:
print (val)
Returns:
[(b'3 (RFC822 {821}', b'MIME-Version: 1.0\r\nDate: Sun, 2 Feb 2020
22:12:19 +0530\r\nMessage-ID:
\r\nSubject:
code\r\nFrom: abc >\r\nTo: abc \r\nContent-Type:
multipart/alternative;
boundary="43434343"\r\n\r\n--0000000000008ecb2e059d9a7dfe\r\nContent-Type:
text/plain; charset="UTF-8"\r\n\r\n1. 4549 3867 6. 1755 6816\r\n2.
3068 0287 7. 8557 7000\r\n3. 3827 1727 8. 4177 1609\r\n4. 5093 4909 9.
9799 3366\r\n5. 1069 7992 10. 5141
2029\r\n\r\n--0000000000008ecb2e059d9a7dfe\r\nContent-Type: text/html;
charset="UTF-8"\r\n\r\ntest
code\r\n\r\n--0000000000008ecb2e059d9a7dfe--'), b')']
whereas for
for val in A:
for v in val:
print (v)
returns:
b'3 (RFC822 {821}' b'MIME-Version: 1.0\r\nDate: Sun, 2 Feb 2020
22:12:19 +0530\r\nMessage-ID:
\r\nSubject:
code\r\nFrom: >\r\nTo: \r\nContent-Type: multipart/alternative;
boundary="0000000000008ecb2e059d9a7dfe"\r\n\r\n--0000000000008ecb2e059d9a7dfe\r\nContent-Type:
text/plain; charset="UTF-8"\r\n\r\n1. 4549 3867 6. 1755 6816\r\n2.
3068 0287 7. 8557 7000\r\n3. 3827 1727 8. 4177 1609\r\n4. 5093 4909 9.
9799 3366\r\n5. 1069 7992 10. 5141
2029\r\n\r\n--0000000000008ecb2e059d9a7dfe\r\nContent-Type: text/html;
charset="UTF-8"\r\n\r\n\r\n\r\n--0000000000008ecb2e059d9a7dfe--' 41
I dont understand why i am getting ASCII Value of ')' i.e 41 and how can i just read it as ')'
It would appear that when you iterate of a bytes object, it yields a decimal value for each byte in it:
eg:
>>> phrase = b'Hello, world!'
>>> chars = list(phrase)
>>> chars
[72, 101, 108, 108, 111, 44, 32, 119, 111, 114, 108, 100, 33]
With your example, you only have one character, so it's only one number that's printed. The reason none of the other bytes strings are printed as integers is because they're incased in an extra tuple, so instead it's the tuple that's iterated over.
Fixing your code:
A = [
(
b'3 (RFC822 {821}',
b'MIME-Version: 1.0\r\nDate: Sun, 2 Feb 2020 22:12:19 +0530\r\nMessage-ID: \r\nSubject: code\r\nFrom: abc >\r\nTo: abc \r\nContentType: multipart/alternative; boundary="43434343"\r\n\r\n--0000000000008ecb2e059d9a7dfe\r\nContent-Type: text/plain; charset="UTF-8"\r\n\r\n1. 4549 3867 6. 1755 6816\r\n2. 3068 0287 7. 8557 7000\r\n3. 3827 1727 8. 4177 1609\r\n4. 5093 4909 9. 9799 3366\r\n5. 1069 7992 10. 5141 2029\r\n\r\n--0000000000008ecb2e059d9a7dfe\r\nContent-Type: text/html; charset="UTF-8"\r\n\r\ntest code\r\n\r\n--0000000000008ecb2e059d9a7dfe--'
),
b')'
]
for val in A:
if isinstance(val, bytes):
print(val)
else:
for v in val:
print (v)
Output:
b'3 (RFC822 {821}'
b'MIME-Version: 1.0\r\nDate: Sun, 2 Feb 2020 22:12:19 +0530\r\nMessage-ID: \r\nSubject: code\r\nFrom: abc >\r\nTo: abc \r\nContentType: multipart/alternative; boundary="43434343"\r\n\r\n--0000000000008ecb2e059d9a7dfe\r\nContent-Type: text/plain; charset="UTF-8"\r\n\r\n1. 4549 3867 6. 1755 6816\r\n2. 3068 0287 7. 8557 7000\r\n3. 3827 1727 8. 4177 1609\r\n4. 5093 4909 9. 9799 3366\r\n5. 1069 7992 10. 5141 2029\r\n\r\n--0000000000008ecb2e059d9a7dfe\r\nContent-Type: text/html; charset="UTF-8"\r\n\r\ntest code\r\n\r\n--0000000000008ecb2e059d9a7dfe--'
b')'
Basically, if the object that we've encountered in A is a bytes-object, we print it as it is. Otherwise (eg if it's a tuple), we iterate over it, and print its items.
def post_driver(l):
def id_generator(size = 6, chars = string.ascii_uppercase + string.digits):
return ''.join(random.choice(chars) for _ in range(size))
data = {"startLocation":{"address":{"fullAddress":"\n, ","fullAddressInLine":", , ","line1":"10407 Water Hyacinth Dr","city":"Orlando","state":"FL","zip":"32825"},"latitude":28.52144,"longitude":-81.23301},"endLocation":{"address":{"fullAddress":"\n, ","fullAddressInLine":", , ","line1":"1434 N Alafaya Trail","city":"Orlando","state":"FL","zip":"32819"},"latitude":28.52144,"longitude":-81.23301},"user":{"firstName":"Test" + id_generator(),"lastName":"doe","Username":"","Password":"","userId":"","role":"","email":"zbaker#productivityapex.com","username":"Test" + id_generator() + "","newPassword":"test"},"vehicle":{"vehicleNumber":"" + id_generator(),"licensePlate":"3216549877","maxCapacity":200000,"maxCapacityUnits":"units","costPerMile":3500,"fixedCost":35000,"costPerHour":35,"odometer":0,"tags":[{"text":"vehicle"}],"constraintTags":[],"id":"59fcca46520a6e2bb4a397ed"},"earliestStartTimeDate":"Mon Nov 13 2017 07:00:00 GMT-0500 (Eastern Standard Time)","restBreakDurationDate":"Mon Nov 13 2017 01:00:00 GMT-0500 (Eastern Standard Time)","maximumWorkingHours":9,"maximumDrivingHours":8,"fixedCost":0,"costPerMile":0,"costPerHour":0,"restBreakWindows":[{"startTime":"2017-11-03T19:30:35.275Z","endTime":"2017-11-04T05:00:35.275Z"}],"color":"#1c7c11","tags":[{"text":"test"}],"driverNumber":"" + id_generator(),"phoneNumber":"7894651231","earliestStartTime":"07:00:00","restBreakDuration":"01:00:00","customFields":{}}
# json_str = json.dumps(data)
r = l.client.post('/api/v1/drivers', data=str(data))
# content = r.json()
# return content["id"]
I am currently having issues with posting the data variable and having it succeed in my locust file. I've done this with other locust files with similar formats, but it's failing every time on this particular request... can someone please help me?
I am working with Census data and I need to combine four character columns into a single column.
Example:
LOGRECNO STATE COUNTY TRACT BLOCK
60 01 001 021100 1053
61 01 001 021100 1054
62 01 001 021100 1055
63 01 001 021100 1056
64 01 001 021100 1057
65 01 001 021100 1058
I want to create a new column that adds the strings of STATE, COUNTY, TRACT, and BLOCK together into a single string. Example:
LOGRECNO STATE COUNTY TRACT BLOCK BLOCKID
60 01 001 021100 1053 01001021101053
61 01 001 021100 1054 01001021101054
62 01 001 021100 1055 01001021101055
63 01 001 021100 1056 01001021101056
64 01 001 021100 1057 01001021101057
65 01 001 021100 1058 01001021101058
I've tried:
AL_Blocks$BLOCK_ID<- paste(c(AL_Blocks$STATE, AL_Blocks$County, AL_Blocks$TRACT, AL_Blocks$BLOCK), collapse = "")
But this combines all rows of all four columns into a single string.
Try this:
AL_Blocks$BLOCK_ID<- with(AL_Blocks, paste0(STATE, COUNTY, TRACT, BLOCK))
there was a typo in County... it should've been COUNTY. Also, you don't need the collapse parameter.
I hope that helps.
You can use do.call and paste0. Try:
AL_Blocks$BLOCK_ID <- do.call(paste0, AL_Block[c("STATE", "COUNTY", "TRACT", "BLOCK")])
Example output:
do.call(paste0, AL_Blocks[c("STATE", "COUNTY", "TRACT", "BLOCK")])
# [1] "010010211001053" "010010211001054" "010010211001055" "010010211001056"
# [5] "010010211001057" "010010211001058"
do.call(paste0, AL_Blocks[2:5])
# [1] "010010211001053" "010010211001054" "010010211001055" "010010211001056"
# [5] "010010211001057" "010010211001058"
You can also use unite from "tidyr", like this:
library(tidyr)
library(dplyr)
AL_Blocks %>%
unite(BLOCK_ID, STATE, COUNTY, TRACT, BLOCK, sep = "", remove = FALSE)
# LOGRECNO BLOCK_ID STATE COUNTY TRACT BLOCK
# 1 60 010010211001053 01 001 021100 1053
# 2 61 010010211001054 01 001 021100 1054
# 3 62 010010211001055 01 001 021100 1055
# 4 63 010010211001056 01 001 021100 1056
# 5 64 010010211001057 01 001 021100 1057
# 6 65 010010211001058 01 001 021100 1058
where "AL_Blocks" is provided as:
AL_Blocks <- structure(list(LOGRECNO = c("60", "61", "62", "63", "64", "65"),
STATE = c("01", "01", "01", "01", "01", "01"), COUNTY = c("001", "001",
"001", "001", "001", "001"), TRACT = c("021100", "021100", "021100",
"021100", "021100", "021100"), BLOCK = c("1053", "1054", "1055", "1056",
"1057", "1058")), .Names = c("LOGRECNO", "STATE", "COUNTY", "TRACT",
"BLOCK"), class = "data.frame", row.names = c(NA, -6L))
You can try this too
AL_Blocks <- transform(All_Blocks, BLOCKID = paste(STATE,COUNTY,
TRACT, BLOCK, sep = "")
Or try this
DF$BLOCKID <-
paste(DF$LOGRECNO, DF$STATE, DF$COUNTY,
DF$TRACT, DF$BLOCK, sep = "")
(Here is a method to set up the dataframe for people coming into this discussion later)
DF <-
data.frame(LOGRECNO = c(60, 61, 62, 63, 64, 65),
STATE = c(1, 1, 1, 1, 1, 1),
COUNTY = c(1, 1, 1, 1, 1, 1),
TRACT = c(21100, 21100, 21100, 21100, 21100, 21100),
BLOCK = c(1053, 1054, 1055, 1056, 1057, 1058))
You can use tidyverse package:
DF %>% unite(new_var, STATE, COUNTY, TRACT, BLOCK)
The new kid on the block is the glue package:
library(glue)
my_data %>%
glue::glue("{STATE}{COUNTY}{TRACT}{BLOCK}")
You can both WRITE and READ Text files with any specified "string-separator", not necessarily a character separator. This is very useful in many cases when the data has practically all terminal symbols, and thus, no 1 symbol can be used as a separator. Here are examples of read and write functions:
WRITE OUT Special Separator Text:
writeSepText <- function(df, fileName, separator) {
con <- file(fileName)
data <- apply(df, 1, paste, collapse = separator)
# data
data <- writeLines(data, con)
close(con)
return
}
Test Writing out text file separated by a string "bra_break_ket"
writeSepText(df=as.data.frame(Titanic), fileName="/Users/user/break_sep.txt", separator="<break>")
READ In text files with special separator string
readSepText <- function(fileName, separator) {
data <- readLines(con <- file(fileName))
close(con)
records <- sapply(data, strsplit, split=separator)
dataFrame <- data.frame(t(sapply(records,c)))
rownames(dataFrame) <- 1: nrow(dataFrame)
return(as.data.frame(dataFrame,stringsAsFactors = FALSE))
}
Test Reading in text file separated by
df <- readSepText(fileName="/Users/user/break_sep.txt", separator="<break>"); df