I get dates on the site for some events:
>>> parse(event.find_element_by_xpath('../td[#data-dt]').get_attribute('data-dt'))
datetime.datetime(2019, 11, 26, 19, 15, tzinfo=<StaticTzInfo 'Z'>)
How can I convert this time to a local time zone, so that I can count down to the start of the event?
I found a solution:
from tzlocal import get_localzone
parse(event.find_element_by_xpath('../td[#data-dt]').get_attribute('data-dt')).astimezone(get_localzone())
Related
I have a very simple timestamp I need to parse:
10/2/2020 3:19:42 PM (UTC-7)
But using python 3.6, when I try to parse this, I get the following:
>>> datetime.strptime('10/2/2020 3:19:42 PM (UTC-7)', '%m/%d/%Y %I:%M:%S %p (%Z%z)')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "_strptime.py", line 565, in _strptime_datetime
tt, fraction = _strptime(data_string, format)
File "_strptime.py", line 362, in _strptime
(data_string, format))
ValueError: time data '10/2/2020 3:19:42 PM (UTC-7)' does not match format '%m/%d/%Y %I:%M:%S %p (%Z%z)'
I have tried dateutil.parser, as well as several variations of the format string. The piece that's tripping up strptime is the (UTC-7) portion.
Is the string format wrong? How can I parse this string and receive the timezone information as well? Any help is appreciated.
Edit: If the string is (UTC-0700) then the parsing works. But I cannot control how the timestamps are being formatted, is there a way to parse them in their current format (UTC-7)?
Ah, it turned out to be quite silly:
>>> import dateutil
>>> dateutil.parser.parse(dt, fuzzy=True)
datetime.datetime(2020, 10, 2, 15, 19, 42, tzinfo=tzoffset(None, 25200))
Should have used fuzzy logic before. :-)
EDIT: The above does NOT work (thanks to #wim for pointing it out) - Fuzzy flag is ignoring the sign of the offset string.
Here is code that works:
>>> from datetime import datetime
>>> import re
>>> dt = '10/2/2020 3:19:42 PM (UTC-7)'
>>> sign, offset = re.search('\(UTC([+-])(\d+)\)', dt).groups()
>>> offset = f"0{offset}00" if len(offset) == 1 else f"{offset}00"
>>> dt = re.sub(r'\(UTC.\d+\)', f'(UTC{sign}{offset})', dt)
>>> datetime.strptime(dt, '%m/%d/%Y %I:%M:%S %p (%Z%z)')
datetime.datetime(2020, 10, 2, 15, 19, 42, tzinfo=datetime.timezone(datetime.timedelta(-1, 61200), 'UTC'))
I need to add onehour to the currenttime and subtract it with the minutes, e
For example:current time = 7:31,addedhour = 7:31 + 1 hour = 8:31,required time = 8:31 - 31 = 8:00
Any help or a workaround will be greatly appreciated.
from datetime import datetime, timedelta
import time
addedtime = (datetime.now() + timedelta(hours=1)).strftime('%H:%M')
requiredtime = addedtime - timedelta(now.minutes).strftime('%H:%M')
You're setting addedtime to a string rather than a datetime, then getting into trouble because you're trying to subtract a timedelta from that string:
>>> addedtime = (datetime.now() + timedelta(hours=1)).strftime('%H:%M')
>>> addedtime
'23:30'
>>> addedtime - timedelta(minutes=4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for -: 'str' and 'datetime.timedelta'
Instead, keep them as timepoints for as long as you need to manipulate them as timepoints, converting them to a string when you need the final result:
>>> time1 = datetime.now()
>>> time1
datetime.datetime(2019, 10, 17, 22, 23, 55, 860195)
>>> time2 = time1 + timedelta(hours=1)
>>> time2
datetime.datetime(2019, 10, 17, 23, 23, 55, 860195)
>>> time3 = time2 - timedelta(minutes=time2.minute)
>>> time3
datetime.datetime(2019, 10, 17, 23, 0, 55, 860195)
>>> time3.strftime("%H:%M")
'23:00'
Of course, you can also do it as a single operation since you can both add one hour and subtract some minutes with a single timedelta:
>>> final = (time1 + timedelta(hours=1, minutes=-time1.minute)).strftime("%H:%M")
>>> final
'23:00'
Why not explore one of Python's many amazing datetime libraries ...
pip install parsedatetime
import parsedatetime as pdt
from datetime import datetime
if __name__ == '__main__':
cal = pdt.Calendar()
dt, result = cal.parse("10 minutes before an hour from now")
print(datetime(*dt[:6]))
I'm trying to connect to an API but was having trouble figure out the right datetime format. The document says it follows this format: YYYY-MM-DDThh:mm:ss.000Z
Example: "2019-03-07T10:30:00-0400"
I can only get it working if use datetime.datetime.now().isoformat()
But if I want to try a specific time like this datetime.datetime(2019, 8, 18, 12, 0, 0).isoformat(), it'd give me 500 error: 'message': 'Internal server error', 'type': 'INTEGRATION_FAILURE', 'statusCode': '500'.
What datetime format should I use in this case?
Try using .strftime
Ex:
import datetime
print(datetime.datetime(2019, 8, 18, 12, 0, 0).strftime("%Y-%m-%dT%H:%M:%S.%fZ")) #or "%Y-%m-%dT%H:%M:%S.%f"
#--> 2019-08-18T12:00:00.000000Z
I m trying to save scraped data in db but got stuck,
first I have saved scraped data in csv file and using glob library to find newest csv and upload data of that csv into db-
I m not sure what i m doing wrong here plase find code and error
i have created table yahoo_data in db with same column name as that of csv and my code output
import scrapy
from scrapy.http import Request
import MySQLdb
import os
import csv
import glob
class YahooScrapperSpider(scrapy.Spider):
name = 'yahoo_scrapper'
allowed_domains = ['in.news.yahoo.com']
start_urls = ['http://in.news.yahoo.com/']
def parse(self, response):
news_url=response.xpath('//*[#class="Mb(5px)"]/a/#href').extract()
for url in news_url:
absolute_url=response.urljoin(url)
yield Request (absolute_url,callback=self.parse_text)
def parse_text(self,response):
Title=response.xpath('//meta[contains(#name,"twitter:title")]/#content').extract_first()
# response.xpath('//*[#name="twitter:title"]/#content').extract_first(),this also works
Article= response.xpath('//*[#class="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm Mt(0.8em)--sm"]/text()').extract()
yield {'Title':Title,
'Article':Article}
def close(self, reason):
csv_file = max(glob.iglob('*.csv'), key=os.path.getctime)
mydb = MySQLdb.connect(host='localhost',
user='root',
passwd='prasun',
db='books')
cursor = mydb.cursor()
csv_data = csv.reader(csv_file)
row_count = 0
for row in csv_data:
if row_count != 0:
cursor.execute('INSERT IGNORE INTO yahoo_data (Title,Article) VALUES(%s, %s)', row)
row_count += 1
mydb.commit()
cursor.close()
gettting this error
ana. It should be directed not to disrespect the Sikh community and hurt its sentiments by passing such arbitrary and uncalled for orders," said Badal.', 'The SAD president also "brought it to the notice of the Haryana chief minister that Article 25 of the constitution safeguarded the rights of all citizens to profess and practices the tenets of their faith."', '"Keeping these facts in view I request you to direct the Haryana Public Service Commission to rescind its notification and allow Sikhs as well as candidates belonging to other religions to sport symbols of their faith during all examinations," said Badal. (ANI)']}
2019-04-01 16:49:41 [scrapy.core.engine] INFO: Closing spider (finished)
2019-04-01 16:49:41 [scrapy.extensions.feedexport] INFO: Stored csv feed (25 items) in: items.csv
2019-04-01 16:49:41 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method YahooScrapperSpider.close of <YahooScrapperSpider 'yahoo_scrapper' at 0x2c60f07bac8>>
Traceback (most recent call last):
File "C:\Users\prasun.j\AppData\Local\Continuum\anaconda3\lib\site-packages\MySQLdb\cursors.py", line 201, in execute
query = query % args
TypeError: not enough arguments for format string
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\prasun.j\AppData\Local\Continuum\anaconda3\lib\site-packages\twisted\internet\defer.py", line 151, in maybeDeferred
result = f(*args, **kw)
File "C:\Users\prasun.j\AppData\Local\Continuum\anaconda3\lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "C:\Users\prasun.j\Desktop\scrapping\scrapping\spiders\yahoo_scrapper.py", line 44, in close
cursor.execute('INSERT IGNORE INTO yahoo_data (Title,Article) VALUES(%s, %s)', row)
File "C:\Users\prasun.j\AppData\Local\Continuum\anaconda3\lib\site-packages\MySQLdb\cursors.py", line 203, in execute
raise ProgrammingError(str(m))
MySQLdb._exceptions.ProgrammingError: not enough arguments for format string
2019-04-01 16:49:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 7985,
'downloader/request_count': 27,
'downloader/request_method_count/GET': 27,
'downloader/response_bytes': 2148049,
'downloader/response_count': 27,
'downloader/response_status_count/200': 26,
'downloader/response_status_count/301': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2019, 4, 1, 11, 19, 41, 350717),
'item_scraped_count': 25,
'log_count/DEBUG': 53,
'log_count/ERROR': 1,
'log_count/INFO': 8,
'request_depth_max': 1,
'response_received_count': 26,
'scheduler/dequeued': 27,
'scheduler/dequeued/memory': 27,
'scheduler/enqueued': 27,
'scheduler/enqueued/memory': 27,
'start_time': datetime.datetime(2019, 4, 1, 11, 19, 36, 743594)}
2019-04-01 16:49:41 [scrapy.core.engine] INFO: Spider closed (finished)
This error
MySQLdb._exceptions.ProgrammingError: not enough arguments for format string
seems motivated by the lack of a sufficient number of arguments in the row you passed.
You can try to print the row, to understand what is going wrong.
Anyway, if you want to save scraped data to DB, I suggest to write a simple item pipeline, which exports data to DB, without passing through CSV.
For further information abuot item pipelines, see http://doc.scrapy.org/en/latest/topics/item-pipeline.html#topics-item-pipeline
You can found a useful example at Writing items to a MySQL database in Scrapy
seems like you are passing list to the parameters that need to be mentioned by the comma
try to add asterix to 'row' var:
cursor.execute('INSERT IGNORE INTO yahoo_data (Title,Article) VALUES(%s, %s)', row)
to:
cursor.execute('INSERT IGNORE INTO yahoo_data (Title,Article) VALUES(%s, %s)', *row)
I'm using Python 3.6.2.
I've learnt from this question how to convert between the standard datetime type to np.datetime64 type, as follows.
dt = datetime.now()
print(dt)
print(np.datetime64(dt))
Output:
2017-12-19 17:20:12.743969
2017-12-19T17:20:12.743969
But I can't convert an iterable of standard datetime objects into a Numpy array. The following code ...
np.fromiter([dt], dtype=np.datetime64)
... gives the following error.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-14-46e4618bda89> in <module>()
----> 1 np.fromiter([dt], dtype=np.datetime64)
TypeError: Cannot cast datetime.datetime object from metadata [us] to according to the rule 'same_kind'
However, using np.asarray() works.
np.asarray([dt])
Output:
array([datetime.datetime(2017, 12, 19, 17, 20, 12, 743969)], dtype=object)
Might this be a bug with either np.fromiter() or np.datetime64?
It may just be a matter of setting the datetime units:
In [368]: dt = datetime.now()
In [369]: dt
Out[369]: datetime.datetime(2017, 12, 19, 12, 48, 45, 143287)
Default action for np.array (don't really need fromiter with a list) is to create an object dtype array:
In [370]: np.array([dt,dt])
Out[370]:
array([datetime.datetime(2017, 12, 19, 12, 48, 45, 143287),
datetime.datetime(2017, 12, 19, 12, 48, 45, 143287)], dtype=object)
Looks like plain 'datetime64' produces days:
In [371]: np.array([dt,dt], dtype='datetime64')
Out[371]: array(['2017-12-19', '2017-12-19'], dtype='datetime64[D]')
and specifying the units:
In [373]: np.array([dt,dt], dtype='datetime64[m]')
Out[373]: array(['2017-12-19T12:48', '2017-12-19T12:48'], dtype='datetime64[m]')
This also works with fromiter.
In [374]: np.fromiter([dt,dt], dtype='datetime64[m]')
Out[374]: array(['2017-12-19T12:48', '2017-12-19T12:48'], dtype='datetime64[m]')
In [384]: x= np.fromiter([dt,dt], dtype='M8[us]')
In [385]: x
Out[385]: array(['2017-12-19T12:48:45.143287', '2017-12-19T12:48:45.143287'], dtype='datetime64[us]')
I've learned to use the string name of the datetime64, which allows me to specify the units, rather than the most generic np.datetime64.