Extract dates from text with spaCy in relation to a given date - python-3.x

I want to extract dates, given in text form like 'next week' or 'February' from a news article, given the date the article was published. I.e. if the article was published on Feb 13 2019 and 'next week' was mentioned in that article, I want the function to find Feb 20 2019 for 'next week'. Does anybody know how to do that? I was thinking of doing it with spaCy's entity finder and then manually writing a function for every 'DATE' instance, but there must be something better.
Here is my example:
text = """Chancellor Angela Merkel and some of her ministers will
discuss at a cabinet retreat next week ways to avert driving
bans in major cities after Germany's top administrative court
in February allowed local authorities to bar heavily polluting
diesel cars."""
article_date = '2019-02-13'
My ideal result would be something as the following:
ref_dates = {'next_week': '2019-02-20',
'february': '2019-02-01'}

With SUTime from CoreNLP this can be done quite easily:
https://github.com/FraBle/python-sutime

Related

Netsuite: Creating Usage Report by Period

I am trying to create a saved search that gives the quantity/sales amount in NetSuite for the previous months to track usage on an item, customer and sales rep level. I have somewhat of a solution however the way my formula is set up it is going off of the system date rather than the whole previous month(s). For example, if I ran the search today it would give me the usage from today (Feb 7) back to Jan 7, Dec 7 and Nov 7. But I know NetSuite has 'Period' as a date function and struggling to incorporate that instead.
My formula is below for an example.
NULLIF(CASE WHEN {trandate} > ADD_MONTHS(SYSDATE,-1) THEN CASE WHEN {type} = 'Invoice' THEN NVL({amount},0)*1 ELSE 0 END ELSE 0 END,0)
ADD_MONTHS(SYSDATE,-1) <---I change this to -2 and -3 to show previous 2 and 3 months as well
I am attempting to avoid doing 12 of these as a CASE WHEN posting period = P1 2021 THEN amount etc etc so we can have more of a rolling sales/quantity search.
Does not seem like NetSuite likes having to pull a point in time value, it likes to base searches/reports from the time it is run. But that could be being new'ish to NetSuite as we converted a few months ago.
Output of the saved search
Results with grouping and summarizing
TRUNC(ADD_MONTHS(SYSDATE,-2),'Month')-1

Amazon Quicksight Dynamic date filter

so everyday I receive sales data from the previous day. So today November 15 I have data from July 2021 until November 14 2021. What I want is to show this data for the current month by aggregating by day. I use a quicksight visual with a MTD (Month To Date) filter. Everything is fine so far.
The problem is on each first date of the month, I see "No Data" on my visual which is normal since I do not have any data from the current day/month but as I said earlier from the day before.
So what I want to achieve is:
Each 1st of current month: show data from the whole previous month
From 2nd to last day of current month: show data from the current month
Can someone help me please to know how I can achieve this?
I looked for ways to do this and I found dynamic default parameters but this option is not fine with me since I have to fill a username column according to the documentation (https://docs.aws.amazon.com/quicksight/latest/user/parameters-set-up.html) and I have many users so it will be not interesting to list all of them.
You can assign parameters to a group rather than a specific user which is much quicker
There is new functionality which allows you to set today or beginning/ending of month/quater/year as default.
See screenshot:
enter image description here

Named Entity Extraction of dates

I am absolutely new to the NER and Extraction and programming in general. I am trying to figure out a way where I can extract due dates and start date of certain documents. Is there a way to do this? A place where I can start? I have been looking around but the problem I run into is the same. Can extract dates but not whether the date is due or post. If it only has 1 date, is it post or due. Stuff like that. Any help would be appreciated.
Example:
"Essay on Medieval Asia was due on September 3rd."
"Your last assignment that was given on April 6th was supposed to be submitted in 10 days."
"The bid is due no later than a month from the date it was posted(today)."
The amount of possibilities to express dates in free text is huge. There are a few solutions:
You can come with a set of regular expressions and try to parse them for yourself.
Another option is to train a supervised sequence classifier like CRF, if you have a document with dates annotated.
A third option, which can have quick results is to use this framework from Facebook research https://github.com/facebookincubator/duckling, it will identify expressions which are dates or time expressions, and it will even normalise them into a single unique date.
Yet another options is ct-parse, based on Duckling but a pure python package to parse time expressions from natural language in German and English.

How to create tabular output in python

Currently, I'm looking to scrape the signatures table from the edgar filings for specific companies. I have created a Python program to get down into each document and finds the tables that I need to scrape. I'm having trouble figuring out how to output the data to a file in a 'pretty' way.
Here's a link for a bit of a visual (just scroll to the bottom of the document, there will be a page of signatures there):
Example Document
What I'm looking to do is format the table, the same way it is formatted on the website, with each cell taking up a specific amount of space, and filling in unused space with... well, spaces!
My current output:
|Signature, Date, Title|
|/s/ Stanley M. Kuriyama| Chairman of the Board, February 29th, 2016|
|Stanley M. Kuriyama|
|/s/ Christopher J. Benjamin, President, Chief Executive, February 29th, 2016|
|Christopher J. Benjamin, Officer and Director (and so on...)|
|-----------------------------------------------------------------------------|
What I'm looking to do (periods are spaces):
|Signature......................,Title......................,Date...............|
|/s/ Stanley M. Kuriyama,.......,Chairman of the Board,.....,February 29th, 2016|
|Stanley M. Kuriyama............................................................|
|/s/ Christopher J. Benjamin....,President, Chief Executive,February 29th,2016..|
|Christopher J. Benjamin,.......,Officer and Director (and so on...)............|
|-------------------------------------------------------------------------------|
Is there any way to print out the string plus (maxSize -stringSize) number of spaces per cell, so the data looks more tabular? I'm looking to do this with the vanilla Python3, not additional downloads because the people using this program may not be as tech savvy as I am.

View Collation with Couchbase

We are using couchbase as our nosql store and loving it for its capabilities.
There is however an issue that we are running in with creating associations
via view collation. This can be thought of akin to a join operation.
While our data sets are confidential I am illustrating the problem with this model.
The volume of data is considerable so cannot be processed in memory.Lets say we have data on ice-creams, zip-code and average temperature of the day.
One type of document contains a zipcode to icecream mapping
and the other one has transaction data of an ice-cream being sold in a particular zip.
The problem is to be able to determine a set of top ice-creams sold by the temperature of a given day.
We crunch this corpus with a view to emit two outputs, one is a zipcode to temperature mapping , while the other
represents an ice-cream sale in a zip code. :
Key Value
[zip1] temp1
[zip1,ice_cream1] 1
[zip2,ice_cream2] 1
The view collation here is a mechanism to create an association between the ice_cream sale, the zip and the average temperature ie a join.
We have a constraint that the temperature lookup happens only once in 24 hours when the zip is first seen and that is the valid
avg temperature to use for that day. eg lookup happened at 12:00 pm on Jan 1st, next lookup does not happen till 12:00 pm Jan 2nd.
However the avg temperature that is accepted in the 1st lookup is valid only for Jan 1st and that on the 2nd lookup only for Jan 2
including the first half of the day.
Now things get complicated when I want to do the same query with a time component involved, concretely associating the average temperature of a
day with the ice-creams that were sold on that day in that zip.eg. x vanilla icecreams were sold when the average temperature for that day is 70 F
Key Value
[y,m,d,zip1] temp1
[y,m,d,zip2,ice_cream2 ] 1
[y,m,d2,zip1,ice_cream1] 1
This has an interesting impact on the queries, say I query for the last 1 day I cannot make any associations between the ice-cream and temperature before the
first lookup happens, since that is when the two keys align. The net effect being that I lose the ice-cream counts for that day before that temperature lookup
happens. I was wondering if any of you have faced similar issues and if you are aware of a pattern or solution so as not to lose those counts.
First, welcome to StackOverflow, and thank you for the great question.
I understand the specific issue that you are having, but what I don't understand is the scale of your data - so please forgive me if I appear to be leading down the wrong path with what I am about to suggest. We can work back and forth on this answer depending on how it might suit your specific needs.
First, you have discovered that CB does not support joins in its queries. I am going to suggest that this is not really an issue if when CB is used properly. The conceptual model for how Couchbase should be used to filter out data is as follows:
Create CB view to be as precise as possible
Select records as precisely as possible from CB using the view
Fine-filter records as necessary in data-access layer (also perform any joins) before sending on to rest of application.
From your description, it sounds to me as though you are trying to be too clever with your CB view query. I would suggest one of two courses of action:
Manually look-up the value that you want when this happens with a second view query.
Look up more records than you need, then fine-filter afterward (step 3 above).

Resources