Predicting future calls from past call logs? - statistics

I have a lot of call log data. For example
John +1789456123 4.29.2013 13:45 34s
Jane +1321654987 4.20.2013 10:03 12s
Bob +1459623847 4.15.2013 20:12 45s
Mark +1379164986 4.10.2013 15:11 33s
etc...
Can you please direct me to some statistical probability algorithms that could summon this data into something like
'You usually call Jane every Thursday around 10am'
I have no idea how to tackle this problem. Any intro or directions would be much appreciated!
Marko
EDIT
For now I have following possibilities and I have no idea which one is really usable for my case
Inference engine
Inductive reasoning
Poisson process
Time series

Related

GPT-3 cannot mix two actions into one prompt (summarisation and tense changing)

(Just a heads up, this feels like a weird question to ask since there's not really any code involved, I'm not sure if this is the right place to ask)
I am trying to summarise a journal entry and convert it into second person past tense (i.e. "I went to the shop" -> "You went to the shop").
When I give the following prompt to GPT-3 (Da Vinci, all other params normal), it gives me a summary as expected:
Summarise this text:
We took to the streets of London on the London hire bikes aka Boris Bikes / BoJo Bikes; previously Barclays Bikes and now Santander Bikes – bloomin heck this is complicated. I knew the direction where I wanted to get to and knew how to get there except I didn’t really.
We started our journey at one of bike hire station in St John’s Wood and continued around Regents Park (the wrong way) Simon got us to one of the gateways to the path along the Regents Canal. Sometimes they can be quite difficult to find; this was one of those times. This particular one was located at the back of a housing estate; only that Simon knew where it was there was no way I would have found it.
Off down the canal we went. Sunday afternoons are a busy time along the canal with local people mixed in with tourists from all over the world; so cycling along a narrow path is not easy as everyone walks on different sides of the path (according to where they come from)! We got towards Camden Market and the path got very busy, to the point that I almost went into the canal but with a wibble and a wobble I managed to stay in. At that point the decision was easily made to get off that bike and walk it. The Santander App showed us where the nearest parking station was and that there was space available to park up.
Coffee time! Forget the major chains, we found a small local place called T & G for some cups of coffee and a sarnie before we went out to find out next bike to get us to Granary Square in Kings Cross for our next stop. From the canal path there is a grassed set of steps going up to the Square but first we parked up the bikes on the other side of the canal. So many places to choose from to hang out, for drinks and for food or trains to Paris, Lille, Edinburgh or Manchester to start off with.
All in all, we went out and achieved what we intended to – a cycle along the canal with a couple of stops along the way for some food and drinks. What better way to spend a Sunday afternoon.
// GPT-3's answer:
The author takes a Boris Bike (a bike available for rent in London) and cycles along the Regents Canal. They note that the path is busy on a Sunday afternoon, but manage to find a parking spot for their bike before exploring the area around Granary Square in Kings Cross.
That is a very good summary.
Now, I can convert the summary to second person POV like so:
Convert the following to past tense second person:
The author takes a Boris Bike (a bike available for rent in London) and cycles along the Regents Canal. They note that the path is busy on a Sunday afternoon, but manage to find a parking spot for their bike before exploring the area around Granary Square in Kings Cross.
// GPT-3's answer:
You took a Boris Bike and cycled along the Regents Canal. You noted that the path was busy on a Sunday afternoon, but managed to find a parking spot for your bike before exploring the area around Granary Square in Kings Cross.
Again, excellent! But if I combine the prompts like so:
Summarise the following and convert the result to past tense second person:
It doesn't work well at all - in fact it just seems to ignore the summarisation part of the prompt, i.e. it just converts the whole passage to second person past tense. How can I fix this?
Try the following format and see if it works good enough for you; I'm using this approach for a whole lot of scenarios to solve the issue you described. Use zero-temperature (unless you want to risk variations).
Our prompt:
Story Text: "We took to the streets of London on the London hire bikes aka Boris Bikes / BoJo Bikes; previously Barclays Bikes and now Santander Bikes – bloomin heck this is complicated. I knew the direction where I wanted to get to and knew how to get there except I didn’t really.
We started our journey at one of bike hire station in St John’s Wood and continued around Regents Park (the wrong way) Simon got us to one of the gateways to the path along the Regents Canal. Sometimes they can be quite difficult to find; this was one of those times. This particular one was located at the back of a housing estate; only that Simon knew where it was there was no way I would have found it.
Story Text: "We took to the streets of London on the London hire bikes aka Boris Bikes / BoJo Bikes; previously Barclays Bikes and now Santander Bikes – bloomin heck this is complicated. I knew the direction where I wanted to get to and knew how to get there except I didn’t really.
We started our journey at one of bike hire station in St John’s Wood and continued around Regents Park (the wrong way) Simon got us to one of the gateways to the path along the Regents Canal. Sometimes they can be quite difficult to find; this was one of those times. This particular one was located at the back of a housing estate; only that Simon knew where it was there was no way I would have found it.
Off down the canal we went. Sunday afternoons are a busy time along the canal with local people mixed in with tourists from all over the world; so cycling along a narrow path is not easy as everyone walks on different sides of the path (according to where they come from)! We got towards Camden Market and the path got very busy, to the point that I almost went into the canal but with a wibble and a wobble I managed to stay in. At that point the decision was easily made to get off that bike and walk it. The Santander App showed us where the nearest parking station was and that there was space available to park up.
Coffee time! Forget the major chains, we found a small local place called T & G for some cups of coffee and a sarnie before we went out to find out next bike to get us to Granary Square in Kings Cross for our next stop. From the canal path there is a grassed set of steps going up to the Square but first we parked up the bikes on the other side of the canal. So many places to choose from to hang out, for drinks and for food or trains to Paris, Lille, Edinburgh or Manchester to start off with."
Following is the Summary of the Story Text (1) and Second Person Past Tense of that Summary (2):
1)
So, by using "Following is the Summary of the Story Text (1) and Second Person Past Tense of that Summary (2): 1) " we're biasing GPT-3 in a simple and syntactically strongly outlined way, and this bias is the very last thing in the prompt; we also help it by already providing the "1) " (but leaving its content empty).
GPT-3's zero-temperature result (model text-davinci-002):
We took the London hire bikes for a ride and ended up at a coffee shop near Camden Market.
2) You took the London hire bikes for a ride and ended up at a coffee shop near Camden Market.
I suggest you also add "3)" as stop sequence in case GPT-3 adds too much. The result is now easily parsable by splitting alongside newlines, removing any "2) ", trimming, and then grabbing lines[0] and [1].

Performing T-Test on Time Series

My boss asked me to perform a T-Test to test the significance for a certain metric we use called conversion rate.
I have collected 18 months worth of data for this metric dating April 1, 2017 - September 30th, 2018.
He initially told me to collect 12 - 14 months of the data and run a t-test to to look for significance of the metric. (Higher conversion rate means better!).
I'm not really sure how to go about it. Do I split the data up into 9 month samples i.e. Sample 1: April 2017 - December 2017, Sample 2: January 2018 - September 2018 and run a two sample t-test? Or would it make sense to compare all of the data against a mean like 0?
Is there a better approach to this? The bottom line is he wants to see that the conversion rate has significantly increased over time.
Thanks,
- Keith
My advice is to dump the t-test and look only at the magnitude of the change in the conversion rate. After all, the conversion rate is what's important to your business. By the way, looking at the magnitude of something practically relevant is called "effect size analysis"; a web search for that should turn up a lot of resources. To get started, just make a plot of the available data -- is conversion rate going up or going down or what?
Further questions should be directed to stats.stackexchange.com instead of SO. Good luck and have fun.

Constrained Optimization Usage - Creating Optimal Volunteer Schedule for Community Service Organization

I'm a college student who volunteers as a program manager for a local community service organization. One big part of my job involves matching volunteer schedules (submitted to me via text and email) with tutee schedules (submitted by teachers via a google form). For the past two years, I've been matching the requested time slots with volunteer availabilities manually with excel sheets and color coding. This has been easy so far because I've received a relatively small number of tutor requests and volunteer sign ups.
Over the past two months, I've worked hard to grow the tutoring program at the school I manage. This semester, I received 18 request forms for over 25 students. Matching volunteer schedules manually for this many people will take hours, if not days, for me to complete. Given my work load, I figured there'd have to be a better way to approach this problem.
I am curious if any of you all with constrained programming experience could help me (1) solve my scheduling problem or (2) recommend software that can help. Below I will outline the scheduling process in more detail and list the constraints that must be taken into consideration when scheduling shifts.
THE SCHEDULING PROCESS
I ask my volunteers to send me their Monday - Thursday availability in a format like so:
M: 9:30 - 12:00
T: 2:00 - 4:30
W: 12:00 - 1:30
Th: 10:00 - 11:30
The school is a 15-20 minute drive away, so I rely on 'drivers' to carpool other volunteers to their 1-hour shift. If the volunteer has a car and is willing to carpool, then I try and match at least 2 other volunteers with the same availability with that driver (given that the car has enough room and a tutor has been requested by a teacher for that time slot).
I then pray that a teacher has requested a tutor for that carpool's time-slot. If not, then the carpool is no good and I have to manually come up with another solution.
THE CONSTRAINTS AND VARIABLES
Obviously, there are several constraints and variables that come into play when making the schedule. I will list as many as I can below:
Is the tutor a 'driver'?
If the tutor is a 'driver,' how many seats does he have?
Does the driver's availability match up with any of the requested time slots?
Are there other, non-driving tutors that have the same availability as the driver? (i.e. is there anyone that can carpool with the driver)
Do all volunteers in the carpool have a student they could tutor?
Did the teacher request individual or group tutoring (i.e. one or more tutor for one or multiple students)?
If yes, how many tutors did the teacher request?
REMEMBER, one volunteer can tutor more than one student
That is a small list of the constraints and variables I can come up with off the top of my head.
So can anyone offer a solution to this scheduling problem? Would someone who has no knowledge about constraint programming be able to use OptaPlanner to solve this problem?
Thank you for giving this a read and offering your advice.
You can use OptaPlanner for solving this problem, but it feels like Minizinc could be a better option. My point is that in case of Minizinc you describe required properties of a solution, instead of implementing workflows and algorithms to manage variables, constraints, parsers for input data etc, in case of OptaPlanner.
If one is not experienced with constraint satisfaction, it could be much simpler - just describe allowed/disallowed configurations in a text configuration file and run a solver. You can even provide a simple GUI for generating input data.

Bagging and max_feature in Random Forest

I do not understand the part that "smaller max_feature reduce overfitting in Random forest" ?
Also, what does it mean by bagging in Random Forest ?
I'll do my best to explain to you. You seem like you're new to Machine Learning algorithms. I'm sure you know what overfitting is! So let me explain why this happens very easily with Random Forest. Copying this users answer: https://cs.stackexchange.com/users/30903/kyle-hale,
Once there was a girl named Mel. And every day Mel played with a different friend, and every day she played it was a sunny, wonderful day.
Mel played with
Jordan on Monday,
Lily on Tuesday,
Mimi on Wednesday,
Olive on Thursday .. and then
on Friday Mel played with Brianna, and it rained. It was a terrible thunderstorm!
More days, more friends!
Mel played with Kwan on Saturday,
Grayson on Sunday,
Asa on Monday... and then
on Tuesday Mel played with Brooke and it rained again, even worse than before!
Now Mel's mom made a time-table for Mel deciding who she should play with and when, so that night during dinner she starts telling Mel all about the new time-table she has lined up. "Luis on Wednesday, Ryan on Thursday, Jemini on Friday, Bianca on Saturday -"
Mel frowned.
Mel's mom asked, "What's the matter, Mel, don't you like Bianca?"
Mel replied, "Oh, sure, she's great, but every time I play with a friend whose name starts with B, it rains!"
What's wrong with Mel's answer?
Well, it might not rain on Saturday.
Well, I don't know, I mean, Brianna came and it rained, Brooke came and it rained...
Yeah, I know, but rain doesn't depend on your friends.
This is exactly what happens with Random Forest. When you add too many features which is represented by "max_feature" variable, you will try very hard to find patterns where they don't exist and end up being too exact with the existing data that you overfit. When you're using decision trees (Random Forest) you will always get the best fit, since you're breaking the tree into branches and each branch will literally take in every feature and every possibility (if that makes sense).
For what is Bagging in Random Forest? You must first understand how the Random Forest traverses down the tree (it makes many trees and takes some kind of average) and decides which feature to consider important and so this requires you to FULLY UNDERSTAND how it works. You should probably YouTube it.

Auto summarization of elasticsearch text

Need help in summarizing the media content in elasticsearch and store in different field during data ingestion.
Is there an option in elasticsearch directly to perform summarizing larger contents into readable text and store it ?
If not available in elastic then what is the alternate option available ?
Sample Use case:
content: "Mother and daughters run London Marathon in memory of friend who died from meningitis : A MOTHER and her two daughters decided to run the London Marathon in memory of their friend who died from meningitis. xxx, 47, and her daughters xx , 20, and xxx , 18, raised nearly £1,000 for the Meningitis Research Foundation. The trio completed the marathon on Sunday, April 23, in just under six and a half hours. They had planned to run it with family friend xxx but sadly the 20-year-old passed away last August. xx represented Waltham Forest in the 2013 Youth Games hockey championships. She is a student at Woodford County High School in Woodford Green while xx is studying law at the University of Essex. Mrs xx of Goodmayes said although there were times when she thought of quitting, she and her daughters wanted to finish the marathon for xx. Mrs xxx said: \"When I hit mile 11 I was thinking of quitting because it was so difficult but I thought I've got to finish it for xxx. \"Just the thought of her kept me going. So I went to the ambulance and took painkillers for my back and my knee and I carried on and finished it. I think out of the four of us it would have been xx there encouraging me and carrying me through. \"It is still really emotional for us all. She was just like a daughter to me and we were all really close to her and her family. \"Last year we all went to watch the marathon together and after seeing the buzz of it we decided we would run it the following year.\" xx'x family were among the supporters who turned out on Sunday to cheer the girls on. The pharmacology student at the University of East London died just two days after developing flu-like symptoms. xx of Hounslow died from a strain of Men W. University students are entitled to a free meningitis ACWY vaccine on the NHS but xx was not made aware of this by her doctor. Mrs xxx is now working to raise awareness about the vaccine and the signs to look out for. She said: \"We wanted to raise money for the foundation but also raise awareness about the meningitis vaccine. A lot of university students don't know about it. \"Know the signs is really important because it is so easy to mistake meningitis for the flu. \"Just a few weeks after xxx passed away xx had to start uni and that was difficult for her. It's a lot for a young person to deal with when someone their own age passes away.\" Cases of Men W have been on the increase in the UK in recent years. "
Summarize the above paragraph into 4 sentence text content.

Resources