Series Design in Influxdb for showing number of repeat customers - statistics

Consider an analytics where you need to find out repeat customers for a date range. Repeat customers are defined for date range as customers who use the service 3*(Given Date Range Interval) before the starting range and also used the service in given date range.
For example repeat customer for this week is all customers who used service 3 weeks before starting of this week and all such customers used it this week.
I am using influxdb. I haven't decided the series yet, I am looking for inputs into how I can define a series such that I can do available operations in influxdb to obtain this analytics.
Data available to me is the timestamp at which user used the facility, user_id , service_category, service_instance_id, and a json dump of further details about service.
may be my thought process is limited, I need some intervention on how to approach this and any input is welcome.

So I thought about this and came to a decent solution. I have to save the last time a user visited along with the entry. So at least one reference will be there for any time period if user is repeating for that time period.
This is similar to a linkedlist except that we have direct access to time based filtering of nodes.

Related

How to Subtract a value from the oldest expiration date and move to the next once it hits zero?

I work for an organization and I am working on a way to better organize our part numbers that have expiration dates. I've started off by making an Excel sheet that has the part number multiple times based on the expiration date and then had the matching quantities of that expiration date. What im trying to do is what you always do in life, if something is going to go bad use up the one that will expire first. But now trying to do this in an excel table automatically. This is my Ex: my master log where they will put in the Part#, expiration date, and quantity, this is where most info will be pulled from.
I am now on to taking away our companies forecasted usage of these parts away from these quantities to show an overall usage over the months and pairing the intersecting expiration dates. I’ve got the Usage by using a pivot table to total these part numbers quantities. And I’ve made a chart mapping the overall usage to part quantity forecast with it being flagged if the expiration date interferes with the actual date. This is the forcasting for what we plan to use of each part #
This is me subtracting the usage number from the inventory of those part numbers and it being flagged when it expires.
=IF((IF((VLOOKUP($F60,Expires!$1:$1048576, 10, FALSE))<Sheet1!G$58, "Bad", "Good"))="good", T33, "Expired")
I used this function, my vlookup pulling from my first sheet and pulling the part numbers expiration date and if it is <the date up top (9/1,10/1 etc.) then it will be flagged as expired and wont even show the usage. T33 is simply pulling the fromual from another table which is just quantity- usage. Please lmk if this looks right my vlookup turns out pulls the latest expiration date from the multiple part #s.
But that is with total quantities, not separated based on the part number expiration dates. What I am wondering is how can I make a table like this but subtracts the usage from quantity of the part number that will expire first? Then after that quantity hits zero it takes away the usage from the next one due to expire? But also I want the Numbers to be able to auto populate from the sheet where the user enters the data (master log).
So this is what it will be doing and will look like the table above multiple of the same part number. Whats happening now
But needs to look like this Final table and add the part number to the list and quantity as they are entered in the master log. I was trying to do a Vlookup(MIN and take the part number from the other table based on the lowest expiration date first but I don’t think excel is capable of that, I’m not sure here.
Please help, thank you!

What is the best way to store daily prices on Apache Druid?

I'd like to store daily prices on Apache Druid in a way that is possible to group it by daily, weekly and monthly. I was thinking of defining open and close prices as metrics, not dimensions, but for that I would have to use first and last aggregators to group open and close prices, respectively. The documentation says:
(Double/Float/Long) First and Last aggregator cannot be used in ingestion spec, and should only be specified as part of queries.
So it seems I cannot load the data in this way already. I would like to know what would be the best way to load daily prices data in a way that allows me to use all of druid's capabilities when grouping by day, week or month.

Using QDigest over a date range

I need to keep a 28 day history for some dashboard data. Essentially I have an event/action that is recorded through our BI system. I want to count the number of events and the distinct users who do that event for the past 1 day, 7 days and 28 days. I also use grouping sets (cube) to get the fully segmented data by country/browser/platform etc.
The old way was to do this keeping a 28 day history per user, for all segments. So if a user accessed the site from mobile and desktop every day for all 28 days they would have 54 rows in the DB. This ends up being a large table and is time consuming even to calculate approx_distinct and not distinct. But the issue is that I also wish to calculate approx_percentiles.
So I started investigating the user of HyperLogLog https://prestodb.io/docs/current/functions/hyperloglog.html
This works great, its much more efficient storing the sketches daily rather than the entire list of unique users per day. As I am using approx_distinct the values are close enough and it works.
I then noticed a similar function for medians. Qdigest.
https://prestodb.io/docs/current/functions/qdigest.html
Unfortunately the documentation is not nearly as good on this page as it is on previous pages, so it took me a while to figure it out. This works great for calculating daily medians. But it does not work if I want to calculate the median actions per user over the longer time period. The examples in HyperLogLog demonstrate how to calculate approx_distinct users over a time period but the Qdigest docs do not give such an example.
The results that I get when I try something to the HLL example for date ranges with Qdigest I get results similar to 1 day results.
Because you're in need of medians that are aggregated (summed) across multiple days on a per user basis, you'll need to perform that aggregation prior to insertion into the qdigest in order for this to work for 7- and 28-day per-user counts. In other words, the units of the data need to be consistent, and if daily values are being inserted into qdigest, you can't use that qdigest for 7- or 28-day per-user counts of the events.

Access DB: Daily cumulative count and daily number on assignment

I have a table that tracks the dispatching of personnel. The table has the employee name and the date the person went out and the date they returned.
The table has hundreds of entries from 1988 to current.
In Excel I track the cumulative count per day (of the year) of how many people have been sent out, and I also track the number of people out on any given day. The table lists the Month & Day in the first column (every day of the year, including leap days) and the years on the first row. There is data for every date (a zero is entered until the first person is sent out that year, then starts counting up as there are more dispatches, or in the case of the number of people out each day, it will show zero if no one is out that day or if there were, say 5 people out, it would show "5" for that day). I then use the data in Excel to construct a graph that shows the number of dispatches on the y axis and the day of the year on the x axis (along with the current year’s number, the average number and the max over the 27 year history). Currently I just track this manually (I just keep a running count of each and enter it in manually in Excel.) I would like to build a query of my Access data that would return the same information that I could import into my Excel spreadsheet. One query that would show the day & month in the first column and the years along the top row and for each day show a cumulative count for that year of how many people have been sent out. Another query that has the day & month in the first column and the years along the top and a count of how many people were out for that particular day for that particular year. There shouldn't be any gaps (every day has data, even if it is "0"). I would then import those queries into Excel to replace my manual tracking that I am doing now.
I know how to construct the Excel stuff (I have that running already), and how to import info from Access to Excel, what I need to know is how to construct these 2 Access queries.
Any help/ideas on how to construct those 2 queries would be greatly appreciated!
I'd recommend that you migrate this app to a web based solution that uses a real database - SQL Server or MySQL, not Access.
"Desk drawer software" is what I call homegrown apps that someone creates for themselves to perform some small task that eventually become integral to running a business and grow out of hand. Your truck factor is 1: if anything happened to you, no one would know how to do this function. The software may not be backed up or checked into a source code management system. There's no QA. There's no way to migrate new features to production: if you alter the app, then that is what you have.
I'd recommend a web app to mitigate all the risks I've described:
You have to deploy a web app to a server, which takes it off your desktop and puts it in a central place where anyone who's authorized can access it.
Separates database from display issues.
Makes you think about how to archive historical data. Partitioning by year makes sense.
Likely you'll put this in a source code management system like Subversion or Git.

Attendance Calculations / Period Calendar

This is a multi-tiered project. Let me give a quick overview. I have attendance data, card/ timestamp punches. I would like to have a pivot table with slicers in Excel. Ideally you'd be able to choose a department / last name / associate number. And also a period of time. Ideally this would be a table with the company period/week. And maybe default to last weeks.
I can get at timecard data in two ways:
(1) generate a CSV that automatically performs the timecard math, to figure out how many hours someone worked and it is smart enough to understand 3rd shift workers. The format of that CSV is:
Last Name, First Name, Personnel Type, Associate Number, Facility, Department, TimeIn, TimeOut, Total Hours
The problem with this method is that I would have to manually append the information to the CSV tables. Or come up with some autoIT script.
(2) Get at the raw data via sql/odbc. This way the math is not done. It is just all of the associates timestamps. I would have to figure up the daily hours myself and figure out a 3rd shift formula too. It is not a set schedule, many people swing shifts and others get called in a lot.
Lastly, I would like to be able to filter the dates by using our company fiscal calendar. I have a spreadsheet that goes from 2000 to 2093. With everyday listed and it's corresponding year/period/week.
Example period info spreadsheet:
date Year Period week WeekTotal Period Total
12/3/2007 2008 1 1 2008.1.1 2008.1
12/4/2007 2008 1 1 2008.1.1 2008.1
I know there is a lot going on here, but what would be the best way to approach this project?
First I have not been able to post any script however the last I tried it I used two options 1. Was a php conversion where the time was numbers ( which makes it easier for calculations)
2. Was in the tables where I deliberately entered the values places the time in different columns or fields for hours, mins, and seconds this meant that while the input is eased I still have to calculate the output in php especially for totals, averages and differences.
Hope it helps a bit

Resources