I have use case in which I have to generate 10 million discount codes for a campaign.
There can be multiple campaigns.
Hence the final discount figures can be close to 200-300 million discount codes.
A simple solution would be to generate these many discount codes and store it in DB(postgres). But that would lead to slowdowns and unnecessary burden on the system.
Alternate solution in my mind is to:
Store a range_of_discount, campaign in the DB.
Run python code to generate x-million discount codes in range mentioned in step 1.
Upload output of above in merchant like GPAY/CRED etc.
When a user comes in and applies the code, and if it is in range, then it is a valid discount code.
This way, I wont have to store millions of record.
Or is there any easier way to achieve this?
Limitation would be that the codes cannot be sequential.
Related
I have a database in Tableau from an Excel file. Every row in the database is one ticket (assigned to an Id of a customer) for a different theme park across two years.
The structure is like the following:
Every Id can buy tickets for different parks (or same park several times), also in different years.
What I am not able to do is flagging those customers who have been in the same park in two different years (in the example, customer 004 has been to the park a in 2016 and 2017).
How do I create this calculated field in Tableau?
(I managed to solve this in Excel with a sumproduct fucntion, but the database has more than 500k rows and after a while it crashes / plus I want to use a calculated field in case I update the excel file with a new park or a new year)
Ideally, the structure of the output I thought should be like the following (but I am open to different views, as long I get to the result): flag with 1 those customers who have visited the same park in two different years.
Create a calculated field called customer_park_years =
{ fixed [Customerid], [Park] : countd([year]) }
You can use that on the filter shelf to only include data for customer_park_years >= 2
Then you will be able to visualize only the data related to those customers visiting specific parks that they visited in multiple years. If you also want to then look at their behavior at other parks, you'll have to adjust your approach instead of just simply filtering out the other data. Changes depend on the details of your question.
But to answer your specific question, this should be an easy way to go.
Note that countd() can be slow for very large data sets, but it makes answering questions without reshaping your data easy, so its often a good tradeoff.
Try this !
IFNULL(str({fixed [Customerid],[Park]:IF sum(1)>1 then 1 ELSE 0 END}),'0')
I am adding Two tables (Transactions, Customers) to Qlikview and I need a number returned on how many customers have spent over 1000$ in a text object.
I am trying to achive this through aggregate function with no luck till now.Is this possible?or should i try an alternative root.
Num(Count( {$ < Aggr(Sum(Total),Customer) = {">1000"}>} Distinct Customer), '###.###.###')
Total is the amount spent on each transaction and customer the customer who made the transaction.
I also tried something like the below code:
count({<Customer= {"=sum(Total)> =100"} >} distinct Customer)
but still havent gotten anywhere.
If think is what you want to do. Assuming Total is the number you want to add I've used Spend to avoid confusion. This would give you the number of customers with a spend above 1000 based on the current selections.
Num(Count(if(Aggr(Sum(Spend),Customer)>1000,1)), '###.###.###')
The use of the TOTAL function inside the aggr() function will skew the results
I use Excel to store information about purchased products and invoices (according ID, Item!, Quantity, Price(per), Date etc.). Every time I have to put the new information about orders into table. For example: I have an order. First I create new rows in table for each order. Then I purchase them and after delivery I can fill the rows. But sometimes I order the same item, but with different price and quantity or I should to order more, since last time was not enough.
Therefore, I want to create a "program" in Excel to organize them and to check the order status, whether task already done or not.
The problem is that I can't use the same cell for different prices. And reorder can be also not sufficient.
Any kind of help is welcome!
P.S.
I can do it in MS Excel?
Do I have to learn maybe MS Access?
Is there any freeware relevant for my purpose?
I have several documents which contain statistical data of performance of companies. There are about 60 different excel sheets representing different months and I want to collect data into one big table. Original tables looks something like this, but are bigger:
Each company takes two rows which represent their profit from the sales of the product and cost to manufacture the product.I need both of these numbers.
As I said, there are ~60 these tables and I want to extract information about Product2. I want to put everything into one table where columns would represent months and rows - profit and costs of each company. It could be easily done (I think) with INDEX function as all sheets are named similarly. The problem I faced is that at some periods of time other companies enter the market:
Some of them stay, some of them fail. I would like to collect information on all companies that exist today or ever existed, but newly found companies distort the list (in second picture we see, that company BA is in 4th row, not BB). As row of a company changes from time to time, using INDEX becomes problematic, because in some cases results of different companies get into one row. Adjusting them one by one seems very painful.
Maybe there is some quick and efficient method to solve such problem?
Any help or ideas would be appreciated.
One think you may want to try is linking the Excel spreadsheets as tables in Access. From there you can create a query that ties the tables together. As data changes in the spreadsheets, the query will reflect those changes.
I'm using Windows Azure and venturing into Azure Table Storage for the first time in order to make my application scalable to high density traffic loads.
My goal is simple, log every incoming request against a set of parameters and for reporting count or sum the data from the log. In this I have come up with 2 options and I'd like to know what more experienced people think is the better option.
Option 1: Use Boolean Values and Count the "True" rows
Because each row is written once and never updated, store each count parameter as a bool and in the summation thread, pull the rows in a query and perform a count against each set of true values to get the totals for each parameter.
This would save space if there are a lot of parameters because I imagine Azure Tables store bool as a single bit value.
Option 2: Use Int Values and Sum the rows
Each row is written as above, but instead each parameter column is added as a value of 0 or 1. Summation would occur by querying all of the rows and using a Sum operation for each column. This would be quicker because Summation could happen in a single query, but am I losing something in storing 32 bit integers for a Boolean value?
I think at this point for query speed, Option 2 is best, but I want to ask out loud to get opinions on the storage and retrieval aspect because I don't know Azure Tables that well (and I'm hoping this helps other people down the road).
Table storage doesn't do aggregation server-side, so for both options, you'd end up pulling all the rows (with all their properties) locally and counting/summing. That makes them both equally terrible for performance. :-)
I think you're better off keeping a running total, instead of re-summing everything everytime. We talked about a few patterns for that on Cloud Cover Episode 43: http://channel9.msdn.com/Shows/Cloud+Cover/Cloud-Cover-Episode-43-Scalable-Counters-with-Windows-Azure