Excel/GSheets count unique versus reoccuring values in dynamic list - excel

So I have a large data set of webinar conference attendees - about 30k+ rows. I need to write a function (either in Excel or Google Sheets) to determine if each participant is a first time or returning attendee. The data set is sorted chronologically by event from oldest to most recent. Let's use the simplified version below to better illustrate the business case here.
Check out the simple sample set here.
I considered using named ranges for each of the webinar sets and searching to see if the user exists in that range but given the scale and various webinar attendee sizes that would be tough. I also thought about index-matching the various attendees but because the number of attendees per event is so variable (anywhere from 5 to 500) that too would be frivolous. I feel like I might be missing a relatively simple solution here - please assist! Thanks!

Have a try on below formula. It will work both on excel and google sheet.
=IF(COUNTIF($B$2:$B2,B2)>1,"Returning","New")
If you need to show N/A for first web inner then use below formula.
=IF(A2="A","N/A",IF(COUNTIF($B$2:$B2,B2)>1,"Returning","New"))

Related

Dynamically generate list of payment dates considering first/last date and client - ideal for controlling receivables of SaaS and recurrent contracts

I'm building an accounts receivable sheet in Google Sheets.
I would like to register the clients and their contract characteristics (client, payment frequency and price) in one sheet and I would like to dynamically generate the payment dates in another sheet.
The input sheet would look like this:
The output sheet would look like this:
I think it might be something in the QUERY and ARRAYFORMULA universe but I don't know how to configure it. Is there a way to dynamically generate the combination of Date and Client, taking into consideration first and last payment dates?
Sample in this link. If you'd like to use, please fill free to create a copy for yourself and post it in your answer.
Creating a 2D array of concatenated strings of dates and values can be a good first step in these kinds of problems.
I've demonstrated the idea in a tab called MK.Help on this sheet that I also shared in the comment above. This formula can be found in cell A2 and is generating the whole list:
=ARRAYFORMULA(QUERY(SPLIT(FLATTEN(Input!A2:A5&"|"&Input!D2:D5+SEQUENCE(1,CEILING(MAX(IFERROR((Input!E2:E5-Input!D2:D5)/Input!C2:C5))),0)*Input!C2:C5&"|"&Input!E2:E5),"|",0,0),"select Col2, Col1 where Col2<=Col3 order by Col2"))
Once you have the data in a big 2D array, you can flatten it out and then split it into it's component parts to make it query'able. I've tried to outline the process to the right of the solution.
#MattKing's answer was really good but I particularly had problems since all my inputs would have dynamic sizes and doing his step by step I couldn't figure how to adapt to this situation.
So, using a lot of Matt's inspiration and some extra research (including this new question) I came to a solution that worked better for me, using multiple pages to come to a final result. Not so classy but works.
I left my solution available in this sheet.
Even though, I've chosen to accept Matt's answer since it worked, it helped me, it looks "more pythonic" and maybe the need to be so dynamic wasn't so clear in the question.

Crop order scheduling in excel

I am looking for some advice, I have a small micro green business and I have an excel sheet that breaks down the seeds, seed batch amount required, yield etc and so on.
I want to create a tab where I can input a customer order, I then want excel to schedule that order based on the information contained above in a calendar format on a tab.
I also want excel to calculate the amount of seed required number of trays and assign the tray a number. All trays are number in this format "A123, A124, A125" etc.
I'm also keen for excel to then assign the seed batch to the order and a tray number to the order.
Firstly is this possible, I've used excel a fair bit from my previous work experience but this is quite new for me and I am keen to learn so if someone could point me in the right direction on a possible method and what I should be looking at!
It sounds like you are looking for a data(base) structure for this challenge. Yes, that could be done in Excel, some VBA skills will probably come in handy if you want to achieve it automatically. As a first step, you could e.g. set up something like this:
Mockup of a data structure
So you would at least need one table where you enter your orders, one list of trays and one table where you link your incoming orders to the trays. You might need more columns than I added in my mockup.
Hope that gets you started

How can this lookup (find the last relevant item) be improved?

One of the reports that wastes a bunch of my time at work is the Roster. It's a multi-site, multi-contract listing of every employee currently assigned to a specific client. Currently, it has a little over 6,000 lines by 20-something columns, indexed against 3 different datasets. Not the largest mess in the world, but still a pain. And it's almost all in excel, because I somehow don't have a business case for Access.
But one part of this monster stands apart. One tab per site Site Totals, listing off every time any agent has gone through training. A second tab (again, one per site) Site Data displaying only the most recent training class, and the credentials they had during that class.
That second tab is driven by variations of this array formula - Last_Row is a named range on another tab, and column A is a pivot of the UID column on Site Totals. I've broken it apart for readability:
=IF(INDEX('Site Totals'!B:B,LARGE(($A2=INDIRECT("'Site Totals'!$A$1:$A$"&Last_Row))*
(INDIRECT("'Site Totals'!B1:B"&Last_Row)<>"")*
ROW(INDIRECT("'Site Totals'!$A$1:$A$"&Last_Row)),1))="Trainer",
"",
INDEX('Site Totals'!B:B,LARGE(($A2=INDIRECT("'Site Totals'!$A$1:$A$"&Last_Row))*
(INDIRECT("'Site Totals'!B1:B"&Last_Row)<>"")*
ROW(INDIRECT("'Site Totals'!$A$1:$A$"&Last_Row)),1)))
I know what this formula does, but I don't know how to improve it. This formula needs to be changed, because it currently is on the order of 500 Million calculations (I'm not allowed to delete historical data), and it takes me 3 hours to calculate the workbook ... if it doesn't crash Excel first.
I'm open to VBA and / or custom functions, but would prefer to have native Excel functions. I'm not able to install anything, so any solution must be native Excel, and Must be compatible to Excel 2007.
If your source is a pivot table, try is the GETPIVOTDATA function. You might be able to accomplish what you want without INDIRECT and INDEX.
What i have understood is that every person has/has not attended a training and you want to retrive the name of that training, in case he has not, you want a blank space in the cell. If this description is correct you can try this formua, press ctrl+shift+enter to execute.
=IFERROR(INDEX('Site Totals'!B$1:B$12,MATCH(A2&"Trainer",'Site Totals'!A$1:A$12&'Site Totals'!B$1:B$12)),"")
Here A2 contians the name of the person. I can be more precise with this formula if you can provide some sample data butI would recommend to not to use entire B & Columns in Site Total workssheete as this will definately slow down computing process, instead you can use B1:B8000 or smaller range, to speed up process. Hope that helps.

Book ordering comparison between spreadsheets for existing catalogue of a Library

I have recently asked this question of google's spreadsheet page.
I a significant data comparison problem I would like to solve. It relates to purchasing books for a Library. We have a catalogue of over 11,000 books. When we order new books we need to compare our proposed purchases to the current stock. Currently we can manually compare them to our catalogue, very laboriously book by book.
We need to do 3 things to make our life easier -
1 easily clean out bad data/characters in the ISBN's - these are either spaces, - (hyphen's) or . (period mark or full stops). A simple formula to run over all ISBN fields would be great.
2 I need to compare data between 1 spreadsheet with 11,000 books in it (current library stock), a second with up to 1000 books in it (currently on order) and finally the third currently active one (about to be ordered) with 50 to 200 books listed in it.
All spreadsheets use the same column configuration as below
Library orders
Title Author Publisher ISBN (long version) US$ UKgpd HK$ Other$ P/O no. Date ordered
UNNATURAL SELECTION MARA HVISTENDAHL Public Affairs Publishing; Reprint edition (May 1, 2012) 978610391511
Finally, the out put of these comparisons should quickly and easily identify on what lines we have matches. and what type of match it is, Author only, Author and Title, or Author, title and ISBN etc for all the possible combinations. To make this easier assume spreadsheet 1 is an unalterable master table, with spreadsheet two similar. It is really only on Spreadsheet 3 we need to be clear if we are starting to reorder materials.
If it is possible to have these as different sheets in a workbook it would be ideal. The only additional feature is that any scripts that run need to be able to cope with spreadsheet 1 increasing in size as new acquisitions arrive and are included. Both spreadsheets 2 and 3 will vary (increase and decrease) as the ordering process proceeds.
Finally the absolute ideal would be for this comparison process to be instant (live) and ongoing as data is included.
If anyone would like to take this on 3 Library staff will be eternally grateful.
regards
Nick
This would be very much easier had you one sheet rather than three (simply add a column to each existing sheet to show whether in stock, on order or to be ordered – three individual letters would be sufficient, then append each of the smaller two files to the largest). Then for example you could apply Conditional Formatting to highlight duplicates one column at a time (Author, Title etc). Apart from the initial data cleansing it would mean in the future switching ‘between sheets’ would merely involve changing a one-letter flag. Filtering would allow you and your colleagues to appear to have three separate sheets and if anyone asks for a particular Title the search would be one-time, not in triplicate.
Also, http://www.microsoft.com/en-gb/download/details.aspx?id=15011 may be of interest, also =SUBSTITUTE.And with data validation you would prevent entry of a new ISBN that already is in your list.

Grouping rows by area codes

I have a table of customers to which my company ships products. The problem is that these customers need to be sorted by their area codes, so that the products can be sent to the appropriate shipping companies (we have two partner companies that ship to certain parts of the country). Each company sent us a list of area code numbers to which they can ship and I need to divide the Excel sheet into two sheets, each containing the customers with the area codes compatible with the respective company.
I tried to solve this problem with VLOOKUP function, but it only works on individual row basis, and I need a solution that will find all rows that contain a number from the specified group of area codes.
Another way would be IF function that would put a True or False (one IF function for each company) value in new column and then I could sort by that value, and copy the data into a new sheet. This approach would work, but the IF function would be extremely long and hard to control.
Can you suggest a way to solve this problem?
Edit to incorporate details provided via Comment:
Presently I have about 5,000 rows but in future it might be more though I doubt over 10,000 rows.
A VLOOKUP seems very promising, of the kind =VLOOKUP($B2,F:G,1,0) in C2 copied across and down as required, with a layout as below:
This does not group as you say you require (but do you really need to?) because it seems possible some locations will be served by both shippers. You might resolve this by flagging those rows where both are viable and then by sorting to split into three groups (Shipper1 only, Shipper2 only, both) before transferring the ranges as desired.
Edit in response to OPs comment
If you can be certain there is no overlap between Shippers, a single column with this formula, say in E2copied down, might be preferable:
=IF(ISERROR(MATCH(B2,F:F,0)>0),"Shipper2","Shipper1")
and would not routinely show #N/A. (This assumes no area is outside the range of both shippers.)

Resources