Solr rangesearch within strings containing chains of characters - string

I hope I find some Help. I'm pretty new to solr and had the oportunity to participate to a talk about it.
For the following scenario even the consultant, who held the talk was unsure about, therefore I hope someone had the same problem.
I have a list of objects identified by a specific key. For examplepurpose:
There are 500000 employees identified by id's (1-500000). Everyone of these people has to work for the next 2 years. Every day of these Years is identified by a character (employee will work - "A", employee won't be at work - "B"). So every employee got a String containing up to 730, but not every employee has the full amount of 730 characters (a specific employee joined the company later or something).
Example String for employee 256:
AABBAAABAAAABBAB
=> Employee 256 will work 2 days, 1 day he is not working, then he will work 3 days in a row, 1 day freetime, 4 days work, 2 days not at work, 1 day work, 1 day home and so on.
Example String for employee 542:
ABBAABABAAABAAAABABBAABAAAAABBABBABABBBABAABABBABABABBABAAAA
Example String for employee 2:
AAAABABBABABAAAABABABABABA
For dispositionpurposes I now want to get the employees who are 4 days in a row at work to go to dinner with them or whatever.
I want to receive the following results:
employee 256 4 days free after day 8
employee 542 4 days free after day 12, after day 23, after day 56
employee 2 4 days free after day 0, after day 12
I hope you got my problem. The example is only for a better imagination. Is it possible to implement a solution with solr?
Other solution approaches (also for the day representation) are highly welcome. Right now we are dependent on the daily representation (every day has one character). But if you deliver me a high performance solution even this is discussable. The amount of entries (500000) is realisitic for the project.

I would not model this as employees, but as availability. Perhaps with availability as a nested/child object of an employee. And availability object would then be StartDay,NumberOfDays.
The query then becomes a simpler join with condition on the child being NumberOfDays>=4.

Related

Is there any way to Count how many times a calculated measure occurs in the table?

I have a PowerPivot table that gives me the information about the daily amount a company gives for food to its employees. Per example, a company gives 130$ to the employee which is the same to say (130/22 days)= 5,91$ daily.
This is the simple explanation.
The PowerPivot has 4 scenarios 19 days, 20 days, 21 days and 22 days (The possible weekdays per month in a year).
Now, I already have a measure with the MOD of each day.
"In 19 days the most recurring value is 5,91$, for this company"
What I need your help with is calculating how many times that 5,91$ appears in the table.
I already built something through pivottables and using COuntifs. That basically go through the pivottable searching for the value. But the file is heavy, really heavy.
I'll leave you an example and the answer:
Hope its clear for all and thank you

SUM UNIQUE except...?

Okay, this may get confusing, but I'll do my best to describe it.
I'm trying to calculate the amount of days a task takes to complete. So I have a spreadsheet that lists the tasks, assigns hours, and assigns a person to do that task.
So, say Task 1 has 5 sub-tasks, and each take 8 hours. If I assign Bill to each task, Bill will take 5 days to complete Task 1. That's easy.
Now, if it takes Bill 5 days to complete Task 1, but the deadline is in 4 days, I simply take 1 of the sub-tasks from Bill and assign to Bob, and viola... we now have 40 hours of work being done in 4 days.
I have all this working with the following formula:
=MAX(SUMIFS('Task List'!Q:Q,'Task List'!T:T,UNIQUE('Task List'!T:T),'Task List'!D:D,"<>"&"*Summary"))
The formula looks down column T and finds any UNIQUE names. It then basically puts those hours happening at the same time as the other hours (from column Q) since Bill and Bob can work at the same time. Another way to look at it is: the person with the most tasks assigned to them is controlling the end date, since that's the "longest" everything will take to get done.
Now the question...
There will ALWAYS be a task at the end of each project called "baking". "Baking" will always be in column H. Baking will always be assigned to a UNIQUE person not on the project... so Bill and Bob will be part of the team, but Baking will always be a different Unique person.
So what happens is the above formula is subtracting out the Baking line because it's a unique name.
But I don't want that. The Baking task will always happen after the other tasks are done and can't be done while Bill and Bob are doing their tasks.
So, what I need the formula to say is, "Look down Column T for any unique names and add the person's hours which appears the most, but ignore this if column G has "baking" in it, and always add these hours normally."
If it helps, the first image below should return the full Value of "80" because Bill is doing all the tasks and Baker is baking once Bill is done, so the total time for the project is 80 hours. However, the above formula returns "70", because "Baker" is Unique.
This one SHOULD return 70, because Bob took 10 hours from Bill's plate, but again, Baker happens after all tasks are done (Bill: 40+10+10 + Baker 10):
=SUMPRODUCT(--(E:E<>""),Q:Q)+MAX(SUMIFS(Q:Q,T:T,UNIQUE(T:T),F:F,"<>"&""))+SUMPRODUCT(--(G:G<>""),Q:Q)
This would do the trick as well. Looking for the time for gathering ingredients + max time of unique employees + time for baking.
Answered in the other thread...
=MAX(SUMIFS(Q:Q,T:T,UNIQUE(T:T),D:D,"<>"&"*Summary",G:G,"<>Baking"))+SUMIFS(Q:Q,G:G,"Baking")

Assign specific dates based to an employee level dataset on weighted averages in excel

I have a dataset in excel which shows headcount by employee level and which department each employee would fall under (sales, ops, or support). I would like to send a survey to each employee once every 26 weeks (2 times a year), but I would also like to keep sending surveys every week to ensure continuation of surveys to a certain amount of population split between sales, ops, and support departments based on their weight of the total population.
This way, I am sending surveys every week to a tiny bit of my overall headcount but only repeating people every 26 weeks.
Can anyone please help on how to solve this in excel with a formula?
From attached sample data, how can I split the headcount to send surverys for 26 weeks straight but to different population every week and not repeat? This different population should be split by % of department out of total headcount. Meaning if I have 10 people every week and % split is 40% sales, 30% operations, and 20% support, the survey should be sent to 4 sales, 3 operations, and 2 support people. Please note that the 10 people and the %s may vary every week because of new hires and resignations.
Thank you!
Sample Data
In the data sheet, ceate a helper column D, where you hand out the numbers to each employee, label it MOD. Use the formula for each employee, enter to cell D2:
=MOD(ROWS(A$2:A2)-1;$H$2)+1
That way each employee is assign a number from 1 to whatever is in the cell H2, e.g. 26. Then contact list all employees with 1 and you have the first batch and so you continue each week to get to employees with 26 in 26 weeks. This way all get the survey but just once.
Of course the share of the individual depts cannot be achieved each time, as there are less employees in some. If you wanted to keeps the shares, some employees of the smaller departments would get the survey more times.
If you want to get some randomness into the order, just mix the order of MOD numbers, e.g. start with 7, continue with 23 etc.
I hope I got the question right, I am not sure in some parts.

Sum product for all the months if 2 tables match

So I have this issue, I have two tables one is employees, and another one is the projects.
Employees Table:
Year Name Type Jan Feb
2018 Kevin Salary 5000 2000
2018 Kevin Insurance 200 400
2018 Alex Salary 3000 4000
2018 Alex Insurance 300 400
Projects Table
Year Project_Name Employee_Name Jan_Hours_Worked Feb_Hours_Worked
2018 Apple Alex 7 5
2018 Apple Kevin 5 0
2018 LG Kevin 0 3
Now I am creating a result list of all the projects and costs recurred for them, what I need is for each project in Table 2 to find which employees are involved and then use that to find related costs for the employee from the Table 1 and calculate total costs for that project.
(e.g for project LG, I have Kevin working on that in Feb,for him company paid 4400(salary+insurance) in Feb and the costs recurred for the LG project would be 4400 divided by hours spent on the project which Kevin in total spent 3 hours; e.g.2 for the project Apple it would be the same but sum of Kevin's and Alex's costs from Jan and Feb, so Kevin: 5200/5 + Alex:3300/7 + 4400/5)
Now I have the formula to calculate this for 1 months which is something like this
=SUMPRODUCT(SUMIFS(Employees[Jan], Employees[Name],Project[Employee_Name], Employees[Year], 2018 )/Project[Jan_Hours_Worked],--(Project[Project_Name]=K14))
I need to find how to get the yearly result per project without repeating the formula 12 times, also with this formula, i get div to 0 error when an employee didn't work on particular months, so that needs to be sorted somehow. Any Help?
I suggest you to change how you store your data. If you can make some minor changes, then you can have an easy way to get the information you want, and also a Pivot Table with a summary of cost recurred for each proyect and which employee generated that cost.
IMPORTANT: For this answer to work, you must make sure that every Employee's Name is UNIQUE. If not, adapt the example trying to create
an Employee's ID or something.
Also, please, note i got a spanish version of Excel, so screenshots are in spanish but I will translate formulas :)
Ok, first of all, I changed the design of your table Employees. Creating a column for each month is kind of annoying. Use just a column to get the month. You can type the month in a cell just like 01/2018 and Excel will change it instantly to format mmm-yy (Jan-18)
This is how your Employees table should look:
The column TOTAL COST is just a sum of SALARY + INSURANCE. If you have any other concept, just add it as a column and modify the TOTAL COST COLUMN to include it.
Second, the table Project, I think it should be like this:
The column Employee Cost has an Array Formula.
IMPORTANT: Array formulas are inserted pressing CTRL+SHIFT+ENTER
The formula is (I used same names for tables, so copy-pasting should work for you):
=INDEX(Employees;MATCH(Project[[#This row];[Employee_Name]]&Project[[#This row];[Month]];Employees[Name]&Employees[Month];0);COLUMN(Employees[[#Headers];[TOTAL COST]]))
If you typed the formula right, you should see { at start and } at end.
The formula in Cost Recurred to Project is just a division of Employee Cost / Hours. Added an IFERROR when the hours worked are 0, then show 0.
=IFERROR(Project[[#This row];[Employee Cost]]/Project[[#This row];[Hours]];0)
And last step, your Pivot Table. Create one and organise it to get the sum per hours and month and proyect you want. You can get one like the one below:
As you can see,e.g. for project Apple, you can see that total cost is 2.391,43
but also you can see the cost of each Employee. Pretty cool I think.
I really hope you can modify the design of your data, because Excel is designed to work going down (I mean using rows) more than using columns. Excel 2007 got more than 1 million of rows and just around 16.000 columns, so it's designed to work vertically.
Hope this helps, or at least, give to you a clue of how to proceed :)

Excel - What is the best way to analyse the following data?

I'm looking for the best way to analyse the following data.
This is the amount in hours an employee has taken in a year and what remains. The data bars are a percent of hours taken from what they're allotted, shown in column K. Col I is how many hours they have left, and col J shows the cumulative holiday they've taken throughout the year.
I need the relevant info shown on one row where each row will be an employee's holiday history. Different employees have different hours and I need a way that shows who is in most need of taking holiday when that filtering from largest to smallest etc.
Where I find this tricky is if an employee had 10 hours allotted, and has taken 2 hours, that's 20%, which is the same if someone had 100 hours and has only taken 20 hours. But it's clearly more important that the second employee uses up some of their leave. I'm struggling with the best way to represent this.

Resources