I would like to know if is is possible to read the next record when we are using SyncSORT (SyncTool) based on a certain condition.
Example of the input
Sort key will be account nbr + descending record type + amount
account nbr amount record type
11111111111 10 reversal not in the output
11111111111 10 deposit not in the output
33333333333 20 deposit in the output
44444444444 15 deposit in the output
55555555555 20 reversal in the output
55555555555 10 deposit in the output
66666666666 30 reversal in the output no match
When a reversal type is read, a deposit should follow with the same amount, in this case it both record the reversal and deposit should not be in the output file. It is possible the amount is not the same for the reversal and the deposit, in this case both records should be in the output file.
output
33333333333 20 deposit
44444444444 15 deposit
55555555555 20 reversal
55555555555 10 deposit
66666666666 30 reversal
Yes. As long as your SyncSORT is up-to-date enough.
You need to use JOINKEYS. Specify the same DSN for both input datasets, and indicate that they are SORTED. There is an undocumented feature which allows the use of JNFnCTNL files, like DFSORT.
In JNF1CNTL (which is a "preprocessor" for the first JOINKEYS dataset) temporarily add a sequence number to each record. The default is that the sequence starts at one. Here it is useful to be explicit...
Because, in JNF2CNTL you want to do the same thing, but start the sequence at zero (START=0).
The key for each of the JOINKEYS is the sequence number.
Use JOIN UNPAIRED,F1. Define a REFORMAT with all the data from the first file, and data for comparison from the second file.
This is what a four-record dataset would look like if you imagine the join:
- - A 0
A 1 B 1
B 2 C 2
C 3 D 3
D 3 - -
Because you specify JOIN UNPAIRED,F1 you won't actually see the mismatched A 0 (because that is on F2) but you will see the mismatched D 3.
If you look at your REFORMAT record, you now have data from the "current" record, and data from the "next" record.
Then there's a little more work to select only the records you want. But, dinner first...
Related
I have a spreadsheet with order item in column A, order quantity in column B, order start date in column C, and order finish date in column D. What I would like to do is treat orders on consecutive start dates for the same item as one single order. So until there is at least one days break between order start dates for an order item, treat it as one single order. Then I need to count the orders, sum the order quantities and calculate the average gap in days between orders (gap between order finish date and the next order start date). So if an order item was ordered on the 1st, 2nd, 3rd and 4th of March, and then again on the 10th and 11th of March, and then again on the 20th March (with all orders having the same start and finish date), there would be 2 gaps, which the average gap being 7.5 days ((6+9)/2). So the input and output will look like this;
Any help would be much appreciated. Many thanks!
Discussion...
The fields I've defined are OrderItem, OrderQty, OrderStartDate, and OrderEndDate, plugging in values identical to those you provided.
The Select tool just forces OrderQty to Int32
MultiRow Formula, creates new Int32 variable Gap using this expression:
IIF(IsNull([Row-1:OrderStartDate]), 1, DateTimeDiff([OrderStartDate], [Row-1:OrderStartDate],"Days"))
First Summary tool:
Group By OrderItem ...
Group By Gap ...
Sum OrderQty to new output field OrdersPerGap
a. Top avenue Summary tool:
Group By OrderItem ...
Sum OrdersPerGroup to output field name OrderQty ...
Count OrderItem to output field name NumOrders
b. Bottom avenue, simple filter as shown Gap > 1 and then another summary:
Group By OrderItem ...
Avg Gap to new output field AvgGap
Join the two strains back together on OrderItem and exclude Right_OrderItem from the output (uncheck its checkbox).
In Alteryx, this provides the output requested. There may be other ways but this is straight-forward without too much going on any step.
I'm creating an excel spreadsheet to track when an item is received as well as when a response to the item having been received has been made (ie: my mail was delivered at 1:00pm (item received) but I didn't check the mail until 5:00pm (response to item having been received)).
I need to track both the date and time of the item being received and want to separate these in two separate columns. At the moment this translates to:
Column A: Date item received
Column B: Time item received
Column L: Date item was responded to having been received
Column M: Time item was responded to having been received
In essence I'm looking to run calculations on the response time between when the item is received and when it has been responded to (ie: average response time, number of responses in less than an hour, and even things like the number of responses that took between 2 and 3 hours where Bob was the person who responded).
The per-line pseudo code would look something like:
(Lr + Mr) - (Ar + Br) ' where L,M,A,B are the columns and 'r' is the row number.
An example, with the following data:
1. A B L M
2. 1/5/19 10:00 1/5/19 12:00
3. 1/5/19 21:00 1/6/19 1:00
4. 1/5/19 22:00 1/5/19 23:00
5. 1/6/19 3:00 1/6/19 4:00
The outcome for the average response time would be 2 hours (average(rows 2-5) = average(2, 4, 1, 1) = 2)
The number of items with an average response times would be as follows:
(<=1 hour) = 2
(>1 & <=2) = 2
(>2 & <=3) = 0
(>3) = 1
I don't know (or can find) a function that will perform this and then let me use it within something like a countifs() or averageifs() function.
While I could do this (fairly easily) in VBA, the practical implementation of this spreadsheet limits me to standard Excel. I suspect that sumproduct() will be fundamental to make this work, but I feel that I need something like a sumsum() function (which doesn't exist) and I'm not familiar with sumproduct() to better understand what to even look for to set something like this up.
If you are not so familiar with SUMPRODUCT() or the likes I would suggest one helper column. Like so:
You can see the formula used is:
=((C2+D2)-(A2+B2))
You can probably do all type of calculations on this helper column. Note, column is formatted hh:mm. However, if you want to look into SUMPRODUCT() you could think about these:
Formula in H2:
=SUMPRODUCT(--(ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)<=1))
Formula in H3:
=SUMPRODUCT((ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)>1)*(ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)<=2))
Formula in H4:
=SUMPRODUCT((ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)>2)*(ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)<3))
Formula in H5:
=SUMPRODUCT(--(ROUND((((A2:A5+B2:B5)-(C2:C5+D2:D5))*-24),2)>3))
The helper column is the easiest approach. It gives you the time differences that you can then easily analyse however you want. Analysis without the helper column is possible, but the approach differs depending on what type of analysis you want to do.
For the example you provided, which is counting the number of time differences grouped into ranges, you would use the FREQUENCY function:
=FREQUENCY(C2:C5+D2:D5-A2:A5-B2:B5,F2:F4)
In F2:F4 (called the "bins"), enter the upper limit of each range you want to count. The Frequency function counts up to and including the first value, then counts from there up to and including the second value, and so on. Enter the bins as times, e.g. 1:00 for 1 hour.
Note that Frequency is an array-entered and an array-returning function. This you means you need to first select the range that will contain all output values, G2:G5 in this example, then enter the function, then press CTRL+SHIFT+ENTER
Also note that Frequency returns an array that is one element larger than the number of bins specified. The extra element is the count of all values greater than the largest bin specified.
I am having tough time building a logic around this problem for a while , Hope some one can help.
I have 3 column of data. Lets call them Customer ID , Call ID , Agent ID
Customer ID and Agent ID can have repetition however Call ID is unique .
Now i have a table with these columns- they are stacked in chronological order based on date or time. Also one customer can call multiple time to multiple agent generating unique caller ID every time.
Here i want to count number of time one customer has called after certain agent ID has received the call. So count or freq function will have to have a rule embedded in chronological function or "Count after certain rule has been met"
Below is the table
CusID CalID Agent
1 123 a
1 22 b
1 112 a
1 222 a
1 54 a
1 334 a
2 221 a
2 312 b
2 334 b
2 129 b
2 986 a
4 98 b
In above table i want to calculate number of observation for customer id '1'after he has called to agent 'b' so the answer will be 4. I have used couple of unique count based no multiple crietria using combination of sumif 1/countif however major problem is counting after certain observation.
can any one help
You can use this formula that will change the range to count to match where b is found the first time.
=COUNTIF(INDEX(A:A,AGGREGATE(15,6,ROW($C$2:$C$13)/(($C$2:$C$13=E3)*($A$2:$A$13=E2)),1)+1):$A$13,E2)
So I have this data in excel right now
A B C
2015-1 Test 1 23
2015-2 Test 1 12
2015-3 Test 1 43
2015-4 Test 1 32
2015-5 Test 1 3
2015-6 Test 1 90
2015-1 Test 2 200
2015-2 Test 2 123
2015-3 Test 2 21
2015-4 Test 2 40
2015-5 Test 2 17
2015-6 Test 2 138
2015-1 Test 3 160
2015-2 Test 3 55
2015-3 Test 3 30
2015-4 Test 3 74
2015-5 Test 3 67
2015-6 Test 3 89
Right now, I have it so that the user can look at the a specific time period, not necessarily all of the dates, of data, (for example, from 2015-1 to 2015-4). So when the user selects the date that they want, I want to take the percentile of the data(column C) at that date across all of the different test scenarios in column B. Right now there is only 3, but there will be up to 100 different test cases.
I know its possible to do =Percentile((test1_data,test2_data,test3_data),1),
but I'm going to have to do the percentile across over 100 difference test cases, and the way I have it set up now seems highly inefficient. Is there a way to do this without having to enter in all of the 100 different arrays by hand?
Based on your table, something along the lines of the following formula should work. (It is an array formula and you should use CTL+SHIFT+ENTER as you enter the formula into the cell to activate the function.)
{=PERCENTILE(
IF(NUMBERVALUE(LEFT($A$1:$A$18,4))<=EndYear,
IF(NUMBERVALUE(LEFT($A$1:$A$18,4))>=BegYear,
IF(NUMBERVALUE(RIGHT($A$1:$A$18,1))<=EndMonth,
IF(NUMBERVALUE(RIGHT($A$1:$A$18,1))>=BegMonth,
$C$1:$C$18)))),1)}
EndYear is a reference to the cell that has the LAST year you want included
BegYear is a reference to the cell that has the FIRST year you want included
EndMonth is a reference to the cell that has the LAST month (or whatever the second unit is) you want included
BegMonth is a reference to the cell that has the FIRST month (or whatever the second unit is) you want included
Just expand the references $A$1:$A$18 and $C$1:$C$18 to include however many test cases you want.
FORMULA EXPLANATION
The first two if statements focus on the year. They take the LEFT() four digits as a string. NUMBERVALUE() then turns strings into values. You can then use the if statement to logically evaluate whether the test dates fall into the desired range of dates.
The second two if statements do precisely the same thing on the last single-digit (month?)
The embedded if statements, will return an array of the associated value from column C if all the statements are true and FALSE if one of the statements is not true.
PERCENTILE() will take the array, ignore the items that returned as FALSE, and provide you with the k-th percentile of the range of values in which all four if statements are true.
*As a note, I don't know the significance of your second digit. If it ever goes above 9, you might need to adjust for your data. In that case you could either replace all the 2015-9 entries with 2015-09 and change the second argument of the RIGHT() function to 2, or you could do something like MID($A$1:$A$18,6,2) or the last digit could just be replaced by however many characters you have after the year argument.
I am reading a file called Expenses.txt...I want to store it in a hashmap with repeated entries of items
The text file contains data on several lines, where each line (a record) consists of two fields: category name (a string), and its value (a number). For example, the file below shows expenses by category.
Input
Expenses.txt
cosmetics 100.00
medicines 120.00
cosmetics 50.00
books 250.00
medicines 80.00
medicines 100.00
program should generate a Summary report showing the sums and averages by category, sorted by category. The summary should be displayed on the console. The program should prompt the user and read in the name of the input file.
For example, for the above data, the summary will be:
output
Category Total Average
books $250 $250.00
medicines $300.00 $100.00
cosmetics $150.00 $75.00
a) The first field is a string and the second field is a floating point number.
b) The number of records for each category may vary. For example, in the above example, there are 2 records for cosmetics, 3 for medicines and 1 for books.
c) The total number of records (lines) may vary. Do not limit them to any fixed number.
d) The records are not in any sorted order.
It really depends on the language you are using, but I would recommend you using some kind of structure of tuple to save in the hashmap. You can read each line, split each of them in two (for the label and the value), and check if the label is already in the hashmap. If it is, just increment by one the number of units, as well as summing the coast.
At the end, just do a hashmap transversal and print all the values needed.