Read a text file and store it in the hashmap - hashmap

I am reading a file called Expenses.txt...I want to store it in a hashmap with repeated entries of items
The text file contains data on several lines, where each line (a record) consists of two fields: category name (a string), and its value (a number). For example, the file below shows expenses by category.
Input
Expenses.txt
cosmetics 100.00
medicines 120.00
cosmetics 50.00
books 250.00
medicines 80.00
medicines 100.00
program should generate a Summary report showing the sums and averages by category, sorted by category. The summary should be displayed on the console. The program should prompt the user and read in the name of the input file.
For example, for the above data, the summary will be:
output
Category Total Average
books $250 $250.00
medicines $300.00 $100.00
cosmetics $150.00 $75.00
a) The first field is a string and the second field is a floating point number.
b) The number of records for each category may vary. For example, in the above example, there are 2 records for cosmetics, 3 for medicines and 1 for books.
c) The total number of records (lines) may vary. Do not limit them to any fixed number.
d) The records are not in any sorted order.

It really depends on the language you are using, but I would recommend you using some kind of structure of tuple to save in the hashmap. You can read each line, split each of them in two (for the label and the value), and check if the label is already in the hashmap. If it is, just increment by one the number of units, as well as summing the coast.
At the end, just do a hashmap transversal and print all the values needed.

Related

Excel - getting a value based on the max value off another row in a Table

I'm looking for a solution for a problem I'm facing in Excel. This is my table simplified:
Every sale has an unique ID, but more people can have contributed to a sale. the column "name" and "share of sales(%)" show how many people have contributed and what their percentage was.
Sale_ID
Name
Share of sales(%)
1
Person A
100
2
Person B
100
3
Person A
30
3
Person C
70
Now I want to add a column to my table that shows the name of the person that has the highest share of sales percentage per Sales_ID. Like this:
Sale_ID
Name
Share of sales(%)
Highest sales
1
Person A
100
Person A
2
Person B
100
Person B
3
Person A
30
Person C
3
Person C
70
Person C
So when multiple people have contributed the new column shows only the one with the highest value.
I hope someone can help me, thanks in advance!
You can try this on cell D2:
=LET(maxSales, MAXIFS(C2:C5,A2:A5,A2:A5),
INDEX(B2:B5, XMATCH(A2:A5&maxSales,A2:A5&C2:C5)))
or just removing the LET since maxSales is used only one time:
=INDEX(B2:B5, XMATCH(A2:A5&MAXIFS(C2:C5,A2:A5,A2:A5),A2:A5&C2:C5))
On cell E2 I provided another solution via MAP/XLOOKUP:
=LET(maxSales, MAXIFS(C2:C5,A2:A5,A2:A5),
MAP(A2:A5, maxSales, LAMBDA(a,b, XLOOKUP(a&b, A2:A5&C2:C5, B2:B5))))
similarly without LET:
=MAP(A2:A5, MAXIFS(C2:C5,A2:A5,A2:A5),
LAMBDA(a,b, XLOOKUP(a&b, A2:A5&C2:C5, B2:B5)))
and here is the output:
Explanation
The trick here is to identify the max share of sales per each group and this can be done via MAXIFS(max_range, criteria_range1, criteria1, [criteria_range2, criteria2], ...). The size and shape of the max_range and criteria_rangeN arguments must be the same.
MAXIFS(C2:C5,A2:A5,A2:A5)
it produces the following output:
maxSales
100
100
70
70
MAXIFS will provide an output of the same size as criteria1, so it returns for each row the corresponding maximum sales for each Sale_ID column value.
It is the array version equivalent to the following formula expanding it down:
MAXIFS($C$2:$C$5,$A$2:$A$5,A2)
INDEX/XMATCH Solution
Having the array with the maximum Shares of sales, we just need to identify the row position via XMATCH to return the corresponding B2:B5 cell via INDEX. We use concatenation (&) to consider more than one criteria to find as part of the XMATCH input arguments.
MAP/XLOOKUP Solution
We use MAP to find for each pair of values (a,b) per row, of the first two MAP input arguments where is the maximum value found for that group and returns the corresponding Name column value. In order to make a lookup based on an additional criteria we use concatenation (&) in XLOOKUP first two input arguments.

How to find row and column names of the n-th highest values in a 2D array in Excel?

I want to find the row and column names of the n-th highest values in a 2D array in Excel.
My array has a header row (the Coins) and a header column (the Markets). The data itself displays if a coin is supported on the market and if so what the approximate return of investment (ROI) will be in percent.
Example
An example of the array could look like this:
ROI
Coin A
Coin B
Coin C
Market 1
N/A
7.8%
5.7%
Market 2
0.4%
6.8%
N/A
Market 3
0.45%
7.6%
12.3%
Pay attention: So some values are also set to N/A (or is there a better way to display that a market doesn't support a specific coin? I don't want to enter 0% as it makes it harder to spot is a coin is supported by the market. I also don't want to leave that field blank because then I don't know if I already checked that market for that coin.)
Preferred output
The output for the example table from above with n=3 should then look like this (from high ROI to low):
Coin
Market
ROI
C
3
12.3%
B
1
7.8%
A
3
0.45%
Requirements
Each coin must only be shown once. So, for example, Coin B must not be listed twice in the Top3 output (once for Market 1: 7.8% and once for Market 3: 7.6%)
What I tried
So I thought about how to split up that problem into smaller parts. I think, it will come to these main parts:
find header/row name
here I found something to find the column name for the highest value per row but I wasn't able to adapt it to a working solution for a 2D array
find max in 2D array
here they describe to find the max value in a 2D array but not how to find the n-th highest values
find n-th highest values
here is a good explanation on how to find the highest n values of a 1D array but not how to apply that for a 2D array
only include each coin once
So I really tried to solve this myself but I struggle with adding these different parts together.

Compare successive rows Alteryx

I have a spreadsheet with order item in column A, order quantity in column B, order start date in column C, and order finish date in column D. What I would like to do is treat orders on consecutive start dates for the same item as one single order. So until there is at least one days break between order start dates for an order item, treat it as one single order. Then I need to count the orders, sum the order quantities and calculate the average gap in days between orders (gap between order finish date and the next order start date). So if an order item was ordered on the 1st, 2nd, 3rd and 4th of March, and then again on the 10th and 11th of March, and then again on the 20th March (with all orders having the same start and finish date), there would be 2 gaps, which the average gap being 7.5 days ((6+9)/2). So the input and output will look like this;
Any help would be much appreciated. Many thanks!
Discussion...
The fields I've defined are OrderItem, OrderQty, OrderStartDate, and OrderEndDate, plugging in values identical to those you provided.
The Select tool just forces OrderQty to Int32
MultiRow Formula, creates new Int32 variable Gap using this expression:
IIF(IsNull([Row-1:OrderStartDate]), 1, DateTimeDiff([OrderStartDate], [Row-1:OrderStartDate],"Days"))
First Summary tool:
Group By OrderItem ...
Group By Gap ...
Sum OrderQty to new output field OrdersPerGap
a. Top avenue Summary tool:
Group By OrderItem ...
Sum OrdersPerGroup to output field name OrderQty ...
Count OrderItem to output field name NumOrders
b. Bottom avenue, simple filter as shown Gap > 1 and then another summary:
Group By OrderItem ...
Avg Gap to new output field AvgGap
Join the two strains back together on OrderItem and exclude Right_OrderItem from the output (uncheck its checkbox).
In Alteryx, this provides the output requested. There may be other ways but this is straight-forward without too much going on any step.

What's the right pattern to reduce the number of rows within one table in Cassandra?

By reducing I mean some form of caching when you can reduce 100 rows with 1 row (accumulate counters etc).
I want to able to answer queries how many people are from %EACH_COUNTRY, so basically return an array of pairs / map from (Country, COUNT). And then I've got huge number (think of 50 * 10^8) of people so I can't allocate 1 row for each person, so I'd like to cache the results somehow to keep PeopleTable under 10^6 entries at least (and then merge the results with the fast read from CacheTable). By caching I mean count the number of people with country=%SPECIFIC_COUNTRY, write %SPECIFIC_COUNTRY, COUNT(*) in CacheTable (to be precise, increment the count for %SPECIFIC_COUNTRY and then remove these rows from PeopleTable):
personId, country
1132312312, Russia
2344333333, the USA
1344111112, France
1133555555, Russia
1132666666, Russia
3334124124, Russia
....
and then
CacheTable
country, count
Russia, 4
France, 1

Test data on next record

I would like to know if is is possible to read the next record when we are using SyncSORT (SyncTool) based on a certain condition.
Example of the input
Sort key will be account nbr + descending record type + amount
account nbr amount record type
11111111111 10 reversal not in the output
11111111111 10 deposit not in the output
33333333333 20 deposit in the output
44444444444 15 deposit in the output
55555555555 20 reversal in the output
55555555555 10 deposit in the output
66666666666 30 reversal in the output no match
When a reversal type is read, a deposit should follow with the same amount, in this case it both record the reversal and deposit should not be in the output file. It is possible the amount is not the same for the reversal and the deposit, in this case both records should be in the output file.
output
33333333333 20 deposit
44444444444 15 deposit
55555555555 20 reversal
55555555555 10 deposit
66666666666 30 reversal
Yes. As long as your SyncSORT is up-to-date enough.
You need to use JOINKEYS. Specify the same DSN for both input datasets, and indicate that they are SORTED. There is an undocumented feature which allows the use of JNFnCTNL files, like DFSORT.
In JNF1CNTL (which is a "preprocessor" for the first JOINKEYS dataset) temporarily add a sequence number to each record. The default is that the sequence starts at one. Here it is useful to be explicit...
Because, in JNF2CNTL you want to do the same thing, but start the sequence at zero (START=0).
The key for each of the JOINKEYS is the sequence number.
Use JOIN UNPAIRED,F1. Define a REFORMAT with all the data from the first file, and data for comparison from the second file.
This is what a four-record dataset would look like if you imagine the join:
- - A 0
A 1 B 1
B 2 C 2
C 3 D 3
D 3 - -
Because you specify JOIN UNPAIRED,F1 you won't actually see the mismatched A 0 (because that is on F2) but you will see the mismatched D 3.
If you look at your REFORMAT record, you now have data from the "current" record, and data from the "next" record.
Then there's a little more work to select only the records you want. But, dinner first...

Resources