We use UOM conversions at this client. We stock in Eaches and sell in Cases. The problem we are having with the Pick ticket is that both the quantity to be picked and the UOM being picked are the stocking unit and not the selling unit.
e.g. The customer orders 73 cases (12 ea per case). The pick ticket prints 876 each. This requires the warehouse person to look up each item determine if there is a Selling UOM and ratio and to then manually convert 876 eaches to 73 cases.
Obviously, the pick ticket should print 73 cases. But I cannot find a way to do this. The items are lotted and an order of 73 case might have 50 cases of Lot A and 23 cases of Lot B. This is represented in the SOShipLineSplit table. The quantities and UOM in this table are based on Stocking units.
Ideally, I could join the INUnits table to both the SOSHipLine and SOShipLineSPlit table. See Below.
Select case when isnull(U.UnitRate,0) = 0 then S.Qty else S.Qty/U.Unitrate end as ShipQty
,case when isnull(U.UnitRate,0) = 0 then s.uom else U.FromUnit end as UOM
from SOShipLineSplit S
inner join SOShipLine SL
ON S.CompanyID = SL.CompanyID and s.ShipmentNbr = SL.ShipmentNbr and S.LineNbr = SL.LineNbr and S.InventoryID = SL.InventoryID
Left Outer Join INUnit U
On S.CompanyID = U.CompanyID and S.InventoryID = U.InventoryID and s.UOm = U.ToUnit and SL.UOM = U.FromUnit
where S.ShipmentNbr = '000161' and S.CompanyId = 4
The problem is the Acumatica Report writer does not support a join with multiple tables.
Left Outer Join INUnit U
On S.CompanyID = U.CompanyID and S.InventoryID = U.InventoryID and s.UOm = U.ToUnit and SL.UOM = U.FromUnit
I believe I must be missing something. This cannot be the only client using Acumatica who utilizes Selling Units of Measure. Is there another table I could use that would contain the quantities and UOM already converted for this order to Selling Units?
Or another solution?
Thanks in advance.

If the goal is to display accurate quantities before/after conversion then INUnit DAC can't be used. It doesn't store historical data, you can change INUnit values after an order has been finalized so re-using it to compute quantities will not yield accurate results.
For that scenario you would need to use the historical data fields with Base prefixes like ShippedQuantity/BaseShippedQuantity. If you require to store more historical data you need to add a custom field to hold these values and update them when shipment is created/modified.
The main issue appears to be a logical error in the requirement:
The problem is that the INUnit table has to be joined to BOTH the
SOShipLine and the SOShipLineSplit tables.
INUnit DAC has a single parent, not 2 so you need to change your requirement to reflect that constraint.
If SOShipLine and SOShipLineSplit values differ then you'll never get any record.
If they are identical then there's no need to join on both since they have the same value.
I suggest to add 2 joins, one for SOShipLine and another for SOShipLineSplit. In the report you can choose which one to display (1st, 2nd or both).
You can also add visibility conditions or IIF formula condition in the report if you want to handle null values error check for display purpose.
Use the Child Alias property in schema builder to join the same table 2 times without name conflicts. In the report formulas (to display field or in formula conditions) use the Child Alias table name too.


Custom partitioning on JDBC in PySpark

I have a huge table in an oracle database that I want to work on in pyspark. But I want to partition it using a custom query, for example imagine there is a column in the table that contains the user's name, and I want to partition the data based on the first letter of the user's name. Or imagine that each record has a date, and I want to partition it based on the month. And because the table is huge, I absolutely need the data for each partition to be fetched directly by its executor and NOT by the master. So can I do that in pyspark?
P.S.: The reason that I need to control the partitioning, is that I need to perform some aggregations on each partition (partitions have meaning, not just to distribute the data) and so I want them to be on the same machine to avoid any shuffles. Is this possible? or am I wrong about something?
I don't care about even or skewed partitioning! I want all the related records (like all the records of a user, or all the records from a city etc.) to be partitioned together, so that they reside on the same machine and I can aggregate them without any shuffling.
It turned out that the spark has a way of controlling the partitioning logic exactly. And that is the predicates option in spark.read.jdbc.
What I came up with eventually is as follows:
(For the sake of the example, imagine that we have the purchase records of a store, and we need to partition it based on userId and productId so that all the records of an entity is kept together on the same machine, and we can perform aggregations on these entities without shuffling)
First, produce the histogram of every column that you want to partition by (count of each value):
Then, use the multifit algorithm to divide the values of each column into n balanced bins (n being the number of partitions that you want).
Then, store these in the database
Then update your query and join on these tables to get the bin numbers for every record:
url = 'jdbc:oracle:thin:username/password#address:port:dbname'
query = ```
df = spark.read\
.option('driver', 'oracle.jdbc.driver.OracleDriver')\
jdbc(url=url, table=query, predicates=predicates)
And finally, generate the predicates. One for each partition, like these:
predicates = [
The predicates are added to the query as WHERE clauses, which means that all the records of the users in partition 1 go to the same machine. Also, all the records of the products in partition 1 go to that same machine as well.
Note that there are no relations between the user and the product here. We don't care which products are in which partition or are sent to which machine.
But since we want to perform some aggregations on both the users and the products (separately), we need to keep all the records of an entity (user or product) together. And using this method, we can achieve that without any shuffles.
Also, note that if there are some users or products whose records don't fit in the workers' memory, then you need to do a sub-partitioning. Meaning that you should first add a new random numeric column to your data (between 0 and some chunk_size like 10000 or something), then do the partitioning based on the combination of that number and the original IDs (like userId). This causes each entity to be split into fixed-sized chunks (i.e., 10000) to ensure it fits in the workers' memory.
And after the aggregations, you need to group your data on the original IDs to aggregate all the chunks together and make each entity whole again.
The shuffle at the end is inevitable because of our memory restriction and the nature of our data, but this is the most efficient way you can achieve the desired results.

SSAS MDX Calculated Measure - COUNT of [ITEMS].[Item] grouped [Items].[Item Group]

In Excel connected to SSAS, I am trying to build a pivot table and add a custom Measure Calculation using "OLAP Tools" and/or "OLAP Pivot Table Exensions". I am trying to add a calculation that is really simple in my mind, but I cannot get it to work. The calc I need is:
GOAL: A record count of the [Items] dimension records grouped by any of the
[Items] dimension fields.
In particular I am trying to group by [Items].[Items Groups] and [Items].[Item]. Item is the lowest grain, so the count should return value "1". I have created a couple calculations that are kind of in the ballpark (see below). But the calcs don't appears to be working as desired.
What I have tried:
Attempt #1 -- [Measures].[Items Count (With net amount values)]
The calc 'Items Count (With net amount values)' appears to be
returning a decent count value, but it appears it only counts the Item
if there are transnational records found (not sure why). Also, when
at the lowest grain level the calc returns that value for the parent
group, not the dimension level selected on the rows.
Attempt #2 -- [Measures].[Items Count (All)]
This calc returns the TOTAL item count for the entire dimension
regardless of the dimension level placed on the rows.
Attempt #3 -- [Measures].[Items Count]
This calc freezes up Excel and I have to quit Excel. No idea why. I have seen this sytnax recommended on a few different sites.
Help please? This seems really simple, but I am not very skilled with MDX. In DAX and SSAS TABULAR this would be very simple expression. But I'm struggling to count the rows with MDX in SSAS MD.
The "Outside Purchased Beef" group has 18 items with transactions, but 41 items in total. I do not know how to calculate the "41" value.
SSAS Excel-CalcMeasure-CountRows.png
Take a look at the following samples on AdventureWorks.
with member [Measures].[CountTest]
count(existing [Product].[Subcategory].members - [Product].[Subcategory].[All])
[Measures].[Internet Sales Amount],[Measures].[CountTest]
on columns,
,[Product].[Subcategory].[Subcategory] -- comment this line for the second result
on rows
from [Adventure Works]
Now comment the indicated line for the parent view.

How to debug "Each GROUP BY expression must contain at least one column that is not an outer reference error"

Since SSRS doesn't allow filters on aggregates, I found some code which helped me come up with the below query. However, when I run it I get:
Each GROUP BY expression must contain at least one column that is not an outer reference
I have searched everywhere but can't find how to fix this. I've even removed the two extra tables from the query so there were no joins at all. I need to not return any order where the total of the lines on the order is less than $500 and greater than 0.
tdsls041_sales_order_lines AS tdsls041_sales_order_lines
(tdsls041_sales_order_lines.company = 610) AND
(tdsls041_sales_order_lines.order_number IN
tdsls041_sales_order_lines AS tdsls041_sales_order_lines_1
(SUM(tdsls041_sales_order_lines.amount) <= 500) OR
SUM(tdsls041_sales_order_lines.amount) > 0))
The issue that SQL Server is complaining about is that the Grouping wants an aggregate function in the SELECT statement. Unfortunately, you want to use IN which you need a list of Order Numbers.
You just need to add an aggregate function to your subquery and then add another layer to select just the Order Numbers from that.
SELECT T1.company, T1.order_number, T1.amount, T1.item, T1.container
FROM tdsls041_sales_order_lines AS T1
WHERE (T1.company = 610) AND (T1.order_number IN
(SELECT order_number FROM
(SELECT TSOL.order_number, SUM(TSOL.amount) AS TTL
FROM tdsls041_sales_order_lines AS TSOL
GROUP BY TSOL.order_number
HAVING (SUM(TSOL.amount) <= 500) OR
SUM(TSOL.amount) > 0) AS T2) )
You can filter on aggreagates in Chart and Tables. You have to put the aggregate filter on your GROUP instead of on the table itself (Group Properties->Filters tab).

Cassandra design approach for my sample use case

Im learning cassandra from past few days. Tried to create a data model for the following use case..
"Each Zipcode in US has a list of stores sorted based on a defined rank"
"Each store/warehouse has millions of SKUs and the inventory is tracked"
"If I search using a zipcode and SKU, it should return the best possible 100 stores
with inventory, based on the rank"
Assume store count is 1000+ and sku count is in millions
Design tried
One table with
primary key (ZipCode, Rank)
Another table with
Primary Key (Sku, Store)
Now, if I want to search top 100 stores for each ZipCode, SKU
I have to search in table 1 for the top 100 stores and
then pull inventory of each store from the second table.
Since the SKU count is in millions and store count is in 1000+, m not
sure if we can store all this in one table and have zipcode_sku as row
key and stores and inventory stored as wide row sorted by rank
Am I thinking right? What could be other possible data models for this use case?
UPDATE: Data Loader Code (as mentioned in below comments)
println "Loading data started.."
(1..1000000).each { // SKUs
sku = it.toString()
(1..42000).each { // Zip Codes
zipcode = it.toString().padLeft(5,"0")
(1..1500).each { // Stores
store = it.toString()
int inventory = Math.abs(new Random().nextInt() % 10000) + 1
session.execute("INSERT INTO ritz.rankedStoreByZipcodeAndSku(sku, zipcode, store, store_rank, inventory) " +
println "Data Loaded"
Cassandra is a Columnar database, so you can have wide rows that you usually want to represent each kind of query you want to make. In this case
CREATE TABLE storeByZipcodeAndSku (
sku text,
zipcode int,
store text,
store_rank int,
inventory int,
PRIMARY KEY ((sku, zipcode), store)
This way the row key is sku + zipcode so its a very fast lookup and you can store up to 2 billion stores in it. When you update your inventory also update this table. To get the top 100 you just pull down all of them and sort (1000's is not many) but if this operation is super common and you need it faster you can instead use
CREATE TABLE rankedStoreByZipcodeAndSku (
PRIMARY KEY ((sku, zipcode), store_rank)
to have it sorted for you automatically and you just grab the top 100. Then when you update it you will want to use the lightweight transactions to move things around atomically.
It sounds like you want to get a list of StoreID's from the first table based on ZipCode, and a list of StoreID's from the second table based on Sku, and then do a join. Since Cassandra is a simple key value store, it doesn't do join's. So you would have to either write code in your client to do the two queries and manually do the join, or connect Cassandra to spark which has a join function.
As you say, trying to denormalize the two tables into one table so that you could do this as one query might result in a very large and difficult to maintain table. If this is the only query pattern you will have, then that might be worth it, but if this is a general inventory system with a lot of different query patterns, then it might be too inflexible.
The other option would be to use an RDBMS instead of Cassandra, and then joins are super easy.

Excel - Power Query 2016

I got data from two tables.
Customers (containing customer ID and the total value of orders/funding
Orders (Containing customer ID and each order)
I created a Power Query, then chose the option to "Merge Queries as New". Selected the matching Columns (Customer ID) and chose the option:Left Outer (All from the first and, matching from second => All from the customer table, matching from the order table). Then I expanded the last column of the Query to include what I wanted from the Order table resulting in the table below on the left. The one on the right is what I'm after. The problem is that funding amounts are already totals per customer. I don't need the value of each order broken down. I still need the orders displayed but I don't need their values (just the total per customer). Is it possible to do it like the one below on the right? Otherwise, the grand total is way off.
I think what you're trying to do is join with only the first instance of each value in your Customer column. There doesn't appear to be any feature or GUI element that allows you to do that (I had a look at the reference documentation for Power Query M, maybe I missed something).
To replicate your data, I'm starting off with some tables (left table is namedCustomers, right table is namedOrders):
I then use the M code below (the first few lines are just to get my tables from the sheet):
customers = Excel.CurrentWorkbook(){[Name = "Customers"]}[Content],
orders = Excel.CurrentWorkbook(){[Name = "Orders"]}[Content],
merged = Table.NestedJoin(orders, {"CUSTOMER"}, customers, {"CUSTOMER"}, "merged", JoinKind.LeftOuter),
indexColumn = Table.AddIndexColumn(merged, "Temporary", 0, 1),
indexes =
uniqueCustomers = Table.Distinct(Table.SelectColumns(indexColumn, {"CUSTOMER"})), // Want to keep as table
listOfRecords = Table.ToRecords(uniqueCustomers),
firstOccurenceIndexes = List.Accumulate(listOfRecords, {}, (listState, currentItem) =>
List.Combine({listState, {Table.PositionOf(indexColumn, currentItem, Occurrence.First, "CUSTOMER")}})
expandSelectively =
toBoolean = Table.TransformColumns(indexColumn, {{"Temporary", each List.Contains(indexes, _), type logical}}),
tableOrNull = Table.AddColumn(toBoolean, "toExpand", each if [Temporary] then [merged] else null),
dropRedundantColumns = Table.RemoveColumns(tableOrNull, {"merged", "Temporary"}),
expand = Table.ExpandTableColumn(dropRedundantColumns, "toExpand", {"FUNDING"})
If your table names and column names match mine (including case sensitivity), then you might just be able to copy-paste all of the M code above into the Advanced Editor and have it work for you. Otherwise, you may need to tweak as necessary.
This is what I get when I load the query to the worksheet.
There might be better (more efficient) ways of doing this, but this is what I have for now.
If you're not using the order ID column, then I would suggest doing a Group By on the OrderTable before merging in the funding so that you'd end up with a table like this instead:
Region Customer OrderCount Funding
South A 3 2394
South B 2 4323
South C 1 1234
South D 2 3423
This way you don't have mixed levels of granularity that cause problems like you are seeing with the totals.
