Rust DataFusion - Float Operations in dataframe select - rust

I am calculating fraction of orders that were not filled from a large list of order lines. I am using datafusion crate to perform analysis. I want to build a table that looks as shown below:
+--------+--------------+---------------+--------------+
| Month | Total Orders | Missed Orders | Missed Ratio |
+--------+--------------+---------------+--------------+
| 201803 | 10 | 3 | 0.3 |
+--------+--------------+---------------+--------------+
To achieve this I have return following code:
let result = record_count
.select(vec![col("Month"),
col("Total Orders"),
col("Missed Orders"),
(col("Missed Orders").cast_to(&DataType::Float64, &m_order_schema).unwrap() / col("Total Orders").cast_to(&DataType::Float64, &t_order_schema).unwrap()).alias("Service Level")])?;
The total orders and missed orders column as integers so, I am casting them to float to get fraction. But, Service Level column comes out as integer with all zeros. Result looks as shown below:
+--------+--------------+---------------+--------------+
| Month | Total Orders | Missed Orders | Missed Ratio |
+--------+--------------+---------------+--------------+
| 201803 | 10 | 3 | 0 |
+--------+--------------+---------------+--------------+
Question: How to perform float operations with integer columns?

I don't think many people are monitoring stack overflow for DataFusion issues and you might get a quicker response by filing an issue at https://github.com/apache/arrow-datafusion/issues

Related

Visualise percentage of sales over time Power Bi for each salesman

I have a dataset that looks like.
+------------+-------+----------+--+
| Date | Sales | Salesmen | |
+------------+-------+----------+--+
| 12/31/1999 | 100 | P1 | |
| 12/31/1999 | 100 | P2 | |
| 12/31/1999 | 300 | P3 | |
| 01/31/2000 | 500 | P2 | |
| ... | | | |
| 07/31/2020 | 500 | p3 | |
+------------+-------+----------+--+
But I want to visualise this as a line chart (with multiple lines for each salesmen), as a percentage of the total sales per salesman per year and visualise it over the full 20 year period(1999-2020)yearly.
Power BI for visualising lines has 3 fields:
Axis , Legend and Values.
I placed date in axis, Salesmen in legend and for values I created a measure of the Sales(displayed "As a % of Grand total" option), however it doesn't return what I would like it to.
Measure = CALCULATE(SUM('Table1'[Sales]),FILTER('Table1','Table1'[Date].[Year]=2020))
However this only returns percentage values of the sum total for 1 year.
and the visualisation outputs a table that looks like.
So I used another measure
Measure = SUMX(CALCULATETABLE('Table1','Table1'[Date].[Year]), 'Table1'[Sales])
however this reverts the same value for each year.
However this produces a table that looks like:
I need it as a sum total of sales per year split as percentage among the 3 salesmen, then visualised.
So I know there's something I'm missing in the measure, any suggestions?
Follow these following step-
Step-1: Create a measure to calculate total sales as below-
total_sales = SUM(Table1[sales])
Step-2: Create a measure that will calculate SUM of amount per year as below-
current_year_sales =
var current_row_year = MIN(Table1[Date].[Year])
return
CALCULATE(
SUM(Table1[sales]),
FILTER(
ALL(Table1),
Table1[Date].[Year] = current_row_year
)
)
Step-3: Now create the final measure as below. Convert this measure type to %
year_wise_percentage = [total_sales]/[current_year_sales]
Spet-4: Configure your line chart as below-
Axis: Year
Legend: salesmen
Values: year_wise_percentage
You should have your expected output now.

EXCEL: SUMIFS criterion applied to a INDEX MATCH search equals a value

I've spent pretty much all day trying to figure this out. I've read so many threads on here and on various other sites. This is what I'm trying to do:
I've got the total sales output. It's large and the number of items on it varies depending on the time frame it's looked at. There is a major lack in the system where I cannot get the figures by region. That information is not stored in the system. The records only store the customer's name, the product information, number of units, price, and purchase date. I want to get the total number of each item sold by region so that I can compare item popularity across regions.
There are only about 50 customers, so it is feasible for me to create a separate sheet assigning a region to the customers.
So, I have three sheets:
Sheet 1: Sales
+-----------------------------------------------------+
|Customer Name | Product | Amount | Price | Date |
-------------------------------------------------------
| Joe's Fish | RT-01 | 7 | 5.45 | 2020/5/20 |
-------------------------------------------------------
| Joe's Fish | CB-23 | 17 | 0.55 | 2020/5/20 |
-------------------------------------------------------
| Mack's Bugs | RT-01 | 4 | 4.45 | 2020/4/20 |
-------------------------------------------------------
| Joe's Fish | VX-28 | 1 | 1.20 | 2020/5/13 |
-------------------------------------------------------
| Karen's \/ | RT-01 | 9 | 3.45 | 2020/3/20 |
+-----------------------------------------------------+
Sheet 2: Regions
+----------------------+
| Customer | Region |
------------------------
| Joe's Fish | NA |
------------------------
| Mack's Bugs | NA |
------------------------
| Karen's \/ | EU |
+----------------------+
And my results are going in Sheet 3:
+----------------------+
| | NA | EU |
------------------------
| RT-01 | 11 | 9 |
+----------------------+
So looking at the data I made up for this question, I want to compare the number of RW-01's sold in North America to those sold in Europe. I can do it if I add an INDEX MATCH column to the end of the sales sheet, but I would have to do that every time I update the sales information.
Is there some way to do a SUMIFS like:
SUMIFS(Sheet1!$D:$D,Sheet1!$A:$A,INDEX(Sheet2!$B:$B,MATCH(Sheet1!#Current A#,Sheet2!$A:$A))=Sheet3!$B2,Sheet1!$B:$B,Sheet3!$A3)
?
I think it's difficult to do it with a SUMIFS because the columns you're matching have to be ranges, but you can certainly do it with a SUMPRODUCT and COUNTIFS:
=SUMPRODUCT(Sheet1!$C$2:$C$10*(Sheet1!$B$2:$B$10=$A2)*COUNTIFS(Sheet2!$A$2:$A$5,Sheet1!$A$2:$A$10,Sheet2!$B$2:$B$5,B$1))
I don't recommend using full-column references because it could be slow.
BTW I was assuming that there were no duplicates in Sheet2 for a particular combination of customer and region - if there were, you could use
=SUMPRODUCT(Sheet1!$C$2:$C$10*(Sheet1!$B$2:$B$10=$A2)*
(COUNTIFS(Sheet2!$A$2:$A$5,Sheet1!$A$2:$A$10,Sheet2!$B$2:$B$5,B$1)>0))
EDIT
It is worth using a dynamic version of the formula, though it is not elegant:
=SUM(Sheet1!$C2:INDEX(Sheet1!$C:$C,MATCH(2,1/(Sheet1!$C:$C<>"")))*(Sheet1!$B2:INDEX(Sheet1!$B:$B,MATCH(2,1/(Sheet1!$B:$B<>"")))=$A2)*
(COUNTIFS(Sheet2!$A$2:INDEX(Sheet2!$A:$A,MATCH(2,1/(Sheet2!$A:$A<>""))),Sheet1!$A2:INDEX(Sheet1!$A:$A,MATCH(2,1/(Sheet1!$A:$A<>""))),Sheet2!$B$2:INDEX(Sheet2!$B:$B,MATCH(2,1/(Sheet2!$B:$B<>""))),B$1)>0))
As you would need to make the match in memory I don't think it's feasible in Excel, you'll have to use a vba dictionary.
On the other hand, if the number of columns is fixed in your sales sheet, you can just format as table and add your index match in F.
When updating the sales data delete all lines as of line 3 and copy paste the update value. Excel will automatically apply the index match on all rows.

Creating an MDX measure in Excel to count members of one dimension based on another dimension

I am trying to create an MDX measure in Excel (in OLAP Tools) that will count how many members there are for every other item in another dimension. As I don't know the exact syntax and notation for MDX and OLAP cubes I will try to simply explain what I want to do:
I have a pivot table based on an OLAP Cube. I have a Machine Number field stored in one dimension, that is the "parent" and for every machine number there is a number of articles that were produced (in certain period of time). Those articles are represented by Order Numbers. Those numbers are stored in another dimension. I would like the measure to count how many order numbers there are for every machine number.
So the table looks like this:
+------------------+----------------+
| [Machine Number] | [Order Number] |
+------------------+----------------+
| Machine001 | |
| | 111111111 |
| | 222222222 |
| | 333333333 |
| Machine002 | |
| | 444444444 |
| | 555555555 |
| | 666666666 |
| | 777777777 |
+------------------+----------------+
and I would like the result to be:
+------------------+----------------+------------+
| [Machine Number] | [Order Number] | [Measure1] |
+------------------+----------------+------------+
| Machine001 | | 3 |
| | 111111111 | |
| | 222222222 | |
| | 333333333 | |
| Machine002 | | 4 |
| | 444444444 | |
| | 555555555 | |
| | 666666666 | |
| | 777777777 | |
+------------------+----------------+------------+
I've tried using the COUNT function with EXISTING as well, but it wouldn't work (always showing 1, or the same wrong number for every machine). I believe that I have to somehow connect those two dimensions together so the Order Number is dependent to Machine Number, but lacking the knowledge about MDX and OLAP Cubes I don't even know how to ask Google how to do that.
Thanks in advance for any tips and solutions.
Your problem basicly is, you have two attributes in diffrent dimensions. You want to retrive the valid combinations of these attribute, further you want to count the number of attribute values avaliable in the sceond attribute based on the value of the first attribute.
Based on the above problem statement, in an OLAP cube a fact table or a Measure defines the relations between attributes of diffrent dimension linked to the Measure\Fact-Table. Take a look at the example below.(I have used the SSAS sample db Adventureworks)
--Iam trying to find the promotions that were offered for each product category.
select
[Measures].[Internet Sales Amount]
on columns,
([Product].[Category].[Category],[Promotion].[Promotion].[Promotion])
on rows
from
[Adventure Works]
Result
The result is cross-product of all the product categories and the promotions. Now lets make the cube return the valid combinations only.
select
[Measures].[Internet Sales Amount]
on columns,
nonempty(
([Product].[Category].[Category],[Promotion].[Promotion].[Promotion])
,[Measures].[Internet Sales Amount])
on rows
from
[Adventure Works]
Result
Now we indicated that it needs to return only valid combinations. Note that we provided a measure that belonged to the fact connecting the two dimensions. Now lets count them
with member
[Measures].[test]
as
count(
nonempty(([Product].[Category].currentmember,[Promotion].[Promotion].[Promotion]),[Measures].[Internet Sales Amount])
)
select
[Measures].[Test]
on columns,
[Product].[Category].[Category]
on rows
from
[Adventure Works]
Result
Alternate query
with member
[Measures].[test]
as
{nonempty(([Product].[Category].currentmember,[Promotion].[Promotion].[Promotion]),[Measures].[Internet Sales Amount]) }.count
select
[Measures].[Test]
on columns,
[Product].[Category].[Category]
on rows
from
[Adventure Works]

Excel: How to sum a column to date from a specified date where date ranges are involved

I'm sure i have seen this done with a short formula but i am struggling to remember how to do it.
I am trying to find where a date falls in an interval and sum another column, from that point in the interval, to the end of all specified date intervals.
So, in the image below the intervals are in columns D and E and the date to find is in cell F1 i.e. 12/12/2016.
I want to find where 12/12/2016 falls within the ranges and sum column F accordingly i.e. 12/12/2016 - 13/12/2016 (2 days) and then all intervals after will be 2 + 6+2+1+3 = 14. I am returning this result in cell J14.
I have used the idea of histograms to calculate this currently, but the formula is large and unwieldy, and i just know i have seen a similar question, somewhere on SO but can't find it, that does this with SUMPRODUCT and OFFSET only. I guess FREQUENCY could also be used.
So what i have currently is:
=SUMPRODUCT(OFFSET(F6,MATCH(TRUE,OFFSET(E6,0,0,COUNT(E6:E1048576),)>F1,0)-1,0,COUNTA(F6:F1048576),))-(F1-OFFSET(D6,MATCH(TRUE,OFFSET(E6,0,0,COUNT(E6:E1048576),)>F1,0)-1,,,))
Where, if i broke it down into stages, i find which bucket (range) has the target value:
={MATCH(TRUE,OFFSET(E6,0,0,COUNT(E6:E1048576),)>F1,0)}
I calculate the distance into the range with:
=F1-OFFSET(D6,H13-1,,,)
And i sum from this point until the end of the range with:
=SUMPRODUCT(OFFSET(F6,H13-1,0,COUNTA(F6:F1048576),))-I13
So, can anyone help me with a shorter more efficient formula please?
Data:
| Start of measurement | 12/12/2016 | |
|----------------------|------------|----------------|
| | | |
| | | |
| | | |
| From | To | Number of days |
| 13/11/2016 | 17/11/2016 | 5 |
| 10/12/2016 | 13/12/2016 | 4 |
| 03/02/2017 | 08/02/2017 | 6 |
| 06/12/2017 | 07/12/2017 | 2 |
| 09/12/2017 | 09/12/2017 | 1 |
| 12/12/2017 | 14/12/2017 | 3 |
This should work for you:
=SUMIF(E6:E11,">="&F1,F6:F11)-F1+INDEX(D6:D11,MATCH(F1,D6:D11))
How about this?
=SUMIFS(F6:F11,D6:D11,">="&F1)+MINIFS(E6:E11,E6:E11,">="&F1)-F1+1
The SUMIFS() gives the 6+2+1+3 part and the MINIFS() - F1 + 1 gives the 2 part.
Note that the ___IFS() functions are more recent and not available in older versions of Excel.

CSV file always picks the lowest price

Hi I am using Magento and an extension called matrix rates, they use .CSV files to store the data for the post codes saying what post code has what delivery charge. The problem I have got is that it only picks the smallest delivery charge for every single postcode when it should look at the rate for the post code first and then if there isn't one pick the flat rate. I was just wondering if anybody has any ideas.
My excel sheet is like this>>
Country | Region/State | City | Zip/Postal Code From |Zip/Postal Code To | Weight From | Weight To | Shipping Price | Delivery Type
GBR | * | - | - | 0 | 9999 | 9999 | Websales Shipping (WS01)
GBR | * | - | - | AB37% | 0 | 9999 | 65 | Websales Shipping (WS01)
Without more details, your pseudo-code (or, pseudo-formula) could look like this:
=if(
vlookup([postCode], [PostCodeShipRateTable], [rateColumn]) >0,
[postcodeRate],
[flatRate]
)
[flatRate] could also be a vlookup to another rate table if nec.

Resources