Select last row from csv in Azure Data Factory

Select last row from csv in Azure Data Factory - azure

I'm pulling in a small ( less than 100kb ) dataset as csv. All I want to do is select the last row of that data and sink it into a different location.
I cannot seem to find a simple way to do this.
I have tried a wrangling data flow, but the "keep rows" M function is not supported - though you can select it, it just results in an error. That's annoying because it does exactly what I need in one fell swoop.
I sort of get it working using a last() function on each field, but that is a lot of messing around and it's slow.
Surely there is a better way to do this simple task?
Would greatly appreciate any assistance.
Thanks

Mapping Data Flows: Surrogate Key, Aggregate (max), Filter (max row)

Related

Assigning indexes across rows within python DataFrame

I'm currently trying to assign a unique indexer across rows, rather than alongside columns. The main problem is these values can never repeat, and must be preserved with every monthly report that I run.
I've thought about merging the columns and assigning an indexer to that, but my concern is that I won't be able to easily modify the dataframe and still preserve the same index values for each cell with this method.
I'm expecting my df to look something like this below:
Sample DataFrame
I haven't yet found a viable solution so haven't got any code to show yet. Any solutions would be much appreciated. Thank you.

Building timeline using Excel Table?

I'm trying to build a kind of timeline based on a table in Excel. The table has several columns that represent a due date. Column A is considered the key/identifier, Column B is a company name, Columns C through H are the task due dates.
My goal is to hopefully find a way to setup a second type of table that will automatically set the items in order of what key is due when. I've included an img of the table and what I'm hoping the end result would be. I haven't been able to find anything that does this. I was thinking maybe a pivot table but it's not doing what I want.
I'm not even sure if this is possible or not but any help or push in the right direction would be GREATLY appreciated!
Thanks!!
-Deke

You can easily do this with Power Query. Just load the table, sort on date, and select which columns you want

Tableau Changing Data When a Connection is Made

I'm trying to analyze a set of data across 3 excel tables in Tableau.
When I just use 1 table (transactions) the data is correct and works normally, however when I make a connection, it seems like Tableau takes the originally correct data and deletes/duplicated entries seemingly at random (I've looked for patterns to see what the problem is, can't find them).
Here are some screenshots to illustrate the problem. I'm using inner joins. The tables I'm joining should have no impact on the data that I'm currently talking about.
This is a large data set (> 400,000 rows)
Correct Data:
From Excel, ordered by date
Tableau's Data Set (notice the missing entries (PK field) and incorrect order despite order by date)
Looking at PK and Distribution Amount
This is my first time using stackexchange so I apologize if I breached any kind of etiquette requirements! Any suggestions as to how to correct the data set tableau is detecting would be extremely helpful!
EDIT: Also, going deeper into the bad data it seems as though even when sorted chronologically the data in tableau is not in the correct order

Can you nest Excel data tables?

I have an Excel workbook that utilises a data table (A).
I now want to create another data table (B) that effectively sits on top of the other data table. That is, each "iteration" of B calls A.
This approach fails although I cannot find any documentation about data tables that indicates that this would not work.
Basically I'd like to know if anyone has tried this before and whether I am missing something?
Is there a workaround? Do you know of any documentation that spells out whether and why this is not supported?

No.
I tried this at length some years ago in both xl03 and xl07 and my conclusion was that it can't be done - each data table seems to be an independent one-off run, they don't talk if you try to link them
I couldn't find any documentation on this issues either on the process, or for anyone else looking at a similar problem.

I want to share my experience using the data tables.
We have found a workaround for this problematic.
If you have two variables A & B that need to run into a datatable and get one or multiple result.
What we've done is :
Set any combinaison (binari combinaison) for A & B and put an id for each of this combinaison (A=0 & B=0 => id=1)
So you will then run one data table with a length of A*B.
The default here is the length to calculate those data (7min with 25 data table & 2 data table with a length of 8000 rows).
Hope it help !

What's a better counting algorithm for Azure Table Storage log data?

I'm using Windows Azure and venturing into Azure Table Storage for the first time in order to make my application scalable to high density traffic loads.
My goal is simple, log every incoming request against a set of parameters and for reporting count or sum the data from the log. In this I have come up with 2 options and I'd like to know what more experienced people think is the better option.
Option 1: Use Boolean Values and Count the "True" rows
Because each row is written once and never updated, store each count parameter as a bool and in the summation thread, pull the rows in a query and perform a count against each set of true values to get the totals for each parameter.
This would save space if there are a lot of parameters because I imagine Azure Tables store bool as a single bit value.
Option 2: Use Int Values and Sum the rows
Each row is written as above, but instead each parameter column is added as a value of 0 or 1. Summation would occur by querying all of the rows and using a Sum operation for each column. This would be quicker because Summation could happen in a single query, but am I losing something in storing 32 bit integers for a Boolean value?
I think at this point for query speed, Option 2 is best, but I want to ask out loud to get opinions on the storage and retrieval aspect because I don't know Azure Tables that well (and I'm hoping this helps other people down the road).

Table storage doesn't do aggregation server-side, so for both options, you'd end up pulling all the rows (with all their properties) locally and counting/summing. That makes them both equally terrible for performance. :-)
I think you're better off keeping a running total, instead of re-summing everything everytime. We talked about a few patterns for that on Cloud Cover Episode 43: http://channel9.msdn.com/Shows/Cloud+Cover/Cloud-Cover-Episode-43-Scalable-Counters-with-Windows-Azure

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Select last row from csv in Azure Data Factory - azure

Mapping Data Flows: Surrogate Key, Aggregate (max), Filter (max row)

Related

Assigning indexes across rows within python DataFrame

Building timeline using Excel Table?

Tableau Changing Data When a Connection is Made

Can you nest Excel data tables?

What's a better counting algorithm for Azure Table Storage log data?

Categories

Resources