I have a column of data [Sales ID] that bringing in duplicate data for an analysis. My goal is to try and limit the data to pull unique sales ID's for the max day of every month in the analysis only (instead of daily). Im basically trying to get it to only pull in unique sales ID values for the last the day of every month in the analysis ,and if the current day is the last day so far then it should pull that in. So it should pull in the MAX date in any given month. Please how do i write an expresion with the [Sales ID] column and [Date ] column to acieve this?
Probably the two easiest options are to
1) Adjust the SQL as niko mentioned
2) Limit the visualization with the "Limit Data Using Expression" option, using the following:
Rank(Day([DATE]), "desc", Month([DATE]), Year([DATE])) = 1
If you had to do it in the Data on Demand section (maybe the IL itself is a usp or you don't have permission to edit it), my preference would be to create another data table that only has the max dates for each month, and then filter your first data table by that.
However, if you really need to do it in the Data on Demand section, then I'm guessing you don't have the ability to create your own information links. This would mean you can't key off additional data tables, and you're probably going to have to get creative.
Constraints of creativity include needing to know the "rules" of your data -- are you pulling the data in daily? Once a week? Do you have today's data, or today - 2? You could probably write a python script to grab the last day of every month for the last 10 years, and then whatever yesterday's date was, and throw all those values into a document property. This would allow you to do a "Values from Property".
(Side Note: I want to say you could also do it directly in the expression portion with something like an extremely long
Date(DateTimeNow()),DateAdd("dd",-1,Date(Year(DateTimeNow()), Month(DateTimeNow()), 1))
But Spotfire is refusing to accept that as multiple values. Interestingly, when I pull the logic for a StringList property, it gives this: $map("${udDates}", ","), which suggests commas are an accurate methodology, but I get an error reading "Expected 'End of expression' but found ','" . Uncertain if this is a Spotfire issue, or related to my database connection)
tl;dr -- Doing it in the Data on Demand section is probably convoluted. Recommend adjusting in SQL if possible, and otherwise limiting in the visualization
Related
I have been working on this for several days now and am having no luck in figuring it out, but feel like I am very close.
In Excel, I am trying to create a reverse running total for a budget by referencing a cell which can be modified or updated.
The only way I have been able to make this work so far is by hard-wiring the starting amount in manually. this is what I want, except every time we receive money I have to change the starting amount manually again.
= Excel.CurrentWorkbook(){[Name="Q4Rec"]}[Content][QT4_Received]{0}
I used this to break down one of my input tables into queries that result in singular numeric values. (The table sums all of the revenue received into annual and quarter) How do I how to reference the new single-quantity queries in my previously built queries to conduct the transformation. I keep getting null values.
here's my input and my desidered output. Problem Is: I have a columns for product, price, and date and I want to build a cross table where I can see the price of the day before for every product.
How can I calculate column d-1 (day - 1) in my data table? Should I use Intersect/Over functions? What If I want to calculate my d-1 column on-the-fly in a cross table?
You have some options here, depending on how you want to display everything.
If all you want is an overall snapshot of how it changed between Today and Yesterday, probably the simplest thing to do is the following:
Avg(If([DATE]=Date(DateTimeNow()),[Price],NULL)) as [Today's Price],
Avg(If([DATE]=Date(DateAdd("day",-1,DateTimeNow())),[Price],NULL)) as [Yesterday's Price],
Avg(If([DATE]=Date(DateTimeNow()),[Price],NULL)) - Avg(If([DATE]=Date(DateAdd("day",-1,DateTimeNow())),[Price],NULL)) as [d-1]
This can be very useful for a dashboard or summary, and if you replace the "DateTimeNow" component with a Document Property controlled by an Input field, this can quickly show your user the change on a specific day.
The Over functions could still be quite useful, especially if you're using a date hierarchy of some form.
It will update on the fly to accommodate the columns grouped across the top, so if you're displaying by date, you can see how it's been changing each day, and if you're displaying by month, you can see it that way instead.
Avg([Price]) as [Avg Price],
Avg([Price]) - Avg([Price]) OVER (PreviousPeriod([Axis.Columns])) as [Avg Price Change]
I've a query regarding some tweaking my Hive query in the requirement defined below; couldn't get my head around on this.
Case: The data gets generated only on business days i.e., weekdays & non-holidays dates. This data I load in Hive. The source & target, both are HDFS.
Stringent process: The data should be replicated for every day. So, for Saturday & Sunday, I'll copy the same data of Friday. Same is the case for public holidays.
Current process: As of now I'm executing it manually to load weekends' data.
Requirement: I need to automate this in the query itself.
Any suggestions? A solution in spark for the same is also welcome if feasible.
Though clear what the issue is, it is unclear when you say " in the query itself".
Two options
When querying results, look for data using a scalar sub query (using Impala) that looks first for the max date relative to a given select date i.e. max less than or dqual to given seldct date; thus no replication.
Otherwise use scheduling and when scheduled a) check date for weekend via Linux or via SQL b) maintain a table of holiday dates and check for existence. If either or both of the conditions are true, then copy from the existing data as per bullet 1 whereby select date is today, else do your regular processing.
Note you may need to assume that you are running processing to catch up due to some error. Implies some control logic but is more robust.
Apologies if this has been asked before. I would be surprised if it hasn't but I am just not hitting the correct syntax to search and get the answer.
I have a table of raw data for my staff, it contains data on the name of the employee who completed a job and the start and finish times, among other things. I have no unique ID's other than name, and I cant change that as I'm part of a large organisation and I have to make do with the data I'm given.
what I would like to do it present a table (Table 2) that shows the name of the employee and then takes the start/finish times for all of their jobs on table 1 and presents the average time taken across all of their jobs.
I have used Vlookup in the past but I'm not sure it will cut it here. the raw data table contains approx 6000 jobs each month.
On table 1 i work out the time taken for each job with this formula;
=IF(V6>R6,V6-R6,24-R6+V6) (R= started Time) (V= Completed Time) in 24hr clock.
I have gone this route as some jobs are started before midnight and completed afterwards. Although my raw data also contains dates (started/completed) in separate columns so I am open to an experts feedback on this and if there is a better way to work out the total time form start to completion.
I believe the easiest way to tackle this would be with a Pivot Table. Calculate the time taken for each Name and Job combination in Table 1; create a pivot table with the Name in the Row Labels and the Time in the Values -- change the Time Values to be an average instead of a sum:
Alternatively, you could create a unique list of names, perhaps with Data > Remove Duplicates and then use an =AVERAGEIF formula:
Thanks this give me the thread to pull on, I have unique names as its the persons full name, but ill try pivot tables to hopefully make it a little more future proof for other things to be reports on later.
In the past I have created websites that extract data from a database and format it using tables.
Now, I am trying to do the same thing but with Excel, and I'm lost. I am used to using SQL commands to extract data from given fields and then sort/manipulate it.
Currently, I am able to print a report that provides me with an Excel spreadsheet full of raw data, but I would like to make my life easier and organize it into a report.
The column that I would like to reference contains duplicates, but the data in the adjacent columns is different.
To give an example, assume I had a spreadsheet of sales transactions. One column would be the Customer ID, and the adjacent columns would contain the quantity, the cost per unit, total cost, order ID, etc.
What I would want to do in this case would be to select all the transactions with the same Customer ID and add them together based on their Order ID. Then, I would want to print the result to a second sheet.
I realize that I can use built-in functions to accomplish this, but I would also like to format this report evenually using VBA. Also, since I will have a variable number of rows that differ from one report to the next, I haven't encountered a fucnction that will allow you to add rows.
I'm assuming this must be done with VBA.
Well you can do it manually, but it would take ages. So VBA would be good, particularly as you would be able to generate future reports quickly.
My interpretation of what your saying is that each row in your report will be the total for one customer ID. If it's something else, I imagine the below will still be mostly relevant.
I think it would be a bit much to give you the full answer, particularly as you haven't provided full detail but to take a stab at what you'd do:
Create your empty report page, whether it be a new worksheet or a new workbook
Loop through the table (probably using While next is not empty)
a. Identifying if a row is for a customer ID you haven't covered yet
i. If so then add a new entry in your report
ii. Else add it to the existing customer ID record (loop through until you find it)
Format your report so it looks pretty, e.g:
a. Fill the background in white
b. Throw in some filled bars
c. Put in good titles and totals etc.
For part 1, it might be better building an array first and then dumping the contents into the report. It depends how process intensive it will be - if very intense, an array should shave off time.