SQL aggregating a text field based on latest date text value

SQL aggregating a text field based on latest date text value - string

I have the following script:
DROP TABLE IF EXISTS [dbo].[test]
CREATE TABLE [dbo].[test]
(
[Name] [varchar](50) NULL,
[Amount] [int] NULL
)
GO
INSERT INTO [dbo].[test]
(
[Name]
,[Amount]
)
VALUES
('Abc - 20 april to 7 june 2020',25)
,('Abc - 20 april to 7 june 2020',33)
,('Abc - 20 april to 29 june 2020',15)
,('Abc - 20 april to 29 june 2020',55)
,('Abc - 20 april to 10 may 2020',20)
,('Abc - 20 april to 10 may 2020',75)
,('Abc - 20 april to 10 may 2020',89)
GO
SELECT *
FROM [dbo].[test]
The resulting table gives the following results:
Name | Amount
-------------------------------|-------
Abc - 20 april to 7 june 2020 | 25
Abc - 20 april to 7 june 2020 | 33
Abc - 20 april to 29 june 2020 | 15
Abc - 20 april to 29 june 2020 | 55
Abc - 20 april to 10 may 2020 | 20
Abc - 20 april to 10 may 2020 | 75
Abc - 20 april to 10 may 2020 | 89
I would like to be able to determine the most recent end date in the text field and eliminate the repetitive data by grouping it showing the record with the latest end date and aggregating the amount. The result should be a single record with the following data:
Name | Amount
-------------------------------|-------
Abc - 20 april to 29 june 2020 | 312
I've played around with some group bys and text manipulation functions and this is as far as I got using the following code:
select (case when [Name] like '%-%'
then trim(left([Name], charindex('-', reverse([Name])) - 1))
else [Name]
end) as [Name]
,sum(Amount) as Amount
from [dbo].[test]
group by
(case when [Name] like '%-%'
then trim(left([Name], charindex('-', reverse([Name])) - 1))
else [Name]
end)
The above code doesn't really do so much as to just aggregate unique records only. I was looking for something more intelligent that will find and recognize that all of my records really mean the same thing and that the latest end date in the Name field is 29 june 2020 and will only display that single record with the total aggregated Amount.
Any help would be much appreciated.

Related

How take from string the words I need?

I have many strings like these.
Roliffe (Day) - Thursday, 15 June 2019
Tadcorp Pk Munangle (Day) - Tuesday, 10 July 2019
Gecester Park (Night) - Friday, 26 June 2019
I need to take names for example Roliffe, Tadcorp Pk Munangle, Gecester Park
And dates 15 June 2019, 10 July 2019, 26 June 2019
How can I make it?

I would use regular expressions like this:
import re
string = """Roliffe (Day) - Thursday, 15 June 2019
Tadcorp Pk Munangle (Day) - Tuesday, 10 July 2019
Gecester Park (Night) - Friday, 26 June 2019"""
places = re.findall(r'([\w ]*) \(.*\)', string)
dates = re.findall(r'\d{2} \w* \d{4}', string)
print(', '.join(places))
print(', '.join(dates))
Output
Roliffe, Tadcorp Pk Munangle, Gecester Park
15 June 2019, 10 July 2019, 26 June 2019

If the data follows the same pattern.
This will not be an efficient one but will work.
s = 'Roliffe (Day) - Thursday, 15 June 2019';
firstSplit = s.split('(');
name = firstSplit[0].trim();
date = firstSplit[1].split(',')[1].trim();

Count rows in multiple columns left to right until specific criteria is met

I have the following table below. I will be referencing a specific number based on other extraneous information. Lets say the specific number is 30. I first need to count 30 numbers down my September list then move to October then November until count has reached 30. Then I need to count all the missing values until the next value would reach the 30th count from the previous task. So for this example the 30th number would be November 19th. The count of the missing should be 55, November 15th (if I counted that right). That value would then be stored in a cell.
I obtained the missed days with the following formula: =IFERROR(SMALL(IF(ISERROR(MATCH(ROW(L$1:INDEX(L:L,N$2)),M$2:INDEX(M:M,COUNT(M:M)+ROW(M$1)),0)),ROW(L$1:INDEX(L:L,N$2))),ROW()-ROW(L$1)),"") (see table 2 for column reference)
The max column value will be blank if there is no data in the month column, therefore the missed column will also have not data. I set that up with the following formula:
=IF(COUNTA(M:M)>1,31,"") (see table 2 for column reference)
Table 1
September max missed October max missed November max missed
1 30 4 1 31 2 2 30 1
2 6 3 6 7 3
3 7 4 7 9 4
5 11 5 8 10 5
8 12 12 9 11 6
9 13 15 10 16 8
10 14 20 11 17 12
15 16 28 13 18 13
22 17 30 14 19 14
23 18 31 16 20 15
24 19 17 22 21
25 20 18 27 23
29 21 19 28 24
26 21 25
27 22 26
28 23 29
30 24 30
25
26
27
29
Table 2
L M N O
(blank) September max missed
I have an idea of how I would write this, but do not know the syntax:
x = Select(Range("G8").Value)
'value that holds specific value (30 for above example)
If x < 31 Then
'30 days in September
y = Count(M2:M32) Until = x
'values in September
z = Count(O2:O32) Until = value of y - 1
'What if the last value is the 30th of September, how would you stop on August 31st?
Range("A1").Value = z
'value of z is stored in cell A1
Elseif x < 62 Then
'61 days in September and October
y2 = Count(M2:M32) & Count(Q2:Q32) Until = x
'Values in September and October
z2 = Count(R2:R32) & (S2:S32) Until =value of -1
'Again, if the last value is the 31st of October how would you stop on September 30th?
Range("A1").Value = z
'Value of z is stored in cell A1
Elseif
'continue for each month (12 times)
End If
There are a couple of things that could cause some problems here with my suggestions (that I just thought of). How would I dictate my starting month? Lets say I wanted to reference a specific cell and that cell contains the number 4. So I would want to start in April, even if I had data in March. Another way of thinking about this is March is in year 2019 and April is in 2018. So then how could I could I get the code to jump from say December back to January? Say column Z is December and column A is January. I wouldn't necessarily want my code to only read left to right. It would need to start in reference to another cell and then jump back to the start if the year changes.
I apologies for the lengthiness, but that's my best effort in explaining. Let me know if you have any questions or if I can provide anyone with more example, pictures, etc.

I think you should reorganize your data table to something like this:
Day Status
01.09.2018 ok
02.09.2018 ok
03.09.2018 ok
04.09.2018 missed
05.09.2018 ok
06.09.2018 missed
07.09.2018 missed
08.09.2018 ok
09.09.2018 ok
10.09.2018 ok
11.09.2018 missed
12.09.2018 missed
13.09.2018 missed
14.09.2018 missed
15.09.2018 ok
16.09.2018 missed
17.09.2018 missed
18.09.2018 missed
19.09.2018 missed
20.09.2018 missed
21.09.2018 missed
22.09.2018 ok
23.09.2018 ok
24.09.2018 ok
25.09.2018 ok
26.09.2018 missed
27.09.2018 missed
28.09.2018 missed
29.09.2018 ok
30.09.2018 missed
01.10.2018 ok
02.10.2018 ok
03.10.2018 ok
04.10.2018 ok
05.10.2018 ok
06.10.2018 ok
07.10.2018 ok
08.10.2018 ok
09.10.2018 ok
10.10.2018 ok
11.10.2018 ok
12.10.2018 ok
13.10.2018 ok
14.10.2018 ok
15.10.2018 ok
16.10.2018 ok
17.10.2018 ok
18.10.2018 ok
19.10.2018 ok
20.10.2018 ok
21.10.2018 ok
22.10.2018 ok
23.10.2018 ok
24.10.2018 ok
25.10.2018 ok
26.10.2018 ok
27.10.2018 ok
28.10.2018 ok
29.10.2018 ok
30.10.2018 ok
31.10.2018 missed
After that, you could easily manage your counts, find anything you want via filtering, specifying date start and so on

Conditional Cumulative Sum based on multiple columns

I have a simple inventory table in excel that looks like this:
Number of Items | Date Incoming | Date Out
-------------------------------------------------------
10 | 1 Jan 2018 | 30 Jan 2018
30 | 15 Jan 2018 | 1 May 2018
20 ! 1 Feb 2018 | 15 Mar 2018
I would like something that can give me the the total number of items that are present in the inventory at each date, that is:
1 Jan 2018 | 10
15 Jan 2018 | 40
30 Jan 2018 | 30
1 Feb 2018 | 50
15 Mar 2018 | 30
1 May 2018 | 0
What I was thing is some sort of cumulative sum where the number of items are added at "Date Incoming" and substracted at "Date Out".
Can you help me? I would prefer to avoid macros but even a vba solution if fine.

For a given date, you can do:
=sumif(#DateIn, "<="&#CellWithGivenDate, #NumberOfItems) - sumif(#DateOut, "<="&#CellWithGivenDate, #NumberOfItems)
With #NumberOfItems, #DateIn, and #DateOut being columns 1 to 3 of your sample, and #CellWithGivenDate being the relevant cell in column 1 of your expected result sample.

Return value based on most recent "completed year"?

I have data that lists a Term Year ("A", "B", "C", ...) and some data.
A term year is a complete calendar year from that includes all 12 months.
I am trying to determine the most recent, complete, term year with a formula. (Not a UDF if possible).
Example data:
Term Month Year Misc. Data
A January 2017 32
A February 2017 35
A March 2017 448
A April 2017 747
A May 2017 656
A June 2017 370
A June 2017 1892
A July 2017 373
A August 2017 387
A August 2017 3
A August 2017 32992
A September 2017 815
A October 2017 479
A November 2017 753
A December 2017 413
B August 2018 544
B September 2018 541
B October 2018 435
B November 2018 17
B December 2018 270
B January 2018 309
B February 2018 488
(Edit: Added data, there will be multiple entries per month.)
So, since Term A is the most recent from today (being 2019) that has all months , I am just looking to have the formula return A.
As for my current attempts, I can't think of how to work an Index/Match formula. I am "afraid" I'll need a UDF, or at least some type of helper column. So far I've gotten just =Index(A2:A20 but can't think of how to build it from there. I have a hunch Aggregate() may be needed but I can't figure how.

IF you only have a single entry per month, and IF the years are sorted ascending as you show, then try:
=LOOKUP(2,1/(COUNTIFS(Table1[Year],Table1[Year])=12),Table1[[#All],[Term]])

Parse CSV file with some dynamic columns

I have a CSV file that I receive once a week that is in the following format:
"Item","Supplier Item","Description","1","2","3","4","5","6","7","8" ...Linefeed
"","","","Past Due","Day 13-OCT-2014","Buffer 14-OCT-2014","Week 20-OCT-2014","Week 27-OCT-2014", ...LineFeed
"Part1","P1","Big Part","0","0","0","100","50", ...LineFeed
"Part4","P4","Red Part","0","0","0","35","40", ...LineFeed
"Part92","P92","White Part","0","0","0","10","20", ...LineFeed
...
An explanation of the data - Row 2 is dynamic data signifying the date parts are due. Row 3 begins the part numbers with description and number of parts due on a particular date. So looking at the above data: row 3 column7 shows that PartNo1 has 100 parts due on the Week of OCT 20 2014 and 50 due on the Week of OCT 27, 2014.
How can I parse this csv to show the data like this:
Item, Supplier Item, Description, Past Due, Due Date Amount Due
Part1 P1 Big Part 0 20 OCT 2014 100
Part1 P1 Big Part 0 27 OCT 2014 50
Part4 P4 Red Part 0 20 OCT 2014 35
Part4 P4 Red Part 0 27 OCT 2014 40
....
Is there a way to manipulate the format in Excel to rearrange the data like I need or what is the best method to resolve this?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

SQL aggregating a text field based on latest date text value - string

Related

How take from string the words I need?

Count rows in multiple columns left to right until specific criteria is met

Conditional Cumulative Sum based on multiple columns

Return value based on most recent "completed year"?

Parse CSV file with some dynamic columns

Categories

Resources