Parse CSV file with some dynamic columns - excel

I have a CSV file that I receive once a week that is in the following format:
"Item","Supplier Item","Description","1","2","3","4","5","6","7","8" ...Linefeed
"","","","Past Due","Day 13-OCT-2014","Buffer 14-OCT-2014","Week 20-OCT-2014","Week 27-OCT-2014", ...LineFeed
"Part1","P1","Big Part","0","0","0","100","50", ...LineFeed
"Part4","P4","Red Part","0","0","0","35","40", ...LineFeed
"Part92","P92","White Part","0","0","0","10","20", ...LineFeed
...
An explanation of the data - Row 2 is dynamic data signifying the date parts are due. Row 3 begins the part numbers with description and number of parts due on a particular date. So looking at the above data: row 3 column7 shows that PartNo1 has 100 parts due on the Week of OCT 20 2014 and 50 due on the Week of OCT 27, 2014.
How can I parse this csv to show the data like this:
Item, Supplier Item, Description, Past Due, Due Date Amount Due
Part1 P1 Big Part 0 20 OCT 2014 100
Part1 P1 Big Part 0 27 OCT 2014 50
Part4 P4 Red Part 0 20 OCT 2014 35
Part4 P4 Red Part 0 27 OCT 2014 40
....
Is there a way to manipulate the format in Excel to rearrange the data like I need or what is the best method to resolve this?

Related

Point in time calculation #2

Incident number
Received date
Closed Date
Time taken to close
111
01 Jan 2021
01 Feb 2021
31
222
01 Jan 2021
07 Feb 2021
37
333
01 Jan 2021
444
01 Jan 2021
I wanted to calculate the average number of days an incidents have been open at a point in time. So using the example above lets say at the end of Feb 2021 you would look at
Received date has to be less then the metric date (the metric date in this case being Feb 2021)
Closed date has to be either greater then metric date or empty (if the closed date is empty then the calculation for time taken to close would be from the received date to the metric date)
Using the example above the first two incidents would not been included, however the last two would be and so the different between 01 Jan 2021 and 28th Feb 2021 is 58 , divide that number by 2 as that’s the number of incidents included in the calculation to give you an average of 58. Using the same example the calculation for Jan 2021 would be 31 days for each incident as no incident was closed by 31st Jan, so its (31*4) / 4. I would be repeating this for Jan – Dec 2020 and 2021
The encoding of an unclosed incident with a missing value will require a case of if statement to properly compute the days open metric on a given asof date.
Example:
The days open average is computed for a variety of asof dates stored in a data set.
data have;
call streaminit(2022);
do id = 1 to 10;
opened = '01jan2021'd + rand('integer', 60);
closed = opened + rand('integer', 90);
if rand('uniform') < 0.25 then call missing(closed);
output;
end;
format opened closed yymmdd10.;
run;
data asof;
do asof = '01jan2021'd to '01jun2021'd-1;
output;
end;
format asof yymmdd10.;
run;
proc sql;
create table averageDaysOpen_asof
as
select
asof
, mean (days_open) as days_open_avg format=6.2
, count(days_open) as id_count
from
( select asof
, opened
, closed
, case
when closed is not null and asof between opened and closed then asof-opened
when closed is null and asof > opened then asof-opened
else .
end as days_open
from asof
cross join have
)
group by asof
;
quit;

Convert Daily dates to weekly using a specific first day of the week

I am currently working on grouping/aggregating data based on date range for a weekly plot.
Below is how my dataframe looks like for Daily data:
daily_dates
registered
attended
02/10/2022
0
0
02/09/2022
0
0
02/08/2022
1
0
02/07/2022
1
0
02/06/2022
20
06
02/05/2022
05
03
02/04/2022
15
12
02/03/2022
10
08
02/02/2022
10
05
The first day of the week I'd want is Sunday.
My current code to perform weekly group is:
weekly_df = weekly_df.resample('w').sum().reset_index()
The output I am desiring is:
weekly_dates
registered
attended
02/06/2022
22
06
01/30/2022
40
28
A bit of explanation about desired output - the reason for 02/06/2022 & 01/30/2022 is that both these dates are start date of that respective week which is a sunday. And for the week of 01/30/2022 only 02/05/2022|05|03, 02/04/2022, 02/03/2022, 02/02/2022 dates are considered as those are the one's in the daily dataframe.
My current implementation follows the instructions provided here.
I am looking for any suggestion to achieve my Desired Output
Try:
df.resample('W-SUN', label='left', closed='left').sum().reset_index()
Output:
daily_dates registered attended
0 2022-01-30 40 28
1 2022-02-06 22 6

SQL aggregating a text field based on latest date text value

I have the following script:
DROP TABLE IF EXISTS [dbo].[test]
CREATE TABLE [dbo].[test]
(
[Name] [varchar](50) NULL,
[Amount] [int] NULL
)
GO
INSERT INTO [dbo].[test]
(
[Name]
,[Amount]
)
VALUES
('Abc - 20 april to 7 june 2020',25)
,('Abc - 20 april to 7 june 2020',33)
,('Abc - 20 april to 29 june 2020',15)
,('Abc - 20 april to 29 june 2020',55)
,('Abc - 20 april to 10 may 2020',20)
,('Abc - 20 april to 10 may 2020',75)
,('Abc - 20 april to 10 may 2020',89)
GO
SELECT *
FROM [dbo].[test]
The resulting table gives the following results:
Name | Amount
-------------------------------|-------
Abc - 20 april to 7 june 2020 | 25
Abc - 20 april to 7 june 2020 | 33
Abc - 20 april to 29 june 2020 | 15
Abc - 20 april to 29 june 2020 | 55
Abc - 20 april to 10 may 2020 | 20
Abc - 20 april to 10 may 2020 | 75
Abc - 20 april to 10 may 2020 | 89
I would like to be able to determine the most recent end date in the text field and eliminate the repetitive data by grouping it showing the record with the latest end date and aggregating the amount. The result should be a single record with the following data:
Name | Amount
-------------------------------|-------
Abc - 20 april to 29 june 2020 | 312
I've played around with some group bys and text manipulation functions and this is as far as I got using the following code:
select (case when [Name] like '%-%'
then trim(left([Name], charindex('-', reverse([Name])) - 1))
else [Name]
end) as [Name]
,sum(Amount) as Amount
from [dbo].[test]
group by
(case when [Name] like '%-%'
then trim(left([Name], charindex('-', reverse([Name])) - 1))
else [Name]
end)
The above code doesn't really do so much as to just aggregate unique records only. I was looking for something more intelligent that will find and recognize that all of my records really mean the same thing and that the latest end date in the Name field is 29 june 2020 and will only display that single record with the total aggregated Amount.
Any help would be much appreciated.

Subtracting row values in Cognos

I have the following report:
Id Product Revenue Month
1 A 302 jan
1 A 342 feb
1 A 133 mar
For this report, I need to find the difference in revenues for months, i.e difference in revenue from Jan to Feb and feb to march.
I want to add another column like this:
Id Product Revenue Month Profit
1 A 302 jan
1 A 342 feb 40
1 A 142 mar -200
I can write a case statement for it and create a Calculated field but I don't know how to refer the rows to do my subtraction. Can someone help me with this.
What happens when you try running-difference([Revenue])?

Sum IFs of total count without recounting Multiple instances, only the closest date prior to the AS OF DATE

I need a formula that will SUM the amount of, let's say, animal types AS OF DATE given WITHOUT adding the previous animal type count, only for the closest date prior to or on the AS OF DATE. Different animal types maybe added to or taken away. So list is not set.
I prefer not to do this in VBA or with a Pivot Table, But any help will be appreciated.
A B C
DATE ANIMAL TYPE COUNT
JAN 01 DOG 1
JAN 02 CAT 2
JAN 04 Fish 1
JAN 12 DOG 2
JAN 20 CAT 3
FEB 01 PIG 1
FEB 02 CAT 2
AS OF DATE TOTAL ANIMALS
JAN 03 3
JAN 13 5
JAN 21 6
FEB 01 7
FEB 02 6
So.
As of Jan 03, there was 3 animals total. 1 Dog and 2 cats.
As of Jan 13, there was 5 animals total. 2 Dogs, 1 Fish and 2 Cats,,,,,, NOT 6
As of Jan 21, there was 6 animals total. 2 Dogs, 1 Fish and 3 Cats,,,,,, NOT 9
As of Feb 01, there was 7 animals total. 2 Dogs, 1 Fish 1 Pig and 3 Cats, NOT 10
So far this is what I have. By using a helper column to filter the Animal Types I get a list without duplicates. Then I put that in a cell with Data Validation to pick the Type. Same for the Dates. However I would like to drop the Type input and just choose the Date. And be able to get a total.
Here is what works but not what I need.
=SUMIFS(TabData1[Count],TabData1[Date],MAX(IF(TabData1[Animal Type]=$G$2,IF(TabData1[Date]<=$F$2,TabData1[Date]))))
I want to do away with the Single Cell reference ($F$2) of a single Animal Type and replace it with a Range to get the latest count of Animals for all Animal Types as of a certain date. Like this but this does not work.
=SUMIFS(TabData1[Count],TabData1[Date],MAX(IF(TabData1[Animal Type]=(OFFSET($J$2,0,0,COUNT(IF(ListAnimalType="","",1)),1)),IF(TabData1[Date]<=$F$2,TabData1[Date]))))
To simplify (OFFSET($J$2,0,0,COUNT(IF(ListAnimalType="","",1)),1)) you can use $J$2:$J$5
=SUMIFS(TabData1[Count],TabData1[Date],MAX(IF(TabData1[Animal Type]=$J$2:$J$5,IF(TabData1[Date]<=$F$2,TabData1[Date]))))
And it looks like this
=SUMIFS(TabData1[Count],TabData1[Date],MAX(IF({"Dog";"Cat";"Fish";"Dog";"Cat";"Pig";"Cat";0;0;0;0;0;0;0;0;0}={"Cat";"Dog";"Fish";"Pig"},IF(TabData1[Date]<=$F$2,TabData1[Date]))))
Like I said, I want one formula that will take each Animal Type find the latest date from a specified cell and return the sum for each Animal Type then sum them all up.

Resources