How to use python3 to read some specific columns of .xlsx files, sort them by time and put them into new row? - python-3.x

I have a .xlsx data likes this.
NAME a b c ...
2012 1246108359 190153864 NA ...
2013 1089521299 181339787 -122350575 ...
2015 2092545545 648831005 -69981000 ...
2014 802730996 435162019 -69644809 ...
2017 1681536957 690355938 -1210327000 ...
2016 1149898973 491972036 -226538000 ...
First, I want to extract every column and sort them by time.
And then, put them into a new row.
How can I do this?
It shall be like this.
2012 a 1246108359
2013 a 1089521299
2014 a 802730996
2015 a 2092545545
2016 a 1149898973
2017 a 1681536957
2012 b 190153864
2013 b 181339787
2014 b 435162019
2015 b 648831005
2016 b 491972036
2017 b 690355938
... ... ...

you can use melt to do this
firs sort values
df.sort_values('NAME', inplace=True)
df.melt(id_vars='NAME', value_vars=df.columns[1:])
you can use the params value_name and var_name to change the column names

Related

Compile on column with name as year

I have a data with column names 2020, 2019
then I have in a variable refYear the value 2020
dataFr <- data.table(`2020`=c(120,110),
`2019`= c(105,98))
I want to compile change from 2019 to 2020 using
refyear-as.character(as.numeric(refYear)-1) / as.character(as.numeric(refYear)-1)
How can I write this? (dplyr)

How fetch record within date range with multiple date range in excel

I have two excel sheets, I am trying to find all user details along with request IDs that are accessed on a particular date and time range.
Sheet 1: Request
A
B
C
D
E
User
Request ID
Startdate
enddate
Business reason
100
1234567
Jul 01, 2022 03:24:11
Jul 01, 2022 06:10:11
SRQ123456
101
1234568
Jul 01, 2022 06:24:11
Jul 01, 2022 08:24:11
CHG123456
Sheet 2: Access details
A
B
C
D
OBJECTNAME
ACTION
ACCESSBY
ACCESSTIME
Business User
Update
100
Jul 01, 2022 05:59:12
Workflow
Update
100
Jul 01, 2022 06:05:20
Roles Add Workflow
Update
100
Jul 01, 2022 06:10:32
SFA
Delete
101
Jul 01, 2022 06:24:12
Tried the below code result to find out the record within the date range but I am not able to get the entire row details of sheet1:Request. I have used Name manager from the formula to store the value of Startdate, enddate, and User.
=IF(COUNTIFS(Startdate,"<="&D3,enddate,">="&D3,User,C3)>0,"Yes","No")
Sheet 2: Access details
A
B
C
D
E
OBJECTNAME
ACTION
ACCESSBY
ACCESSTIME
Business User
Update
100
Jul 01, 2022 05:59:12
Yes
Workflow
Update
100
Jul 01, 2022 06:05:20
Yes
Roles Add Workflow
Update
100
Jul 01, 2022 06:10:32
No
SFA
Delete
101
Jul 01, 2022 06:24:12
Yes
The expected output in Sheet2 should be as below
A
B
C
D
E
F
G
H
I
J
OBJECTNAME
ACTION
ACCESSBY
ACCESSTIME
User
Request ID
Startdate
enddate
Business reason
Business User
Update
100
Jul 01, 2022 05:59:12
Yes
100
1234567
Jul 01, 2022 03:24:11
Jul 01, 2022 06:10:11
SRQ123456
Workflow
Update
100
Jul 01, 2022 06:05:20
Yes
100
1234567
Jul 01, 2022 03:24:11
Jul 01, 2022 06:10:11
SRQ123456
Roles Add Workflow
Update
100
Jul 01, 2022 06:10:32
No
Record not matched
SFA
Delete
101
Jul 01, 2022 06:24:12
Yes
100
1234568
Jul 01, 2022 06:24:11
Jul 01, 2022 08:24:11
CHG123456
There are multiple ways to do this.
I will assume that the date values in both, Sheet 1 and Sheet 2, are formatted as dates and not as text. If that is not the case, additional formulas for conversion to date format will be needed.
You could try the following solution:
=OFFSET(Sheet1!$A$1:$E$1,MATCH(1,($C3=Sheet1!$A:$A)*($D3>=Sheet1!$C:$C)*($D3<=Sheet1!$D:$D),0)-1,0)
However, there is no error handling embedded here yet which leads to an error for the 3rd row of data on sheet 2 with the example data that you provided. The date and time of the "access" in the 3rd row is not within the given start and end dates of that "user" on sheet 1 (screenshot for clarification)
Therefore, for those 4 rows of data the formula will return the following result:
If you want the formula to return something else than the #N/A error, use the =IFERROR(value,value_if_error) function, where 'value' would be the formula above.

Pandas dataframe non unique values of one column based on unique values of another column

I have a Pandas dataframe and I want to get a list of all of the unique years for unique events. I don't care about the DIRECTION column, I just want a list of DATE's. I don't necessarily want the DATE's to be unique, because there are sometimes multiple ID's for the same date, but I don't need all of the DIRECTION's for the same date.
Pandas df
ID DIRECTION DATE
ABA Z 2019
ABA N 2019
ABA E 2019
ABB Z 2019
ABB N 2019
ABB E 2019
ABC Z 2020
ABC N 2020
ABC E 2020
Expected Output
[2019, 2019, 2020]
Actual Output
[2019, 2020]
Current Code
ids=df['ID'].unique().tolist()
dates=df['DATE'].unique().tolist()
labels, counts = np.unique(dates, return_counts=True)
**
len(counts) == 2
#I want len(counts) == 3
You want the unique date per id, and then concatenate them into one array:
np.concatenate(df.groupby('ID')['DATE'].unique().values)
Output:
array([2019, 2019, 2020])

Stripping out a piece of text string from a column in Python using REGEX

I need to strip out the date and time string from a column in a data frame that has rows of uneven delimiters i.e. some with three and some with four commas.
I am using Python3, pandas
Example:
df['sample field'].head(2) 
returns
"4294-Skateboard Foundation (MSF) Advanced Rider Course (ARC) , 1134123 , Oct 24 2016 12:00AM ,"
"1254-Skateboard Foundation (MSF) Experienced Rider Courses (ERC/BRC 2) , 3217121 , May 15 2015 12:00AM ,"
"4457-Total Control, Level 1 (Advanced Skateboarding Clinic) (TCL1) , 6743468 , Nov 11 2013 12:00AM ," 
Intended Return
"4294-Skateboard Foundation (MSF) Advanced Rider Course (ARC) 1134123"
"1254-Skateboard Foundation (MSF) Experienced Rider Courses (ERC/BRC 2) 3217121"
"4457-Total Control Level 1 (Advanced Skateboarding Clinic) (TCL1) 6743468" 
 
I am trying to figure out how to strip the date and time values: on the back, if the text strings into a new column: Intended Returned.
To do the reverse I used the following:
df3_1['Date'] = df3_1['Course ID'].str.extract('([A-Za-z]+\s+\d+\s+\d+\s+\d+:[0-9A-Z]+(?=\s+\,+))')  
This worked extremely well in stripping off the date but I am now trying to find out how to keep the text without the date.
df=pd.DataFrame({'Text':['4457-I only, need, this, Nov 11 2013 12:00AM ,',
'2359-I only need, this, Apr 11 2013 12:00AM ,']})
#get rid of the date section and merge the rest on whitespace
df['extract'] = df.Text.str.strip(',').str.split(',').str[:-1].str.join(' ')
df
Text extract
0 4457-I only, need, this, Nov 11 2013 12:00AM , 4457-I only need this
1 2359-I only need, this, Apr 11 2013 12:00AM , 2359-I only need this
Assumed you already have the date column
df['Course ID'].replace(regex=r'(?i)'+ df.Date,value="")
0 4457-I only, need, this,
1 2359-I only need, this,
Name: Course ID, dtype: object

Excel Doc Sum row b where row a in between dates

I have a google spreadsheet having data as below
Date SomeVal
Aug 01, 2013 1
Aug 02, 2013 5
And so on. I want to have another sheet with Monthly grouping as follows(This is what I want in the end but unable to get here)
Month Total
Apr 2013 200
May 2013 300
I tried following but could not get correct result
=SUMIFS(Sheet1!B:B,Sheet1!A:A,ʺ> Aug 1, 2013ʺ) //Returns SUM(B:B) as all dates are above Aug 1 2013
=SUMIFS(Sheet1!B:B,Sheet1!A:A,ʺ< Jun 1, 2014ʺ) //Returns 0 which is wrong
This is what I expected to get
sum for which date > Aug 2013
sum for which date < Jun 2014
My date format is Mon DD, YYYY
Any idea Where I am wrong
This format works for me
Raw Data in Sheet2
(Column A) (Column B)
Date Val
----------- ---------
7/1/2013 1
7/2/2013 2
7/3/2013 3
7/4/2013 4
And so on and so forth up until 2/9/2016 (as long as they are date values you can format them however you want)
Summed Data in Sheet1
(Column A) (Column B)
Month Total
---------- ---------
Jul 2013 496
Aug 2013 1457
Sep 2013 2325
Oct 2013 3348
Nov 2013 4155
So on and so forth for each month that I want, using this formula. Each month is formatted as MMM YYYY, and the date is the first of the month (7/1/2013, 8/1/2013, etc...)
=SUMIFS(Sheet2!$B:$B,Sheet2!$A:$A,">="&Sheet1!$A2,Sheet2!$A:$A,"<"&Sheet1!$A3)
This says
Sum the range in Sheet 2 Column B where:
The date in column A of that row is greater than the first of the month supplied
The date in column A of that row is less than the first of the month afterwards

Resources