How to select a set of values in pandas data frame (multiple colums with multiple row conditions) - python-3.x

I have a huge ass csv file like given below which I opened as dataframe using pandas. I want to extract data from multiple columns at different date sets.
I want to select from a particular date and hour to another for the last 3 column values. The slicing options I tried and googled were for single column.
date heure PM10 NO2 O3
0 01/01/2016 1 27 22 36
1 01/01/2016 2 25 29 27
2 01/01/2016 3 26 47 10
3 01/01/2016 4 16 40 13
4 01/01/2016 5 15 34 13
5 02/01/2016 1 15 34 13
6 02/01/2016 2 15 34 13
Target output - taking data from a particular data and hour to another one.
3 01/01/2016 4 16
4 01/01/2016 5 15
Thank you. The data set is obviously way bigger than 4 No.

You can do this:
df_selected = df[(df.date >= "01/01/2016") &
(df['hour']>=4) &
(df.date < "02/01/2016") &
(df['hour']<6)
].iloc[:,:3] #first three columns
Alternatively, for the columns selection you can use .loc[:,['name', 'of', 'columns']] or for the last n columns .iloc[:,-n:].
Be careful with date because I'm not sure what happens with an "English" date, maybe you have to change the date using df['date'] = pd.to_datetime(df.date).

Related

Excel MERGE two tables

I have SET 1
CLASS
Student
TEST
SCORE
A
1
1
46
A
1
2
50
A
1
3
45
A
2
1
45
A
2
2
47
A
2
3
31
A
3
1
34
A
3
2
45
B
1
1
36
B
2
1
31
B
2
2
41
B
3
1
50
C
1
1
42
C
3
1
31
and SET 2
CLASS
SIZE
YEARS
A
39
7
B
20
12
C
31
6
and wish to COMBINE to make SET 3
CLASS
STUDENT
TEST
SCORE
SIZE
YEARS
A
1
1
46
39
7
A
1
2
50
39
7
A
1
3
45
39
7
A
2
1
45
39
7
A
2
2
47
39
7
A
2
3
31
39
7
A
3
1
34
39
7
A
3
2
45
39
7
B
1
1
36
20
12
B
2
1
31
20
12
B
2
2
41
20
12
B
3
1
50
20
12
C
1
1
42
31
6
C
3
1
31
31
6
so basically add the SIZE and YEARS columns from SET 2 and merge on CLASS onto SET 1. In excel how you can do this? I need to match on CLASS
Define both sets as tables and “left join” in PowerQuery. There you can choose the columns of the resulting table.
https://learn.microsoft.com/en-us/power-query/merge-queries-left-outer
If you have Set 1 on the top left of a worksheet "Set1" and Set 2 on the top left of a worksheet "Set2", then you can use the formula
=VLOOKUP(A2;'Set2'!$A$2:$C$4;2;FALSE), where $A$2:$C$4 is the range of Set2, and A2 is the class value from Set1, which is what is used to do the lookup in Set2. The next argument, 2, means to take the second row from Set2, and the FALSE at the end means that you only want exact matches on the CLASS. You can do auto-fill with this formula, and do similar steps for the years. If you look up the help for VLOOKUP within Excel, that should help you to understand how it works.
Your first set of data is essentially your primary set of data that you just want to add attribute columns to. I built this example on Google Sheets which should help explain. Using spill formulas, only a few cells are needed with their own formulas. You can see them as they are highlighted in yellow. When you use in Excel, obviously make sure you change the column references, but this would get you the answer.
Note you have to have SpillRange in Excel for this to work. To test, see if you have the formula =unique()
This solution may work for you if both sets start in the same column. As example in my image, both of them start at column A. You can get all data with a single VLOOKUP formula:
Formula in cell E2 is:
=VLOOKUP($A2;$A$22:$R$25;COLUMN($B22);FALSE)
Notice the mixed references at first and third argument and absolute references in the second one. Third argument is critical, because is the relational position between both sets, that's the reason it's easier if both sets start at same column. If not, you'll need to adjust this argument substracting or adding, depending on the case.
Anyways, with a single formula, you can get any number of columns. The only disavantage of this formula is that you need to manually drag to right until you got all the columns (10, 30 or whatever). You'll notice you are done because the formula will raise an error:
This error means you are trying to get a referenced outside of your column area.

Grouping data based on month-year in pandas and then dropping all entries except the latest one- Python

Below is my example dataframe
Date Indicator Value
0 2000-01-30 A 30
1 2000-01-31 A 40
2 2000-03-30 C 50
3 2000-02-27 B 60
4 2000-02-28 B 70
5 2000-03-31 C 90
6 2000-03-28 C 100
7 2001-01-30 A 30
8 2001-01-31 A 40
9 2001-03-30 C 50
10 2001-02-27 B 60
11 2001-02-28 B 70
12 2001-03-31 C 90
13 2001-03-28 C 100
Desired Output
Date Indicator Value
2000-01-31 A 40
2000-02-28 B 70
2000-03-31 C 90
2001-01-31 A 40
2001-02-28 B 70
2001-03-31 C 90
I want to write a code that groups data by particular month-year and then keep the entry of latest date in that particular month-year and drop the rest. The data is till year 2020
I was only able to fetch the count by month-year. I am not able to drop create a proper code that helps to group data as per month-year and indicator and get the correct results
Use Series.dt.to_period for months periods, aggregate index of maximal date per groups by DataFrameGroupBy.idxmax and then pass to DataFrame.loc:
df['Date'] = pd.to_datetime(df['Date'])
print (df['Date'].dt.to_period('m'))
0 2000-01
1 2000-01
2 2000-03
3 2000-02
4 2000-02
5 2000-03
6 2000-03
7 2001-01
8 2001-01
9 2001-03
10 2001-02
11 2001-02
12 2001-03
13 2001-03
Name: Date, dtype: period[M]
df = df.loc[df.groupby(df['Date'].dt.to_period('m'))['Date'].idxmax()]
print (df)
Date Indicator Value
1 2000-01-31 A 40
4 2000-02-28 B 70
5 2000-03-31 C 90
8 2001-01-31 A 40
11 2001-02-28 B 70
12 2001-03-31 C 90

Computing most recent smaller value

I have an excel sheet with dates (sorted) in one column and values in another. Ex:
1/1/2019 10
1/2/2019 12
1/3/2019 8
1/4/2019 20
1/10/2019 8
1/12/2019 22
I want to compute in a third column, the most recent date such that value was less than or equal to the current value (if the current is the lowest, then use the current date). So, for the sample data above,
1/1/2019 10 1/1/2019
1/2/2019 12 1/1/2019
1/3/2019 8 1/3/2019
1/4/2019 20 1/3/2019
1/10/2019 8 1/3/2019
1/12/2019 22 1/10/2019
Is there a way of accomplishing this without VBA macros?
Here's a way. Paste these in and copy down the column.
Column C: =IF(COUNTIF(B2:B6,D1)=0,A1,MINIFS(A2:A6,B2:B6,D1))
Column D: =CONCATENATE("<",TEXT(VALUE(B1),"#"))
You can hide column D to make it prettier. It's the criteria being used by the COUNTIF and MINIFS. Column C is the output.
1/1/2019 10 1/3/2019 <10
1/2/2019 12 1/3/2019 <12
1/3/2019 8 1/3/2019 <8
1/4/2019 20 1/10/2019 <20
1/10/2019 8 1/10/2019 <8
1/12/2019 22 1/12/2019 <22
Formula view:
43466 10 =IF(COUNTIF(B2:B6,D1)=0,A1,MINIFS(A2:A6,B2:B6,D1)) =CONCATENATE("<",TEXT(VALUE(B1),"#"))
43467 12 =IF(COUNTIF(B3:B7,D2)=0,A2,MINIFS(A3:A7,B3:B7,D2)) =CONCATENATE("<",TEXT(VALUE(B2),"#"))
43468 8 =IF(COUNTIF(B4:B8,D3)=0,A3,MINIFS(A4:A8,B4:B8,D3)) =CONCATENATE("<",TEXT(VALUE(B3),"#"))
43469 20 =IF(COUNTIF(B5:B9,D4)=0,A4,MINIFS(A5:A9,B5:B9,D4)) =CONCATENATE("<",TEXT(VALUE(B4),"#"))
43475 8 =IF(COUNTIF(B6:B10,D5)=0,A5,MINIFS(A6:A10,B6:B10,D5)) =CONCATENATE("<",TEXT(VALUE(B5),"#"))
43477 22 =IF(COUNTIF(B7:B11,D6)=0,A6,MINIFS(A7:A11,B7:B11,D6)) =CONCATENATE("<",TEXT(VALUE(B6),"#"))
This is a little sloppy in that you could use a named value or absolute value for the end of the range, e.g. B$6. Otherwise you're going to be looking at cells below your table, which is fine as long as they're empty, but kind of sloppy.
Column C: =IF(COUNTIF(B2:B$6,D1)=0,A1,MINIFS(A2:A$6,B2:B$6,D1))

Excel: Subtracting One Value from the Next Entry in Column

I have a set of data that has two columns, one for enumerating entries, and the second for storing a value:
1 0.000000000
2
3
4
5
6
7
8
9 0.664076596
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34 0.668394223
The values in column 2 are separated by varying numbers of rows throughout my data sheet. I want to subtract each value (Value B) in column 2 by the previous value above it (Value A) (ie 0.664076596 - 0.000000000, and 0.668394223 - 0.664076596 etc.). I want to put these results in the third column and next to Value B. How would I do this in Excel?
Use this which will find the last number value in the column before the current row.
=IF(B2="","",IFERROR(B2-INDEX($B$1:B1,MATCH(1E+99,$B$1:B1)),B2))

Table transformation in Excel

I have a spreadsheet with data in following format:
CarID Day DistanceTraveled
Ford1 1 10
Ford1 2 12
Nissan1 1 13
Ford1 3 41
Nissan1 2 20
Nissan1 3 10
...
And so on. There are a few hundreds of records in format like this, with a few dozens of cars.
I have to transform it into a following format:
Day Ford1 Nissan1
1 10 13
2 12 20
3 41 10
Is it any fast and automatic way to achieve it in Excel?
Just for the sake of an answer:

Resources