This question already has answers here:
How to extract the n-th maximum/minimum value in a column of a DataFrame in pandas?
(3 answers)
Closed 3 years ago.
I have a data frame with a DateTime column, I can get minimum value by using
df['Date'].min()
How can I get the second, third... smallest values
Use nlargest or nsmallest
For second largest,
series.nlargest(2).iloc[-1]
Make sure your dates are in datetime first:
df['Sampled_Date'] = pd.to_datetime(df['Sampled_Date'])
Then drop the duplicates, take the nsmallest(2), and take the last value of that:
df['Sampled_Date'].drop_duplicates().nsmallest(2).iloc[-1]
Related
This question already has answers here:
Get date column from numpy loadtxt()
(2 answers)
Closed 4 years ago.
I have an array of dates in the format ('yyyy-mm-dd') and another array of integers numbers, each corresponding to a value in the date array. But, when I tried to plot the graph using:
matplotlib.pyplot.plot(dates, values, label='Price')
It gives the error:
ValueError: could not convert string to float: '2017-07-26'
How do I fix this error?
Your dates are strings, convert them to datetime objects first.
import datetime
x = [datetime.datetime.strptime(date, "%Y-%m-%d") for date in dates]
This question already has answers here:
Spark SQL: apply aggregate functions to a list of columns
(4 answers)
Closed 4 years ago.
I have a data frame with int values and I'd like to sum every column individually and then test if that column's sum is above 5. If the column's sum is above 5 then I'd like to add it to feature_cols. The answers I've found online only work for pandas and not PySpark. (I'm using Databricks)
Here is what I have so far:
working_cols = df.columns
for x in range(0, len(working_cols)):
if df.agg(sum(working_cols[x])) > 5:
feature_cols.append(working_cols[x])
The current output for this is that feature_cols has every column, even though some have a sum less than 5.
Out[166]:
['Column_1',
'Column_2',
'Column_3',
'Column_4',
'Column_5',
'Column_6',
'Column_7',
'Column_8',
'Column_9',
'Column_10']
I am not an expert in python but in your loop you are comparing a DataFrame[sum(a): bigint] with 5, and for some reason the answer is True.
df.agg(sum(working_cols[x])).collect()[0][0] should give you what you want. I actually collect the dataframe to the driver, select the first row (there is only one) and select the first column (only one as well).
Note that your approach is not optimal in terms of perf. You could compute all the sums with only one pass of the dataframe like this:
sums = [F.sum(x).alias(str(x)) for x in df.columns]
d = df.select(sums).collect()[0].asDict()
With this code, you would have a dictionary that assocites each column name to its sum and on which you could apply any logic that's of intrest to you.
This question already has answers here:
multiple excel if statements to produce value 1,2 or 3
(3 answers)
Closed 4 years ago.
Screenshot
I'm trying to use and if statement to return a possible 3 different values. I'm trying to determine if a customer re-ordered this year after ordering last year.
My results should be re-ordered, only 2018 or Q4 not Q1. My formula is : =IF(AND(I15>0,J15<1),"Q4 not Q1","Re-Ordered")
But as you can see I'm not sure if I should be adding an or statement or what the layout of the formula should be. Any help is greatly appreciated. Everything I've found on this keeps returning my statement as false.
An IF statement can only return 2 values, so if you want to return 3, you have to nest it inside another IF like this:
=IF(A=B,"A equals B",IF(A=C, "A equals C", "A does not equal B and A does not equal C"))
Based on your screenshot:
=IF(AND(I12>0,J12<1),"Q4 not Q1",IF(AND(I12>0,J12>0),"Re-Ordered","2018 Only"))
This question already has answers here:
Lexicographic minimum permutation such that all adjacent letters are distinct
(6 answers)
re-arrange items into an array with no similar items next to each other
(3 answers)
Closed 8 years ago.
How to re-arrange string so that same characters are not next to each other and if there are many alternative sorting options we'll choose the one which is alphabetically sorted?
i.e.
AAABBBB -> BABABAB
AAABBB -> ABABAB
BCDDEEEF -> BCEDEDEF
BACHH -> ABHCH
Pseudo code or something would be useful.
A naive solution:
Find all permutations of the string
Find all that don't have repeating characters
Find the first alphabetically
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Counting values by day/hour with timeseries in MATLAB
This is an elementary question, but I cannot find it:
I have a 3000x25 character array:
2000-01-01T00:01:01+00:00
2000-01-01T00:01:02+00:00
2000-01-01T00:01:03+00:00
2000-01-01T00:01:04+00:00
These are obviously times. I want to reformat the array to be a 3000x1 array. How can I redefine each row to be one entry in an array?
(Again, this is simple, I'm sorry)
Other than converting to serial date numbers as other have shown, I think you simply wanted to convert to cell array of strings:
A = cellstr(c)
where c is the 3000x25 matrix of characters.
You need to specify a format for the array and feed it to datenum, like this:
>> d = datenum(c,'YYYY-MM-DDTHH:mm:ss')
d =
1.0e+005 *
7.3487
7.3487
7.3487
7.3487
The times are now stored as datenums, i.e. as floating point numbers representing the number of days elapsed since the start of the Matlab epoch. If you want to convert these to numbers representing the fraction of the day elapsed, you can do
>> t = d - fix(d);
and if you want the number of seconds since midnight, you can do
>> t = 86400 * (d - fix(d));
t =
61.0000
62.0000
63.0000
64.0000