Panda remove duplicates but keep relationship [duplicate] - python-3.x

I have a dataframe with repeat values in column A. I want to drop duplicates, keeping the row with the highest value in column B.
So this:
A B
1 10
1 20
2 30
2 40
3 10
Should turn into this:
A B
1 20
2 40
3 10
I'm guessing there's probably an easy way to do this—maybe as easy as sorting the DataFrame before dropping duplicates—but I don't know groupby's internal logic well enough to figure it out. Any suggestions?

This takes the last. Not the maximum though:
In [10]: df.drop_duplicates(subset='A', keep="last")
Out[10]:
A B
1 1 20
3 2 40
4 3 10
You can do also something like:
In [12]: df.groupby('A', group_keys=False).apply(lambda x: x.loc[x.B.idxmax()])
Out[12]:
A B
A
1 1 20
2 2 40
3 3 10

The top answer is doing too much work and looks to be very slow for larger data sets. apply is slow and should be avoided if possible. ix is deprecated and should be avoided as well.
df.sort_values('B', ascending=False).drop_duplicates('A').sort_index()
A B
1 1 20
3 2 40
4 3 10
Or simply group by all the other columns and take the max of the column you need. df.groupby('A', as_index=False).max()

Simplest solution:
To drop duplicates based on one column:
df = df.drop_duplicates('column_name', keep='last')
To drop duplicates based on multiple columns:
df = df.drop_duplicates(['col_name1','col_name2','col_name3'], keep='last')

I would sort the dataframe first with Column B descending, then drop duplicates for Column A and keep first
df = df.sort_values(by='B', ascending=False)
df = df.drop_duplicates(subset='A', keep="first")
without any groupby

Try this:
df.groupby(['A']).max()

I was brought here by a link from a duplicate question.
For just two columns, wouldn't it be simpler to do:
df.groupby('A')['B'].max().reset_index()
And to retain a full row (when there are more columns, which is what the "duplicate question" that brought me here was asking):
df.loc[df.groupby(...)[column].idxmax()]
For example, to retain the full row where 'C' takes its max, for each group of ['A', 'B'], we would do:
out = df.loc[df.groupby(['A', 'B')['C'].idxmax()]
When there are relatively few groups (i.e., lots of duplicates), this is faster than the drop_duplicates() solution (less sorting):
Setup:
n = 1_000_000
df = pd.DataFrame({
'A': np.random.randint(0, 20, n),
'B': np.random.randint(0, 20, n),
'C': np.random.uniform(size=n),
'D': np.random.choice(list('abcdefghijklmnopqrstuvwxyz'), size=n),
})
(Adding sort_index() to ensure equal solution):
%timeit df.loc[df.groupby(['A', 'B'])['C'].idxmax()].sort_index()
# 101 ms ± 98.7 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit df.sort_values(['C', 'A', 'B'], ascending=False).drop_duplicates(['A', 'B']).sort_index()
# 667 ms ± 784 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

I think in your case you don't really need a groupby. I would sort by descending order your B column, then drop duplicates at column A and if you want you can also have a new nice and
clean index like that:
df.sort_values('B', ascending=False).drop_duplicates('A').sort_index().reset_index(drop=True)

Easiest way to do this:
# First you need to sort this DF as Column A as ascending and column B as descending
# Then you can drop the duplicate values in A column
# Optional - you can reset the index and get the nice data frame again
# I'm going to show you all in one step.
d = {'A': [1,1,2,3,1,2,3,1], 'B': [30, 40,50,42,38,30,25,32]}
df = pd.DataFrame(data=d)
df
A B
0 1 30
1 1 40
2 2 50
3 3 42
4 1 38
5 2 30
6 3 25
7 1 32
df = df.sort_values(['A','B'], ascending =[True,False]).drop_duplicates(['A']).reset_index(drop=True)
df
A B
0 1 40
1 2 50
2 3 42

You can try this as well
df.drop_duplicates(subset='A', keep='last')
I referred this from https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html

Here's a variation I had to solve that's worth sharing: for each unique string in columnA I wanted to find the most common associated string in columnB.
df.groupby('columnA').agg({'columnB': lambda x: x.mode().any()}).reset_index()
The .any() picks one if there's a tie for the mode. (Note that using .any() on a Series of ints returns a boolean rather than picking one of them.)
For the original question, the corresponding approach simplifies to
df.groupby('columnA').columnB.agg('max').reset_index().

When already given posts answer the question, I made a small change by adding the column name on which the max() function is applied for better code readability.
df.groupby('A', as_index=False)['B'].max()

Very similar method to the selected answer, but sorting data frame by multiple columns might be an easier way to code.
Firstly, sort the date frame by both "A" and "B" columns, the ascending=False ensure it is ranked from highest value to lowest:
df.sort_values(["A", "B"], ascending=False, inplace=True)
Then, drop duplication and keep only the first item, which is already the one with the highest value:
df.drop_duplicates(inplace=True)

this also works:
a=pd.DataFrame({'A':a.groupby('A')['B'].max().index,'B':a.groupby('A') ['B'].max().values})

I am not going to give you the whole answer (I don't think you're looking for the parsing and writing to file part anyway), but a pivotal hint should suffice: use python's set() function, and then sorted() or .sort() coupled with .reverse():
>>> a=sorted(set([10,60,30,10,50,20,60,50,60,10,30]))
>>> a
[10, 20, 30, 50, 60]
>>> a.reverse()
>>> a
[60, 50, 30, 20, 10]

Related

groupby consecutive identical values in pandas dataframe and cumulative count of the number of occurences

I have a problem where I would like to count the number of times the current value has not changed in a dataframe over rolling periods.
For example:
df = pd.DataFrame({'col':list('aaaabbab')})
would somehow give output of
0
1
2
3
0
1
0
0
I have been trying something along the following
df['col'] = df['col'] == df['col'].shift(1)
df.rolling(window=3).sum().reset_index(drop=True, level=0)
I have added in the rolling as I will want to look at the full data set in terms of rolling periods but even without having it over rolling periods I can not quite figure out the logic.
I am not sure if I am missing something simple or this may not be possible using shift
You need to generate a grouper for the change in values. For this compare each value with the previous one and apply a cumsum. This gives you groups in the itertools.groupby style ([1, 1, 1, 1, 2, 2, 3, 4]), finally group and apply a cumcount.
df['count'] = (df.groupby(df['col'].ne(df['col'].shift()).cumsum())
.cumcount()
)
output:
col count
0 a 0
1 a 1
2 a 2
3 a 3
4 b 0
5 b 1
6 a 0
7 b 0
edit: for fun here is a solution using itertools (much faster):
from itertools import groupby, chain
df['count'] = list(chain(*(list(range(len(list(g))))
for _,g in groupby(df['col']))))
NB. this runs much faster (88 µs vs 707 µs on the provided example)
I can't comment so just to add some more to #mozway answer.
My goal was to count consecutives value for an entire huge dataframe effectively.
The pb I encounter is that by construction
np.nan == np.nan
will return False so you could have a whole column full of only NaN and yet the counter will be at 0.
A simple workaround would be to replace all NaN in your df by a value not already in it.
For instance in the case of a float dataset you could do
df.fillna('NA')
which will work but by changing the dtype of your columns to Object the following code will be much slower (20x on my set up).
I would rather advised something like :
all_values = list(np.unique(np.array(df)))
all_values = [a for a in all_values if a==a]
unik_val = min(all_values)-1
temp = df.fillna(unik_val).copy()
from itertools import groupby, chain
for col in temp.columns:
temp[col] = list(chain(*(list(range(len(list(g))))
for _,g in groupby(temp[col]))))
count_df

Use of Replace() in Python Dataframe for Multiple Columns but same value

Query: I need to replace the 1 old value with the 1 new value for a bunch of columns (not all columns) in a dataframe. The question is about the syntax to be used. Is there a shorter syntax?
Sample Dataframe:
df = pd.DataFrame({'A': [0,1,2,3,4],
'B': [5,6,7,0,9],
'C': [2,0,9,3,0],
'D': [1,3,0,5,2]})
I need all 0 to be replaced with 10 in the above df but only for column A and C (Not for B or D).
Code that I use to do this:
Method 1: Two separate commands.
df['A'].replace({0:10},inplace=True)
df['C'].replace({0:10},inplace=True)
Method 2: One command using dictionary in dictionary
df.replace({'A': {0:10}, 'C': {0:10}},inplace=True)
Method 3: Keeping new value out of dictionary
df.replace({'A':0,'C':0},10,inplace=True)
Expected Outcome:
A B C D
0 10 5 2 1
1 1 6 10 3
2 2 7 9 0
3 3 0 3 5
4 4 9 10 2
I am able to get expected outcome using all three methods. But I have a doubt that can we give a list of columns and enter old and new values for replacement only once?
Something like:
df.replace({['col_ref'...]:{'old':'new'})
#OR
df['col_ref'...].replace()
In my scenario, there are 26 columns out of 52 that need replacing, and the value is to be replaced through a regex command. Now I can store the regex command as a variable and use the method 2 to do this. But this also requires entering the variable name for 26 times. Is there any shorter way where I can enter these 26 columns and the regex replacement {'r':'r2'} only once?
I was looking on how to do this quicker myself this week and found this method and setup to handle instead of a for loop:
col_list = ['A', 'B']
df[col_list] = df[col_list.replace(0,10,inplace=True)
If you are using regex for a string:
col_list = ['A', 'B']
df[col_list] = df[col_list.replace('[\$,]','',regex=True, inplace=True)
I tried this.
for col in [list of columns]:
df.replace({col:{'r':'r2'}},regex=True,inplace=True)
This is the shortest way I could think of to write minimum code characters.
However, if there is a faster way, other answers are welcome.

Count from a dataframe, then sort based on that count [duplicate]

I have a dataset
category
cat a
cat b
cat a
I'd like to be able to return something like (showing unique values and frequency)
category freq
cat a 2
cat b 1
Use value_counts() as #DSM commented.
In [37]:
df = pd.DataFrame({'a':list('abssbab')})
df['a'].value_counts()
Out[37]:
b 3
a 2
s 2
dtype: int64
Also groupby and count. Many ways to skin a cat here.
In [38]:
df.groupby('a').count()
Out[38]:
a
a
a 2
b 3
s 2
[3 rows x 1 columns]
See the online docs.
If you wanted to add frequency back to the original dataframe use transform to return an aligned index:
In [41]:
df['freq'] = df.groupby('a')['a'].transform('count')
df
Out[41]:
a freq
0 a 2
1 b 3
2 s 2
3 s 2
4 b 3
5 a 2
6 b 3
[7 rows x 2 columns]
If you want to apply to all columns you can use:
df.apply(pd.value_counts)
This will apply a column based aggregation function (in this case value_counts) to each of the columns.
df.category.value_counts()
This short little line of code will give you the output you want.
If your column name has spaces you can use
df['category'].value_counts()
df.apply(pd.value_counts).fillna(0)
value_counts - Returns object containing counts of unique values
apply - count frequency in every column. If you set axis=1, you get frequency in every row
fillna(0) - make output more fancy. Changed NaN to 0
In 0.18.1 groupby together with count does not give the frequency of unique values:
>>> df
a
0 a
1 b
2 s
3 s
4 b
5 a
6 b
>>> df.groupby('a').count()
Empty DataFrame
Columns: []
Index: [a, b, s]
However, the unique values and their frequencies are easily determined using size:
>>> df.groupby('a').size()
a
a 2
b 3
s 2
With df.a.value_counts() sorted values (in descending order, i.e. largest value first) are returned by default.
Using list comprehension and value_counts for multiple columns in a df
[my_series[c].value_counts() for c in list(my_series.select_dtypes(include=['O']).columns)]
https://stackoverflow.com/a/28192263/786326
As everyone said, the faster solution is to do:
df.column_to_analyze.value_counts()
But if you want to use the output in your dataframe, with this schema:
df input:
category
cat a
cat b
cat a
df output:
category counts
cat a 2
cat b 1
cat a 2
you can do this:
df['counts'] = df.category.map(df.category.value_counts())
df
If your DataFrame has values with the same type, you can also set return_counts=True in numpy.unique().
index, counts = np.unique(df.values,return_counts=True)
np.bincount() could be faster if your values are integers.
You can also do this with pandas by broadcasting your columns as categories first, e.g. dtype="category" e.g.
cats = ['client', 'hotel', 'currency', 'ota', 'user_country']
df[cats] = df[cats].astype('category')
and then calling describe:
df[cats].describe()
This will give you a nice table of value counts and a bit more :):
client hotel currency ota user_country
count 852845 852845 852845 852845 852845
unique 2554 17477 132 14 219
top 2198 13202 USD Hades US
freq 102562 8847 516500 242734 340992
Without any libraries, you could do this instead:
def to_frequency_table(data):
frequencytable = {}
for key in data:
if key in frequencytable:
frequencytable[key] += 1
else:
frequencytable[key] = 1
return frequencytable
Example:
to_frequency_table([1,1,1,1,2,3,4,4])
>>> {1: 4, 2: 1, 3: 1, 4: 2}
I believe this should work fine for any DataFrame columns list.
def column_list(x):
column_list_df = []
for col_name in x.columns:
y = col_name, len(x[col_name].unique())
column_list_df.append(y)
return pd.DataFrame(column_list_df)
column_list_df.rename(columns={0: "Feature", 1: "Value_count"})
The function "column_list" checks the columns names and then checks the uniqueness of each column values.
#metatoaster has already pointed this out.
Go for Counter. It's blazing fast.
import pandas as pd
from collections import Counter
import timeit
import numpy as np
df = pd.DataFrame(np.random.randint(1, 10000, (100, 2)), columns=["NumA", "NumB"])
Timers
%timeit -n 10000 df['NumA'].value_counts()
# 10000 loops, best of 3: 715 µs per loop
%timeit -n 10000 df['NumA'].value_counts().to_dict()
# 10000 loops, best of 3: 796 µs per loop
%timeit -n 10000 Counter(df['NumA'])
# 10000 loops, best of 3: 74 µs per loop
%timeit -n 10000 df.groupby(['NumA']).count()
# 10000 loops, best of 3: 1.29 ms per loop
Cheers!
The following code creates frequency table for the various values in a column called "Total_score" in a dataframe called "smaller_dat1", and then returns the number of times the value "300" appears in the column.
valuec = smaller_dat1.Total_score.value_counts()
valuec.loc[300]
n_values = data.income.value_counts()
First unique value count
n_at_most_50k = n_values[0]
Second unique value count
n_greater_50k = n_values[1]
n_values
Output:
<=50K 34014
>50K 11208
Name: income, dtype: int64
Output:
n_greater_50k,n_at_most_50k:-
(11208, 34014)
your data:
|category|
cat a
cat b
cat a
solution:
df['freq'] = df.groupby('category')['category'].transform('count')
df = df.drop_duplicates()

Comparison over > and < between values of 2 data frames in pandas

Am having two dataframes,as df1 & df2
df1 df2
A 10 A 7
B 1500 B 1100
C 5 C 10
D 100 D 60
Now i need to write code for comparing each df1['A'] over ">" or "<" with df2['A']
and so on..can someone suggest the function to write the comparison and return greater value not any boolean result.
If I understand correctly: you want a function that returns a dataframe (df3) where df3.iloc[i] == max(df1.iloc[i], df2.iloc[i])?
There might be a simpler way, but here's something that works:
s1 = pd.Series((10, 1500, 5, 100), index=['A', 'B', 'C', 'D'])
s2 = pd.Series((7, 1100, 10, 60), index=['A', 'B', 'C', 'D'])
s3 = pd.concat((s1[s1 > s2], s2[s2 > s1]), axis=0)
s3
A 10
B 1500
D 100
C 10
When you call s1[s1 > s2], it returns all elements of s1 that are greater than the corresponding element of s2 (likewise for s2[s2 > s1]). Then you just have to concatenate these two together; axis=0 means to do it row-wise.
This does get your index out of order. You can fix that, if you want, by just sorting it after:
s3.sort_index()
A 10
B 1500
C 10
D 100
#unutbu's answer (linked in his comment to the OP's question) is probably more straightforward:
df_3 = df_1.where(df_1 > df_2, df_2)
This returns the value from df_1 when the condition holds (the value is larger), otherwise it returns the value from df_2.
(This does seem a duplicate of the question he links, but the title here is perhaps easier to find in a Google search for this task? Something like "Elementwise < (Greater Than) Between Pandas DataFrames" might also go well)

Python Pandas: Get index of rows which column matches certain value (max) [duplicate]

How can I find the row for which the value of a specific column is maximal?
df.max() will give me the maximal value for each column, I don't know how to get the corresponding row.
Use the pandas idxmax function. It's straightforward:
>>> import pandas
>>> import numpy as np
>>> df = pandas.DataFrame(np.random.randn(5,3),columns=['A','B','C'])
>>> df
A B C
0 1.232853 -1.979459 -0.573626
1 0.140767 0.394940 1.068890
2 0.742023 1.343977 -0.579745
3 2.125299 -0.649328 -0.211692
4 -0.187253 1.908618 -1.862934
>>> df['A'].idxmax()
3
>>> df['B'].idxmax()
4
>>> df['C'].idxmax()
1
Alternatively you could also use numpy.argmax, such as numpy.argmax(df['A']) -- it provides the same thing, and appears at least as fast as idxmax in cursory observations.
idxmax() returns indices labels, not integers.
Example': if you have string values as your index labels, like rows 'a' through 'e', you might want to know that the max occurs in row 4 (not row 'd').
if you want the integer position of that label within the Index you have to get it manually (which can be tricky now that duplicate row labels are allowed).
HISTORICAL NOTES:
idxmax() used to be called argmax() prior to 0.11
argmax was deprecated prior to 1.0.0 and removed entirely in 1.0.0
back as of Pandas 0.16, argmax used to exist and perform the same function (though appeared to run more slowly than idxmax).
argmax function returned the integer position within the index of the row location of the maximum element.
pandas moved to using row labels instead of integer indices. Positional integer indices used to be very common, more common than labels, especially in applications where duplicate row labels are common.
For example, consider this toy DataFrame with a duplicate row label:
In [19]: dfrm
Out[19]:
A B C
a 0.143693 0.653810 0.586007
b 0.623582 0.312903 0.919076
c 0.165438 0.889809 0.000967
d 0.308245 0.787776 0.571195
e 0.870068 0.935626 0.606911
f 0.037602 0.855193 0.728495
g 0.605366 0.338105 0.696460
h 0.000000 0.090814 0.963927
i 0.688343 0.188468 0.352213
i 0.879000 0.105039 0.900260
In [20]: dfrm['A'].idxmax()
Out[20]: 'i'
In [21]: dfrm.iloc[dfrm['A'].idxmax()] # .ix instead of .iloc in older versions of pandas
Out[21]:
A B C
i 0.688343 0.188468 0.352213
i 0.879000 0.105039 0.900260
So here a naive use of idxmax is not sufficient, whereas the old form of argmax would correctly provide the positional location of the max row (in this case, position 9).
This is exactly one of those nasty kinds of bug-prone behaviors in dynamically typed languages that makes this sort of thing so unfortunate, and worth beating a dead horse over. If you are writing systems code and your system suddenly gets used on some data sets that are not cleaned properly before being joined, it's very easy to end up with duplicate row labels, especially string labels like a CUSIP or SEDOL identifier for financial assets. You can't easily use the type system to help you out, and you may not be able to enforce uniqueness on the index without running into unexpectedly missing data.
So you're left with hoping that your unit tests covered everything (they didn't, or more likely no one wrote any tests) -- otherwise (most likely) you're just left waiting to see if you happen to smack into this error at runtime, in which case you probably have to go drop many hours worth of work from the database you were outputting results to, bang your head against the wall in IPython trying to manually reproduce the problem, finally figuring out that it's because idxmax can only report the label of the max row, and then being disappointed that no standard function automatically gets the positions of the max row for you, writing a buggy implementation yourself, editing the code, and praying you don't run into the problem again.
You might also try idxmax:
In [5]: df = pandas.DataFrame(np.random.randn(10,3),columns=['A','B','C'])
In [6]: df
Out[6]:
A B C
0 2.001289 0.482561 1.579985
1 -0.991646 -0.387835 1.320236
2 0.143826 -1.096889 1.486508
3 -0.193056 -0.499020 1.536540
4 -2.083647 -3.074591 0.175772
5 -0.186138 -1.949731 0.287432
6 -0.480790 -1.771560 -0.930234
7 0.227383 -0.278253 2.102004
8 -0.002592 1.434192 -1.624915
9 0.404911 -2.167599 -0.452900
In [7]: df.idxmax()
Out[7]:
A 0
B 8
C 7
e.g.
In [8]: df.loc[df['A'].idxmax()]
Out[8]:
A 2.001289
B 0.482561
C 1.579985
Both above answers would only return one index if there are multiple rows that take the maximum value. If you want all the rows, there does not seem to have a function.
But it is not hard to do. Below is an example for Series; the same can be done for DataFrame:
In [1]: from pandas import Series, DataFrame
In [2]: s=Series([2,4,4,3],index=['a','b','c','d'])
In [3]: s.idxmax()
Out[3]: 'b'
In [4]: s[s==s.max()]
Out[4]:
b 4
c 4
dtype: int64
df.iloc[df['columnX'].argmax()]
argmax() would provide the index corresponding to the max value for the columnX. iloc can be used to get the row of the DataFrame df for this index.
A more compact and readable solution using query() is like this:
import pandas as pd
df = pandas.DataFrame(np.random.randn(5,3),columns=['A','B','C'])
print(df)
# find row with maximum A
df.query('A == A.max()')
It also returns a DataFrame instead of Series, which would be handy for some use cases.
Very simple: we have df as below and we want to print a row with max value in C:
A B C
x 1 4
y 2 10
z 5 9
In:
df.loc[df['C'] == df['C'].max()] # condition check
Out:
A B C
y 2 10
If you want the entire row instead of just the id, you can use df.nlargest and pass in how many 'top' rows you want and you can also pass in for which column/columns you want it for.
df.nlargest(2,['A'])
will give you the rows corresponding to the top 2 values of A.
use df.nsmallest for min values.
The direct ".argmax()" solution does not work for me.
The previous example provided by #ely
>>> import pandas
>>> import numpy as np
>>> df = pandas.DataFrame(np.random.randn(5,3),columns=['A','B','C'])
>>> df
A B C
0 1.232853 -1.979459 -0.573626
1 0.140767 0.394940 1.068890
2 0.742023 1.343977 -0.579745
3 2.125299 -0.649328 -0.211692
4 -0.187253 1.908618 -1.862934
>>> df['A'].argmax()
3
>>> df['B'].argmax()
4
>>> df['C'].argmax()
1
returns the following message :
FutureWarning: 'argmax' is deprecated, use 'idxmax' instead. The behavior of 'argmax'
will be corrected to return the positional maximum in the future.
Use 'series.values.argmax' to get the position of the maximum now.
So that my solution is :
df['A'].values.argmax()
mx.iloc[0].idxmax()
This one line of code will give you how to find the maximum value from a row in dataframe, here mx is the dataframe and iloc[0] indicates the 0th index.
Considering this dataframe
[In]: df = pd.DataFrame(np.random.randn(4,3),columns=['A','B','C'])
[Out]:
A B C
0 -0.253233 0.226313 1.223688
1 0.472606 1.017674 1.520032
2 1.454875 1.066637 0.381890
3 -0.054181 0.234305 -0.557915
Assuming one want to know the rows where column "C" is max, the following will do the work
[In]: df[df['C']==df['C'].max()])
[Out]:
A B C
1 0.472606 1.017674 1.520032
The idmax of the DataFrame returns the label index of the row with the maximum value and the behavior of argmax depends on version of pandas (right now it returns a warning). If you want to use the positional index, you can do the following:
max_row = df['A'].values.argmax()
or
import numpy as np
max_row = np.argmax(df['A'].values)
Note that if you use np.argmax(df['A']) behaves the same as df['A'].argmax().
Use:
data.iloc[data['A'].idxmax()]
data['A'].idxmax() -finds max value location in terms of row
data.iloc() - returns the row
If there are ties in the maximum values, then idxmax returns the index of only the first max value. For example, in the following DataFrame:
A B C
0 1 0 1
1 0 0 1
2 0 0 0
3 0 1 1
4 1 0 0
idxmax returns
A 0
B 3
C 0
dtype: int64
Now, if we want all indices corresponding to max values, then we could use max + eq to create a boolean DataFrame, then use it on df.index to filter out indexes:
out = df.eq(df.max()).apply(lambda x: df.index[x].tolist())
Output:
A [0, 4]
B [3]
C [0, 1, 3]
dtype: object
what worked for me is:
df[df['colX'] == df['colX'].max()
You then get the row in your df with the maximum value of colX.
Then if you just want the index you can add .index at the end of the query.

Resources