I had instaled few days ago Hbase 1.0.1 on Hadoop 2.5
My issue is that Get command dosen't return any rows.
I tried with diferent tables, via shell or API ... and nothing
If you have any thoughts about this, please share with me.
hbase(main):020:0> get 'teste', 'camp:name'
COLUMN CELL
0 row(s) in 0.0930 seconds
hbase(main):021:0> scan 'teste'
ROW COLUMN+CELL
1 column=camp:nume, timestamp=1431128619811, value=David
1 row(s) in 0.1720 seconds
That's because you're missing the row key and it thinks you're getting the 'camp:nume' row which doesn't exists.
Use this to get all columns from row 1:
hbase(main):020:0> get 'teste', '1'
Use this to get the 'camp:nume' column from row 1:
hbase(main):020:0> get 'teste', '1', 'camp:nume'
Just FYI, in the HBase Shell you can run a simple command with no arguments to see the help:
hbase(main):005:0> get
ERROR: wrong number of arguments (0 for 2)
Here is some help for this command:
Get row or cell contents; pass table name, row, and optionally
a dictionary of column(s), timestamp, timerange and versions. Examples:
hbase> get 't1', 'r1'
hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
hbase> get 't1', 'r1', {COLUMN => 'c1'}
hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
hbase> get 't1', 'r1', 'c1'
hbase> get 't1', 'r1', 'c1', 'c2'
hbase> get 't1', 'r1', ['c1', 'c2']
The same commands also can be run on a reference to a table (obtained via get_table or
create_table). Suppose you had a reference t to table 't1', the corresponding commands would be:
hbase> t.get 'r1'
hbase> t.get 'r1', {TIMERANGE => [ts1, ts2]}
hbase> t.get 'r1', {COLUMN => 'c1'}
hbase> t.get 'r1', {COLUMN => ['c1', 'c2', 'c3']}
hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
hbase> t.get 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
hbase> t.get 'r1', 'c1'
hbase> t.get 'r1', 'c1', 'c2'
hbase> t.get 'r1', ['c1', 'c2']
Related
I have a pandas dataframe, which looks like
df =
Index1 Index2 Index3 column1 column2
i11 i12 i13 2 5
i11 i12 i23 3 8
i21 i22 i23 4 5
How to convert this into list of dictionaries with keys as Index3, column1, column2 and values as in the respective cells.
So, expected output:
[[{Index3: i13, column1: 2, column2: 5}, {Index3: i23, column1: 3, column2: 8}], [{Index3: i23, column1: 4, column2: 5}]]
Please note that the same values of Index1 and Index2 form 1 inner list and the values won't be repeated.
d = {'Index1': ["i11", "i12", "i13"],
'Index2': ["i21", "i22", "i23"],
'Index3': ["i31", "i32", "i33"],
'column1': [2, 3,4],
'column2': [5, 8, 5]}
df = pd.DataFrame(data=d)
This should fit:
a = []
for i in range(df.shape[0]):
a.append({"Index3": df.iloc[2,i],"column 1": df.iloc[i,3], "column2": df.iloc[i,4]})
Res:
[{'Index3': 'i13', 'column 1': 2, 'column2': 5},
{'Index3': 'i23', 'column 1': 3, 'column2': 8},
{'Index3': 'i33', 'column 1': 4, 'column2': 5}]
I made summarized table like below using pandas groupby function
I
II
A
apple
3
banana
4
B
dog
1
cat
2
C
seoul
9
tokyo
5
I want to remain if II column has max value in each category.
For example, in A category I want to remain banana row only because it has max value in II column.
the result table what I want to get is like below.
I
II
A
banana
4
B
cat
2
C
seoul
9
Thanks.
Dataframe used by me:
df=pd.DataFrame({'II': {('A', 'apple'): 3,
('A', 'banana'): 4,
('B', 'dog'): 1,
('B', 'cat'): 2,
('C', 'seoul'): 9,
('C', 'tokyo'): 5}})
Try via sort_values(),reset_index() and drop_duplicates():
out=(df.sort_values('II',ascending=False)
.reset_index()
.drop_duplicates('level_0')
.set_index('level_0')
.rename_axis(index=None)
.rename(columns={'level_1':'I'}))
OR
out=(df.reset_index()
.sort_values('II',ascending=False)
.groupby('level_0')
.first()
.rename(columns={'level_1':'I'})
.rename_axis(index=None))
output of out:
I II
C seoul 9
A banana 4
B cat 2
Not sure if this is the most elegant solution, but if you want this should work with a groupby object.
# Creating the Dummy DataFrame
d = {
'Letter': ['A', 'A', 'B', 'B', 'C', 'C'], 'Word': ['apple', 'banana',
'dog', 'cat', 'seoul', 'tokyo'], 'II': [3, 4, 1, 2, 9, 5]
}
df = pd.DataFrame(data=d)
df_max = df.groupby('Letter')[['II']].agg('max')
df_max = df_max.merge(df, how='left', on='II') # merge the "Word" column back into df_max
You could then reorder the columns if you need them to be in a specific order.
[{'name': 'Test Item1',
'column_values': [{'title': 'col2', 'text': 'Oladimeji Olaolorun'},
{'title': 'col3', 'text': 'Working on it'},
{'title': 'col4', 'text': '2019-09-17'},
{'title': 'col5', 'text': '1'}],
'group': {'title': 'Group 1'}},
{'name': 'Test Item2',
'column_values': [{'title': 'col2', 'text': 'Lucie Phillips'},
{'title': 'col3', 'text': 'Done'},
{'title': 'col4', 'text': '2019-09-20'},
{'title': 'col5', 'text': '2'}],
'group': {'title': 'Group 1'}},
{'name': 'Test Item3',
'column_values': [{'title': 'col2', 'text': 'David Binns'},
{'title': 'col3', 'text': None},
{'title': 'col4', 'text': '2019-09-25'},
{'title': 'col5', 'text': '3'}],
'group': {'title': 'Group 1'}},
{'name': 'Item 4',
'column_values': [{'title': 'col2', 'text': 'Lucie Phillips'},
{'title': 'col3', 'text': 'Stuck'},
{'title': 'col4', 'text': '2019-09-06'},
{'title': 'col5', 'text': '4'}],
'group': {'title': 'Group 2'}},
{'name': 'Item 5',
'column_values': [{'title': 'col2', 'text': 'David Binns'},
{'title': 'col3', 'text': 'Done'},
{'title': 'col4', 'text': '2019-09-28'},
{'title': 'col5', 'text': '5'}],
'group': {'title': 'Group 2'}},
{'name': 'item 6',
'column_values': [{'title': 'col2', 'text': 'Lucie Phillips'},
{'title': 'col3', 'text': 'Done'},
{'title': 'col4', 'text': '2020-03-05'},
{'title': 'col5', 'text': '76'}],
'group': {'title': 'Group 2'}}]
I'm currently extracting data from Monday.com's API, my call returns the response above with a dict like above I'm trying to find the best method to flatten this dict into a Dataframe.
I'm currently using json_normalize(results['data']['boards'][0]['items']) when I seem to get the result below
The desired output is a table like below
Using the module glom, it becomes easy to extract the required 'text' keys from the nested list. Read in the data to pandas dataframe, split the names column and finally merge back to the parent dataframe.
from glom import glom
spec = {'names':('column_values',['text']),
'group': 'group.title',
'Name' : 'name'
}
the function replaces the None entry to string 'None'
def replace_none(val_list):
val_list = ['None' if v is None else v for v in val_list]
return val_list
for i in M:
i['names'] = replace_none(i['names'])
df = pd.DataFrame(M)
df_split = df['names'].str.join(',').str.split(',',expand=True).add_prefix('Col')
df = df.drop('names',axis=1)
pd.concat([df,df_split],axis=1)
group Name Col0 Col1 Col2 Col3
0 Group 1 Test Item1 Oladimeji Olaolorun Working on it 2019-09-17 1
1 Group 1 Test Item2 Lucie Phillips Done 2019-09-20 2
2 Group 1 Test Item3 David Binns None 2019-09-25 3
3 Group 2 Item 4 Lucie Phillips Stuck 2019-09-06 4
4 Group 2 Item 5 David Binns Done 2019-09-28 5
5 Group 2 item 6 Lucie Phillips Done 2020-03-05 76
Update : All of the code above is, unnecessary. the code below is simpler, less verbose, and clearer.
d=[]
for ent in data:
for entry in ent['column_values']:
entry.update({'name':ent['name']})
entry.update({'group':ent['group']['title']})
d.append(entry)
res = pd.DataFrame(d)
res.set_index(['name','group','title']).unstack()
text
title col2 col3 col4 col5
name group
Item 4 Group 2 Lucie Phillips Stuck 2019-09-06 4
Item 5 Group 2 David Binns Done 2019-09-28 5
Test Item1 Group 1 Oladimeji Olaolorun Working on it 2019-09-17 1
Test Item2 Group 1 Lucie Phillips Done 2019-09-20 2
Test Item3 Group 1 David Binns None 2019-09-25 3
item 6 Group 2 Lucie Phillips Done 2020-03-05 76
I think there are many ways to do this thing but I like the most this one. you can update the dictionary according to your need.
in the below code, we are deleting unnecessary data and updating keys and values according to our need now we can convert this dict into Dataframe
for i in range(len(d)):
data = d[i]
#changing names of dict
for value in d[i]['column_values']:
data[value['title']] = value['text']
data['group'] = data['group']['title']
del(d[i]['column_values'])
import pandas as pd
data = pd.DataFrame(d)
data.head()
I think this will help.
first, I convert the list into dict title as key and text as value and then convert it into Series it makes a data frame for use
def solve(list_d: list) ->pd.Series:
data = dict()
for item in list_d:
# taking values each items in list
# assign title as a key and text as values
data[item['title']] = item['text']
return pd.Series(data)
df['column_values'].apply(solve).join(d)
drop unnecessary columns and your dataset is ready.
if you found any difficulty understanding feel free to ping me
My need is to duplicate the last row of each id group max(num) after each row of the same group
import pandas as pd
data = [{'id': 110, 'val1': 'A', 'num': 0},
{'id': 110, 'val1': 'B', 'num': 1},
{'id': 110, 'val1': 'C', 'num': 2},
{'id': 220, 'val1': 'E', 'num': 0},
{'id': 220, 'val1': 'F', 'num': 1},
{'id': 220, 'val1': 'G', 'num': 2},
{'id': 220, 'val1': 'X', 'num': 3},
{'id': 300, 'val1': 'H', 'num': 0},
{'id': 300, 'val1': 'I', 'num': 1}]
df = pd.DataFrame(data)
df
My dataframe:
What I m looking for:
Here is one way merge with wide_to_long, the drop_duplicates assumed the data frame is well ordered , if not , use sort_values first
s=df.merge(df.drop_duplicates('id',keep='last'),on='id').query('val1_x!=val1_y').reset_index()
newdf=pd.wide_to_long(s,['val1','num'],i=['index','id'],j='drop',suffix='\\w+').\
reset_index('id').reset_index(drop=True)
newdf
id val1 num
0 110 A 0
1 110 C 2
2 110 B 1
3 110 C 2
4 220 E 0
5 220 X 3
6 220 F 1
7 220 X 3
8 220 G 2
9 220 X 3
10 300 H 0
11 300 I 1
I have a below data frame and i want to create a key value pair in list using the columns in data frame, how can i do it in python.
df=
city code qty1 type
hyd 1 10 a
hyd 2 12 b
ban 2 15 c
ban 4 25 d
pune 1 10 e
pune 3 12 f
i want to create a new data frame as below:
df1 =
city list
hyd [{"1":"10","type":"a"},{"2":"12","type":"b"}]
ban [{"2":"15","type":"c"},{"4":"25","type":"d"}]
pune [{"1":"10","type":"e"},{"3":"12","type":"f"}]
defaultdict
from collections import defaultdict
d = defaultdict(list)
for t in df.itertuples():
d[t.city].append({t.code: t.qty1, 'type': t.type})
pd.Series(d).rename_axis('city').to_frame('list')
list
city
ban [{2: 15, 'type': 'c'}, {4: 25, 'type': 'd'}]
hyd [{1: 10, 'type': 'a'}, {2: 12, 'type': 'b'}]
pune [{1: 10, 'type': 'e'}, {3: 12, 'type': 'f'}]
groupby
pd.Series([
{c: q, 'type': t}
for c, q, t in zip(df.code, df.qty1, df.type)
]).groupby(df.city).apply(list).to_frame('list')
list
city
ban [{2: 15, 'type': 'c'}, {4: 25, 'type': 'd'}]
hyd [{1: 10, 'type': 'a'}, {2: 12, 'type': 'b'}]
pune [{1: 10, 'type': 'e'}, {3: 12, 'type': 'f'}]