I have set of results from query that looks like this:
ID Type
6011 I
6411 I
6811 I
6911 I
6311 I
1021 L
1321 L
1421 L
1821 L
1921 L
2031 M
2431 M
2831 M
2931 M
2331 M
3041 S
3341 S
3441 S
3841 S
3941 S
Result set above is produced by this query:
SELECT rec_id, rec_type
FROM Table 1
I would like to show records like this:
ID Type
6011, 6411, 6811, 6911, 6311 I
1021, 1321, 1421, 1821, 1921 L
2031, 2431, 2831, 2931, 2331 M
3041, 3341, 3441, 3841, 3941 S
I do not know how to achieve this in Sybase. Is this a good fit for PIVOT or UNPIVOT? If anyone knows the way to get this to look like result set above please let me know. Thank you.
Related
I have the output from
df = pd.DataFrame.from_records(get_data)
display(df_data)
Output
'f_data':[{'fid': '9.3', 'lfid': '39.3'}, {'fid': '839.4', 'lfid': '739.3'}]
Needed output format like below
f_data
fid
lfid
9.3
39.3
839.4
739.3
Try with dict get the correct key
d = {'f_data':[{'fid': '9.3', 'lfid': '39.3'}, {'fid': '839.4', 'lfid': '739.3'}]}
out = pd.DataFrame(d['f_data'])
Out[147]:
fid lfid
0 9.3 39.3
1 839.4 739.3
the data comes in 3 columns after (orderbook = pd.DataFrame(orderbook_data):
timestamp bids asks
UNIX timestamp [bidprice, bidvolume] [askprice, askvolume]
list has 100 values of each. timestamp is the same
the problem is that I don't know how to access/index the values inside each row list [price, volume] of each column
I know that by running ---> bids = orderbook["bids"]
I get the list of 100 lists ---> [bidprice, bidvolume]
I'm looking to avoid doing a loop.... there has to be a way to just plot the data
I hope someone can undertand my problem. I just want to plot price on x and volume on y. The goal is to make it live
As you didn't present your input file, I prepared it on my own:
timestamp;bids
1579082401;[123.12, 300]
1579082461;[135.40, 220]
1579082736;[130.76, 20]
1579082801;[123.12, 180]
To read it I used:
orderbook = pd.read_csv('Input.csv', sep=';')
orderbook.timestamp = pd.to_datetime(orderbook.timestamp, unit='s')
Its content is:
timestamp bids
0 2020-01-15 10:00:01 [123.12, 300]
1 2020-01-15 10:01:13 [135.40, 220]
2 2020-01-15 10:05:36 [130.76, 20]
3 2020-01-15 10:06:41 [123.12, 180]
Now:
timestamp has been converted to native pandasonic type of datetime,
but bids is of object type (actually, a string).
and, as I suppose, this is the same when read from your input file.
And now the main task: The first step is to extract both numbers from bids,
convert them to float and int and save in respective columns:
orderbook = orderbook.join(orderbook.bids.str.extract(
r'\[(?P<bidprice>\d+\.\d+), (?P<bidvolume>\d+)]'))
orderbook.bidprice = orderbook.bidprice.astype(float)
orderbook.bidvolume = orderbook.bidvolume.astype(int)
Now orderbook contains:
timestamp bids bidprice bidvolume
0 2020-01-15 10:00:01 [123.12, 300] 123.12 300
1 2020-01-15 10:01:01 [135.40, 220] 135.40 220
2 2020-01-15 10:05:36 [130.76, 20] 130.76 20
3 2020-01-15 10:06:41 [123.12, 180] 123.12 180
and you can generate e.g. a scatter plot, calling:
orderbook.plot.scatter('bidprice', 'bidvolume');
or other plotting function.
Another possibility
Or maybe your orderbook_data is a dictionary? Something like:
orderbook_data = {
'timestamp': [1579082401, 1579082461, 1579082736, 1579082801],
'bids': [[123.12, 300], [135.40, 220], [130.76, 20], [123.12, 180]] }
In this case, when you create a DataFrame from it, the column types
are initially:
timestamp - int64,
bids - also object, but this time each cell contains a plain
pythonic list.
Then you can also convert timestamp column to datetime just like
above.
But to split bids (a column of lists) into 2 separate columns,
you should run:
orderbook[['bidprice', 'bidvolume']] = pd.DataFrame(orderbook.bids.tolist())
Then you have 2 new columns with respective components of the
source column and you can create your graphics jus like above.
I have the following data:
High Low Open Close Volume Adj Close
Date
1999-12-31 1472.420044 1458.189941 1464.469971 1469.250000 374050000 1469.250000
2000-01-03 1478.000000 1438.359985 1469.250000 1455.219971 931800000 1455.219971
2000-01-04 1455.219971 1397.430054 1455.219971 1399.420044 1009000000 1399.420044
2000-01-05 1413.270020 1377.680054 1399.420044 1402.109985 1085500000 1402.109985
2000-01-06 1411.900024 1392.099976 1402.109985 1403.449951 1092300000 1403.449951
... ... ... ... ... ... ...
2020-01-06 3246.840088 3214.639893 3217.550049 3246.280029 3674070000 3246.280029
2020-01-07 3244.909912 3232.429932 3241.860107 3237.179932 3420380000 3237.179932
2020-01-08 3267.070068 3236.669922 3238.590088 3253.050049 3720890000 3253.050049
2020-01-09 3275.580078 3263.669922 3266.030029 3274.699951 3638390000 3274.699951
2020-01-10 3282.989990 3268.010010 3281.810059 3273.739990 920449258 3273.739990
5039 rows × 6 columns
Since this is the daily data this was resampled to weekly to find the 52 week high and low.
weekly_high = data.High.groupby(pd.Grouper(freq='M')).tail(52)
weekly_low = data.Low.groupby(pd.Grouper(freq='M')).tail(52)
Here is the problem:
weekly_high.max()
yields: 3282.989990234375
weekly_low.min()
yeilds: 666.7899780273438
These value are are issue because 3283.0 is the high so why am i getting in deimals? Secondly weekly low is is 666 which i know for a fact is incorrect. How can i fix this?
hi you can try the following code:
data['52weekhigh'] = data.High.rolling(252).max()
data['52weeklow'] = data.Low.rolling(252).min()
This allows you to prevent having to resample on a monthly basis and gives you the rolling 52 week high (52 weeks == 252 trading days) Let me know if you need any further clarification.
One of my dataframes contains columns
WR K ID
SP-RS-001 K001
SP-RS-001 K002
SP-RS-001 K006
SP-RS-002 K002
SP-RS-002 K007
SP-RS-002 K008
and the other has [EDIT]
U Code CO Code K ID
C001 C001.01 K001
C001 C001.02 K002
C001 C001.03 K006
C002 C002.01 K001
C002 C002.02 K006
I need another column in this dataframe which gives
U Code K ID WR
C001 K001, K002, K006 SP-RS-001, SP-RS-002
C002 K001, K006 SP-RS-001
C003 K002, K007 SP-RS-001, SP-RS-002
How can I do that? Thanks! :)
First of all I'm assuming C003 entry was a mistake (in your original question), I believe the following will work for you. It wasn't apparent which type of merge you wanted, so I assumed an inner merge.
Load Dataframe:
df1 = pd.DataFrame({'WR': ['SP-RS-001', 'SP-RS-001', 'SP-RS-001', 'SP-RS-002', 'SP-RS-002', 'SP-RS-002'],
'K_ID': ['K001', 'K002', 'K006', 'K002', 'K007', 'K008']})
df2 = pd.DataFrame({'U_Code': ['C001', 'C001', 'C001', 'C002', 'C002'],
'C0_Code': ['C001.01', 'C001.02', 'C001.03', 'C002.01', 'C002.02'],
'K_ID': ['K001', 'K002', 'K006', 'K001', 'K006']})
Merge on K_ID:
df = df2.merge(df1, on='K_ID', how='inner')[['U_Code', 'K_ID', 'WR']]
This gives us:
and finally, a groupby on U_CODE with the following aggregating function:
def f(x):
return pd.Series(dict(K_ID = ', '.join(x['K_ID'].unique()),
WR = ', '.join(x['WR'].unique())))
df = df.groupby(['U_Code']).apply(f)
Which gives us:
Hope this helps.
I think you are looking for this:
df3 = df1.merge(df2, on = 'K ID')
df4 =df3.groupby('U Code')['K ID','WR'].agg({'K ID': lambda x: ','.join(x), 'WR': lambda x: ','.join(x)})
Input:
dfB=dfA.groupby('labelA').labelB.nlargest(3)
Output:
labelA
G 5309 415004880.00
6016 268492764.00
5570 191452396.00
PG 6687 486295561.00
5943 400738009.00
5987 368061265.00
PG-13 6380 936662225.00
6391 652270625.00
5723 623357910.00
R 6616 363070709.00
6184 350126372.00
5569 254464305.00
Name: labelB, dtype: float64
I now would like to create a new data frame, to which I can visualise, containing the mean of each group(G, PG, PG-13, R). I tried the following, however, as shown below, the output is the mean of all 4 groups combined.
Input:
barB.mean()
Output:
442499751.75
dfB = dfA.groupby('labelA').labelB.apply(lambda x: x.nlargest(3).mean())
You can use apply to chain the mean-function to nlargest.