How to merge multiple rows in one row Sybase? - pivot

I have set of results from query that looks like this:
ID Type
6011 I
6411 I
6811 I
6911 I
6311 I
1021 L
1321 L
1421 L
1821 L
1921 L
2031 M
2431 M
2831 M
2931 M
2331 M
3041 S
3341 S
3441 S
3841 S
3941 S
Result set above is produced by this query:
SELECT rec_id, rec_type
FROM Table 1
I would like to show records like this:
ID Type
6011, 6411, 6811, 6911, 6311 I
1021, 1321, 1421, 1821, 1921 L
2031, 2431, 2831, 2931, 2331 M
3041, 3341, 3441, 3841, 3941 S
I do not know how to achieve this in Sybase. Is this a good fit for PIVOT or UNPIVOT? If anyone knows the way to get this to look like result set above please let me know. Thank you.

Related

Convert dataframe dict to table in pandas

I have the output from
df = pd.DataFrame.from_records(get_data)
display(df_data)
Output
'f_data':[{'fid': '9.3', 'lfid': '39.3'}, {'fid': '839.4', 'lfid': '739.3'}]
Needed output format like below
f_data
fid
lfid
9.3
39.3
839.4
739.3
Try with dict get the correct key
d = {'f_data':[{'fid': '9.3', 'lfid': '39.3'}, {'fid': '839.4', 'lfid': '739.3'}]}
out = pd.DataFrame(d['f_data'])
Out[147]:
fid lfid
0 9.3 39.3
1 839.4 739.3

How to graph Binance API Orderbook with Pandas-matplotlib?

the data comes in 3 columns after (orderbook = pd.DataFrame(orderbook_data):
timestamp bids asks
UNIX timestamp [bidprice, bidvolume] [askprice, askvolume]
list has 100 values of each. timestamp is the same
the problem is that I don't know how to access/index the values inside each row list [price, volume] of each column
I know that by running ---> bids = orderbook["bids"]
I get the list of 100 lists ---> [bidprice, bidvolume]
I'm looking to avoid doing a loop.... there has to be a way to just plot the data
I hope someone can undertand my problem. I just want to plot price on x and volume on y. The goal is to make it live
As you didn't present your input file, I prepared it on my own:
timestamp;bids
1579082401;[123.12, 300]
1579082461;[135.40, 220]
1579082736;[130.76, 20]
1579082801;[123.12, 180]
To read it I used:
orderbook = pd.read_csv('Input.csv', sep=';')
orderbook.timestamp = pd.to_datetime(orderbook.timestamp, unit='s')
Its content is:
timestamp bids
0 2020-01-15 10:00:01 [123.12, 300]
1 2020-01-15 10:01:13 [135.40, 220]
2 2020-01-15 10:05:36 [130.76, 20]
3 2020-01-15 10:06:41 [123.12, 180]
Now:
timestamp has been converted to native pandasonic type of datetime,
but bids is of object type (actually, a string).
and, as I suppose, this is the same when read from your input file.
And now the main task: The first step is to extract both numbers from bids,
convert them to float and int and save in respective columns:
orderbook = orderbook.join(orderbook.bids.str.extract(
r'\[(?P<bidprice>\d+\.\d+), (?P<bidvolume>\d+)]'))
orderbook.bidprice = orderbook.bidprice.astype(float)
orderbook.bidvolume = orderbook.bidvolume.astype(int)
Now orderbook contains:
timestamp bids bidprice bidvolume
0 2020-01-15 10:00:01 [123.12, 300] 123.12 300
1 2020-01-15 10:01:01 [135.40, 220] 135.40 220
2 2020-01-15 10:05:36 [130.76, 20] 130.76 20
3 2020-01-15 10:06:41 [123.12, 180] 123.12 180
and you can generate e.g. a scatter plot, calling:
orderbook.plot.scatter('bidprice', 'bidvolume');
or other plotting function.
Another possibility
Or maybe your orderbook_data is a dictionary? Something like:
orderbook_data = {
'timestamp': [1579082401, 1579082461, 1579082736, 1579082801],
'bids': [[123.12, 300], [135.40, 220], [130.76, 20], [123.12, 180]] }
In this case, when you create a DataFrame from it, the column types
are initially:
timestamp - int64,
bids - also object, but this time each cell contains a plain
pythonic list.
Then you can also convert timestamp column to datetime just like
above.
But to split bids (a column of lists) into 2 separate columns,
you should run:
orderbook[['bidprice', 'bidvolume']] = pd.DataFrame(orderbook.bids.tolist())
Then you have 2 new columns with respective components of the
source column and you can create your graphics jus like above.

Max and min values in pandas

I have the following data:
High Low Open Close Volume Adj Close
Date
1999-12-31 1472.420044 1458.189941 1464.469971 1469.250000 374050000 1469.250000
2000-01-03 1478.000000 1438.359985 1469.250000 1455.219971 931800000 1455.219971
2000-01-04 1455.219971 1397.430054 1455.219971 1399.420044 1009000000 1399.420044
2000-01-05 1413.270020 1377.680054 1399.420044 1402.109985 1085500000 1402.109985
2000-01-06 1411.900024 1392.099976 1402.109985 1403.449951 1092300000 1403.449951
... ... ... ... ... ... ...
2020-01-06 3246.840088 3214.639893 3217.550049 3246.280029 3674070000 3246.280029
2020-01-07 3244.909912 3232.429932 3241.860107 3237.179932 3420380000 3237.179932
2020-01-08 3267.070068 3236.669922 3238.590088 3253.050049 3720890000 3253.050049
2020-01-09 3275.580078 3263.669922 3266.030029 3274.699951 3638390000 3274.699951
2020-01-10 3282.989990 3268.010010 3281.810059 3273.739990 920449258 3273.739990
5039 rows × 6 columns
Since this is the daily data this was resampled to weekly to find the 52 week high and low.
weekly_high = data.High.groupby(pd.Grouper(freq='M')).tail(52)
weekly_low = data.Low.groupby(pd.Grouper(freq='M')).tail(52)
Here is the problem:
weekly_high.max()
yields: 3282.989990234375
weekly_low.min()
yeilds: 666.7899780273438
These value are are issue because 3283.0 is the high so why am i getting in deimals? Secondly weekly low is is 666 which i know for a fact is incorrect. How can i fix this?
hi you can try the following code:
data['52weekhigh'] = data.High.rolling(252).max()
data['52weeklow'] = data.Low.rolling(252).min()
This allows you to prevent having to resample on a monthly basis and gives you the rolling 52 week high (52 weeks == 252 trading days) Let me know if you need any further clarification.

Pandas - Merging a grouped column to another dataframe

One of my dataframes contains columns
WR K ID
SP-RS-001 K001
SP-RS-001 K002
SP-RS-001 K006
SP-RS-002 K002
SP-RS-002 K007
SP-RS-002 K008
and the other has [EDIT]
U Code CO Code K ID
C001 C001.01 K001
C001 C001.02 K002
C001 C001.03 K006
C002 C002.01 K001
C002 C002.02 K006
I need another column in this dataframe which gives
U Code K ID WR
C001 K001, K002, K006 SP-RS-001, SP-RS-002
C002 K001, K006 SP-RS-001
C003 K002, K007 SP-RS-001, SP-RS-002
How can I do that? Thanks! :)
First of all I'm assuming C003 entry was a mistake (in your original question), I believe the following will work for you. It wasn't apparent which type of merge you wanted, so I assumed an inner merge.
Load Dataframe:
df1 = pd.DataFrame({'WR': ['SP-RS-001', 'SP-RS-001', 'SP-RS-001', 'SP-RS-002', 'SP-RS-002', 'SP-RS-002'],
'K_ID': ['K001', 'K002', 'K006', 'K002', 'K007', 'K008']})
df2 = pd.DataFrame({'U_Code': ['C001', 'C001', 'C001', 'C002', 'C002'],
'C0_Code': ['C001.01', 'C001.02', 'C001.03', 'C002.01', 'C002.02'],
'K_ID': ['K001', 'K002', 'K006', 'K001', 'K006']})
Merge on K_ID:
df = df2.merge(df1, on='K_ID', how='inner')[['U_Code', 'K_ID', 'WR']]
This gives us:
and finally, a groupby on U_CODE with the following aggregating function:
def f(x):
return pd.Series(dict(K_ID = ', '.join(x['K_ID'].unique()),
WR = ', '.join(x['WR'].unique())))
df = df.groupby(['U_Code']).apply(f)
Which gives us:
Hope this helps.
I think you are looking for this:
df3 = df1.merge(df2, on = 'K ID')
df4 =df3.groupby('U Code')['K ID','WR'].agg({'K ID': lambda x: ','.join(x), 'WR': lambda x: ','.join(x)})

How can I create a new dataframe with the products of another dataframe which has been grouped?

Input:
dfB=dfA.groupby('labelA').labelB.nlargest(3)
Output:
labelA
G 5309 415004880.00
6016 268492764.00
5570 191452396.00
PG 6687 486295561.00
5943 400738009.00
5987 368061265.00
PG-13 6380 936662225.00
6391 652270625.00
5723 623357910.00
R 6616 363070709.00
6184 350126372.00
5569 254464305.00
Name: labelB, dtype: float64
I now would like to create a new data frame, to which I can visualise, containing the mean of each group(G, PG, PG-13, R). I tried the following, however, as shown below, the output is the mean of all 4 groups combined.
Input:
barB.mean()
Output:
442499751.75
dfB = dfA.groupby('labelA').labelB.apply(lambda x: x.nlargest(3).mean())
You can use apply to chain the mean-function to nlargest.

Resources