Dataframe concatenate columns

Dataframe concatenate columns - python-3.x

I have a dataframe with a multiindex (ID, Date, LID) and columns from 0 to N that looks something like this:
0 1 2 3 4
ID Date LID
00112 11-02-2014 I 0 1 5 6 7
00112 11-02-2014 II 2 4 5 3 4
00112 30-07-2015 I 5 7 1 1 2
00112 30-07-2015 II 3 2 8 7 1
I would like to group the dataframe by ID and Date and concatenate the columns to the same row such that it looks like this:
0 1 2 3 4 5 6 7 8 9
ID Date
00112 11-02-2014 0 1 5 6 7 2 4 5 3 4
00112 30-07-2015 5 7 1 1 2 3 2 8 7 1

Using pd.concat and pd.DataFrame.xs
pd.concat(
[df.xs(x, level=2) for x in df.index.levels[2]],
axis=1, ignore_index=True
)
0 1 2 3 4 5 6 7 8 9
ID Date
112 11-02-2014 0 1 5 6 7 2 4 5 3 4
30-07-2015 5 7 1 1 2 3 2 8 7 1

Use unstack + sort_index:
df = df.unstack().sort_index(axis=1, level=1)
#for new columns names
df.columns = np.arange(len(df.columns))
print (df)
0 1 2 3 4 5 6 7 8 9
ID Date
112 11-02-2014 0 1 5 6 7 2 4 5 3 4
30-07-2015 5 7 1 1 2 3 2 8 7 1

Related

How to convert column to rows

I have csv file contain on 6 columns like this:
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
I need to convert this columns to rows to be like this:
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
6 6 6 6 6 6
7 7 7 7 7 7
8 8 8 8 8 8
How can do that please?
This is input
This is the output

Try:
import csv
with open("input.csv", "r") as f_in, open("output.csv", "w") as f_out:
reader = csv.reader(f_in, delimiter=" ")
writer = csv.writer(f_out, delimiter=" ")
writer.writerows(zip(*reader))
Contents of input.csv:
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
Contents of output.csv after the script run:
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
6 6 6 6 6 6
7 7 7 7 7 7
8 8 8 8 8 8

you are looking for a table pivot method
if you are using pandas , this will do the trick https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html

how to split data in groups by two column conditions pandas

I have dataframe, i want to split dataframe in groups based on condition from flag_0 and flag_1 column , when flag_0 is '3' and and flag_1 is '1' continous.
Here is my dataframe example:
df=pd.DataFrame({'flag_0':[1,2,3,1,2,3,1,2,3,3,3,3,1,2,3,1,2,3,4,4],'flag_1':[1,2,3,1,2,3,1,2,1,1,1,1,1,2,1,1,2,3,4,4],'dd':[1,1,1,7,7,7,8,8,8,1,1,1,7,7,7,8,8,8,5,7]})
Out[172]:
flag_0 flag_1 dd
0 1 1 1
1 2 2 1
2 3 3 1
3 1 1 7
4 2 2 7
5 3 3 7
6 1 1 8
7 2 2 8
8 3 1 8
9 3 1 1
10 3 1 1
11 3 1 1
12 1 1 7
13 2 2 7
14 3 1 7
15 1 1 8
16 2 2 8
17 3 3 8
18 4 4 5
19 4 4 7
Desired output:
group_1
Out[172]:
flag_0 flag_1 dd
9 3 1 1
10 3 1 1
11 3 1 1
group 2
Out[172]:
flag_0 flag_1 dd
14 3 1 7

You can use a mask and groupby to split the dataframe:
cond = {'flag_0': 3, 'flag_1': 1}
mask = df[list(cond)].eq(cond).all(1)
groups = [g for k,g in df[mask].groupby((~mask).cumsum())]
output:
[ flag_0 flag_1 dd
8 3 1 8
9 3 1 1
10 3 1 1
11 3 1 1,
flag_0 flag_1 dd
14 3 1 7]
groups[0]
flag_0 flag_1 dd
8 3 1 8
9 3 1 1
10 3 1 1
11 3 1 1

Creating a variable using conditionals python using vectorization

I have a pandas dataframe as below,
flag a b c
0 1 5 1 3
1 1 2 1 3
2 1 3 0 3
3 1 4 0 3
4 1 5 5 3
5 1 6 0 3
6 1 7 0 3
7 2 6 1 4
8 2 2 1 4
9 2 3 1 4
10 2 4 1 4
I want to create a column 'd' based on the below condition:
1) For first row of each flag, if a>c, then d = b, else d = nan
2) For non-first row of each flag, if (a>c) & ((previous row of d is nan) | (b > previous row of d)), d=b, else d = prev row of d
I am expecting the below output:
flag a b c d
0 1 5 1 3 1
1 1 2 1 3 1
2 1 3 0 3 1
3 1 4 0 3 1
4 1 5 5 3 5
5 1 6 0 3 5
6 1 7 0 3 5
7 2 6 1 4 1
8 2 2 1 4 1
9 2 3 1 4 1
10 2 4 1 4 1

Here's how I would translate your logic:
df['d'] = np.nan
# first row of flag
s = df.flag.ne(df.flag.shift())
# where a > c
a_gt_c = df['a'].gt(df['c'])
# fill the first rows with a > c
df.loc[s & a_gt_c, 'd'] = df['b']
# mask for second fill
mask = ((~s) # not first rows
& a_gt_c # a > c
& (df['d'].shift().isna() # previous d not null
| df['b'].gt(df['d']).shift()) # or b > previous d
)
# fill those values:
df.loc[mask, 'd'] = df['b']
# ffill for the rest
df['d'] = df['d'].ffill()
Output:
flag a b c d
0 1 5 1 3 1.0
1 1 2 1 3 1.0
2 1 3 0 3 1.0
3 1 4 0 3 0.0
4 1 5 5 3 5.0
5 1 6 0 3 0.0
6 1 7 0 3 0.0
7 2 6 1 4 1.0
8 2 2 1 4 1.0
9 2 3 1 4 1.0
10 2 4 1 4 1.0

I need to write a program to print the following pattern:

I need to produce the following pattern:
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
I have written a code that produces the same but in a right side up form. i don't understand how to flip it over.
for i in range(1, 6 + 1):
for j in range(1, rows + 1):
if(j < i):
print(' ', end = ' ')
else:
print(i, end = ' ')
print()

Edit: This somewhat fails with rows >= 12, honorable mention to alexanderhurst for finding the bug in this implementation, and providing another clean solution. However, we can mimic tabulate by using tabs (\t) instead of spaces (see at the bottom).
Why not something simpler?
rows = 6
l = list(range(rows))
for i in range(rows):
print(" " * 2*i + " ".join(str(x) for x in l[:rows-i]))
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
Edit: If you want permutations, try these:
>>> for i in range(rows):
... x = " " * 2*i + " ".join(str(x) for x in l[:rows-i])
... print(x[::-1])
6 5 4 3 2 1
5 4 3 2 1
4 3 2 1
3 2 1
2 1
1
>>> for i in range(rows, -1, -1):
... print(" " * 2*i + " ".join(str(x) for x in l[:rows-i]))
...
1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
1 2 3 4 5 6
>>> for i in range(rows, -1, -1):
... x = " " * 2*i + " ".join(str(x) for x in l[:rows-i])
... print(x[::-1])
...
1
2 1
3 2 1
4 3 2 1
5 4 3 2 1
6 5 4 3 2 1
Bug for larger numbers of rows:
>>> rows = 14
>>> l = list(range(rows))
>>> for i in range(rows):
... print(" " * 2*i + " ".join(str(x) for x in l[:rows-i]))
...
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 1 2 3 4 5 6 7 8 9 10 11 12
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6
0 1 2 3 4 5
0 1 2 3 4
0 1 2 3
0 1 2
0 1
0
Hotfix 1: use tabs. This can work okay if your tab length is the same as me and you use <20 rows on max screen width (well, otherwise alexanderhurst's solution might not solve your problem either).
>>> for i in range(rows):
... print("\t" * i + "\t".join(str(x) for x in l[:rows-i]))
...
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 1 2 3 4 5 6 7 8 9 10 11 12
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6
0 1 2 3 4 5
0 1 2 3 4
0 1 2 3
0 1 2
0 1
0
Hotfix 2: add / remove spaces according to number length (e.g. using log(x) or len(str(x)) or similar) but it becomes too complex.

This solution resembles yours with a few changes
It first prints out the number of spaces needed for the triangle shape
Then it counts up to count
and then it moves to the next line
num = 6
for i in range(num,0,-1):
print(' '*(num - i), end='')
for j in range(i):
print(j + 1, end=' ')
print()
this does have an odd effect if you use a value greater than 10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
You can use tabulate to keep everything in its column. Here I also used a list comprehension to reduce code size.
code:
from tabulate import tabulate
count = 16
numbers = [[''] * (count - i) + [j+1 for j in range(i)] for i in range(count, 0, -1)]
print(tabulate(numbers))
output:
- - - - - - - - - -- -- -- -- -- -- --
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
- - - - - - - - - -- -- -- -- -- -- --

You can count backwards with range():
for i in range(6, 0, -1):
for j in range(1, rows + 1):
if(6-j >= i): # if i = 6, doesn't activate. i=5, activates once. i=4, activates twice, etc.
print(' ', end = ' ')
else:
print(i, end = ' ')
print()

So from what I can see you are trying to make the form:
1
21
321
4321
54321
654321
So the loops need to be reversed and you need to add a space filler section.
rows = 6
for i in range(1, rows+1):
out = ''
for j in range(1, rows):
out += ' '
for j in range(i, 0, -1):
out += str(j)
print(out)
rows -=1

A 1 line statement using list comprehension would be
pattern = '\n'.join((' ' * 2 * i) + ' '.join(str(n) for n in range(1, num + 1)) for i, num in enumerate(range(6, -1, -1)))
For clarification, you can have a look at the below commands executed on Python interactive terminal.
>>> pattern = '\n'.join((' ' * 2 * i) + ' '.join(str(n) for n in range(1, num + 1)) for i, num in enumerate(range(6, -1, -1)))
>>>
>>> print(pattern)
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
>>>
It is suggested to use functional approach for this kind of repetitive work (If you want to try with multiple samples).
def print_num_triangle(n=6):
"""
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
"""
pattern = '\n'.join((' ' * 2 * i) + ' '.join(str(n) for n in range(1, num + 1)) for i, num in enumerate(range(n, -1, -1)))
print(pattern)
if __name__ == "__main__":
print_num_triangle(10)
# 1 2 3 4 5 6 7 8 9 10
# 1 2 3 4 5 6 7 8 9
# 1 2 3 4 5 6 7 8
# 1 2 3 4 5 6 7
# 1 2 3 4 5 6
# 1 2 3 4 5
# 1 2 3 4
# 1 2 3
# 1 2
# 1
#
print_num_triangle(7)
# 1 2 3 4 5 6 7
# 1 2 3 4 5 6
# 1 2 3 4 5
# 1 2 3 4
# 1 2 3
# 1 2
# 1
print_num_triangle() # default -> 6
# 1 2 3 4 5 6
# 1 2 3 4 5
# 1 2 3 4
# 1 2 3
# 1 2
# 1

Table wise value count

I have a table like this, I want to draw a histogram for number of 0, 1, 2, 3 across all table, is there a way to do it?

you can apply melt and hist
for example:
df
A B C D
0 3 1 1 1
1 3 3 2 2
2 1 0 1 1
3 3 2 3 0
4 3 1 1 3
5 3 0 3 1
6 3 1 1 0
7 1 3 3 0
8 3 1 3 3
9 3 3 1 3
df.melt()['value'].value_counts()
3 18
1 14
0 5
2 3

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Dataframe concatenate columns - python-3.x

Using pd.concat and pd.DataFrame.xs pd.concat( [df.xs(x, level=2) for x in df.index.levels[2]], axis=1, ignore_index=True ) 0 1 2 3 4 5 6 7 8 9 ID Date 112 11-02-2014 0 1 5 6 7 2 4 5 3 4 30-07-2015 5 7 1 1 2 3 2 8 7 1

Use unstack + sort_index: df = df.unstack().sort_index(axis=1, level=1) #for new columns names df.columns = np.arange(len(df.columns)) print (df) 0 1 2 3 4 5 6 7 8 9 ID Date 112 11-02-2014 0 1 5 6 7 2 4 5 3 4 30-07-2015 5 7 1 1 2 3 2 8 7 1

Related

How to convert column to rows

how to split data in groups by two column conditions pandas

Creating a variable using conditionals python using vectorization

I need to write a program to print the following pattern:

Table wise value count

Categories

Resources