2d numpy array to 3 column DataFrame [duplicate] - python-3.x

This question already has answers here:
Generalise slicing operation in a NumPy array
(4 answers)
Closed 4 years ago.
I have a 2d numpy matrix, for example:
arr = np.arange(0, 12).reshape(3,4)
This I would like to get in a DataFrame, such that:
X Y Z
0 0 0
0 1 1
0 2 2
0 3 3
1 0 4
1 1 5
1 2 6
1 3 7
2 0 8
2 1 9
2 2 10
2 3 11
How would I do this (efficiently)?

You can use numpy functions:
x1 =np.repeat(np.arange(arr.shape[0]), len(arr.flatten())/len(np.arange(arr.shape[0])))
x2 =np.tile(np.arange(arr.shape[1]), int(len(arr.flatten())/len(np.arange(arr.shape[1]))))
x3= arr.flatten()
pd.DataFrame(np.array([x1,x2,x3]).T, columns=['X','Y','Z'])
Output
X Y Z
0 0 0 0
1 0 1 1
2 0 2 2
3 0 3 3
4 1 0 4
5 1 1 5
6 1 2 6
7 1 3 7
8 2 0 8
9 2 1 9
10 2 2 10
11 2 3 11

Related

Subtract value in Specific columns in CSV file

I would like to subtract a value example value 2 on a specific column of a data frame
csv1=
X Y Subdie 1v 2v 5v 10v
0 1 0 4 2 4 2 2
1 2 0 2 3 4 4 6
2 3 0 3 5 4 6 8
3 4 0 4 2 5 4 4
4 5 0 4 2 5 8 4
I want to subtract 2 on 1v and 2v columns, I tried with this code
Cv=(csv1.loc[:,' 1v':' 5v'])-2
I got an output like
1v 2v 5v
0 0 2 0
1 1 2 2
2 3 2 4
3 0 3 2
4 0 3 6
Expected output: include other columns also
x y 1v 2v 5v 10v
0 1 0 0 2 0 2
1 2 0 1 2 2 6
2 3 0 3 2 4 8
3 4 0 0 3 2 4
4 5 0 0 3 6 4
Don't create a copy, perform an in place modification:
csv1.loc[:, ' 1v':' 5v'] -= 2
modifiers csv1:
X Y Subdie 1v 2v 5v 10v
0 1 0 4 0 2 0 2
1 2 0 2 1 2 2 6
2 3 0 3 3 2 4 8
3 4 0 4 0 3 2 4
4 5 0 4 0 3 6 4
NB. I kept your slice as in the question, but you should avoid having leading spaces in the column names. Also, ' 1v':' 5v' selects 1v, 2v, and 5v (included).

I need to write a program to print the following pattern:

I need to produce the following pattern:
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
I have written a code that produces the same but in a right side up form. i don't understand how to flip it over.
for i in range(1, 6 + 1):
for j in range(1, rows + 1):
if(j < i):
print(' ', end = ' ')
else:
print(i, end = ' ')
print()
Edit: This somewhat fails with rows >= 12, honorable mention to alexanderhurst for finding the bug in this implementation, and providing another clean solution. However, we can mimic tabulate by using tabs (\t) instead of spaces (see at the bottom).
Why not something simpler?
rows = 6
l = list(range(rows))
for i in range(rows):
print(" " * 2*i + " ".join(str(x) for x in l[:rows-i]))
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
Edit: If you want permutations, try these:
>>> for i in range(rows):
... x = " " * 2*i + " ".join(str(x) for x in l[:rows-i])
... print(x[::-1])
6 5 4 3 2 1
5 4 3 2 1
4 3 2 1
3 2 1
2 1
1
>>> for i in range(rows, -1, -1):
... print(" " * 2*i + " ".join(str(x) for x in l[:rows-i]))
...
1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
1 2 3 4 5 6
>>> for i in range(rows, -1, -1):
... x = " " * 2*i + " ".join(str(x) for x in l[:rows-i])
... print(x[::-1])
...
1
2 1
3 2 1
4 3 2 1
5 4 3 2 1
6 5 4 3 2 1
Bug for larger numbers of rows:
>>> rows = 14
>>> l = list(range(rows))
>>> for i in range(rows):
... print(" " * 2*i + " ".join(str(x) for x in l[:rows-i]))
...
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 1 2 3 4 5 6 7 8 9 10 11 12
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6
0 1 2 3 4 5
0 1 2 3 4
0 1 2 3
0 1 2
0 1
0
Hotfix 1: use tabs. This can work okay if your tab length is the same as me and you use <20 rows on max screen width (well, otherwise alexanderhurst's solution might not solve your problem either).
>>> for i in range(rows):
... print("\t" * i + "\t".join(str(x) for x in l[:rows-i]))
...
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 1 2 3 4 5 6 7 8 9 10 11 12
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6
0 1 2 3 4 5
0 1 2 3 4
0 1 2 3
0 1 2
0 1
0
Hotfix 2: add / remove spaces according to number length (e.g. using log(x) or len(str(x)) or similar) but it becomes too complex.
This solution resembles yours with a few changes
It first prints out the number of spaces needed for the triangle shape
Then it counts up to count
and then it moves to the next line
num = 6
for i in range(num,0,-1):
print(' '*(num - i), end='')
for j in range(i):
print(j + 1, end=' ')
print()
this does have an odd effect if you use a value greater than 10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
You can use tabulate to keep everything in its column. Here I also used a list comprehension to reduce code size.
code:
from tabulate import tabulate
count = 16
numbers = [[''] * (count - i) + [j+1 for j in range(i)] for i in range(count, 0, -1)]
print(tabulate(numbers))
output:
- - - - - - - - - -- -- -- -- -- -- --
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
- - - - - - - - - -- -- -- -- -- -- --
You can count backwards with range():
for i in range(6, 0, -1):
for j in range(1, rows + 1):
if(6-j >= i): # if i = 6, doesn't activate. i=5, activates once. i=4, activates twice, etc.
print(' ', end = ' ')
else:
print(i, end = ' ')
print()
So from what I can see you are trying to make the form:
1
21
321
4321
54321
654321
So the loops need to be reversed and you need to add a space filler section.
rows = 6
for i in range(1, rows+1):
out = ''
for j in range(1, rows):
out += ' '
for j in range(i, 0, -1):
out += str(j)
print(out)
rows -=1
A 1 line statement using list comprehension would be
pattern = '\n'.join((' ' * 2 * i) + ' '.join(str(n) for n in range(1, num + 1)) for i, num in enumerate(range(6, -1, -1)))
For clarification, you can have a look at the below commands executed on Python interactive terminal.
>>> pattern = '\n'.join((' ' * 2 * i) + ' '.join(str(n) for n in range(1, num + 1)) for i, num in enumerate(range(6, -1, -1)))
>>>
>>> print(pattern)
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
>>>
It is suggested to use functional approach for this kind of repetitive work (If you want to try with multiple samples).
def print_num_triangle(n=6):
"""
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
"""
pattern = '\n'.join((' ' * 2 * i) + ' '.join(str(n) for n in range(1, num + 1)) for i, num in enumerate(range(n, -1, -1)))
print(pattern)
if __name__ == "__main__":
print_num_triangle(10)
# 1 2 3 4 5 6 7 8 9 10
# 1 2 3 4 5 6 7 8 9
# 1 2 3 4 5 6 7 8
# 1 2 3 4 5 6 7
# 1 2 3 4 5 6
# 1 2 3 4 5
# 1 2 3 4
# 1 2 3
# 1 2
# 1
#
print_num_triangle(7)
# 1 2 3 4 5 6 7
# 1 2 3 4 5 6
# 1 2 3 4 5
# 1 2 3 4
# 1 2 3
# 1 2
# 1
print_num_triangle() # default -> 6
# 1 2 3 4 5 6
# 1 2 3 4 5
# 1 2 3 4
# 1 2 3
# 1 2
# 1

Table wise value count

I have a table like this, I want to draw a histogram for number of 0, 1, 2, 3 across all table, is there a way to do it?
you can apply melt and hist
for example:
df
A B C D
0 3 1 1 1
1 3 3 2 2
2 1 0 1 1
3 3 2 3 0
4 3 1 1 3
5 3 0 3 1
6 3 1 1 0
7 1 3 3 0
8 3 1 3 3
9 3 3 1 3
df.melt()['value'].value_counts()
3 18
1 14
0 5
2 3

Dataframe concatenate columns

I have a dataframe with a multiindex (ID, Date, LID) and columns from 0 to N that looks something like this:
0 1 2 3 4
ID Date LID
00112 11-02-2014 I 0 1 5 6 7
00112 11-02-2014 II 2 4 5 3 4
00112 30-07-2015 I 5 7 1 1 2
00112 30-07-2015 II 3 2 8 7 1
I would like to group the dataframe by ID and Date and concatenate the columns to the same row such that it looks like this:
0 1 2 3 4 5 6 7 8 9
ID Date
00112 11-02-2014 0 1 5 6 7 2 4 5 3 4
00112 30-07-2015 5 7 1 1 2 3 2 8 7 1
Using pd.concat and pd.DataFrame.xs
pd.concat(
[df.xs(x, level=2) for x in df.index.levels[2]],
axis=1, ignore_index=True
)
0 1 2 3 4 5 6 7 8 9
ID Date
112 11-02-2014 0 1 5 6 7 2 4 5 3 4
30-07-2015 5 7 1 1 2 3 2 8 7 1
Use unstack + sort_index:
df = df.unstack().sort_index(axis=1, level=1)
#for new columns names
df.columns = np.arange(len(df.columns))
print (df)
0 1 2 3 4 5 6 7 8 9
ID Date
112 11-02-2014 0 1 5 6 7 2 4 5 3 4
30-07-2015 5 7 1 1 2 3 2 8 7 1

convert dataframe columns value into digital number

I have following data in my column of data frame. How can I convert each domain name by digital number? I try to use replace in a for loop. However, since I have more than 1200 unqie domain name. I do not want to It seems like it is not a idea way to do it
for i, v in np.ndenumerate(np.unique(df['domain'])):
df['domain'] = df['domain'].replace(to_replace=[v], value=i[0]+1, inplace=True)
but it does not work
data frame:
type domain
0 1 yahoo.com
1 1 google.com
2 0 google.com
3 0 aa.com
4 0 google.com
5 0 aa.com
6 1 abc.com
7 1 msn.com
8 1 abc.com
9 1 abc.com
....
I want to convert to
type domain
0 1 1
1 1 2
2 0 2
3 0 3
4 0 2
5 0 3
6 1 4
7 1 5
8 1 4
9 1 4
....
Let's use pd.factorize:
df.assign(domain=pd.factorize(df.domain)[0]+1)
Output:
type domain
0 1 1
1 1 2
2 0 2
3 0 3
4 0 2
5 0 3
6 1 4
7 1 5
8 1 4
9 1 4
If it does really matter for the digital number assignment, you can try this
import pandas as pd
df.domain.astype('category').cat.codes
Out[154]:
0 4
1 2
2 2
3 0
4 2
5 0
6 1
7 3
8 1
9 1
dtype: int8
If that is matter, you can try
maplist=df[['domain']].drop_duplicates(keep='first').reset_index(drop=True).reset_index().set_index('domain')
maplist['index']=maplist['index']+1
df.domain=df.domain.map(maplist['index'])
Out[177]:
type domain
0 1 1
1 1 2
2 0 2
3 0 3
4 0 2
5 0 3
6 1 4
7 1 5
8 1 4
9 1 4

Resources