Covert a dataframe into a matrix form - python-3.x

I have a dataset looks like this:
state VDM MDM OM
AP 1 2 5
GOA 1 2 1
GU 1 2 4
KA 1 5 1
MA 1 4 4
I have tried this code:
aMat=df1000.as_matrix()
print(aMat)
here df1000 is the dataset.
But the above code gives this output:
[['AP' 1 2 5]
['GOA' 1 2 1]
['GU' 1 2 4]
['KA' 1 5 1]
['MA' 1 4 4]]
I want to create a 2d list or matrix which looks like this:
[[1, 2, 5], [1, 2, 1], [1, 2, 4], [1, 5, 1], [1, 4, 4]]

You can use df.iloc[]:
df.iloc[:,1:].to_numpy()
array([[1, 2, 5],
[1, 2, 1],
[1, 2, 4],
[1, 5, 1],
[1, 4, 4]], dtype=int64)
Or for string matrix:
df.astype(str).iloc[:,1:].to_numpy()
array([['1', '2', '5'],
['1', '2', '1'],
['1', '2', '4'],
['1', '5', '1'],
['1', '4', '4']], dtype=object)
Note why we are not using as_matrix()
".as_matrix will be removed in a future version. Use .values instead."

Select all columns without first by DataFrame.iloc and convert integer values to strings by DataFrame.astype, last convert to numpy array by to_numpy or DataFrame.values:
#pandas 0.24+
aMat=df1000.iloc[:, 1:].astype(str).to_numpy()
#pandas below
aMat=df1000.iloc[:, 1:].astype(str).values
Or remove first column by DataFrame.drop:
#pandas 0.24+
aMat=df1000.drop('state', axis=1).astype(str).to_numpy()
#pandas below
aMat=df1000.drop('state', axis=1).astype(str).values

Related

Extending a list

Currently I am tidying up one of my projects by using the more pythonic way of dong things. Now I struggle extending a list by values from a dictionary that are lists.
my_dict = {'a': [1, 2, 3], 'b': [2, 3, 4], 'c': [4, 5, 6]}
criteria = ['a', 'c']
my_list = []
for c in criteria:
my_list.extend(my_dict[c])
Results in [1, 2, 3, 4, 5, 6] which is the sought for result, where as
my_list = []
my_list.extend(my_dict[c] for c in criteria)
Results in a nested list [[1, 2, 3], [4, 5, 6]]. I can't quite find a reason why this is happening
Your code does not work because it attempts to extend the list with the result of the generator comprehension, which is a list of lists:
>>> list(my_dict[c] for c in criteria)
[[1, 2, 3], [4, 5, 6]]
This is because my_dict[c] is itself a list.
A more Pythonic way is to use a list comprehension:
my_dict = {'a': [1, 2, 3], 'b': [2, 3, 4], 'c': [4, 5, 6]}
criteria = ['a', 'c']
my_list = [item for k in criteria for item in my_dict[k]]
>>> my_list
[1, 2, 3, 4, 5, 6]
This uses a nested loop to select the values from the dict by criteria and to flatten the lists that are those values.

Python 3 ~ How to take rows from a csv file and put them into a list

I would like to know how to take this file:
name,AGATC,AATG,TATC
Alice,2,8,3
Bob,4,1,5
Charlie,3,2,5
and put it in a list like the following:
[['Alice', 'Bob', 'Charlie'], [2, 8, 3], [4, 1, 5], [3, 2, 5]]
I'm fairly new to python so excuse me
my current code looks like this:
file = open(argv[1] , "r")
file1 = open(argv[2] , "r")
text = file1.read()
strl = []
with file:
csv = csv.reader(file,delimiter=",")
for row in csv:
strl = row[1:9]
break
df = pd.read_csv(argv[1],header=0)
df = [df[col].tolist() for col in df.columns]
ignore the strl part its for something else unrelated
but it outputs like this:
[['Alice', 'Bob', 'Charlie'], [2, 4, 3], [8, 1, 2], [3, 5, 5]]
i want it to output like this:
[['Alice', 'Bob', 'Charlie'], [2, 8, 3], [4, 1, 5], [3, 2, 5]]
i would like it to output like the above sample
Using pandas
In [13]: import pandas as pd
In [14]: df = pd.read_csv("a.csv",header=None)
In [15]: df
Out[15]:
0 1 2 3
0 Alice 2 8 3
1 Bob 4 1 5
2 Charlie 3 2 5
In [16]: [df[col].tolist() for col in df.columns]
Out[16]: [['Alice', 'Bob', 'Charlie'], [2, 4, 3], [8, 1, 2], [3, 5, 5]]
Update:
In [51]: import pandas as pd
In [52]: df = pd.read_csv("a.csv",header=None)
In [53]: data = df[df.columns[1:]].to_numpy().tolist()
In [57]: data.insert(0,df[0].tolist())
In [58]: data
Out[58]: [['Alice', 'Bob', 'Charlie'], [2, 8, 3], [4, 1, 5], [3, 2, 5]]
Update:
In [51]: import pandas as pd
In [52]: df = pd.read_csv("a.csv")
In [94]: df
Out[94]:
name AGATC AATG TATC
0 Alice 2 8 3
1 Bob 4 1 5
2 Charlie 3 2 5
In [97]: data = df.loc[:, df.columns != 'name'].to_numpy().tolist()
In [98]: data.insert(0, df["name"].tolist())
In [99]: data
Out[99]: [['Alice', 'Bob', 'Charlie'], [2, 8, 3], [4, 1, 5], [3, 2, 5]]

Python: Converting a list of lists of numbers to a list of strings [duplicate]

This question already has an answer here:
How to convert nested list of numbers to list of strings?
(1 answer)
Closed 2 years ago.
I have a list of lists of numbers in this form:
[[2, 3, 4], [0, 2, 3, 4], [1, 3, 4], [1, 2, 4], [2]]
I want this list to be converted to:
['2 3 4', '0 2 3 4', '1 3 4', '1 2 4', '2']
How do I go about it?
You can use list comprehension like so
my_list = [[2, 3, 4], [0, 2, 3, 4], [1, 3, 4], [1, 2, 4], [2]]
formatted = [" ".join(map(str, x)) for x in my_list]
print(formatted)
Prints
['2 3 4', '0 2 3 4', '1 3 4', '1 2 4', '2']
You need to iterate over all sublists in the list, then join the sublist into the string and append it to the new list. The code would look like this:
list = [[2, 3, 4], [0, 2, 3, 4], [1, 3, 4], [1, 2, 4], [2]]
result = []
for sublist in list:
result.append(" ".join(map(str, sublist))) # join accepts only string lists, so you need to convert the sublist into the list of integer which is achieved via map function
Now print(result) will output:
['2 3 4', '0 2 3 4', '1 3 4', '1 2 4', '2']

Pandas Get First Element of Each Tuple in Cell

Given the following data frame:
import pandas as pd
df=pd.DataFrame({'A':['a','b','c'],
'B':[[[1,2],[3,4],[5,6]],[[1,2],[3,4],[5,6]],[[1,2],[3,4],[5,6]]]})
df
A B
0 a [[1, 2], [3, 4], [5, 6]]
1 b [[1, 2], [3, 4], [5, 6]]
2 c [[1, 2], [3, 4], [5, 6]]
I'd like to create a new column ('C') containing the first value in each element of the tuple of column B like this:
A B C
0 a [[1, 2], [3, 4], [5, 6]] [1,3,5]
1 b [[1, 2], [3, 4], [5, 6]] [1,3,5]
2 c [[1, 2], [3, 4], [5, 6]] [1,3,5]
So far, I've tried:
df['C']=df['B'][0]
...but that only returns the first tuple ([1, 2]).
Thanks in advance!
This works for me -
df['C'] = df['B'].str[0]
df['C'] = df['B'].apply(lambda x: [y[0] for y in x])
try this:
df['C'] = df["B"].apply(lambda x : [y[0] for y in list(x)])

Pandas: Reading an CSV file with the intention of creating a 3D array

First time posting here.
So my question is regarding how to read an CSV file in Pandas with the intention of creating a 2d array with a matrix within each element.
So for instance take this example CSV file
1,1,1;2,2,2;3,3,3
1,1,1;2,2,2;3,3,3
1,1,1;2,2,2;3,3,3
Where each new line represents a separate matrix
and each semicolon represents a separate row within each matrix
and each comma represents a separate element within each row
So from this I would like to get to this type of array:
[
[[1,1,1],[2,2,2],[3,3,3]],
[[1,1,1],[2,2,2],[3,3,3]],
[[1,1,1],[2,2,2],[3,3,3]]
]
Currently, when I use pandas.read_csv() on something like this it'll not read the semicolon as a separator and so something like 1;2 would be read as a string.
Thanks!
You can use read_csv with parameter sep=';' and header=None (if no header in csv). Then you need apply function str.split, because string functions work only with Series (columns of df):
import pandas as pd
import io
temp=u"""1,1,1;2,2,2;3,3,3
1,1,1;2,2,2;3,3,3
1,1,1;2,2,2;3,3,3"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep=";", header=None)
print (df)
0 1 2
0 1,1,1 2,2,2 3,3,3
1 1,1,1 2,2,2 3,3,3
2 1,1,1 2,2,2 3,3,3
print (df.apply(lambda x: x.str.split(',')))
0 1 2
0 [1, 1, 1] [2, 2, 2] [3, 3, 3]
1 [1, 1, 1] [2, 2, 2] [3, 3, 3]
2 [1, 1, 1] [2, 2, 2] [3, 3, 3]
print (df.apply(lambda x: x.str.split(',')).values.tolist())
[[['1', '1', '1'], ['2', '2', '2'], ['3', '3', '3']],
[['1', '1', '1'], ['2', '2', '2'], ['3', '3', '3']],
[['1', '1', '1'], ['2', '2', '2'], ['3', '3', '3']]]
But if need list of int:
import pandas as pd
import io
temp=u"""1,1,1;2,2,2;3,3,3
1,1,1;2,2,2;3,3,3
1,1,1;2,2,2;3,3,3"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep=";", header=None)
print (df)
0 1 2
0 1,1,1 2,2,2 3,3,3
1 1,1,1 2,2,2 3,3,3
2 1,1,1 2,2,2 3,3,3
for col in df.columns:
df[col] = df[col].str.split(',')
#if need convert string numbers to int
df[col] = [[int(y) for y in x] for x in df[col]]
print (df.values.tolist())
[[[1, 1, 1], [2, 2, 2], [3, 3, 3]],
[[1, 1, 1], [2, 2, 2], [3, 3, 3]],
[[1, 1, 1], [2, 2, 2], [3, 3, 3]]]

Resources