dumping column data from data frames to list in python

dumping column data from data frames to list in python - python-3.x

import numpy as np
import pandas as pd
def ExtractCsv( Start,End):
lsta,lstb,lstc,lstd= list(),list(),list(),list()
j=0;
for i in range(Start,End+1):
f1 = pd.read_csv('C:/Users/sanilm/Desktop/Desktop_backup/opc/csv/Newfolder/fault_data.tar/fault_data/test/healthy'+ str(i)+'.csv');
listc=list(f1['c'])
listd=list(f1['d'])
liste=list(f1['e'])
listf=list(f1['f'])
lsta.append(listc)
lstb.append(listd)
lstc.append(liste)
lstd.append(listf)
print(lsta)
return f1
f1=ExtractCsv(1,3)
input csv file there are 3 files:
a b c d e f
1 10 2901.1 13.915 39.812 6.2647
1 10 2906.1 13.368 42.083 12.945
1 10 2805.3 12.951 42.261 13.398
1 10 3049.2 14.101 43.499 15.237
1 10 2854.8 13.978 42.699 9.1297
expected output:
[2901.1, 2906.1, 2805.3, 3049.2, 2854.8, 2860.9, 2992.9, 2867.1, 2947.6, 2679.4, 2891.2, 2853.8, 2896.4, 3114.6, 3155.3, 2930.2, 2810.0, 2903.5]
but the output am getting is
[[2901.1, 2906.1, 2805.3, 3049.2, 2854.8], [2860.9, 2992.9, 2867.1, 2947.6, 2679.4, 2891.2], [2853.8, 2896.4, 3114.6, 3155.3, 2930.2, 2810.0, 2903.5]]
any suggestions on how can i achieve my expected output

It looks like you just want to flatten your results.
Maybe try this:
Making a flat list out of list of lists in Python
From link above (but there you can look up more information):
flat_list = [item for sublist in l for item in sublist]

In loop you can create list of DataFrames and then concat together.
Also if many columns in file is possible add parameter usecols to read_csv for read only specified columns names:
def ExtractCsv(Start,End):
dfs = []
for i in range(Start,End+1):
path = 'C:/Users/sanilm/Desktop/Desktop_backup/opc/csv/Newfolder/fault_data.tar/fault_data/test/healthy'+ str(i)+'.csv'
f1 = pd.read_csv(path, usecols=['c','d','e','f'])
dfs.append(f1)
return pd.concat(dfs, ignore_index=True)
df = ExtractCsv(1,3)
Last if need extract some column to list:
lsta = df['c'].tolist()

Related

Make some transformation to a column and then set it as index

Let say I have below pandas dataframe
import pandas as pd
dat = pd.DataFrame({'A' : [1,2,3,4], 'B' : [3,4,5,6]})
dat['A1'] = dat['A'].astype(str) + '_Something'
dat.set_index('A1')
While this alright, I want to achieve below things
Instead of having this line dat['A1'] = dat['A'].astype(str) + '_Something', can I transform the column A on the fly and directly pass that transformed values to dat.set_index? My transformation function is rather little complex, so I am looking for some general approach
After setting index, can I remove A1 which is now sitting as like the header of index
Any pointer will be very helpful

You can pass a np.array to df.set_index. So, just chain Series.to_numpy after the transformation, and make sure that you set the inplace parameter to True inside set_index.
dat.set_index(
(dat['A'].astype(str) + '_Something') # transformation
.to_numpy(),
inplace=True)
print(dat)
A B
1_Something 1 3
2_Something 2 4
3_Something 3 5
4_Something 4 6
So, generalized with a function applied, that would be something like:
def f(x):
y = f'{x}_Something'
return y
dat.set_index(dat['A'].apply(f).to_numpy(), inplace=True)

Replace items like A2 as AA in the dataframe

I have a list of items, like "A2BCO6" and "ABC2O6". I want to replace them as A2BCO6--> AABCO6 and ABC2O6 --> ABCCO6. The number of items are much more than presented here.
My dataframe is like:
listAB:
Finctional_Group
0 Ba2NbFeO6
1 Ba2ScIrO6
3 MnPb2WO6
I create a duplicate array and tried to replace with following way:
B = ["Ba2", "Pb2"]
C = ["BaBa", "PbPb"]
for i,j in range(len(B)), range(len(C)):
listAB["Finctional_Group"]= listAB["Finctional_Group"].str.strip().str.replace(B[i], C[j])
But it does not produce correct output. The output is like:
listAB:
Finctional_Group
0 PbPbNbFeO6
1 PbPbScIrO6
3 MnPb2WO6
Please suggest the necessary correction in the code.
Many thanks in advance.

I used for simplicity purpose chemparse package that seems to suite your needs.
As always we import the required packages, in this case chemparse and pandas.
import chemparse
import pandas as pd
then we create a pandas.DataFrame object like in your example with your example data.
df = pd.DataFrame(
columns=["Finctional_Group"], data=["Ba2NbFeO6", "Ba2ScIrO6", "MnPb2WO6"]
)
Our parser function will use chemparse.parse_formula which returns a dict of element and their frequency in a molecular formula.
def parse_molecule(molecule: str) -> dict:
# initializing empty string
molecule_in_string = ""
# iterating over all key & values in dict
for key, value in chemparse.parse_formula(molecule).items():
# appending number of elements to string
molecule_in_string += key * int(value)
return molecule_in_string
molecule_in_string contains the molecule formula without numbers now. We just need to map this function to all elements in our dataframe column. For that we can do
df = df.applymap(parse_molecule)
print(df)
which returns:
0 BaBaNbFeOOOOOO
1 BaBaScIrOOOOOO
2 MnPbPbWOOOOOO
dtype: object
Source code for chemparse: https://gitlab.com/gmboyer/chemparse

Using pandas manipulate number format

just out of my curiosity, I have a name list with phone numbers in a csv file, and I want to change these phone numbers from ############ (11 digits) to the format of ###-####-####, adding two minus sign in between 3-4 and 7-8 place.
is this possible?

If it's Dataframe you can use apply with formate string
df
num
0 09187543839
1 08745763412
df.num = df.num.apply(lambda x : "{}-{}-{}".format(x[:3],x[3:7],x[7:]))
df
num
0 091-8754-3839
1 087-4576-3412

Yes, it is possible. Below is a code-snippet that accomplishes what you want:
phone = str(55512354567)
print(f'{phone[:3]}-{phone[3:7]}-{phone[7:]}')
You can adapt the above idea to your Pandas dataframe as shown below:
# Sample data
data_df = pd.DataFrame([[55512345678], [55587654321]], columns=['phone'])
# Create a string column
data_df['phone_str'] = data_df['phone'].map(lambda x: str(x))
# Convert the column values to the right format
data_df['phone_str'] = data_df['phone_str'].map(lambda x: f'{x[:3]}-{x[3:7]}-{x[7:]}')

I may not be using pandas but this could potentially work...
n = 3
n1 = 7
str = "12345678901"
l, m, r = str[:n], str[n:n1], str[n1:]
final = l+"-"+m+"-"+r
print(final)
Output:
123-4567-8901

How to split a row by specific string length in a dataframe in Python?

I have a file like this:
system
1000
1VEA C 1 9.294 11.244 11.083
1VEA C1 2 9.324 11.375 11.161
1VEA H 3 9.243 11.396 11.232
...
1203VEA H2092601 20.738 16.293 7.837
1203VEA H2192602 20.900 16.225 7.869
1203VEA H2292603 20.822 16.330 7.989
I want to generate a dataframe which include 6 columns. I used following command to
df = pd.read_csv('system.gro', skiprows=[0,1], delim_whitespace=True, header=None)
generate this dataframe. However, when it came to the row started with 1203, columns between H20 and 92601 has no white space and I cannot just use above command to split it. I used to split the line string by specific length like:
f1 = open(fileName, 'r')
for line in f1.readlines():
atomName = line[8:15].strip(' ')
globalIdx = int(line[15:20].strip(' '))
But it takes really long time to deal with the file. Does anyone has any idea about how to deal with this using dataframe?

As suggested by SRT HellKitty in the comments, use pd.read_fwf (see docs) like this:
import pandas as pd
data="""
1VEA C 1 9.294 11.244 11.083
1VEA C1 2 9.324 11.375 11.161
1VEA H 3 9.243 11.396 11.232
1203VEA H2092601 20.738 16.293 7.837
1203VEA H2192602 20.900 16.225 7.869
1203VEA H2292603 20.822 16.330 7.989
"""
### make sure that the widths are correct!
df=pd.read_fwf(pd.compat.StringIO(data),colspecs=[(0,8),(8,14),(14,20),(20,28),(28,36),(36,44)])
print(df)

Store the value from pandas dataframe without index or header

I am trying to get the values from a CSV file using python and pandas. To avoid index and headers i am using .values with iloc but as an output my value is stored in [] brackets. I dont want them i just need the value. I dont want to print it but want to use it for other operations.
My code is :
import pandas as pd
ctr_x = []
ctr_y = []
tl_list = []
br_list = []
object_list = []
img = None
obj = 'red_hat'
df = pd.read_csv('ring_1_05_sam.csv')
ctr_x = df.iloc[10:12, 0:1].values #to avoid headers and index
ctr_y = df.iloc[10:12, 1:2].values #to avoid headers and index
ctr_x =[]
ctr_y =[]
If i print the result of ctr_x and ctr_y to check if correct values are recorded
The output i get is :
[[1536.25]
[1536.5 ]]
[[895.25]
[896. ]]
So i short i am getting the correct values but i don't want the brackets. Can anyone please suggest any other alternatives to my method. Note : I dont want to print the values but store it(without index and headers) for further operations

When you use column slice, pandas returns a Dataframe. Try
type(df.iloc[10:12, 0:1])
pandas.core.frame.DataFrame
This in turn will return a 2-D array when you use
df.iloc[10:12, 0:1].values
If you want a 1 dimensional array, you can use integer indexing which will return a Series,
type(df.iloc[10:12, 0])
pandas.core.series.Series
And a one dimensional array,
df.iloc[10:12, 0].values
So use
ctr_x = df.iloc[10:12, 0].values
ctr_y = df.iloc[10:12, 1].values

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

dumping column data from data frames to list in python - python-3.x

It looks like you just want to flatten your results. Maybe try this: Making a flat list out of list of lists in Python From link above (but there you can look up more information): flat_list = [item for sublist in l for item in sublist]

Related

Make some transformation to a column and then set it as index

Replace items like A2 as AA in the dataframe

Using pandas manipulate number format

How to split a row by specific string length in a dataframe in Python?

Store the value from pandas dataframe without index or header

Categories

Resources