LabVIEW TDMS file read with python pandas - python-3.x

How can I read a standard labVIEW generated TDMS file using python?

For the benefit of the community , posting sample code base i have used to efficiently read *.tdms file into pandas dataframe. After multiple trials simplified the code for ease of use and documentation.
#import required libraries
from nptdms import TdmsFile
import numpy as np
import pandas as pd
#bokeh plots
from bokeh.plotting import figure, output_file, show
from bokeh.io import output_notebook
#load the tdms file
tdms_file = TdmsFile("/Volumes/Data/dummy/sample.tdms")
#split all the tdms grouped channels to a separate dataframe
#tdms_file.as_dataframe()
for group in tdms_file.groups():
grp1_data = tdms_file.object('grp1').as_dataframe()
grp2_data = tdms_file.object('grp2').as_dataframe()
#plot the data on bokeh plots
# Use Bokeh chart to make plot
p = bokeh.charts.Line(grp1_data, x='time', y='values', color='parameter', xlabel='time (h)', ylabel='values')
# Display it
bokeh.io.show(p)
Suggestions and improvements are welcome.

For clarity, i would further simplify the answer by Sundar to:
from nptdms import TdmsFile
tdms_file = TdmsFile(r"path_to_.tdms")
for group in tdms_file.groups():
df = tdms_file.object(group).as_dataframe()
print(df.head())
print(df.keys())
print(df.shape)
That will read the different groups of the tdms into pandas dataframes.

This worked for me:
import pandas as pd
from nptdms import TdmsFile
tdms_file = TdmsFile("path/to/tdms_file.tdms")
df = tdms_file['group'].as_dataframe()
print(df.head())
print(df.keys())
print(df.shape)
The npTDMS version 1.1.0 at least didn't have any object method for TdmsFile objects that was used in the previous examples here.

Combination of answers given by Joris and ax7ster -- for npTMDS v1.3.1.
import nptdms
from nptdms import TdmsFile
print(nptdms.__version__)
fn = 'foo.tdms'
tdms_file = TdmsFile(fn)
for group in tdms_file.groups():
df = group.as_dataframe()
print(group.name)
print(df.head())
print(df.keys())
print(df.shape)
This reads all the groups in the TDMS file and doesn't require group names to be known beforehand.
It also possible to convert the whole TDMS file into one DataFrame, see example below.
from nptdms import TdmsFile
fn = 'foo.tdms'
tdms_file = TdmsFile(fn)
df = tdms_file.as_dataframe()

Related

why I get KeyError when I extract data with specific keywords from CSV file using python?

I am trying to use below code to get posts with specific keywords from my csv file but I keep getting KeyErro "Tag1"
import re
import string
import pandas as pd
import openpyxl
import glob
import csv
import os
import xlsxwriter
import numpy as np
keywords = {"agile","backlog"}
# all your keywords
df = pd.read_csv(r"C:\Users\ferr1982\Desktop\split1_out.csv",
error_bad_lines=False)#, sep="," ,
encoding="utf-8")
output = pd.DataFrame(columns=df.columns)
for i in range(len(df.index)):
#if (df.loc[df['Tags'].isin(keywords)]):
if any(x in ((df['Tags1'][i]),(df['Tags2'][i]), (df['Tags3'][i] ),
(df['Tags4'][i]) , (df['Tags5'][i])) for x in keywords):
output.loc[len(output)] = [df[j][i] for j in df.columns]
output.to_csv("new_data5.csv", incdex=False)
Okay, it turned to be that there is a little space before "Tags" column in my CSV file !
it is working now after I added the space to the name in the code above.

import data from multiple file and summing column wise

I have n number of txt files each having 99 floating numbers in 99 column. I read each files and append all data by following script.
import glob
import numpy as np
import matplotlib.pyplot as plt
msd_files = (glob.glob('MSD_no_fs*'))
msd_all=[]
for msd_file in msd_files:
# print(msd_file)
msd = numpy.loadtxt(fname=msd_file, delimiter=',')
msd_all.append(msd)
After that I need to make column wise summation of each files. for example file1,column1+file2,column1+...+file(n)column(1) and iterate this for all column. What will be the effective way to perform this? Can I use list comprehension for that?
**edited code and it works fine now.
import glob
import numpy as np
import matplotlib.pyplot as plt
msd_files = (glob.glob('MSD_no_fs*'))
msd_all=[]
for msd_file in msd_files:
with open(msd_file) as f:
for line in f:
# msd_all.append([float(v) for v in line.strip().split(',')])
msd_all.append(float(line.strip()))
msa_array = np.array(msd_all)
x=np.split(msa_array,99)
x=np.array(x)
result=np.mean(x,axis=0)
print(result.shape)
print(len(result))
It depends on efficiency level you want. Using numpy to load many csv files might be a bad choice. Here is my suggestion.
import glob
import numpy as np
msd_files = (glob.glob('MSD_no_fs*'))
msd_all=[]
for msd_file in msd_files:
with open(msd_file) as f:
for line in f:
msd_all.append([float(v) for v in line.strip().split(',')])
msa_array = np.array(msd_all)
result = msa_array.sum(axis=0)

how to convert column titles from int to str in python

I imported a file from spss, (sav file), however, the titles of my columns
appear as integers instead of strings. Is there a way to fix it? Below is the code I used....I would apreciate any help!
import fnmatch
import sys # import sys
import os
import pandas as pd #pandas importer
import savReaderWriter as spss # to import file from SPSS
import io #importing io
import codecs #to resolve the UTF-8 unicode
with spss.SavReader('file_name.sav') as reader: #Should I add "Np"
records = reader.all()
with codecs.open('file_name.sav', "r",encoding='utf-8', errors='strict')
as fdata: # Not sure if the problems resides on this line
df = pd.DataFrame(records)
df.head()
Wondering whether there is a way to actually convert the titles from numbers to strings. It has happened as if it were excel, but excel has an easy fix for that.
Thanks in advance!
After you have created the DataFrame, you can use df.columns = df.columns.map(str) to change the column headers to strings.

how to solve the keyerror when I load a CSV file using pandas

I use pandas to load a csv file and want to print out data of row, here is original data
orginal data
I want to print out 'violence' data for make a bar chart, but it occuar a keyerror, here is my code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
c_data=pd.read_csv('crime.csv')
print(c_data.head())
print (c_data['violence'])
and the error
error detail
error detail
I tried use capital VIOLENCE,print (c_data['VIOLENCE']),but also failed
error detail
error detail
can someone tell me how to work it out?
Try the following if your data is small:
with open('crime.csv', 'r') as my_file:
reader = csv.reader(my_file)
rows = list(reader)
print rows[3]
If your data is big, try this:
from itertools import islice
with open('crime.csv', 'r') as my_file:
reader = csv.reader(my_file)
print next(islice(reader, 3, 4))

How to import values from a column of csv dataset into python for t-test?

New coder here, trying to run some t-tests in Python 3.6. Right now, to run my t-tests between my 2 data sets, I have been doing the following:
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.tools import FigureFactory as FF
import numpy as np
import pandas as pd
import scipy
from scipy import stats
long_term_survivor_GENE1 = [-0.38,-0.99,-1.04,0.1, etc..]
short_term_survivor_GENE1 = [0.32, 0.33,0.96, etc...]
stats.ttest_ind(long_term_survivor_GENE1,short_term_survivor_GENE1)
Which requires me to manually enter the values for each column of both data sets for each specific gene (GENE1 in this case). Is there any way to be able to call for the values from the data set so that Python can just read the values without me typing them out myself? For example, some way that I can just say:
long_term_survivor_GENE1 = ##call values from GENE1 column from dataset 1##
short_term_survivor_GENE1 = ## call values from GENE1 column from dataset 2##
Thanks for any help, and sorry that I'm not very well-versed in this stuff. Appreciate any feedback/tips. If you have any other questions, please let me know!
If you've shoved your data into the columns of a pandas dataframe then it might be as easy as this.
>>> import pandas as pd
>>> long_term_survivor_GENE1 = [-0.38,-0.99,-1.04,0.1]
>>> short_term_survivor_GENE1 = [0.32, 0.33,0.96, 0.56]
>>> df = pd.DataFrame({'long_term_survivor_GENE1': long_term_survivor_GENE1, 'short_term_survivor_GENE1': short_term_survivor_GENE1})
>>> from scipy import stats
>>> stats.ttest_ind(df['long_term_survivor_GENE1'], df['short_term_survivor_GENE1'])
Ttest_indResult(statistic=-3.615804684179662, pvalue=0.011153077626049458)
It might be a good idea to review the statistics behind this though. If you haven't already got them in a dataframe then have a look for some of the many answers here on SO about using read_csv for assistance.

Resources