Extract the onset of pulses in praat - pulse

I am currently wirting a script to calculte VOT of consonants in English
Tier 3 : phoneme
Tier 4 : CV
The goal is to detect in the interval of Tier 4 the onset of the pulses. Here is a part of the code :
thisSound$ = selected$("Sound")
thisTextGrid$ = selected$("TextGrid")
select TextGrid 'thisTextGrid$'
numberOfCV = Get number of intervals: 4
appendInfoLine: "There are ", numberOfCV, " intervals."
for i from 1 to numberOfCV
appendInfoLine: i
select TextGrid 'thisTextGrid$'
thisCV$ = Get label of interval: 4, i
appendInfoLine: thisCV$
When I try the command "pulse listing" it doesn't work...
Can someone help me with that ?
Many thanks !

Related

running for loop until arbitrary index (python 3.x)

So I have these strings that I split by spaces (' ') and I just rolled them into a single list I called 'keyLabelRun'
so it looks like this:
keyLabelRun[0-12]:
0 OS=Dengue
1 virus
2 3
3 PE=4
4 SV=1
5 Split=0
6
7 OS=Bacillus
8 subtilis
9 XF-1
10 GN=opuBA
11 PE=4
12 SV=1
I only want the elements that include and are after "OS=", anything else, whether it be "SV=" or "PE=" etc. I want to skip over those elements until I get to the next "OS="
The number of elements to the next "OS=" is arbitrary so that's where I'm having the problem.
This is what I'm currently trying:
OSarr = []
for i in range(len(keyLabelrun)):
if keyLabelrun[i].count('OS='):
OSarr.append(keyLabelrun[i])
if keyLabelrun[i+1].count('=') != 1:
continue
But the elements where "OS=" is not included is what is tripping me up I think.
Also at the end I'm going to join them all back together in their own elements but I feel like I will be able to handle that after this.
In my attempt, I am trying to append all elements I'm looking for in order to an new list 'OSarr'
If anyone can lend a hand, it would be much appreciated.
Thank you.
These list of strings came from a dataset that is a text file in the form:
>tr|W0FSK4|W0FSK4_9FLAV Genome polyprotein (Fragment) OS=Dengue virus 3 PE=4 SV=1 Split=0
MNNQRKKTGKPSINMLKRVRNRVSTGSQLAKRFSKGLLNGQGPMKLVMAFIAFLRFLAIPPTAGVLARWGTFKKSGAIKVLKGFKKEISNMLSIINKRKKTSLCLMMILPAALAFHLTSRDGEPRMIVGKNERGKSLLFKTASGINMCTLIAMDLGEMCDDTVTYKCPHITEVEPEDIDCWCNLTSTWVTYGTCNQAGEHRRDKRSVALAPHVGMGLDTRTQTWMSAEGAWRQVEKVETWALRHPGFTILALFLAHYIGTSLTQKVVIFILLMLVTPSMTMRCVGVGNRDFVEGLSGATWVDVVLEHGGCVTTMAKNKPTLDIELQKTEATQLATLRKLCIEGKITNITTDSRCPTQGEATLPEEQDQNYVCKHTYVDRGWGNGCGLFGKGSLVTCAKFQCLEPIEGKVVQYENLKYTVIITVHTGDQHQVGNETQGVTAEITPQASTTEAILPEYGTLGLECSPRTGLDFNEMILLTMKNKAWMVHRQWFFDLPLPWTSGATTETPTWNRKELLVTFKNAHAKKQEVVVLGSQEGAMHTALTGATEIQNSGGTSIFAGHLKCRLKMDKLELKGMSYAMCTNTFVLKKEVSETQHGTILIKVEYKGEDVPCKIPFSTEDGQGKAHNGRLITANPVVTKKEEPVNIEAEPPFGESNIVIGIGDNALKINWYKKGSSIGKMFEATARGARRMAILGDTAWDFGSVGGVLNSLGKMVHQIFGSAYTALFSGVSWVMKIGIGVLLTWIGLNSKNTSMSFSCIAIGIITLYLGAVVQADMGCVINWKGKELKCGSGIFVTNEVHTWTEQYKFQADSPKRLATAIAGAWENGVCGIRSTTRMENLLWKQIANELNYILWENNIKLTVVVGDIIGVLEQGKRTLTPQPMELKYSWKTWGKAKIVTAETQNSSFIIDGPNTPECPSVSRAWNVWEVEDYGFGVFTTNIWLKLREVYTQLCDHRLMSAAVKDERAVHADMGYWIESQKNGSWKLEKASLIEVKTCTWPKSHTLWSNGVLESDMIIPKSLAGPISQHNHRPGYHTQTAGPWHLGKLELDFNYCEGTTVVITENCGTRGPSLRTTTVSGKLIHEWCCRSCTLPPLRYMGEDGCWYGMEIRPISEKEENMVKSLVSAGSGKVDNFTMGVLCLAILFEEVMRGKFGKKHMIAGVFFTFVLLLSGQITWRDMAHTLIMIGSNASDRMGMGVTYLALIATFKIQPFLALGFFLRKLTSRENLLLGVGLAMATTLQLPEDIEQMANGIALGLMALKLITQFETYQLWTALISLTCSNTIFTLTVAWRTATLILAGVSLLPVCQSSSMRKTDWLPMAVAAMGVPPLPLFIFGLKDTLKRRSWPLNEGVMAVGLVSILASSLLRNDVPMAGPLVAGGLLIACYVITGTSADLTVEKAADITWEEEAEQTGVSHNLMITVDDDGTMRIKDDETENILTVLLKTALLIVSGIFPYSIPATLLVWHTWQKQTQRSGVLWDVPSPPETQKAELEEGVYRIKQQGIFGKTQVGVGVQKEGVFHTMWHVTRGAVLTYNGKRLEPNWASVKKDLISYGGGWRLSAQWQKGEEVQVIAVEPGKNPKNFQTMPGTFQTTTGEIGAIALDFKPGTSGSPIINREGKVVGLYGNGVVTKNGGYVSGIAQTNAEPDGPTPELEEEMFKKRNLTIMDLHPGSGKTRKYLPAIVREAIKRRLRTLILAPTRVVAAEMEEALKGLPIRYQTTATKSEHTGREIVDLMCHATFTMRLLSPVRVPNYNLIIMDEAHFTDPASIAARGYISTRVGMGEAAAIFMTATPPGTADAFPQSNAPIQDEERDIPERSWNSGNEWITDFAGKTVWFVPSIKAGNDIANCLRKNGKKVIQLSRKTFDTEYQKTKLNDWDFVV
>tr|M4KW32|M4KW32_BACIU Choline ABC transporter (ATP-binding protein) OS=Bacillus subtilis XF-1 GN=opuBA PE=4 SV=1 Split=0
MLTLENVSKTYKGGKKAVNNVNLKIAKGEFICFIGPSGCGKTTTMKMINRLIEPSAGKIFIDGENIMDQDPVELRRKIGYVIQQIGLFPHMTIQQNISLVPKLLKWPEQQRKERARELLKLVDMGPEYVDRYPHELSGGQQQRIGVLRALAAEPPLILMDEPFGALDPITRDSLQEEFKKLQKTLHKTIVFVTHDMDEAIKLADRIVILKAGEIVQVGTPDDILRNPADEFVEEFIGKERLIQSSSPDVERVDQIMNTQPVTITADKTLSEAIQLMRQERVDSLLVVDDEHVLQGYVDVEIIDQCRKKANLIGEVLHEDIYTVLGGTLLRDTVRKILKRGVKYVPVVDEDRRLIGIVTRASLVDIVYDSLWGEEKQLAALS
>sp|Q8AWH3|SX17A_XENTR Transcription factor Sox-17-alpha OS=Xenopus tropicalis GN=sox17a PE=2 SV=1 Split=0
MSSPDGGYASDDQNQGKCSVPIMMTGLGQCQWAEPMNSLGEGKLKSDAGSANSRGKAEARIRRPMNAFMVWAKDERKRLAQQNPDLHNAELSKMLGKSWKALTLAEKRPFVEEAERLRVQHMQDHPNYKYRPRRRKQVKRMKRADTGFMHMAEPPESAVLGTDGRMCLESFSLGYHEQTYPHSQLPQGSHYREPQAMAPHYDGYSLPTPESSPLDLAEADPVFFTSPPQDECQMMPYSYNASYTHQQNSGASMLVRQMPQAEQMGQGSPVQGMMGCQSSPQMYYGQMYLPGSARHHQLPQAGQNSPPPEAQQMGRADHIQQVDMLAEVDRTEFEQYLSYVAKSDLGMHYHGQESVVPTADNGPISSVLSDASTAVYYCNYPSA
I got it! :D
OSarr = []
G = 0
for i in range(len(keyLabelrun)):
OSarr.append(keyLabelrun[G])
G += 1
if keyLabelrun[G].count('='):
while keyLabelrun[G].count('OS=') != 1:
G+=1
Maybe next time everyone, thank you!
Due to the syntax, you have to keep track of which part (OS, PE, etc) you're currently parsing. Here's a function to extract the species name from the FASTA header:
def extract_species(description):
species_parts = []
is_os = False
for word in description.split():
if word[:3] == 'OS=':
is_os = True
species_parts.append(word[3:])
elif '=' in word:
is_os = False
elif is_os:
species_parts.append(word)
return ' '.join(species_parts)
You can call it when processing your input file, e.g.:
from Bio import SeqIO
for record in SeqIO.parse('input.fa', 'fasta'):
species = extract_species(record.description)

Convert user input to time which changes boolean value for the duration entered?

I'm working on this side project game to grasp python better. I'm trying to have the user enter the amount of time the character has to spend busy, then not allow the user to do the same thing until they have completed the original time entered. I have tried a few methods with varying error results from my noob ways. (timestamps, converting input to int and time in different spots, timeDelta)
def Gold_mining():
while P.notMining:
print('Welcome to the Crystal mines kid.\nYou will be paid in gold for your labour,\nif lucky you may get some skill points or bonus finds...\nGoodluck in there.')
print('How long do you wish to enter for?')
time_mining = int(input("10 Gold Per hour. Max 8 hours --> "))
if time_mining > 0 and time_mining <= 8:
time_started = current_time
print(f'You will spend {time_mining} hours digging in the mines.')
P.gold += time_mining * 10
print(P.gold)
P.notMining = False
End_Time = (current_time + timedelta(hours = 2))
print(f'{End_Time} time you exit the mines...')
elif time_mining > 8:
print("You can't possibly mine for that long kid, go back and think about it.")
else:
print('Invalid')
After the set amount of time i would like for it to change the bool value back to false so that you can mine again.
"Crystal Mining" is mapped to a different key for testing so my output says "Inventory" but would say "Crystal Mining" when it works properly and currently looks like this:
*** Page One ***
Intro Page
02:15:05
1 Character Stats
2 Rename Character
3 Inventory
4 Change Element
5 Menu
6 Exit
Num: 3
Welcome to the Crystal mines kid.
You will be paid in gold for your labour,
if lucky you may get some skill points or bonus finds...
Goodluck in there.
How long do you wish to enter for?
10 Gold Per hour. Max 8 hours --> 1
You will spend 1 hours digging in the mines.
60
Traceback (most recent call last):
File "H:\Python ideas\input_as_always.py", line 176, in <module>
intro.pageInput()
File "H:\Python ideas\input_as_always.py", line 45, in pageInput
self.pageOptions[pInput]['entry']()
File "H:\Python ideas\input_as_always.py", line 134, in Gold_mining
End_Time = (current_time + timedelta(hours = 2))
TypeError: can only concatenate str (not "datetime.timedelta") to str

Normalising units/Replace substrings based on lists using Python

I am trying to normalize weight units in a string.
Eg:
1.SUCO MARACUJA COM GENGIBRE PCS 300 Millilitre - SUCO MARACUJA COM GENGIBRE PCS 300 ML
2. OVOS CAIPIRAS ANA MARIA BRAGA 10UN - OVOS CAIPIRAS ANA MARIA BRAGA 10U
3. SUCO MARACUJA MAMAO PCS 300 Gram - SUCO MARACUJA MAMAO PCS 300 G
4. SUCO ABACAXI COM MACA PCS 300Milli litre - SUCO ABACAXI COM MACA PCS 300ML
The keyword table is :
unit = ['Kilo','Kilogram','Gram','Milligram','Millilitre','Milli
litre','Dozen','Litre','Un','Und','Unid','Unidad','Unidade','Unidades']
norm_unit = ['KG','KG','G','MG','ML','ML','DZ','L','U','U','U','U','U','U']
I tried to take up these lists as a table but am having difficulty in comparing two dataframes or tables in python.
I tried the below code.
unit = ['Kilo','Kilogram','Gram','Milligram','Millilitre','Milli
litre','Dozen','Litre','Un','Und','Unid','Unidad','Unidade','Unidades']
norm_unit = ['KG','KG','G','MG','ML','ML','DZ','L','U','U','U','U','U','U']
z='SUCO MARACUJA COM GENGIBRE PCS 300 Millilitre'
#for row in mongo_docs:
#z = row['clean_hntproductname']
for x in unit:
for y in norm_unit:
if (re.search(r'\s'+x+r'$',z,re.I)):
# clean_hntproductname = t.lower().replace(x.lower(),y.lower())
# myquery3 = { "_id" : row['_id']}
# newvalues3 = { "$set": {"clean_hntproductname" : 'clean_hntproductname'} }
# ds_hnt_prod_data.update_one(myquery3, newvalues3)
I'm using Python(Jupyter) with MongoDb(Compass). Fetching data from Mongo and writing back to it.
From my understanding you want to:
Update all the rows in a table which contain the words in the unit array, to the ones in norm_unit.
(Disclaimer: I'm not familiar with MongoDB or Python.)
What you want is to create a mapping (using a hash) of the words you want to change.
Here's a trivial solution (i.e. not best solution but would probably point you in the right direction.)
unit_conversions = {
'Kilo': 'KG'
'Kilogram': 'KG',
'Gram': 'G'
}
# pseudo-code
for each row that you want to update
item_description = get the value of the string in the column
for each key in unit_conversion (e.g. 'Kilo')
see if the item_description contains the key
if it does, replace it with unit_convertion[key] (e.g. 'KG')
update the row

Parsing heterogenous data from a text file in Python

I am trying to parse raw data results from a text file into an organised tuple but having trouble getting it right.
My raw data from the textfile looks something like this:
Episode Cumulative Results
EpisodeXD0281119
Date collected21/10/2019
Time collected10:00
Real time PCR for M. tuberculosis (Xpert MTB/Rif Ultra):
PCR result Mycobacterium tuberculosis complex NOT detected
Bacterial Culture:
Bottle: Type FAN Aerobic Plus
Result No growth after 5 days
EpisodeST32423457
Date collected23/02/2019
Time collected09:00
Gram Stain:
Neutrophils Occasional
Gram positive bacilli Moderate (2+)
Gram negative bacilli Numerous (3+)
Gram negative cocci Moderate (2+)
EpisodeST23423457
Date collected23/02/2019
Time collected09:00
Bacterial Culture:
A heavy growth of
1) Klebsiella pneumoniae subsp pneumoniae (KLEPP)
ensure that this organism does not spread in the ward/unit.
A heavy growth of
2) Enterococcus species (ENCSP)
Antibiotic/Culture KLEPP ENCSP
Trimethoprim-sulfam R
Ampicillin / Amoxic R S
Amoxicillin-clavula R
Ciprofloxacin R
Cefuroxime (Parente R
Cefuroxime (Oral) R
Cefotaxime / Ceftri R
Ceftazidime R
Cefepime R
Gentamicin S
Piperacillin/tazoba R
Ertapenem R
Imipenem S
Meropenem R
S - Sensitive ; I - Intermediate ; R - Resistant ; SDD - Sensitive Dose Dependant
Comment for organism KLEPP:
** Please note: this is a carbapenem-RESISTANT organism. Although some
carbapenems may appear susceptible in vitro, these agents should NOT be used as
MONOTHERAPY in the treatment of this patient. **
Please isolate this patient and practice strict contact precautions. Please
inform Infection Prevention and Control as contact screening might be
indicated.
For further advice on the treatment of this isolate, please contact.
The currently available laboratory methods for performing colistin
susceptibility results are unreliable and may not predict clinical outcome.
Based on published data and clinical experience, colistin is a suitable
therapeutic alternative for carbapenem resistant Acinetobacter spp, as well as
carbapenem resistant Enterobacteriaceae. If colistin is clinically indicated,
please carefully assess clinical response.
EpisodeST234234057
Date collected23/02/2019
Time collected09:00
Authorised by xxxx on 27/02/2019 at 10:35
MIC by E-test:
Organism Klebsiella pneumoniae (KLEPN)
Antibiotic Meropenem
MIC corrected 4 ug/mL
MIC interpretation Resistant
Antibiotic Imipenem
MIC corrected 1 ug/mL
MIC interpretation Sensitive
Antibiotic Ertapenem
MIC corrected 2 ug/mL
MIC interpretation Resistant
EpisodeST23423493
Date collected18/02/2019
Time collected03:15
Potassium 4.4 mmol/L 3.5 - 5.1
EpisodeST45445293
Date collected18/02/2019
Time collected03:15
Creatinine 32 L umol/L 49 - 90
eGFR (MDRD formula) >60 mL/min/1.73 m2
Creatinine 28 L umol/L 49 - 90
eGFR (MDRD formula) >60 mL/min/1.73 m2
Essentially the pattern is that ALL information starts with a unique EPISODE NUMBER and follows with a DATE and TIME and then the result of whatever test. This is the pattern throughout.
What I am trying to parse into my tuple is the date, time, name of the test and the result - whatever it might be. I have the following code:
with open(filename) as f:
data = f.read()
data = data.splitlines()
DS = namedtuple('DS', 'date time name value')
parsed = list()
idx_date = [i for i, r in enumerate(data) if r.strip().startswith('Date')]
for start, stop in zip(idx_date[:-1], idx_date[1:]):
chunk = data[start:stop]
date = time = name = value = None
for row in chunk:
if not row: continue
row = row.strip()
if row.startswith('Episode'): continue
if row.startswith('Date'):
_, date = row.split()
date = date.replace('collected', '')
elif row.startswith('Time'):
_, time = row.split()
time = time.replace('collected', '')
else:
name, value, *_ = row.split()
print (name)
parsed.append(DS(date, time, name, value))
print(parsed)
My error is that I am unable to find a way to parse the heterogeneity of the test RESULT in a way that I can use later, for example for the tuple DS ('DS', 'date time name value'):
DATE = 21/10/2019
TIME = 10:00
NAME = Real time PCR for M tuberculosis or Potassium
RESULT = Negative or 4.7
Any advice appreciated. I have hit a brick wall.

svm train output file has less lines than that of the input file

I am currently building a binary classification model and have created an input file for svm-train (svm_input.txt). This input file has 453 lines, 4 No. features and 2 No. classes [0,1].
i.e
0 1:15.0 2:40.0 3:30.0 4:15.0
1 1:22.73 2:40.91 3:36.36 4:0.0
1 1:31.82 2:27.27 3:22.73 4:18.18
0 1:22.73 2:13.64 3:36.36 4:27.27
1 1:30.43 2:39.13 3:13.04 4:17.39 ......................
My problem is that when I count the number of lines in the output model generated by svm-train (svm_train_model.txt), this has 12 fewer lines than that of the input file. The line count here shows 450, although there are obviously also 9 lines at the beginning showing the various parameters generated
i.e.
svm_type c_svc
kernel_type rbf
gamma 1
nr_class 2
total_sv 441
rho -0.156449
label 0 1
nr_sv 228 213
SV
Therefore 12 lines in total from the original input of 453 have gone. I am new to svm and was hoping that someone could shed some light on why this might have happened?
Thanks in advance
Updated.........
I now believe that in generating the model, it has removed lines whereby the labels and all the parameters are exactly the same.
To explain............... My input is a set of miRNAs which have been classified as 1 and 0 depending on their involvement in a particular process or not (i.e 1=Yes & 0=No). The input file looks something like.......
0 1:22 2:30 3:14 4:16
1 1:26 2:15 3:17 4:25
0 1:22 2:30 3:14 4:16
Whereby, lines one and three are exactly the same and as a result will be removed from the output model. My question is then both why the output model would do this and how I can get around this (whilst using the same features)?
Whilst both SOME OF the labels and their corresponding feature values are identical within the input file, these are still different miRNAs.
NOTE: The Input file does not have a feature for miRNA name (and this would clearly show the differences in each line) however, in terms of the features used (i.e Nucleotide Percentage Content), some of the miRNAs do have exactly the same percentage content of A,U,G & C and as a result are viewed as duplicates and then removed from the output model as it obviously views them as duplicates even though they are not (hence there are less lines in the output model).
the format of the input file is:
Where:
Column 0 - label (i.e 1 or 0): 1=Yes & 0=No
Column 1 - Feature 1 = Percentage Content "A"
Column 2 - Feature 2 = Percentage Content "U"
Column 3 - Feature 3 = Percentage Content "G"
Column 4 - Feature 4 = Percentage Content "C"
The input file actually looks something like (See the very first two lines below), as they appear identical, however each line represents a different miRNA):
1 1:23 2:36 3:23 4:18
1 1:23 2:36 3:23 4:18
0 1:36 2:32 3:5 4:27
1 1:14 2:41 3:36 4:9
1 1:18 2:50 3:18 4:14
0 1:36 2:23 3:23 4:18
0 1:15 2:40 3:30 4:15
In terms of software, I am using libsvm-3.22 and python 2.7.5
Align your input file properly, is my first observation. The code for libsvm doesnt look for exactly 4 features. I identifies by the string literals you have provided separating the features from the labels. I suggest manually converting your input file to create the desired input argument.
Try the following code in python to run
Requirements - h5py, if your input is from matlab. (.mat file)
pip install h5py
import h5py
f = h5py.File('traininglabel.mat', 'r')# give label.mat file for training
variables = f.items()
labels = []
c = []
import numpy as np
for var in variables:
data = var[1]
lables = (data.value[0])
trainlabels= []
for i in lables:
trainlabels.append(str(i))
finaltrain = []
trainlabels = np.array(trainlabels)
for i in range(0,len(trainlabels)):
if trainlabels[i] == '0.0':
trainlabels[i] = '0'
if trainlabels[i] == '1.0':
trainlabels[i] = '1'
print trainlabels[i]
f = h5py.File('training_features.mat', 'r') #give features here
variables = f.items()
lables = []
file = open('traindata.txt', 'w+')
for var in variables:
data = var[1]
lables = data.value
for i in range(0,1000): #no of training samples in file features.mat
file.write(str(trainlabels[i]))
file.write(' ')
for j in range(0,49):
file.write(str(lables[j][i]))
file.write(' ')
file.write('\n')

Resources