Handle the string - python-3.x

I have a problem with Python str which I've tried multiple variations, but none of them seem to work.
Here is my problem:
string = '18.0 8 307.0 130.0 3504. 12.0 70 1\t"chevrolet chevelle malibu"'
I want to handle this string, and get the return like this:
['18.0','8','307.0','130.0','3504.','12.0','70','1','"chevrolet chevelle malibu"']
or like this:
['18.0','8','307.0','130.0','3504.','12.0','70','1','chevrolet chevelle malibu']
I have tried to use re.complie(), but I failed to build a rule.
Please help!

If the last value is always delimited by '\t' you can use this:
s = '18.0 8 307.0 130.0 3504. 12.0 70 1\t"chevrolet chevelle malibu"'
lst = [*s.split('\t')[0].split(), s.split('\t')[-1]]
print(lst)
Prints:
['18.0', '8', '307.0', '130.0', '3504.', '12.0', '70', '1', '"chevrolet chevelle malibu"']

It can be achieved with the following piece of code:
>>> string = '18.0 8 307.0 130.0 3504. 12.0 70 1\t"chevrolet chevelle malibu"'
>>> [y for (i, x) in enumerate(string.split('"')) for y in ([y.strip() for y in x.split()] if i % 2 == 0 else [x])]
['18.0', '8', '307.0', '130.0', '3504.', '12.0', '70', '1', 'chevrolet chevelle malibu']

Related

How can I get specific values from a list within lists, inside a dictionary, and based on the keys

Hey there programmers i'm a bit new...
I have a dictionary containing many lists within lists and in the program I am asking the user for an input of a specific year this year will correlate directly to the keys of the dictionairy (I must find the user input in the keys of the dictionary) then once the specific key is found I must print that specific key along with some of the information/values within the key which is in the form of a list of lists.
In this example I will use a much smaller sample size of the problem but hopefully it will work on a huge data set:
dicti = {'1980': [['1','Emily', '800'], ['2','Steve', '20'],['3','france', '80']], '2000': [['1','jan', '8'],['2','aug', '0'], ['3','Ernest', '90']],
'2003': [['1','mul', '40'],['2','Inuyasha', '20'],['3','hulk', '50'],['4','pop smoke', '1'],['5','kendrick', '2'],['6','nick', '1']],
'2006'[['1','roger', '800'],['2','orochimaru', '1'], ['3', 'john', '783']]}
The output if the user enters 2003 should look something like this:
you selected 2003:
1. mul: 40
2. inuyasha: 20
3. hulk: 50
4. pop smoke: 1
5. kendrick: 2
6. nick: 1
I attempted the problem using this code:
yr = input('Enter yr: ')
for i in dicti.get(yr):
for value in dicti[year]:
print(val)
but this code produces:
1. mul: 40
2. inuyasha: 20
3. hulk: 50
4. pop smoke: 1
5. kendrick: 2
6. nick: 1
1. mul: 40
2. inuyasha: 20
3. hulk: 50
4. pop smoke: 1
5. kendrick: 2
6. nick: 1
1. mul: 40
2. inuyasha: 20
3. hulk: 50
4. pop smoke: 1
5. kendrick: 2
6. nick: 1
1. mul: 40
2. inuyasha: 20
3. hulk: 50
4. pop smoke: 1
5. kendrick: 2
6. nick: 1
It produces what I need but repeatedly and I only need it to print the info off once.
I have found a solution to my question above...
yr = input('enter yr: ')
for I in dicti[yr]:
print(f'{I[0]}\t{I[1]}: {I[2]}')
Please check this.
input_dict = {
'1980': [['1', 'Emily', '800'], ['2', 'Steve', '20'], ['3', 'france', '80']],
'2000': [['1', 'jan', '8'], ['2', 'aug', '0'], ['3', 'Ernest', '90']],
'2003': [['1', 'mul', '40'], ['2', 'Inuyasha', '20'], ['3', 'hulk', '50'], ['4', 'pop smoke', '1'],
['5', 'kendrick', '2'], ['6', 'nick', '1']],
'2006': [['1', 'roger', '800'], ['2', 'orochimaru', '1'], ['3', 'john', '783']]}
user_input = input("Please select year: ")
if input_dict.get(user_input):
for item in input_dict[user_input]:
print(f"{item[0]}. {item[1]}: {item[2]}")
Output:
Please select year: 2003
1. mul: 40
2. Inuyasha: 20
3. hulk: 50
4. pop smoke: 1
5. kendrick: 2
6. nick: 1

Converting string into list of every two numbers in string

A string = 1 2 3 4
Program should return = [[1,2],[3,4]]
in python
I want the string to be converted into a list of every two element from string
You could go for something very simple such as:
s = "10 2 3 4 5 6 7 8"
l = []
i = 0
list_split_str = s.split() # splitting the string according to spaces
while i < len(s) - 1:
l.append([s[i], s[i + 1]])
i += 2
This should output:
[['10', '2'], ['3', '4'], ['5', '6'], ['7', '8']]
You could also do something a little more complex like this in a two-liner:
list_split = s.split() # stripping spaces from the string
l = [[a, b] for a, b in zip(list_split[0::2], list_split[1::2])]
The slice here means that the first list starts at index zero and has a step of two and so is equal to [10, 3, 5, ...]. The second means it starts at index 1 and has a step of two and so is equal to [2, 4, 6, ...]. So we iterate over the first list for the values of a and the second for those of b.
zip returns a list of tuples of the elements of each list. In this case, [('10', '2'), ('3', '4'), ('5', '6'), ...]. It allows us to group the elements of the lists two by two and iterate over them as such.
This also works on lists with odd lengths.
For example, with s = "10 2 3 4 5 6 7 ", the above code would output:
[['10', '2'], ['3', '4'], ['5', '6']]
disregarding the 7 since it doesn't have a buddy.
here is the solution if the numbers exact length is divisible by 2
def every_two_number(number_string):
num = number_string.split(' ')
templist = []
if len(num) % 2 == 0:
for i in range(0,len(num),2):
templist.append([int(num[i]),int(num[i+1])])
return templist
print(every_two_number('1 2 3 4'))
you can remove the if condition and enclosed the code in try and except if you want your string to still be convert even if the number of your list is not divisible by 2
def every_two_number(number_string):
num = number_string.split(' ')
templist = []
try:
for i in range(0,len(num),2):
templist.append([int(num[i]),int(num[i+1])])
except:
pass
return templist
print(every_two_number('1 2 3 4 5'))

Pandas - Fastest way indexing with 2 dataframes

I am developing a software in Python 3 with Pandas library.
Time is very important but memory not so much.
For better visualization I am using the names a and b with few values, although there are many more:
a -> 50000 rows
b -> 5000 rows
I need to select from dataframe a and b (using multiples conditions)
a = pd.DataFrame({
'a1': ['x', 'y', 'z'] ,
'a2': [1, 2, 3],
'a3': [3.14, 2.73, -23.00],
'a4': [pd.np.nan, pd.np.nan, pd.np.nan]
})
a
a1 a2 a3 a4
0 x 1 3.14 NaN
1 y 2 2.73 NaN
2 z 3 -23.00 NaN
b = pd.DataFrame({
'b1': ['x', 'y', 'z', 'k', 'l'],
'b2': [2018, 2019, 2020, 2015, 2012]
})
b
b1 b2
0 x 2018
1 y 2019
2 z 2020
3 k 2015
4 l 2012
So far my code is like this:
for index, row in a.iterrows():
try:
# create a key
a1 = row["a1"]
mask = b.loc[(b['b1'] == a1) & (b['b2'] != 2019)]
# check if exists
if (len(mask.index) != 0): #not empty
a.loc[[index], ['a4']] = mask.iloc[0]['b2']
except KeyError: #not found
pass
But as you can see, I'm using for iterrows that is very slow compared to other methods and I'm changing the value of the DataFrame I'm iterating, that is not recommended.
Could you help me find a better way? The results should be like this:
a
a1 a2 a3 a4
0 x 1 3.14 2018
1 y 2 2.73 NaN
2 z 3 -23.00 2020
I tried things like this below, but I didnt made it work.
a.loc[ (a['a1'] == b['b1']) , 'a4'] = b.loc[b['b2'] != 2019]
*the real code has more conditions
Thanks!
EDIT
I benchmark using: iterrows, merge, set_index/loc. Here is the code:
import timeit
import pandas as pd
def f_iterrows():
for index, row in a.iterrows():
try:
# create a key
a1 = row["a1"]
a3 = row["a3"]
mask = b.loc[(b['b1'] == a1) & (b['b2'] != 2019)]
# check if exists
if len(mask.index) != 0: # not empty
a.loc[[index], ['a4']] = mask.iloc[0]['b2']
except: # not found
pass
def f_merge():
a.merge(b[b.b2 != 2019], left_on='a1', right_on='b1', how='left').drop(['a4', 'b1'], 1).rename(columns={'b2': 'a4'})
def f_lock():
df1 = a.set_index('a1')
df2 = b.set_index('b1')
df1.loc[:, 'a4'] = df2.b2[df2.b2 != 2019]
#variables for testing
number_rows = 100
number_iter = 100
a = pd.DataFrame({
'a1': ['x', 'y', 'z'] * number_rows,
'a2': [1, 2, 3] * number_rows,
'a3': [3.14, 2.73, -23.00] * number_rows,
'a4': [pd.np.nan, pd.np.nan, pd.np.nan] * number_rows
})
b = pd.DataFrame({
'b1': ['x', 'y', 'z', 'k', 'l'] * number_rows,
'b2': [2018, 2019, 2020, 2015, 2012] * number_rows
})
print('For: %s s' % str(timeit.timeit(f_iterrows, number=number_iter)))
print('Merge: %s s' % str(timeit.timeit(f_merge, number=number_iter)))
print('Loc: %s s' % str(timeit.timeit(f_iterrows, number=number_iter)))
They all worked :) and the time to run is:
For: 277.9994369489998 s
Loc: 274.04929955067564 s
Merge: 2.195712725706926 s
So far Merge is the fastest.
If another option appears I will update here, thanks again.
IIUC
a.merge(b[b.b2!=2019],left_on='a1',right_on='b1',how='left').drop(['a4','b1'],1).rename(columns={'b2':'a4'})
Out[263]:
a1 a2 a3 a4
0 x 1 3.14 2018.0
1 y 2 2.73 NaN
2 z 3 -23.00 2020.0

How convert "object" to numeric values including Nan

I have this data for many years, and need to plot error graph for different years, 1993 was selected with
fm93 = fmm[(fmm.Year == 1993)]
then the fm93 data frame is
Year moB m1 std1 co1 min1 max1 m2S std2S co2S min2S max2S
1993 1 296.42 18.91 31 262.4 336 -- -- -- -- --
1993 2 280.76 24.59 28 239.4 329.3 -- -- -- -- --
1993 3 271.41 19.16 31 236.4 304.8 285.80 20.09 20 251.6 319.7
1993 4 287.98 22.52 30 245.9 341 296.75 21.77 27 261.1 345.7
1993 5 287.05 30.79 30 229.2 335.7 300.06 27.64 24 249.5 351.8
1993 6 288.65 11.29 4 275.9 301.9 263.70 73.40 7 156.5 361
1993 7 280.11 36.01 12 237 363 302.67 26.39 22 262.9 377.1
1993 8 296.51 34.55 31 234.8 372.9 305.85 39.95 28 234.1 417.9
1993 9 321.31 34.54 25 263.8 396 309.01 42.52 29 205.9 403.2
1993 10 315.80 8.63 2 309.7 321.9 288.65 35.86 31 230.9 345.4
1993 11 288.26 24.07 30 231.4 322.8 297.99 23.81 28 238 336.5
1993 12 296.87 18.31 31 257.6 331.5 303.02 20.02 29 265.7 340.7
When I try to plot moB,m1 with err std1 appear the error
ValueError: err must be [ scalar | N, Nx1 or 2xN array-like ]
That is because the values are "object"..
array([[1993, 1, '296.42', '18.91', '31', '262.4', '336', '--', '--', '--',
'--', '--'],
[1993, 2, '280.76', '24.59', '28', '239.4', '329.3', '--', '--',
'--', '--', '--'],
[1993, 3, '271.41', '19.16', '31', '236.4', '304.8', '285.80',
'20.09', '20', '251.6', '319.7'],
[1993, 4, '287.98', '22.52', '30', '245.9', '341', '296.75',
'21.77', '27', '261.1', '345.7'],
[1993, 5, '287.05', '30.79', '30', '229.2', '335.7', '300.06',
'27.64', '24', '249.5', '351.8'],
[1993, 6, '288.65', '11.29', '4', '275.9', '301.9', '263.70',
'73.40', '7', '156.5', '361'],
[1993, 7, '280.11', '36.01', '12', '237', '363', '302.67', '26.39',
'22', '262.9', '377.1'],
[1993, 8, '296.51', '34.55', '31', '234.8', '372.9', '305.85',
'39.95', '28', '234.1', '417.9'],
[1993, 9, '321.31', '34.54', '25', '263.8', '396', '309.01',
'42.52', '29', '205.9', '403.2'],
[1993, 10, '315.80', '8.63', '2', '309.7', '321.9', '288.65',
'35.86', '31', '230.9', '345.4'],
[1993, 11, '288.26', '24.07', '30', '231.4', '322.8', '297.99',
'23.81', '28', '238', '336.5'],
[1993, 12, '296.87', '18.31', '31', '257.6', '331.5', '303.02',
'20.02', '29', '265.7', '340.7']], dtype=object)
I try convert this data with
fm93_1 = fm93.astype('float64', raise_on_error = False)
But the problem remain.... How can convert Nan values ('--') or ignore to plot my results?
thanks in advance
First you should try to plot a sample of data after the '--' to see if a plot can be generated. You can try:
# This should plot from row 3 onwards, omitting row 1 and 2
df.iloc[3:].plot()
Assuming '--' is the problem, you can replace with np.NaN. NaN values are not plotted.
df.replace('--', np.NaN, inplace=True)
df.plot()
Another way is to select df that does not contain '--':
mask = df[df == '--'].any(axis=1) # Check if row contains '--'
valid_indexes = df[mask == False].index # Return index for rows w/o '--'
df.iloc[valid_indexes].plot()

Matrix input from a text file(python 3)

Im trying to find a way to be able to input a matrix from a text file;
for example, a text file would contain
1 2 3
4 5 6
7 8 9
And it would make a matrix with those numbers and put it in matrix = [[1,2,3],[4,5,6],[7,8,9]]
And then this has to be compatible with the way I print the matrix:
print('\n'.join([' '.join(map(str, row)) for row in matrix]))
So far,I tried this
chemin = input('entrez le chemin du fichier')
path = input('enter file location')
f = open ( path , 'r')
matrix = [ map(int,line.split(','))) for line in f if line.strip() != "" ]
All it does is return me a map object and return an error when I try to print the matrix.
What am I doing wrong? Matrix should contain the matrix read from the text file and not map object,and I dont want to use external library such as numpy
Thanks
You can use list comprehension as such:
myfile.txt:
1 2 3
4 5 6
7 8 9
>>> matrix = open('myfile.txt').read()
>>> matrix = [item.split() for item in matrix.split('\n')[:-1]]
>>> matrix
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
>>>
You can also create a function for this:
>>> def matrix(file):
... contents = open(file).read()
... return [item.split() for item in contents.split('\n')[:-1]]
...
>>> matrix('myfile.txt')
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
>>>
is working with both python2(e.g. Python 2.7.10) and python3(e.g. Python 3.6.4)
rows=3
cols=3
with open('in.txt') as f:
data = []
for i in range(0, rows):
data.append(list(map(int, f.readline().split()[:cols])))
print (data)

Resources