Writing row from structured numpy array with varying delimiter

Writing row from structured numpy array with varying delimiter - python-3.x

I have some data stored in a structured numpy array that I would like to write to a file. Because I'm generating some code to match an existing output format that exported a more visually pleasant table I need to output this data with a varying whitespace delimiter.
Using a basic example:
import numpy as np
x = np.zeros((2,),dtype=('i4,f4,a10'))
x[:] = [(1,2.,'Hello'),(2,3.,'World')]
With a desired output of:
1 2.00 Hello
2 3.00 World
Now, I know I could do this:
for row in x:
print('%i %.2f %s' % (row[0], row[1], row[2]))
Which works fine but it seems overly verbose if you have a lot of columns. Right now there are 8 in my output.
I'm new to Python (coming from MATLAB), is there a more generic statement that I could use to 'roll up' all of the columns? I was thinking something along the lines of print('%i %.2f %s' % val for val in row) but this just prints 2 generators. When I ran into a generator while using re.split I could use list() to get my desired output, but I don't think using something similar with, say, str() for this case would do what I'm looking for.

If you wrap row in a tuple, you don't have to list all the elements. % formats take a tuple. row may display as a tuple but it is actually a numpy.void.
for row in x:
print('%i %.2f %s'%tuple(row))
Printing or writing rows in a loop like this is perfectly normal Python.
Class __str__ methods are often written like:
astr=['a header line']
for row in x:
astr.append('%i %.2f %s'%tuple(row))
astr='\n'.join(astr)
producing a string of joined lines.
A comprehension equivalent could be:
'\n'.join('%i %.2f %s'%tuple(row) for row in x)
Since you are using Python3, you could also use the .format approach, but I suspect the % style is closer to what you are using in MATLAB.

First:
Regarding your implicit question with the generators;
you can "stringify" a generator for example with str.join:
>>> gen = (i.upper() for i in "teststring")
>>> print(''.join(gen))
TESTSTRING
Second:
To the actual question: It is a little bit more generic, than your approach, but it is still not really satisfying.
formats = [
("{}", 1), # format and number of spaces for row[0]
("{.2f}", 4), # the same for row[1]
("{}", 0) # and row[2]
]
for row in x:
lst = [f.format(d) + (" "*s) for (f, s, d) in zip(formats, row)]
print(*lst, sep='')
It just provides a better overview over each format.

Related

How to modify list of lists using str.format() so floats are 3 decimals and other data types remain the same

I have a list of lists that is very big, and looks like this:
list_of_lists = [[0,'pan', 17.892, 4.6555], [4, 'dogs', 19.2324, 1.4564], ...]
I need to modify it using the str.format() so the floats go to 3 decimal places and the rest of the data stays in its correct format. I also need to add a tab between each list entry so it looks organized and somewhat like this:
0 'pan' 17.892 4.655
4 'dogs' 19.232 1.456
...
And so on.
My problem is that I keep getting the error in my for loop and gow to fix it.
for x in list_of_lists:
print ("{:.2f}".format(x))
TypeError: unsupported format string passed to list.__format__

In your loop you are iterating through a nested list. This means that x is also a list itself, and not a valid argument to the format() function.
If the number of elements in the inner lists are small and it makes sense in the context of the problem, you can simply list all of them as arguments:
list_of_lists = [[0,'pan', 17.892, 4.6555], [4, 'dogs', 19.2324, 1.4564]]
for x in list_of_lists:
print ("{:d}\t{:s}\t{:.3f}\t{:.3f}".format(x[0], x[1], x[2], x[3]))
These are now tab delimited, and the floats have three decimal places.

for x in list_of_lists:
print ("{:.2f}".format(x))
This is only looping over the top level array - not the elements inside - therefore you are getting the error.
Try addressing the element individually
# build manually, join with tab-char and print on loop
for i in s:
result = []
result.append( f'{i[0]}' )
result.append( f'{i[1]}' )
result.append( f'{i[2]:.3f}' )
result.append( f'{i[3]:.3f}' )
print( '\t'.join(result) )
# build in one line and print
for i in s:
print( f'{i[0]}\t\'{i[1]}\'\t{i[2]:.3f}\t{i[3]:.3f}'
Or as a list comprehension
# build whole line from list comprehension, join on new-line chat
result = [f'{i[0]}\t\'{i[1]}\'\t{i[2]:.3f}\t{i[3]:.3f}' for i in s]
result = '\n'.join(result)
print(result
# all in one line
print( '\n'.join([f'{i[0]}\t\'{i[1]}\'\t{i[2]:.3f}\t{i[3]:.3f}' for i in s]))

Sliding window over a string using python

I am working on a dataset as a part of my course practice and am stuck in a particular step. I have tried that using R, but I wish to do the same in python. I am comparatively new to python and so require help.
The data set consists of a column with name 'Seq' with seq(5000+) records. I have another column of name 'MainSeq' that contains the substring seq values in it. I need to check the presence of seq on MainSeq based on the start position given and then print 7 letters before and after each letter of the seq. i.e.
I have a a value in col 'MainSeq' as 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.
Col 'Seq' contains value JKLMNO
Start Position of J= 10 and O= 15
I need to create a new column such that it takes 7 letters before and after the start letter from J till O i.e. having a total length of 15
CDEFGHI**J**KLMNOPQ
DEFGHIJ**K**LMNOPQR
EFGHIJK**L**MNOPQRS
FGHIJKL**M**NOPQRST
GHIJKLM**N**OPQRSTU
HIJKLMN**O**PQRSTUV
I know to apply the logic on a specific seq. But since I have around 5000+ seq records, I need to figure out a way to apply the same on all the seq records.
seq = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
i = seq.index('J')
j = seq.index('O')
value = 7
for mid in range(i, 1+j):
print(seq[mid-value:mid+value+1])

I'm not sure this will do exactly what you want, you've not really supplied a lot of data to test with, but it might work or at least give you a start.
import pandas as pd
df = pd.DataFrame({'MainSeq':['ABCDEFGHIJKLMNOPQRSTUVWZYZ','ABCDEFGHIJKLMNOPQRSTUVWZYZ'], 'Seq':'JKLMNO'})
def get_sequences(seq, letters, value):
sequences = [seq[seq.index(letter)-value:seq.index(letter)+value+1] for letter in letters]
return sequences
df['new_seq'] = df.apply(lambda row : get_sequences(row['MainSeq'], row['Seq'], 7), axis = 1)
df = df.explode('new_seq')
print(df)

Is this the valid "if" expression for not printing the names of less than four characters [duplicate]

I like to filter out data whose string length is not equal to 10.
If I try to filter out any row whose column A's or B's string length is not equal to 10, I tried this.
df=pd.read_csv('filex.csv')
df.A=df.A.apply(lambda x: x if len(x)== 10 else np.nan)
df.B=df.B.apply(lambda x: x if len(x)== 10 else np.nan)
df=df.dropna(subset=['A','B'], how='any')
This works slow, but is working.
However, it sometimes produce error when the data in A is not a string but a number (interpreted as a number when read_csv read the input file):
File "<stdin>", line 1, in <lambda>
TypeError: object of type 'float' has no len()
I believe there should be more efficient and elegant code instead of this.
Based on the answers and comments below, the simplest solution I found are:
df=df[df.A.apply(lambda x: len(str(x))==10]
df=df[df.B.apply(lambda x: len(str(x))==10]
or
df=df[(df.A.apply(lambda x: len(str(x))==10) & (df.B.apply(lambda x: len(str(x))==10)]
or
df=df[(df.A.astype(str).str.len()==10) & (df.B.astype(str).str.len()==10)]

import pandas as pd
df = pd.read_csv('filex.csv')
df['A'] = df['A'].astype('str')
df['B'] = df['B'].astype('str')
mask = (df['A'].str.len() == 10) & (df['B'].str.len() == 10)
df = df.loc[mask]
print(df)
Applied to filex.csv:
A,B
123,abc
1234,abcd
1234567890,abcdefghij
the code above prints
A B
2 1234567890 abcdefghij

A more Pythonic way of filtering out rows based on given conditions of other columns and their values:
Assuming a df of:
data = {
"names": ["Alice", "Zac", "Anna", "O"],
"cars": ["Civic", "BMW", "Mitsubishi", "Benz"],
"age": ["1", "4", "2", "0"],
}
df=pd.DataFrame(data)
df:
age cars names
0 1 Civic Alice
1 4 BMW Zac
2 2 Mitsubishi Anna
3 0 Benz O
Then:
df[
df["names"].apply(lambda x: len(x) > 1)
& df["cars"].apply(lambda x: "i" in x)
& df["age"].apply(lambda x: int(x) < 2)
]
We will have :
age cars names
0 1 Civic Alice
In the conditions above we are looking first at the length of strings, then we check whether a letter "i" exists in the strings or not, finally, we check for the value of integers in the first column.

I personally found this way to be the easiest:
df['column_name'] = df[df['column_name'].str.len()!=10]

You can also use query:
df.query('A.str.len() == 10 & B.str.len() == 10')

If You have numbers in rows, then they will convert as floats.
Convert all the rows to strings after importing from cvs. For better performance split that lambdas into multiple threads.

you can use df.apply(len) . it will give you the result

For string operations such as this, vanilla Python using built-in methods (without lambda) is much faster than apply() or str.len().
Building a boolean mask by mapping len to each string inside a list comprehension is approx. 40-70% faster than apply() and str.len() respectively.
For multiple columns, zip() allows to evaluate values from different columns concurrently.
col_A_len = map(len, df['A'].astype(str))
col_B_len = map(len, df['B'].astype(str))
m = [a==3 and b==3 for a,b in zip(col_A_len, col_B_len)]
df1 = df[m]
For a single column, drop zip() and loop over the column and check if the length is equal to 3:
df2 = df[[a==3 for a in map(len, df['A'].astype(str))]]
This code can be written a little concisely using the Series.map() method (but a little slower than list comprehension due to pandas overhead):
df2 = df[df['A'].astype(str).map(len)==3]

Filter out values other than length of 10 from column A and B, here i pass lambda expression to map() function. map() function always applies in Series Object.
df = df[df['A'].map(lambda x: len(str(x)) == 10)]
df = df[df['B'].map(lambda x: len(str(x)) == 10)]

You could use applymap to filter all columns you want at once, followed by the .all() method to filter only the rows where both columns are True.
#The *mask* variable is a dataframe of booleans, giving you True or False for the selected condition
mask = df[['A','B']].applymap(lambda x: len(str(x)) == 10)
#Here you can just use the mask to filter your rows, using the method *.all()* to filter only rows that are all True, but you could also use the *.any()* method for other needs
df = df[mask.all(axis=1)]

What's x for x in input()?

I am new to coding and is trying to solve this python question
Question:
Write a program that calculates and prints the value according to the given formula:
Q = Square root of [(2 * C * D)/H]
Following are the fixed values of C and H:
C is 50. H is 30.
D is the variable whose values should be input to your program in a comma-separated sequence.
Example
Let us assume the following comma separated input sequence is given to the program:
100,150,180
The output of the program should be:
18,22,24
Hints:
If the output received is in decimal form, it should be rounded off to its nearest value (for example, if the output received is 26.0, it should be printed as 26)
In case of input data being supplied to the question, it should be assumed to be a console input.
This is the solution given. I have not seen 'x for x in input()'expression, may I know what does this expression do ?
import math
c=50
h=30
value = []
items=[x for x in input().split(',')]
for d in items:
value.append(str(int(round(math.sqrt(2*c*float(d)/h)))))
print (','.join(value))
This is my own solution but somehow I got a syntax error.
def sr(D):
For item in D:
return ((2*50*D)/30)**0.5
try:
a=int(input())
j=a.split(",")
print(sr(j))
except:
print('Please enter an integers or intergers seperated by comma')

The x is just a variable that gets assigned to the input that comes in via the input() function.
If you're aware of C style language (or Java), it's similar to
for(int i=0;<some_condition>;<some_operation>){}
This is just a condensed, pythonic and easy to read way to do this.
You can read more Python loops here
https://wiki.python.org/moin/ForLoop

Arrange the string in every possible correct alphabetical sequence of three characters

write a python program to Arrange the string in every possible
correct alphabetical sequence of three characters
for example :
INPUT : "ahdgbice"
OUTPUT: {'abc', 'bcd', 'ghi', 'cde'}
Can anyone Suggest me a Optimised Method to do that I have tried and Was Successful in generating the output but I am not satisfied with my code so Anyone please suggest me a proper optimised way to solve this problem.

This is probably a decent result:
>>> import itertools as it
>>> in_s="ahdgbice"
>>> in_test=''.join([chr(e) for e in range(ord(min(in_s)),ord(max(in_s))+1)])
>>> {s for s in map(lambda e: ''.join(e), (it.combinations(sorted(in_s),3))) if s in in_test}
{'abc', 'ghi', 'bcd', 'cde'}
How it works:
Generate a string that goes abc..khi in this case to test if the substring are in alphabetical order: in_test=''.join([chr(e) for e in range(ord(min(in_s)),ord(max(in_s))+1)])
Generate every combination of 3 letter substrings from a sorted in_s with map(lambda e: ''.join(e), (it.combinations(sorted(in_s),3)))
Test if the substring is sorted by testing if it is a substring of abcd..[max letter of in_s]

Solution: It's not optimised solution but it fulfil the requirement
# for using array import numpy lib
import numpy as np
#input string
str_1="ahdgbice"
#breaking the string into characters by puting it into a list.
list_1=list(str_1)
# for sorting we copy that list value in an array
arr_1=np.array(list_1)
arr_2=np.sort(arr_1)
# some temp variables
previous=0
str_2=""
list_2=list()
#logic and loops starts here : looping outer loop from 0 to length of sorted array
for outer in range(0,len(arr_2)):
#looping inner loop from outer index value to length of sorted array
for inner in range(outer,len(arr_2)):
value=arr_2[inner]
#ord() return an ascii value of characters
if(previous is 0):
previous=ord(value)
#difference between two consecutive sequence is always 1 or -1
# e.g ascii of a= 97, b=98 ,So a-b=-1 or b-a=1 and used abs() to return absolute value
if(abs(previous-ord(value)) is 1):
str_2=str_2+value # appending character with previous str_2 values
previous=ord(value) # storing current character's ascii value to previous
else:
str_2=value # assigning character value to str_2
previous=ord(value) # storing current character's ascii value to previous
# for making a string of three characters
if(len(str_2) == 3):
list_2.append(str_2)
# Logic and loops ends here
# put into the set to remove duplicate values
set_1=set(list_2)
#printing final output
print(set_1)
Output:
{'abc', 'bcd', 'ghi', 'cde'}

I would use the itertool module's permutations function to get a list of all three-element permutations of your input, and then for each result see if it is identical to a sorted version of itself.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Writing row from structured numpy array with varying delimiter - python-3.x

Related

How to modify list of lists using str.format() so floats are 3 decimals and other data types remain the same

Sliding window over a string using python

Is this the valid "if" expression for not printing the names of less than four characters [duplicate]

What's x for x in input()?

Arrange the string in every possible correct alphabetical sequence of three characters

Categories

Resources