Getting all str type elements in a pd.DataFrame - python-3.x

Based on my little knowledge on pandas,pandas.Series.str.contains can search a specific str in pd.Series. But what if the dataframe is large and I just want to glance all kinds of str element in it before I do anything?
Example like this:
pd.DataFrame({'x1':[1,2,3,'+'],'x2':[2,'a','c','this is']})
x1 x2
0 1 2
1 2 a
2 3 c
3 + this is
I need a function to return ['+','a','c','this is']

If you are looking strictly at what are string values and performance is not a concern, then this is a very simple answer.
df.where(df.applymap(type).eq(str)).stack().tolist()
['a', 'c', '+', 'this is']

There are 2 possible ways - check numeric values saved as strings or not.
Check difference:
df = pd.DataFrame({'x1':[1,'2.78','3','+'],'x2':[2.8,'a','c','this is'], 'x3':[1,4,5,4]})
print (df)
x1 x2 x3
0 1 2.8 1
1 2.78 a 4 <-2.78 is float saved as string
2 3 c 5 <-3 is int saved as string
3 + this is 4
#flatten all values
ar = df.values.ravel()
#errors='coerce' parameter in pd.to_numeric return NaNs for non numeric
L = np.unique(ar[np.isnan(pd.to_numeric(ar, errors='coerce'))]).tolist()
print (L)
['+', 'a', 'c', 'this is']
Another solution is use custom function for check if possible convert to floats:
def is_not_float_try(str):
try:
float(str)
return False
except ValueError:
return True
s = df.stack()
L = s[s.apply(is_not_float_try)].unique().tolist()
print (L)
['a', 'c', '+', 'this is']
If need all values saved as strings use isinstance:
s = df.stack()
L = s[s.apply(lambda x: isinstance(x, str))].unique().tolist()
print (L)
['2.78', 'a', '3', 'c', '+', 'this is']

You can using str.isdigit with unstack
df[df.apply(lambda x : x.str.isdigit()).eq(0)].unstack().dropna().tolist()
Out[242]: ['+', 'a', 'c', 'this is']

Using regular expressions and set union, could try something like
>>> set.union(*[set(df[c][~df[c].str.findall('[^\d]+').isnull()].unique()) for c in df.columns])
{'+', 'a', 'c', 'this is'}
If you use a regular expression for a number in general, you could omit floating point numbers as well.

Related

Karate - To find the occurrence of element in a list and print the number of times its present in the list

In my case
list A = [a,a,a,b,b,c]
I have to find the occurrence of the elements available in the list and print their counts
For example print as a=3, b =2 and c =1
Just use JavaScript. The filter() operation is perfect for this:
* def data = ['a', 'c', 'b', 'c', 'c', 'd']
* def count = data.filter(x => x == 'c').length
* assert count == 3
Further reading: https://github.com/karatelabs/karate#json-transforms

Converting string into list of every two numbers in string

A string = 1 2 3 4
Program should return = [[1,2],[3,4]]
in python
I want the string to be converted into a list of every two element from string
You could go for something very simple such as:
s = "10 2 3 4 5 6 7 8"
l = []
i = 0
list_split_str = s.split() # splitting the string according to spaces
while i < len(s) - 1:
l.append([s[i], s[i + 1]])
i += 2
This should output:
[['10', '2'], ['3', '4'], ['5', '6'], ['7', '8']]
You could also do something a little more complex like this in a two-liner:
list_split = s.split() # stripping spaces from the string
l = [[a, b] for a, b in zip(list_split[0::2], list_split[1::2])]
The slice here means that the first list starts at index zero and has a step of two and so is equal to [10, 3, 5, ...]. The second means it starts at index 1 and has a step of two and so is equal to [2, 4, 6, ...]. So we iterate over the first list for the values of a and the second for those of b.
zip returns a list of tuples of the elements of each list. In this case, [('10', '2'), ('3', '4'), ('5', '6'), ...]. It allows us to group the elements of the lists two by two and iterate over them as such.
This also works on lists with odd lengths.
For example, with s = "10 2 3 4 5 6 7 ", the above code would output:
[['10', '2'], ['3', '4'], ['5', '6']]
disregarding the 7 since it doesn't have a buddy.
here is the solution if the numbers exact length is divisible by 2
def every_two_number(number_string):
num = number_string.split(' ')
templist = []
if len(num) % 2 == 0:
for i in range(0,len(num),2):
templist.append([int(num[i]),int(num[i+1])])
return templist
print(every_two_number('1 2 3 4'))
you can remove the if condition and enclosed the code in try and except if you want your string to still be convert even if the number of your list is not divisible by 2
def every_two_number(number_string):
num = number_string.split(' ')
templist = []
try:
for i in range(0,len(num),2):
templist.append([int(num[i]),int(num[i+1])])
except:
pass
return templist
print(every_two_number('1 2 3 4 5'))

Convert a string within a list to an element in the list in python

I am using python data to create a ReportLab report. I have a list that looks like this:
mylist = [['a b c d e f'],['g h i j k l']]
and want to convert it to look like this:
mylist2 = [[a,b,c,d,e],[g,h,i,j,k,l]]
the first list gives me a "List out of index" error when building the report.
the second list works in ReportLab, but columns and formatting in this list aren't what I want.
What is the best method to convert mylist 1 to mylist2 in python?
string to list can be done using split() method.
try mylist[1][0].split() and mylist[0][0].split()
Borrowing idea from Jibin Mathews, I tried the following
new_list = [mylist[0][0].split(), mylist[1][0].split()]
and it prints
[['a', 'b', 'c', 'd', 'e', 'f'], ['g', 'h', 'i', 'j', 'k', 'l']]
I saw 'f' is missing in your final list. Is that the mistake?
mylist = [['a b c d e f'],['g h i j k l']]
import re
space_re = re.compile(r'\s+')
output = []
for l in mylist:
element = l[0]
le = re.split(space_re, element)
output.append(le)
This not best answer but it will work fine.!

Python: Symmetrical Difference Between List of Sets of Strings

I have a list that contains multiple sets of strings, and I would like to find the symmetric difference between each string and the other strings in the set.
For example, I have the following list:
targets = [{'B', 'C', 'A'}, {'E', 'C', 'D'}, {'F', 'E', 'D'}]
For the above, desired output is:
[2, 0, 1]
because in the first set, A and B are not found in any of the other sets, for the second set, there are no unique elements to the set, and for the third set, F is not found in any of the other sets.
I thought about approaching this backwards; finding the intersection of each set and subtracting the length of the intersection from the length of the list, but set.intersection(*) does not appear to work on strings, so I'm stuck:
set1 = {'A', 'B', 'C'}
set2 = {'C', 'D', 'E'}
set3 = {'D', 'E', 'F'}
targets = [set1, set2, set3]
>>> set.intersection(*targets)
set()
The issue you're having is that there are no strings shared by all three sets, so your intersection comes up empty. That's not a string issue, it would work the same with numbers or anything else you can put in a set.
The only way I see to do a global calculation over all the sets, then use that to find the number of unique values in each one is to first count all the values (using collections.Counter), then for each set, count the number of values that showed up only once in the global count.
from collections import Counter
def unique_count(sets):
count = Counter()
for s in sets:
count.update(s)
return [sum(count[x] == 1 for x in s) for s in sets]
Try something like below:
Get symmetric difference with every set. Then intersect with the given input set.
def symVal(index,targets):
bseSet = targets[index]
symSet = bseSet
for j in range(len(targets)):
if index != j:
symSet = symSet ^ targets[j]
print(len(symSet & bseSet))
for i in range(len(targets)):
symVal(i,targets)
Your code example doesn't work because it's finding the intersection between all of the sets, which is 0 (since no element occurs everywhere). You want to find the difference between each set and the union of all other sets. For example:
set1 = {'A', 'B', 'C'}
set2 = {'C', 'D', 'E'}
set3 = {'D', 'E', 'F'}
targets = [set1, set2, set3]
result = []
for set_element in targets:
result.append(len(set_element.difference(set.union(*[x for x in targets if x is not set_element]))))
print(result)
(note that the [x for x in targets if x != set_element] is just the set of all other sets)

Mathematical operation on a dictionary list (Python 3.6)

I am using Python 3.6 and I have a list of dictionaries like this:
list = [{'name': 'A', 'number':'1'}, {'name': 'B', 'number':'2'}, {'name': 'C', 'number':'3'}, {'name': 'D', 'number':'4'}]
I found out how to print the list in the desired format with:
for s in list:
name = s['name']
number = s['number']
print(name + " = "+ number)
Which gives:
A = 1
B = 2
C = 3
D = 4
I would like to be able to multiply the items 'number' by 2 for example and display:
A = 2
B = 4
C = 6
D = 8
Thank you!
Are you trying to temporarily multiply the values and print them out? Which in this case, you would change your last line to
print(name + " = "+ int(number) * 2)
However, if you want to multiply the values in your dictionary directly, you would go about it as so:
for s in list:
name = s['name']
s['number'] = str(int(s['number']) * 2) # multiply value by 2
number = s['number']
print(name + " = "+ number)
Note that your problem may arise from the fact that your dictionary values are stored as strings instead of integers, which means that to perform any kind of mathematical operation on them, you must convert them to an integer and back to a string.
You're able to multiply a number by using the * symbol 2 * 2 will output 4.
Because your values are stored as Strings you'll need to convert them to Integers first. int('2') * 2 == 4.
Then to print an Integer with a string you need to convert it back to a string.
for the last line change it to
print(name + " = "+ str(int(number)*2))
You can always iterate over your list to modify the value of its nested parts, i.e.:
your_list = [{'name': 'A', 'number': '1'},
{'name': 'B', 'number': '2'},
{'name': 'C', 'number': '3'},
{'name': 'D', 'number': '4'}]
for item in your_list: # iterate over each dictionary in your_list
# since your dict contains strings we have to convert the value into a number/integer
# before multiplying it by 2
item["number"] = int(item["number"]) * 2 # turn it back to a string with str() if needed
# now, let's demonstrate that the data changed in `your_list`:
for item in your_list: # iterate over each dictionary in your_list
print("{} = {}".format(item["name"], item["number"])) # print in your desired format
# A = 2
# B = 4
# C = 6
# D = 8

Resources