Python Pandas concatenate a Series of strings into one string - string

In python pandas, there is a Series/dataframe column of str values to combine into one long string:
df = pd.DataFrame({'text' : pd.Series(['Hello', 'world', '!'], index=['a', 'b', 'c'])})
Goal: 'Hello world !'
Thus far methods such as df['text'].apply(lambda x: ' '.join(x)) are only returning the Series.
What is the best way to get to the goal concatenated string?

You can join a string on the series directly:
In [3]:
' '.join(df['text'])
Out[3]:
'Hello world !'

Apart from join, you could also use pandas string method .str.cat
In [171]: df.text.str.cat(sep=' ')
Out[171]: 'Hello world !'
However, join() is much faster.

Your code is "returning the series" because you didn't specify the right axis. Try this:
df.apply(' '.join, axis=0)
text Hello world !
dtype: object
Specifying the axis=0 combines all the values from each column and puts them in a single string. The return type is a series where the index labels are the column names, and the values are the corresponding joined string. This is particularly useful if you want to combine more than one column into a single string at a time.
Generally I find that it is confusing to understand which axis you need when using apply, so if it doesn't work the way you think it should, always try applying along the other axis too.

Related

For the given the two sequences, I am getting a error for input while writing a program to combine two sequences and arrange them alphabetically?

def all_people(people_1, people_2):
# update logic here
people_1.split(',')
people_2.split(',')
people_1.extend(people_2)
print(people_1.sort())
people_1 = input()
people_2 = input()
all_people(people_1, people_2)
For the given the two sequences, I am getting a AttributeError: 'str' object has no attribute 'extend'
The problem is :how to write a program to combine two sequences and arrange them alphabetically.
This is because when you input your list into people_1 it becomes a string.
So for example, if you input ['Person 1', 'Person 2'] into people_1, then people_1 = "['Person 1', 'Person 2']"(A string). So in order to apply .extend() to people_1 you first have to turn it into a list.
Try this:
import ast
people_1 = ast.literal_eval(input())
people_2 = ast.literal_eval(input())
(Be sure to import ast)
That way your input will automatically be changed from a string to a list and you can apply .extend.
str.split will not transform the split string into a list (even without the type difference, strings are immutable), but return a list. To use it, you will need to assign it to a variable. If you don't need the original string, you can just overwrite it1.
list_1 = input('first list, separated by , -> ')
list_1 = list_1.split(',')
list_2 = input('second list, separated by , -> ').split(',')
list_1.extend(list_2)
print(list_1)
Will result in
first list, separated by , -> a,b,some value with spaces
second list, separated by , -> in,the,second,one
['a', 'b', 'some value with spaces', 'in', 'the', 'second', 'one']
1 This applies to a simple case where you convert a data structure once. Don't go around overwriting variables with totally unrelated stuff all over the place.

Looping over python dictionary of dictionaries

I have a python dictionary like below
car_dict=
{
'benz': {'usa':876456, 'uk':965471},
'audi' : {'usa':523487, 'uk':456879},
'bmw': {'usa':754235, 'uk':543298}
}
I need the output like below
benz,876456,965471
audi,523487,456879
bmw,754235,543298
and also in sorted form as well like below
audi,523487,456879
benz,876456,965471
bmw,754235,543298
Please help me in getting both outputs
You could do this:
car_dict= {
'benz': {'usa':876456, 'uk':965471},
'audi' : {'usa':523487, 'uk':456879},
'bmw': {'usa':754235, 'uk':543298}
}
cars = []
for car in car_dict:
cars.append('{},{},{}'.format(
car,
car_dict[car]['usa'],
car_dict[car]['uk']
))
cars = sorted(cars)
for car in cars:
print(cars)
Result
audi,523487,456879
benz,876456,965471
bmw,754235,543298
Explanation
Loop through each car and store the model, USA number and UK number in a list. Sort the list alphabetically. List it.
To print the data
# Use List comprehension to sorted list of values from car, USA, UK fields
data = [[car] + list(regions.values()) for car, regions in sorted(car_dict.items(), key=lambda x:x[0])]
for row in data:
print(*row, sep = ',')
Output
audi,523487,456879
benz,876456,965471
bmw,754235,543298
Explanation
Sort items by car
for car, regions in sorted(car_dict.items(), key=lambda x:x[0])
Each inner list in list comprehension to be row of car, USA, UK values
[car] + list(regions.values())
Print each row comma delimited
for row in data:
print(*row, sep = ',')
Are you looking to print the outputs or just organize the data in a better format for analysis.
If latter, I would use pandas and do the following
import pandas as pd
pd.DataFrame(car_dict).transpose().sort_index()
To view the output on terminal they way you requested,
for index, row in pd.DataFrame(car_dict).transpose().sort_index().iterrows():
print('{},{},{}'.format(index, row['usa'], row['uk']))
will print this out:
audi,523487,456879
benz,876456,965471
bmw,754235,543298
Find below the solution,
car_dict = {
'benz': {'usa':876456, 'uk':965471},
'audi' : {'usa':523487, 'uk':456879},
'bmw': {'usa':754235, 'uk':543298}
}
keys = list(car_dict.keys())
keys.sort()
for i in keys:
print ( i, car_dict[i] ['usa'], car_dict[i] ['uk'])
I like the list comprehension answer given by #DarrylG however you do not really need the lambda expression in this case.
sorted() will just do the sort by key by default, so you can just use :
data = [[car] + list(regions.values()) for car, regions in sorted(car_dict.items())]
I would also make another slight change. If you wanted more explicit control over the region ordering (or wanted a different ordering, you could replace the [car] + list(regions.values()) with [car, regions['usa'], regions['uk']] like this:
data = [[car, regions['usa'], regions['uk']] for car, regions in sorted(car_dict.items())]
Of course, that means that if you added more regions you would have to change this, but I prefer setting the order explicitly.

replacing a special character in a pandas dataframe

I have a dataset that '?' instead of 'NaN' for missing values. I could have gone through each column using replace but the only problem is I have 22 columns. I am trying to create a loop do it effectively but I am getting wrong. Here is what I am doing:
for col in adult.columns:
if adult[col]=='?':
adult[col]=adult[col].str.replace('?', 'NaN')
The plan is to use the 'NaN' then use the fillna function or to drop them with dropna. The second problem is that not all the columns are categorical so the str function is also wrong. How can I easily deal with this situation?
If you're reading the data from a .csv or .xlsx file you can use the na_values parameter:
adult = pd.read_csv('path/to/file.csv', na_values=['?'])
Otherwise do what #MasonCaiby said and use adult.replace('?', float('nan'))

Indexing the list in python

record=['MAT', '90', '62', 'ENG', '92','88']
course='MAT'
suppose i want to get the marks for MAT or ENG what do i do? I just know how to find the index of the course which is new[4:10].index(course). Idk how to get the marks.
Try this:
i = record.index('MAT')
grades = record[i+1:i+3]
In this case i is the index/position of the 'MAT' or whichever course, and grades are the items in a slice comprising the two slots after the course name.
You could also put it in a function:
def get_grades(course):
i = record.index(course)
return record[i+1:i+3]
Then you can just pass in the course name and get back the grades.
>>> get_grades('ENG')
['92', '88']
>>> get_grades('MAT')
['90', '62']
>>>
Edit
If you want to get a string of the two grades together instead of a list with the individual values you can modify the function as follows:
def get_grades(course):
i = record.index(course)
return ' '.join("'{}'".format(g) for g in record[i+1:i+3])
You can use index function ( see this https://stackoverflow.com/a/176921/) and later get next indexes, but I think you should use a dictionary.

Mask elements of a list that are words, and convert the rest to integers in python

Let's say I have a list of lists, where one of the lists within this big list looks like this:
['blabblah1234', '2013-08-23T22:52:08.060', '56527.9529', '56527.9544', '109.7147', '0.0089', '14.3638', '0.0779', '14.3136', '0.0775', '14.3305', '0.1049', '14.3628', '0.0837', '14.3628', '0.0837', '70.9990', '40.0050', '173.046', '-30.328', '73', '-99.175', '0.000', '0.000', '59.8', '0.0', '1.0']
My question is, how can I mask the first two terms: blablah1234 and 2013-08-23whatever and after that work only with the other elements by converting them into integers (that is, getting rid of the quotes, so that I can do computation with those numbers later?)?
Thanks!!!
You could try to convert any strings you can to float. Anything that can't be converted will just be skipped.
other_list = ()
for elem in your_list:
try:
val = float(elem)
new_list.append(val)
except ValueError:
pass

Resources