Create a tuple in Python 3 with string and number - python-3.x

I have a object in python like below
contributor_detail=contributorId+ ',' +contentFileName+','+productPriority
output =
CSFBW23_1194968,CSFBW23_1194968.pdf,10
CSFBW23_1194969,CSFBW23_1194968.pdf,11
CSFBW23_1194970,CSFBW23_1194968.pdf,13
Apology if question was unclear
I will try to reframe it again here ,This is how i am making an array and then need to make tuple to sort .
for record in event['Records']:
#pull the body out & json load it
jsonmaybe=(record["body"])
jsonmaybe=json.loads(jsonmaybe)
jsonmaybe1=(jsonmaybe["Message"])
jsonmaybe1=json.loads(jsonmaybe1)
#now the normal stuff works
bucket_name = jsonmaybe1["Records"][0]["s3"]["bucket"]["name"]
print(bucket_name)
key=jsonmaybe1["Records"][0]["s3"]["object"]["key"]
print(key)
s3_clientobj = s3.get_object(Bucket=bucket_name, Key=key)
s3_clientdata = s3_clientobj['Body'].read().decode('utf-8')
employee_dict = json.loads(s3_clientdata)
contributorId=employee_dict['Research']['Product']['#productID']
contentFileName=employee_dict['Research']['Product']['Content']['Resource']['Name']
productPriority=employee_dict['Research']['Product']['#productPriority']
print("contributorId---------------"+contributorId)
print("contentFileName---------------"+contentFileName)
print("productPriority---------------"+productPriority)
contributor_detail=contributorId +','+contentFileName+','+productPriority
unsorted_contributors.append(contributor_detail)
I want to create a tuple out of it and add multiple object to that tuple .
Output i am getting now
['CSFBW23_1194968,CSFBW23_1194968.pdf,6', 'CSFBW23_1194968,CSFBW23_1194968.pdf,7', 'CSFBW23_1194968,CSFBW23_1194968.pdf,9']
Expected output
[("CSFBW23_1194968","CSFBW23_1194968.pdf",6),("CSFBW23_1194968","CSFBW23_1194968.pdf",7),("CSFBW23_1194968","CSFBW23_1194968.pdf",9)]
I need tuple in above so that it can be sorted based on 3rd item in tuple which is number .
sorted_contributors.sort(key=itemgetter(2))
Please help to create such format in loop

If I understand you correctly, it looks like you just need to stop making contributor_detail and string as you are with contributorId+ ',' +contentFileName+','+productPriority. Assuming those values are strings and an int, you are joining them as a string with ,. It really isn't very clear what you're after, but I suspect, what you want is:
conributor_detail = contributorId, contentFileName, productPriority
That yields a tuple, from which you can make a list.
from operator import itemgetter
contrib1 = "CSFBW23_1194968", "CSFBW23_1194968.pdf", 10
contrib2 = "CSFBW23_1194969", "CSFBW23_1194968.pdf", 11
contrib3 = "CSFBW23_1194970", "CSFBW23_1194968.pdf", 13
contributors = [contrib1, contrib2, contrib3]
sorted_contributors = sorted(contributors, key=itemgetter(2))
print(sorted_contributors)
Output:
[('CSFBW23_1194968', 'CSFBW23_1194968.pdf', 10), ('CSFBW23_1194969', 'CSFBW23_1194968.pdf', 11), ('CSFBW23_1194970', 'CSFBW23_1194968.pdf', 13)]
A simplified clarification of your issue:
# what you are doing (incorrect):
In [1]: contrib1 = "thingB"+","+"thingA"+","+"1"
In [2]: contrib2 = "thingC"+","+"thingD"+","+"2"
In [3]: print([contrib1, contrib2])
['thingB,thingA,1', 'thingC,thingD,2']
# what you really want is:
In [4]: contrib1 = "thingB", "thingA", 1
In [5]: contrib2 = "thingC", "thingD", 2
In [6]: print([contrib1, contrib2])
[('thingB', 'thingA', 1), ('thingC', 'thingD', 2)]

This will create proper format
my_list = []
my_list.append(contributorId)
my_list.append(contentFileName)
my_list.append(bucket_name)
my_list.append(key)
my_list.append(int(productPriority))
unsorted_contributors.append(my_list)
Concatenation is creating a format mismatch so create a list and append that will should work

Related

Replace items like A2 as AA in the dataframe

I have a list of items, like "A2BCO6" and "ABC2O6". I want to replace them as A2BCO6--> AABCO6 and ABC2O6 --> ABCCO6. The number of items are much more than presented here.
My dataframe is like:
listAB:
Finctional_Group
0 Ba2NbFeO6
1 Ba2ScIrO6
3 MnPb2WO6
I create a duplicate array and tried to replace with following way:
B = ["Ba2", "Pb2"]
C = ["BaBa", "PbPb"]
for i,j in range(len(B)), range(len(C)):
listAB["Finctional_Group"]= listAB["Finctional_Group"].str.strip().str.replace(B[i], C[j])
But it does not produce correct output. The output is like:
listAB:
Finctional_Group
0 PbPbNbFeO6
1 PbPbScIrO6
3 MnPb2WO6
Please suggest the necessary correction in the code.
Many thanks in advance.
I used for simplicity purpose chemparse package that seems to suite your needs.
As always we import the required packages, in this case chemparse and pandas.
import chemparse
import pandas as pd
then we create a pandas.DataFrame object like in your example with your example data.
df = pd.DataFrame(
columns=["Finctional_Group"], data=["Ba2NbFeO6", "Ba2ScIrO6", "MnPb2WO6"]
)
Our parser function will use chemparse.parse_formula which returns a dict of element and their frequency in a molecular formula.
def parse_molecule(molecule: str) -> dict:
# initializing empty string
molecule_in_string = ""
# iterating over all key & values in dict
for key, value in chemparse.parse_formula(molecule).items():
# appending number of elements to string
molecule_in_string += key * int(value)
return molecule_in_string
molecule_in_string contains the molecule formula without numbers now. We just need to map this function to all elements in our dataframe column. For that we can do
df = df.applymap(parse_molecule)
print(df)
which returns:
0 BaBaNbFeOOOOOO
1 BaBaScIrOOOOOO
2 MnPbPbWOOOOOO
dtype: object
Source code for chemparse: https://gitlab.com/gmboyer/chemparse

How to create a dataframe from extracted hashtags?

I have used below code to extract hashtags from tweets.
def find_tags(row_string):
tags = [x for x in row_string if x.startswith('#')]
return tags
df['split'] = df['text'].str.split(' ')
df['hashtags'] = df['split'].apply(lambda row : find_tags(row))
df['hashtags'] = df['hashtags'].apply(lambda x : str(x).replace('\\n', ',').replace('\\', '').replace("'", ""))
df.drop('split', axis=1, inplace=True)
df
However, when I am counting them using the below code I am getting output that is counting each character.
from collections import Counter
d = Counter(df.hashtags.sum())
data = pd.DataFrame([d]).T
data
Output I am getting is:
I think the problem lies with the code that I am using to extract hashtags. But I don't know how to solve this issue.
Change find_tags by replace in list comprehension with split and for count values use Series.explode with Series.value_counts:
def find_tags(row_string):
return [x.replace('\\n', ',').replace('\\', '').replace("'", "")
for x in row_string.split() if x.startswith('#')]
df['hashtags'] = df['text'].apply(find_tags)
and then:
data = df.hashtags.explode().value_counts().rename_axis('val').reset_index(name='count')

Python 3 creating variable names from a list and adding a suffix

I would like to create a pandas dataframe using the names from a list and then appending '_df' to the end of it but I seem to have two issues. Here is my code below.
read_csv = ['apple', 'orange', 'bananna']
for f in read_csv:
print('DEBUG 7: Value of f inside the loop: ', f)
##!!! ERROR HERE - We have reassigned the csv file to f
##!!! ERROR HERE - f now contains contents of f.csv(e.g. apple.csv)
f = pd.read_csv(f + '.csv')
##!!! ERROR HERE - Fix above error and the spice shall flow.
#print('DEBUG 8: Inside read_csv \n', f)
The for loop runs and reads in the first item in my list 'apple' and assigns it to f.
We drop into the loop. The first print statement, DEBUG 7, returns the value of f as 'apple'. So far so good.
Next, we run on to the pd.read_csv which is where my first issue is. How do I append '_df' to f? I have read a few answers on here and tried them but it's not working as I expect. I would like to have the loop run and create a new dataframe for apple_df, orange_df and bananna_df. But we can come back to that.
The second error I get here is "ValueError: Wrong number of items passed 8, placement implies 1" The CSV file has 8 columns and that is getting assigned to f instead of the dataframe name.
I can't for the life of me work out what's occurring to make that happen. Well, I can. If I fix the apple_df issue I believe the dataframe will read in the csv file fine.
Still learning so all help is appreciated.
Thanks
Tom
Use locals() to create local variables (apple_df, orange_df, ...)
read_csv = ['apple', 'orange', 'bananna']
for f in read_csv:
locals()[f"{f}_df"] = pd.read_csv(f"{f}.csv")
>>> type(apple_df)
pandas.core.frame.DataFrame
ValueError: Wrong number of items passed 8, placement implies 1
You got that error because you can't assign DataFrame to f variable which is a string in that loop. You have to store it into new variable, for exaple df
df = pd.read_csv(f + '.csv')
If you want to create new variable by f and "_df" you need to use exec
exec(f + "_df" + " = pd.read_csv(f + '.csv')")

Issues deleting discrete vals from transformed sparse vector list using regex in python

I'm trying to remove all values with index vals 1, 2, and 3, from a string list like
['1:1', '2:100.0', '3:100.0',...]. The data is in sparse vector format and was loaded as a pandas dataframe. I used an online regex tester to match the first three positions of this list with success.
But as it exists in my program, the same regex doesn't work. On running:
data = pd.read_csv("c:\data.csv")
for index, row in data.itterrows():
line = parseline(row)
def parseline(line):
line = line.values.flatten() # data like: ['1:1 2:100.0 3:100.0...']
stringLine = listToString(line) # data like: 1:1 2:100.0 3:100.0...
splitLine = stringLine.split(" ") # data like: ['1:1', '2:100.0', '3:100.0',...]
remove = re.findall(r"'1:1'|'[2,3]:\d+.\d+'")
splitLine.remove(remove)
print(splitLine)
I get the following error:
TypeError: findall() missing 1 required positional argument: 'string'
Does anyone have any ideas? Thanks in advance.
The splitLine object was actually a list, but the re.findall() method (and re.sub() method, which was what was actually used) requires a string, instead of a list. Was just operating on the wrong data structure. Ultimately:
def parseline(line):
line = line.values.flatten().tolist()
stringLine = listToString(line)
stringLine = re.sub(r"1:1 |2:\d+.\d+ ", "", stringLine)
...
did the trick.

How to get the specific value from the data using python?

data = ['{"osc":{"version":"1.0"}}']
or
data = ['{"device":{"network":{"ipv4_dante":{"auto":"testing"}}}}']
From the code above, I only get random outputs, but I need to get the last value i.e "1.0" or "testing" and so on.
I always need to get the last value. How can I do it using python?
Dictionaries have no "last" element. Assuming your dictionary doesn't branch and you want the "deepest" element, this should work:
import json
data = ['{"device":{"network":{"ipv4_dante":{"auto":"testing"}}}}']
obj = json.loads(data[0])
while isinstance(obj, dict):
obj = obj[list(obj.keys())[0]]
print(obj)
This should work -
import ast
x = ast.literal_eval(data[0])
while(type(x)==dict):
key = x.keys()[0]
x = x.get(key)
print(x)

Resources