This problem seems very simple, but I'm having trouble finding an already existing solution on StackOverflow.
When I run a sqlalchemy command like the following
valid_columns = db.session.query(CrmLabels).filter_by(user_id=client_id).first()
I get back a CrmLabels object that is not iterable. If I print this object, I get a list
[Convert Source, Convert Medium, Landing Page]
But this is not iterable. I would like to get exactly what I've shown above, except as a list of strings
['Convert Source', 'Convert Medium', 'Landing Page']
How can I run a query that will return this result?
Below change should do it:
valid_columns = (
db.session.query(CrmLabels).filter_by(user_id=client_id)
.statement.execute() # just add this
.first()
)
However, you need to be certain about the order of columns, and you can use valid_columns.keys() to make sure the values are in the expected order.
Alternatively, you can create a dictionary using dict(valid_columns.items()).
Related
I am working on a data analysis project and I am trying to order my results in descending format, the first time I had a similar issue, I used sorted(dictt.items(), key= lambda x: x[1]) and it worked fine. Now, I am having an error of type: "AttributeError: 'set' object has no attribute 'items'".
What am I doing wrong?
Actual output but I want it sorted
There is a special collection for what you are trying to do called OrderedDict:
https://docs.python.org/3/library/collections.html#collections.OrderedDict
The problem is that sorting the way you did returns a set of items instead of a dict and you have to turn it into a dict again. However, the standard dict doesn't keep track of entry order for efficiency reasons, so if you really want that, just plug the set of items into an OrderedDict like so:
myOrderedDict = OrderedDict(sorted(dictt.items()))
I would like to create a SequenceTaggingDataset from two lists that I have created dynamically inside my code - train_sentences and train_tags. I would want to write something like this:
train_data = SequenceTaggingDataset(examples=(zip(train_sentences, train_tags)))
However, the constructor must receive a path. And not only that - it looks from the code as though, even if I were to provide the examples, it will override those, and initialize examples to be an empty list.
For various reasons, I do not want to save the lists I created in a file from which the SequenceTaggingDataset could read. Is there any way around this, save defining my own custom class?
You will need to modify source code for it (https://pytorch.org/text/_modules/torchtext/datasets/sequence_tagging.html#SequenceTaggingDataset). You can make a local copy and import as your module.
path is used in __init__. The important part is that it takes lines from file and splits it using given separator into list named columns. Then this columns list is being fed into another class method together with fields to construct examples list. Please read provided example here to understand fields (Note that UDPOS is called there to create SequenceTaggingDataset).
What you need is columns, which you don't need to read from file as you have all components already. You will feed it directly by simplifying class __init__:
def __init__(self, columns, fields, encoding="utf-8", separator="\t", **kwargs):
examples = []
examples.append(data.Example.fromlist(columns, fields))
super(SequenceTaggingDataset, self).__init__(examples, fields,
**kwargs)
columns is nested list of lists: [[word], [UD_TAG], [PTB_TAG]]. It means that you need to feed following into modified class:
train = SequenceTaggingDataset([train_sentences, train_tags], fields=...)
I have a file some crazy stuff in it. It looks like this:
I attempted to get rid of it using this:
df['firstname'] = map(lambda x: x.decode('utf-8','ignore'), df['firstname'])
But I wound up with this in my dataframe: <map object at 0x0000022141F637F0>
I got that example from another question and this seems to be the Python3 method for doing this but I'm not sure what I'm doing wrong.
Edit: For some odd reason someone thinks that this has something to do with getting a map to return a list. The central issue is getting rid of non UTF-8 characters. Whether or not I'm even doing that correctly has yet to be established.
As I understand it, I have to apply an operation to every character in a column of the dataframe. Is there another technique or is map the correct way and if it is, why am I getting the output I've indicated?
Edit2: For some reason, my machine wouldn't let me create an example. I can now. This is what i'm dealing with. All those weird characters need to go.
import pandas as pd
data = [['🦎Ale','Αλέξανδρα'],['��Grain','Girl🌾'],['Đỗ Vũ','ên Anh'],['Don','Johnson']]
df = pd.DataFrame(data,columns=['firstname','lastname'])
print(df)
Edit 3: I tired doing this using a reg ex and for some reason, it still didn't work.
df['firstname'] = df['firstname'].replace('[^a-zA-z\s]',' ')
This regex works FINE in another process, but here, it still leaves the ugly characters.
Edit 4: It turns out that it's image data that we're looking at.
Python scrub here.
Im still learning python so sorry.
Im trying to create a Dict(i think) that then behaves as a variable called fileshare and then want to call each entry inside the variable called fileshareARN. So basically inside the AWS ARN I want each share to be called. for example I want share-A, share-B, etc to be called each time. Im guessing I need to setup a function or a IF statement but im not sure.
import boto3
client = boto3.client('storagegateway')
fileshare = [share-A, share-B, share-C, share-D]
response = client.refresh_cache(
FileShareARN='arn:aws:storagegateway:us-west-1:AWS-ID:share/{Fileshare-variable, share-ID should go here}.format',
FolderList=['/'],
Recursive=True
)
You're very close! A few notes to preface the answer to assist you on your Python journey:
Python does not allow hyphenated variable names, as hyphens are a reserved operator for subtraction. You only had them listed as placeholders, but figured it would be helpful to know.
lists, arrays, and dictionaries are all different data structures in Python. You can read about them more here: https://docs.python.org/3/tutorial/datastructures.html , but for your particular use case, if you're simply trying to store a collection of variables and iterate over them, a list or array work fine (although a dictionary is usable as well).
In Python, lists and arrays are iterables, which means that they have built-in functions that can naturally be iterated over to sequentially access their constituent values.
Let's go over an example using the following array:
fruits = ['apples','bananas','oranges'],
In other languages, you're probably used to having to define your own loop with the following syntax:
for (int i = 0; i < sizeOf(fruits); i++)
{
print(fruits[i]);
}
Python enables this same functionality much more easily.
for item in fruits:
print(item)
Here, the scope of the term item within the loop is equal to the value that exists at that current index in the array (fruits).
Now, to perform this same functionality for your example, we can use this same technique to loop over your list of ARNs:
import boto3
client = boto3.client('storagegateway')
fileshare = [shareA, shareB, shareC, shareD]
for path in fileshare:
response = client.refresh_cache(
FileShareARN='arn:aws:storagegateway:us-west-1:AWS-ID:share/'+path,
FolderList=['/'],
Recursive=True
)
After changing the placeholder variables you had in fileshare, I wrapped the existing response variable execution with a for loop, and made a slight change to the string appending at the end of your FileShareARN variable.
Hope this helps, and welcome to Python!
Did some more research, found f.string formatting which seems to make python life easy. Also since I am deploying this in AWS Lambda I added a handler.
#!/usr/bin/env python3
import boto3
def default_handler( event, context ):
print(boto3.client('sts').get_caller_identity())
client = boto3.client('storagegateway')
fileshare = ['share-A', 'share-B', 'share-C', 'share-D']
for path in fileshare:
response = client.refresh_cache(
FileShareARN = f"arn:aws:storagegateway:us-west-1:ARN-ID:share/{path}",
FolderList=['/'],
Recursive=True
)
print(response)
default_handler( None, None )
I'm trying to scrape the MTA website and need a little help scraping the "Train Lines Row." (Website for reference: https://advisory.mtanyct.info/EEoutage/EEOutageReport.aspx?StationID=All
The train line information is stored as image files (1 line subway, A line subway, etc) describing each line that's accessible through a particular station. I've had success scraping info out of rows in which only one train passes through, but I'm having difficulty figuring out how to iterate through the columns which have multiple trains passing through it...using a conditional statement to test for whether it has one line or multiple lines.
tableElements = table.find_elements_by_tag_name('tr')
that's the table i'm iterating through
tableElements[2].find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_element_by_tag_name('img').get_attribute('alt')
this successfully gives me the values if only one value exists in the particular column
tableElements[8].find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img')
this successfully gives me a list of values I can successfully iterate through to extract my needed values.
Now I try and combine these lines of code together in a forloop to extract all the information without stopping.
for info in tableElements[1:]:
if info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img')[1] == True:
for images in info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img'):
print(images.get_attribute('alt'))
else:
print(info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_element_by_tag_name('img').get_attribute('alt'))
I'm getting the error message: "list index out of range." I dont know why, as every iteration done in isolation seems to work. My hunch is I haven't correctly used the boolean operation properly here. My idea was that if find_elements_by_tag_name had an index of [1] that would mean multiple image text for me to iterate through. Hence, why I want to use this boolean operation.
Hi All, thanks so much for your help. I've uploaded my full code to Github and attached a link for your reference: https://github.com/tsp2123/MTA-Scraping/blob/master/MTA.ElevatorData.ipynb
The end goal is going to be to put this information into a dataframe using some formulation of and having a for loop that will extract the image information that I want.
dataframe = []
for elements in tableElements:
row = {}
columnName1 = find_element_by_class_name('td')
..
Your logic isn't off here.
"My hunch is I haven't correctly used the boolean operation properly here. My idea was that if find_elements_by_tag_name had an index of [1] that would mean multiple image text for me to iterate through."
The problem is it can't check if the statement is True if there's nothing in index position [1]. Hence the error at this point.
if info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img')[1] == True:
What you want to do is use try: So something like:
for info in tableElements[1:]:
try:
if info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img')[1] == True:
for images in info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img'):
print(images.get_attribute('alt'))
else:
print(info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_element_by_tag_name('img').get_attribute('alt'))
except:
#do something else
print ('Nothing found in index position.')
Is it also possible to back to your question and provide the full code? When I try this, I'm getting 11 table elements, so want to test it with the specific table you're trying to scrape.