Python groupby returning single value between carets - python-3.x

Long time listener first time caller. I am new to Python, about 3 days into this and I cannot figure out for the life of me, what is happening in this particular instance.
I brought in an XLSX file as a dataframe called dfInvoice. I want to use groupby on two columns (indexes?) but something funky is happening I think. I can't see my new grouped dataframe with the code below.
uniqueLocation = dfInvoice.groupby(['Location ID','Location'])
When I call uniqueLocation, all that is returned is this:
<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x000001B9A1C61198>
I have two questions from here.
1) what the heck is going on? I followed these steps almost identically to this (https://www.geeksforgeeks.org/python-pandas-dataframe-groupby).
2) this string of text between the carets, what should I refer to this as? I didn't know how to search for this happening because I don't exactly understand what this return is.

Related

Checking document count in PyMongo.cursor.Cursor inside if statement in python behaving wierdly

I have the following python code:
placeholder = db.user.aggregate(...)
if len(list(placeholder))!=0:
#execute something
Now the issue is even if the mongo cursor doesn't have any documents, it still getting inside the if statement.
But if I am modifying the code as:
placeholder = list(db.user.aggregate(...))
if len(placeholder)!=0:
#execute something
its working normally.
even if pymongo cursor object is an iterator and it can be exhausted but the only thing I am doing is checking the length and converting it into a list. Can anyone please advice why its behaving this way in the first case.
Thanks in advance.
As long as the query is the same, the two statements are identical.
In the first query - how do you know there's no records. You exhaust the cursor without saving any results to look at. Also how is it possible that "the mongo cursor doesn't have any documents", while len(list(cursor)) returns a non-zero value.
If you are really stuck, post a Minimal Reproducible Example.

Turning a Dataframe into a Series with .squeeze("columns")

I'm studying how to work with data right now and so I'm following along with a tutorial for working with Time Series data. Among the first things he does is read_csv on the file path and then use squeeze=True to read it as a Series. Unfortunately (and as you probably know), squeeze has been depricated from read_csv.
I've been reading documentation to figure out how to read a csv as a series, and everything I try fails. The documentation itself says to use pd.read_csv('filename').squeeze('columns') , but, when I check the type afterward, it is always still a Dataframe.
I've looked up various other methods online, but none of them seem to work. I'm doing this on a Jupyter Notebook using Python3 (which the tutorial uses as well).
If anyone has any insights into why I cannot change the type in this way, I would appreciate it. I'm not sure if I've misunderstood the tutorial altogether or if I'm not understanding the documentation.
I do literally type .squeeze("columns") when I write this out because when I write a column name or index, it fails completely. Am I doing that correctly? Is this the correct method or am I missing a better method?
Thanks for the help!
shampoo = pd.read_csv('shampoo_with_exog.csv',index_col= [0], parse_dates=True).squeeze("columns")
I would start with this...
#Change the the stuff between the '' to the entire file path of where your csv is located.
df = pd.read_csv(r'c:\user\documents\shampoo_with_exog.csv')
To start this will name your dataframe as df which is kind of the unspoken industry standard the same as pd for pandas.
Additionally, this will allow you to use a "raw" (the r) string which makes it easier to insert directories into your python code.
Once you are are able to successfully run this you can simply put df in a separate cell in jupyter. This will show you what your data looks like from your CSV. Once you have done all that you can start manipulating your data. While you can use the fancy stuff in pd.read_csv() I mostly just try to get the data and manipulate it from the code itself. Obviously there are reasons not to only do a pd.read_csv but as you progress you can start adding things here and there. I almost never use squeeze although I'm sure there will be those here to comment stating how "essential" it is for whatever the specific case might be.

Python 3 Tips to Shorten Code for Assignment and Getting Around TextIO

I've been going through a course and trying to find ways to shorten my code. I had this assignment to open a text file, split it, then add all of the unique values to a list, then finally sort it. I passed the assignment, but I have been trying to shorten it to learn some ways to apply any shortening concepts to future codes. The main issues I keep running into is trying to make the opened file into strings to turn them into lists to append and such without read(). If I don't used read() I get back TextIO errors. I tried looking into it but what I found involved importing os and doing some other funky stuff, which seems like it would take more time.
So if anyone would mind giving me tips to more effectively code this that are beginner friendly I would be appreciative.
romeo = open('romeo').read()
mylist = list()
for line in romeo.split() :
if line not in mylist:
mylist.append(line)
mylist.sort()
print(mylist)
I saw that set() is pretty good for unique values, but then I don't think I can sort it. Then trying flip flop between a list and set would seem wacky. I tried those swanky one line for loop boys, but couldn't get it to work. like for line not in mylist : mylist.append(line) I know that's not how to do it or even close, but I don't know how to convey what I mean.
So to iterate:
1. How to get the same result without read() / getting around textIO
2. How to write this code in a more stream lined way.
I'm new to the site and coding, so hopefully I didn't trigger anyone.

Questions regarding Python replace specific texts

I'm writing a script to scrape from another website with Python, and I am facing this question that I have yet to figure out a method to resolve it.
So say I have set to replace this particular string with something else.
word_replace_1 = 'dv'
namelist = soup.title.string.replace(word_replace_1,'11dv')
The script works fine, when the titles are dv234,dv123 etc.
The output will be 11dv234, 11dv123.
However if the titles are, dv234, mixed with dvab123, even though I did not set dvab to be replaced with anything, the script is going to replace it to 11dvab123. What should I do here?
Also, if the title is a combination of alphabits,numbers and Korean characters, say DAV123ㄱㄴㄷ,
how exactly should I make it to only spitting out DAV123, and adding - in between alphabits and numbers?
Python - making a function that would add "-" between letters
This gives me the idea to add - in between all characters, but is there a method to add - between character and number?
the only way atm I can think of is creating a table of replacing them, for example something like this
word_replace_3 = 'a1'
word_replace_4 = 'a2'
.......
and then print them out as
namelist3 = soup.title.string.replace(word_replace_3,'a-1').replace(word_replace_4,'a-2')
This is just slow and not efficient. What would be the best method to resolve this?
Thanks.

MATLAB selecting items considering the end of their name

I have to extract the onset times for a fMRI experiment. I have a nested output called "ResOut", which contains different matrices. One of these is called "cond", and I need the 4th element of it [1,2,3,4]. But I need to know its onset time just when the items in "pict" matrix (inside ResOut file) have a name that ends with "*v.JPG".
Here's the part of the code that I wrote (but it's not working):
for i=1:length(ResOut);
if ResOut(i).cond(4)==1 && ResOut(i).pict== endsWith(*"v.JPG")
What's wrong? Can you halp me to fix it out?
Thank you in advance,
Adriano
It's generally helpful to start with unfamiliar functions by reading their documentation to understand what inputs they are expecting. Per the documentation for endsWith, it expects two inputs: the input text and the pattern to match. In your example, you are only passing it one (incorrectly formatted) string input, so it's going to error out.
To fix this, call the function properly. For example:
filepath = ["./Some Path/mazeltov.jpg"; "~/Some Path/myfile.jpg"];
test = endsWith(filepath, 'v.jpg')
Returns:
test =
2×1 logical array
1
0
Or, more specifically to your code snippet:
endsWith(ResOut(i).pict, 'v.JPG')
Note that there is an optional third input, 'IgnoreCase', which you can pass as a boolean true/false to control whether or not the matching ignores case.

Resources