Removing comma coming after a list-item word - string

This is my Python code (I am using Python 2.7.6):
cn = ['here,', 'there']
>>> for c in cn:
if c.endswith(','):
c = c[:-1]
>>> print(cn)
['here,', 'there']
As you can see, cn[0] still has the trailing comma even though I did c = c[:-1]. How do I change cn[0] so that it equals 'here' without the trailing comma?

The problem is that you assign a new value to c but do not update the list.
In order to manipulate the list in place you would need to do something like this:
cn = ['here,', 'there']
for index, c in enumerate(cn):
if c.endswith(','):
cn[index] = c[:-1]
print(cn)
['here', 'there']
Enumerate gives you all elements in the list, along with their index, and then, if the string has a trailing comma you just update the list element and the right index.
The problem with your code was that c was simply holding the string 'here,'. Then you created a new string whithou the comma and assigned it to c, this did not affect the list cn. In order to have any effect on the list you need to set the new value at the desired position.
You could also use list comprehension to achive the same result which might be more pythonic for such a small task: (as mentioned by #AdamSmith)
cn = [c[:-1] if c.endswith(',') else c for c in cn]
This does creates a new list from cn where each element is returned unchanged if it does not end with , and otherwise the comma is cut off before returning the string.
Another thing you could use would be the built-in rstrip function, but it would remove all trialing commas, not just one. It would look something like this (again #AdamSmith pointed this out):
cn = map(lambda x: x.rstrip(','), cn)

What you did was copy the string with an edit, and store it in a new variable named c, with a name that clashed with the original c and overrode it.
Since you didn't change the original, it stays unchanged when the new one goes out of scope and gets cleared up.
You could do something like this to make a new list, containing the changed strings:
newlist = [c.strip(',') for c in cn]
Or, more closely to your example while keeping this approach:
cn = [c[:-1] if c.endswith(',') else c for c in cn]
The alternate loop-with-enumerate approach in another answer will work, but I'm avoiding it because in general, changing lists while iterating over them can lead to mistakes unless done carefully, and that makes me think it's a bad habit to get into in cases where there's another reasonably neat approach.

You're changing the loop variable c, but you aren't changing the list cn. To change the list cn, instead of looping through its values, loop through its indexes. Here's an implementation.
for c in range(len(cn)):
if cn[c].endswith(','):
cn[c] = cn[c][:-1]

Related

Python: (partial) matching elements of a list to DataFrame columns, returning entry of a different column

I am a beginner in python and have encountered the following problem: I have a long list of strings (I took 3 now for the example):
ENSEMBL_IDs = ['ENSG00000040608',
'ENSG00000070371',
'ENSG00000070413']
which are partial matches of the data in column 0 of my DataFrame genes_df (first 3 entries shown):
genes_list = (['ENSG00000040608.28', 'RTN4R'],
['ENSG00000070371.91', 'CLTCL1'],
['ENSG00000070413.17', 'DGCR2'])
genes_df = pd.DataFrame(genes_list)
The task I want to perform is conceptually not that difficult: I want to compare each element of ENSEMBL_IDs to genes_df.iloc[:,0] (which are partial matches: each element of ENSEMBL_IDs is contained within column 0 of genes_df, as outlined above). If the element of EMSEMBL_IDs matches the element in genes_df.iloc[:,0] (which it does, apart from the extra numbers after the period ".XX" ), I want to return the "corresponding" value that is stored in the first column of the genes_df Dataframe: the actual gene name, 'RTN4R' as an example.
I want to store these in a list. So, in the end, I would be left with a list like follows:
`genenames = ['RTN4R', 'CLTCL1', 'DGCR2']`
Some info that might be helpful: all of the entries in ENSEMBL_IDs are unique, and all of them are for sure contained in column 0 of genes_df.
I think I am looking for something along the lines of:
`genenames = []
for i in ENSEMBL_IDs:
if i in genes_df.iloc[:,0]:
genenames.append(# corresponding value in genes_df.iloc[:,1])`
I am sorry if the question has been asked before; I kept looking and was not able to find a solution that was applicable to my problem.
Thank you for your help!
Thanks also for the edit, English is not my first language, so the improvements were insightful.
You can get rid of the part after the dot (with str.extract or str.replace) before matching the values with isin:
m = genes_df[0].str.extract('([^.]+)', expand=False).isin(ENSEMBL_IDs)
# or
m = genes_df[0].str.replace('\..*$', '', regex=True).isin(ENSEMBL_IDs)
out = genes_df.loc[m, 1].tolist()
Or use a regex with str.match:
pattern = '|'.join(ENSEMBL_IDs)
m = genes_df[0].str.match(pattern)
out = genes_df.loc[m, 1].tolist()
Output: ['RTN4R', 'CLTCL1', 'DGCR2']

Changes in a temporary variable are affecting the variable that feeds from

I'm designing a Mastermind game, which basically compares 2 lists and marks the similarities. When a colour is found at the right place, a flag making the correct position is added and the item found on the reference list is marked off. The reference list is feeding off an array from another function. The problem is at the mark off, as any changes done to the reference list is changing also the original array, which i don't want it to happen
tempCode = mCode #mCode is the array combination randomly generated from another function
for i in range (len(uCode)): #user input array
for j in range (len(tempCode)): #temp array
if uCode[i] == tempCode[j]: # compare individual chars
if i == j: #compare position
flagMark = "*"
tempCode.insert(j+1, "x") #problem starts here
tempCode.remove(tempCode[j])
fCode.append(flagMark)
When the insert is reached both the tempCode and mCode change which it is not intended.
The code is written in a way should the user enter a combination of the same colours, thus checking the chras(the colours are just letters) and the position, and then mark them of with "x"
As it stands, when it gets to
tempCode.insert(j+1, "x")
the arrays will change to
mCode = ["B","R","x","G","Y"]
tempCode = ["B","R","x","G","Y"]
when I would just want
mCode = ["B","R","G","Y"]
tempCode = ["B","R","x","G","Y"]
See also this answer, which is a different presentation of the same problem.
Essentially, when you do tempCode = mCode, you're not making a copy of mCode, you're actually making another reference to it. Anything you do to tempCode thereafter affects the original as well, so at any given time the condition tempCode == mCode will be true (as they're the same object).
You probably want to make a copy of mCode, which could be done in either of the following ways:
tempCode = mCode.copy()
tempCode = mCode[:]
which produces a different list with the same elements, rather than the same list

Dict key getting overwritten when created in a loop

I'm trying to create individual dictionary entries while looping through some input data. Part of the data is used for the key, while a different part is used as the value associated with that key. I'm running into a problem (due to Python's "everything is an object, and you reference that object" operations method) with this as ever iteration through my loop alters the key set in previous iterations, thus overwriting the previously set value, instead of creating a new dict key and setting it with its own value.
popcount = {}
for oneline of datafile:
if oneline[:3] == "POP":
dat1, dat2, dat3, dat4, dat5, dat6 = online.split(":")
datid = str.join(":", [dat2, dat3])
if datid in popcount:
popcount[datid] += int(dat4)
else:
popcount = { datid : int(dat4) }
This iterates over seven lines of data (datafile is a list containing that information) and should create four separate keys for datid, each with their own value. However, what ends up happening is that only the last value for datid exist in the dictionary when the code is run. That happens to be the one that has duplicates, and they get summed properly (so, at least i know that part of the code works, but the other key entries just are ... gone.
The data is read from a file, is colon (:) separated, and treated like a string even when its numeric (thus the int() call in the if datid in popcount).
What am I missing/doing wrong here? So far I haven't been able to find anything that helps me out on this one (though you folks have answered a lot of other Python questions i've run into, even if you didn't know it). I know why its failing; or, i think i do -- it is because when I update the value of datid the key gets pointed to the new datid value object even though I don't want it to, correct? I just don't know how to fix or work around this behavior. To be honest, its the one thing I dislike about working in Python (hopefully once I grok it, I'll like it better; until then...).
Simply change your last line
popcount = { datid : int(dat4) } # This does not do what you want
This creates a new dict and assignes it to popcount, throwing away your previous data.
What you want to do is add an entry to your dict instead:
popcount[datid] = int(dat4)

Dictionary lookup fails

I am writing an Excel VBA program that validates a school course schedule. A key component is a global dictionary object that keeps track of the course number (the key) and the number of times that course is scheduled (the item). I have successfully created and loaded the dictionary. I'm trying to lookup the value associated with the course key, but have been unable to do so using the one-line examples I've found at this site. I'd like to use this line of code:
intCourseCnt = gdicCourses("BAAC 100")
or
intCourseCnt = gdicCourses.Item("BAAC 100")
but neither work (actually, the "BAAC 100" part is a string variable, but it won't even work if I hardcode a course in.) Instead, I have to use the kludgy loop code below to lookup the course count:
Private Function Check_Course_Dup_Helper(strCourse As String) As Boolean
Dim k As Variant
Check_Course_Dup_Helper = False
' Read thru dictionary. Look to see if only 1 occurrence then jump out.
For Each k In gdicCourses.Keys
If k = strCourse Then
If gdicCourses.Item(k) = 1 Then
Check_Course_Dup_Helper = True
Exit Function
End If
Exit Function
End If
Next
End Function
Is there a way to rewrite this so that I can lookup of the item value without the loop?
Thank you.
Thanks for the prompt replies. Answers below:
David, the gdicCourses("BAAC 100") code value while the program is running is "empty" which makes the receiving variable equal to 0. The result is the same if I use strCourse variable. Also, the dictionary populating code is shown below. I do not believe it is a problem because I can correctly access the values elsewhere in the program where For-Each-Next loops that use a range variable are employed. Whitespace and non-printable characters are not present.
My guess is that I need to use a range to reference the position in the dictionary rather than a string. I've tried pretty much every combination of this that I can think of, but the value is still "empty".
Set gdicCourses = New Scripting.Dictionary
For Each c In Worksheets("Tables").Range("combined_courses").Cells
If Not (gdicCourses.Exists(c)) Then
gdicCourses.Add c, (Application.WorksheetFunction.CountIF(Range("MWF_Table_Full"), c
(Application.WorksheetFunction.CountIf(Range("TTh_Table_Full"), c)))
End If
Next

Stata behaviour on macros, different outputs

I have a manual list I created in a macro in stata, something like
global list1 "a b c d"
which I later iterate through with something like
foreach name in $list1 {
action
}
I am trying to change this to a DB driven list because the list is getting big and changing quickly, I create a new $list1 with the following commands
odbc load listitems=items, exec("SELECT items from my_table")
levelsof listitems
global list1=r(levels)
The items on each are the same, but this list seems to be different and when I have too many items it break on the for loop with the error
{ required
r(100);
Also, when I run only levelsof listitems I get the output
`"a"' `"b"' `"c"' `"d"'
Which looks a little bit different than the other macros.
I've been stuck in this for a while. Again, it only fails when the number of items becomes large (over 15), any help would be very appreciated.
Solution 1:
levelsof listitems, clean local(list1)
foreach name of local list1 {
...action with `name'...
}
Solution 2:
levelsof listitems, clean
global list1 `r(levels)'
foreach name of global list1 {
...action with `name'...
}
Explanation:
When you type
foreach name in $list1 {
then whatever is in $list1 gets substituted inline before Stata ever sees it. If global macro list1 contains a very long list of things, then Stata will see
foreach name in a b c d e .... very long list of things here ... {
It is more efficient to tell Stata that you have a list of things in a global or local macro, and that you want to loop over those things. You don't have to expand them out on the command line. That is what
foreach name of local list1 {
and
foreach name of global list1 {
are for. You can read about other capabilities of foreach in -help foreach-.
Also, you originally coded
levelsof listitems
global list1=r(levels)
and you noted that you saw
`"a"' `"b"' `"c"' ...
as a result. Those are what Stata calls "compound quoted" strings. A compound quoted string lets you effectively nest quoted things. So, you can have something like
`"This is a string with `"another quoted string"' inside it"'
You said you don't need that, so you can use the "clean" option of levelsof to not quote up the results. (See -help levelsof- for more info on this option.) Also, you were assigning the returned result of levelsof (which is in r(levels)) to a global macro afterward. It turns out -levelsof- actually has an option named -local()- where you can specify the name of a local (not global) macro to directly put the results in. Thus, you can just type
levelsof listitems, clean local(list1)
to both omit the compound quotes and to directly put the results in a local macro named list1.
Finally, if you for some reason don't want to use that local() option and want to stick with putting your list in a global macro, you should code
global list1 `r(levels)'
rather than
global list1=r(levels)
The distinction is that the latter treats r(levels) as a function and runs it through Stata's string expression parser. In Stata, strings (strings, not macros containing strings) have a limit of 244 characters. Macros containing strings on the other hand can have thousands of characters in them. So, if r(levels) had more than 244 characters in it, then
global list1=r(levels)
would end up truncating the result stored in list1 at 244 characters.
When you instead code
global list1 `r(levels)'
then the contents of r(levels) are expanded in-line before the command is executed. So, Stata sees
global list1 a b c d e ... very long list ... x y z
and everything after the macro name (list1) is copied into that macro name, no matter how long it is.

Resources