Remove list of strings from list of strings - string

I have a long list of strings (the main list), and a short list of strings. The short list is an exclude list, and I want to remove all occurences of the elements in the exclude list from the main list.
I've found these two ways of doing it, but none of them seem to work:
val fnrliste: MutableList<String> = ArrayList()
val excludeList = listOf("28030140259", "12050101833", "21089233132", "12050101833")
//Alternative 1:
for (fnr in excludeList) { fnrliste.remove(fnr) }
//Alternative 2:
fnrliste.removeAll(excludeList)
With both alternatives, the strings in the entries are still there when I list the contents of fnrliste.
As can be seen in this screenshot, some entries have been removed (different result with the two methods), but the first entry in the exclude list is still present:
What am I missing here?

Looks like your fnrliste contains some duplicates, so that's why you get different result. Also, as far as I can remember your screenshot, in case of removeAll the first element is "28030140259 " and not "28030140259" so everything is correct

Related

Python: (partial) matching elements of a list to DataFrame columns, returning entry of a different column

I am a beginner in python and have encountered the following problem: I have a long list of strings (I took 3 now for the example):
ENSEMBL_IDs = ['ENSG00000040608',
'ENSG00000070371',
'ENSG00000070413']
which are partial matches of the data in column 0 of my DataFrame genes_df (first 3 entries shown):
genes_list = (['ENSG00000040608.28', 'RTN4R'],
['ENSG00000070371.91', 'CLTCL1'],
['ENSG00000070413.17', 'DGCR2'])
genes_df = pd.DataFrame(genes_list)
The task I want to perform is conceptually not that difficult: I want to compare each element of ENSEMBL_IDs to genes_df.iloc[:,0] (which are partial matches: each element of ENSEMBL_IDs is contained within column 0 of genes_df, as outlined above). If the element of EMSEMBL_IDs matches the element in genes_df.iloc[:,0] (which it does, apart from the extra numbers after the period ".XX" ), I want to return the "corresponding" value that is stored in the first column of the genes_df Dataframe: the actual gene name, 'RTN4R' as an example.
I want to store these in a list. So, in the end, I would be left with a list like follows:
`genenames = ['RTN4R', 'CLTCL1', 'DGCR2']`
Some info that might be helpful: all of the entries in ENSEMBL_IDs are unique, and all of them are for sure contained in column 0 of genes_df.
I think I am looking for something along the lines of:
`genenames = []
for i in ENSEMBL_IDs:
if i in genes_df.iloc[:,0]:
genenames.append(# corresponding value in genes_df.iloc[:,1])`
I am sorry if the question has been asked before; I kept looking and was not able to find a solution that was applicable to my problem.
Thank you for your help!
Thanks also for the edit, English is not my first language, so the improvements were insightful.
You can get rid of the part after the dot (with str.extract or str.replace) before matching the values with isin:
m = genes_df[0].str.extract('([^.]+)', expand=False).isin(ENSEMBL_IDs)
# or
m = genes_df[0].str.replace('\..*$', '', regex=True).isin(ENSEMBL_IDs)
out = genes_df.loc[m, 1].tolist()
Or use a regex with str.match:
pattern = '|'.join(ENSEMBL_IDs)
m = genes_df[0].str.match(pattern)
out = genes_df.loc[m, 1].tolist()
Output: ['RTN4R', 'CLTCL1', 'DGCR2']

Indexing inner lists in "list of lists"

bigList = [list1,list2,list3]
I need a way to get the name of list1 using the zero index
bigList[0] just gives you all the items in the list
My original code prints all the items in the lists while the modified one says the index is out of range. I want it to give me the ilness names not all the symptoms.
Original code
https://i.stack.imgur.com/HDlA9.png
Modified
https://i.stack.imgur.com/5Qdh2.png

Check if string is in list with python

I'm new to python, and I'm trying to check if a String is inside a list.
I have these two variables:
new_filename: 'SOLICITUDES2_20201206.DAT' (str type)
and
downloaded_files:
[['SOLICITUDES-20201207.TXT'], ['SOLICITUDES-20201015.TXT'], ['SOLICITUDES2_20201206.DAT']] (list type)
for checking if the string is inside the list, I'm using the following:
if new_filename in downloaded_files:
print(new_filename,'downloaded')
and I never get inside the if.
But if I do the same, but with hard-coded text, it works:
if ['SOLICITUDES2_20201206.DAT'] in downloaded_files_list:
print(new_filename,'downloaded')
What am I doing wrong?
Thanks!
Your downloaded_files is a list of lists. A list can contain anything insider it, numbers, list, dictionaries, strings and etc. If you are trying to find if your string is inside the list, the if statement will only look for identical matches, i.e., strings.
What I suggest you do is get all the strings into a list instead of a list of lists. You can do it using list comprehension:
downloaded_files = [['SOLICITUDES-20201207.TXT'], ['SOLICITUDES-20201015.TXT'], ['SOLICITUDES2_20201206.DAT']]
downloaded_files_list = [file[0] for file in downloaded_files]
Then, your if statement should work:
new_filename = 'SOLICITUDES2_20201206.DAT'
if new_filename in downloaded_files_list:
print(new_filename,'downloaded')
Your code is asking if a string is in a list of lists of a single string each, which is why it doesn't find any.

Is there a simple way to remove sublits

I have a list(rs_data) with sublists obtained from a Dataframe, and some rows from Dataframe contain multiple elements, like those:
print(rs_data)
rs1791690, rs1815739, rs2275998
rs6552828
rs1789891
rs1800849, rs2016520, rs2010963, rs4253778
rs1042713, rs1042714, rs4994, rs1801253
I want to obtain a list in which each element (rs….) is separated, something like this:
{'rs1791690', 'rs1815739', 'rs227599', 'rs401681', 'rs2180062', 'rs9018'….}
How can I eliminate sublits or generate a new list without sublists, in which each element is unique.
To generate a new list you could iterate over the old one and throw out the elements you don't like.
Something like this
for i in rs_data:
if i in bad_values:
# do something
else:
# do something else
If you just want to eliminate duplicates it would be the best to use a set
Like this
mynewset = set(rs_data)

How to check if there are two identical strings in a list

I'm making a game of hangman. I use a list to keep track of the word that you are guessing for, and a list of blanks that you fill in. But I can't figure out what to do if for example someone's word was apple, and I guessed p.
My immediate thought was to just find if a letter is in the word twice, then figure out where it is, and when they guess that letter put it in both the first and second spot where that letter is. But I can't find
How to test if two STRINGS are duplicates in a list, and
If I were to use list.index to test where the duplicate letters are how to I find both positions instead of just one.
Create a string for your word
Create a string for user input
Cut your string into letters and keep it on a list/array
Get input
Cut input into letters and keep it on another array
Create a string = "--------" as displayed message
Using a for loop check every position in both array lists and compare them
If yourArray[i] == inputArray[i]
Then change displayedString[i] = inputArray[i] and display message then get another input
If it doesnt match leave "-" sings
Displayed the "---a--b" string
One way to do it would be to go through the list one by one and check if something comes up twice.
def isDuplicate(myList):
a = []
index = 0
for item in myList:
if type(item) == str:
if item in a:
return index
else:
a.append(item)
index += 1
return False
This function goes through the list and adds what it has seen so far into another list. Each time it also checks if the item it is looking at is already in that list, meaning it has already been seen before. If it gets through the whole list without any duplicates, it returns False.
It also keeps track of the index it is on, so it can return that index if it does find a duplicate.
Alternately, If you want to find multiple occurrences of a given string, you would use the same structure with some modifications.
def isDuplicate(myList, query):
index = 0
foundIndexes = []
for item in myList:
if item == query:
foundIndexes.append(index)
index += 1
return foundIndexes
This would return a list of the indexes of all instances of query in myList.

Resources