Need help organizing text in Python 3 - python-3.x

First time here, I need help organizing an inputted text file in Python
The text file contains basketball team names, their conference, and wins/losses
Example:
Miami (FL) (Atlantic Coast) 32 5
UCLA (Pac-12) 32 5
Fresno St. (Western Athletic) 25 8
I need these created into objects of:
team_name
conf
wins
losses
My problem is the diverse format these come in, for instance, Miami (FL) is a name, some names contain two strings, others contain one. Some conferences contain one string or two, and in the Miami example, the name contains parenthesis. I can't figure out any good way to sort these because they're all so different. Any help would be amazing, thanks!

Related

Determining a value depending on where you are on a list in Excel

I am tracking the progress of students reading books in class. For the number of books they read they get a reward. It is not 1 book = prize, 2 books = prize. Instead there are dead spaces along the way, for example there is no reward for books 3 and 4 but there is for 5 books. I want to be able to input the number of books each student read and have it update as to what the next reward will be. For example:
List one
Name Books Next Tier Prize
Sally 4 5 Candy Store
Luke 1 2 Extra coloring time
Jane 8 10 10 Extra minutes on the playground
And so on
The table for rewards would be
Books Prize
1 Ribbon
2 Extra coloring time
5 Candy Store
7 Prize bucket
10 10 Extra minutes on the playground
And so on
This is just an abbreviated list and I have used if then statements previously. However the previous list that had 18 values was cumbersome as it was, the new list has 35 values.
I have used if/then statements in the past in combination with vlookup, but with the increased number of values, it just seemed daunting. I could still use an if statement but was hoping there would be an easier way.
Put this in C2 and copy over and down:
=INDEX(F$2:F$6,IFERROR(MATCH($B2,$F$2:$F$6),0)+1)

Find a value depending on first, what type of event it is, second, what group is playing and, finally, where it is between two dates. Excel attached

I created a dataset to illustrate what I need help with. Its made up data so please pardon the strangeness.
I am trying to get a value from Col E "RESULT" and get it into Col K "Result", based on data like:
Tennis 1 01/01/2020
Tennis 2 04/01/2021
Basketball 2 25/05/2018
Squash 2 11/09/2019
Football 1 18/02/2016
The sport can vary so, that's the first variable;
The group that plays can vary so, that's the second variable. (I've only used 2 groups to make the dataset easier to create but assume there could be up to 6 or 7 groups playing);
Finally, the third variable, there's the date which falls between a start date and an end date (Col C and D).
I could use the lookup function to find where a date fits between two other dates but when you have 2 variables before hand, I have no idea how to deal with that.
BTW, just a final comment incase anyone asks, I don't know how to use VBA yet :/
Can anyone please help?

How to tidy up postal addresses in an excel sheet?

I'm using a spreadsheet to process information on people as it comes in, including names, addresses, etc.
The way we receive the addresses allows for them to come in very messily and the addresses themselves vary so much. We also have Eircodes which are seven digits but provide further issues as some people put them in as "XXX XXXX" and others "XXXXXXX" and some only put in half the Eircode to begin with. All this information appears in a single cell with no commas to separate anything.
The addresses come in all over the place, examples would include:
12 Example Street Made Up Area Co. Dublin Dublin Dublin 22 D220000
12 Example Street Made Up Area Dublin D 22
12 Example Street Made Up Area Made Up Area Dublin D220000
I want to make them look roughly like this:
12 Example Street Made Up Area Dublin
The County name is different across the addresses but is always included and is always one of the 26 Counties in ROI.
Can anyone help me with a VBA code for this please? I have a few VBAs to sort out other information and I've combined them all to a Master command and a single button on the main page, so I'd like to add it to that.
P.S. I'm having a similar issue with phone number prefixes so I'm hoping the logic here will apply there too.

Merging data points in NLTK's ConditionalFreqDist

I have a Prussian newspaper corpus covering the years from 1863 to 1894 and want to plot the word usage over time. The corpus consits of roughly 2400 xml files, one file for each issue. If I would plot the ConditionalFreqDist I would get a graph with 2400 data points on the x-axis, which renders the graph unreadable.
How can I merge the information concerning the same year, displaying the average usage of each word in my search list u_input? E.g: I have 3 files for the year 1863, looking for the word 'König' - king (among other search terms), the first file contains 1 mention, the 2nd file 3 and the 3rd file 2. I would like the graph to only have one data point '1863' with the value '2'.
The plotting function:
def _plot_input():
cfd = nltk.ConditionalFreqDist(
(target, fileid[:-4]) # takes first 4 characters as lable names = year
for fileid in reader.fileids() # for all files in directory
for w in reader.words(fileid) # for all words in each file
for target in u_input
if w.lower().startswith(target) # includes words like 'könliglich' if search term was 'König'
)
cfd.plot(title='Word usage over time in Prussian Newspapers')
u_input is a list containing the words I'm analyzing, reader is my corpusreader object, files are named like this yyyy-mm-dd.xml, e.g. "1867-03-06.xml".
Thanks in advance.
Edit:
The quick fix would be to loop over all files, read all files beginning with the same year and write the contents into one single new file for each year.
To extract the year from the filename you must write fileid[:4], not fileid[:-4]. Once you do that, you'll have only as many x positions as there are distinct years in your corpus. This is exactly equivalent to the "quick fix" you suggest.
However, the y values will be totals for the year, not per-file averages within each year as you ask. If this is really what you needed, edit your question to clarify. (I suspect that what you really need is an average over the total number of words in a year; anything else is nonsense, unless all your files are exactly the same size.)

EXCEL: multiple condition count and data listing/summary

I need to do a multiple condition count and data listing/summary in EXCEL.
I have an EXCEL file with the following:
A B C D <-- Columns
tennis Jan 4 Smith John
tennis Jan 4 Fellows Todd
tennis Jan 4 Biebs Justin
football Jul 8 Smith John
football Jul 8 Rucker Pete
tennis Aug 7 Smith John
etc...
I have to figure out by Last name/First Name (col D/col C), which activity they participated in (and it could be multiple times (ie - tennis on Jan 4 and Aug 7)).
I've researched VLOOKUP and countif and I can make them work on other files but I can't get them to work with this one. I know I could sort by names and manually count but trying to figure out a way to use multiple conditions to get the answer (without having to manipulate the file too much because it is not my file in terms of who is entering the information).
Any help is appreciated (even if it requires partial manual effort!) Anything is better than manually sorting the file multiple different ways.
Thanks!
-Dan
I suggest:
two extra columns in your source data. One (say wk) with =WEEKNUM(B2), the other (say bod) with =E2&", "&F2 and both copied down to suit.
a single pivot table (the fields can be rearranged to get different 'views' and/or filtered/sorted/grouped to suit)
If you find multiple activities for the same bod for the same wk, then click on the counts that are greater than 1 and the details should help to determine whether adding a flag may be appropriate, to distinguish camps.

Resources