How can I find text between two headings from docx in python - python-3.x

I want to extract information from the resume, for this, I have to identify headings and take text data underneath that heading.

I think you need to be more specific to your issue and approach you want to take. As of now, for heading extraction, you can define a corpus first form all the headings after reading in beautiful soup. Once such corpus is created you can now match the corpus with heading of the resume and get the section by defining the starting and ending data point. and then match skills et. whatever you want to do with it.
This is the simplest approach based on your current question. Be more specific so, i can guide with more precise approach.
Best,

Related

Fetching data from one sheet, to produce organised summary and display it in sections

I am trying generate a summary page for a list of lessons from a different sheet.
I'm currently using the formula =UNIQUE(FILTER('Lessons NEW'!$E2:$E1009,(RIGHT(LEFT('Lessons NEW'!$E2:$E1009,5),1)="1")+(LEN('Lessons NEW'!$E2:$E1009)=3))) to do so.
This is displaying my list like so, with the code column being the only really important one, as the rest could be fetched from it's result.
This works, but there are two features that I want working that I've not been able to find a way to do;
Split the output into groups. I am after a title to each group/section, and a gap between them too. As in screenshot here.
Arrange it to display in multiple columns (As in have half the results in column B, and half in column G for example.) In the process of this, I'd prefer the resulting sections (as in point 1, aren't broken, and kept together instead of being split between columns.
I'm not sure if what I'm asking is too much, or very much doable, but keen for suggestions or ideas if there is a way.
Thanks in advance!
EDIT:
I've updated the formula (above) and added a title to the source column that it's fetching from. It's now producing this.
What I want it to do, is to break it further for aesthetics and for easy separation when others are looking at it, and to bolden the title row for each section. (I think I can work out the conditional formatting for the title row...)
This is what I want it to end up looking like.
Google drive link to demo sheet: https://docs.google.com/spreadsheets/d/1yx9LWeV7RHfmlldUpdUZjaU8eVdOsUVaeFrfoBypeDs/edit?usp=sharing

how to add columns with 'filled data' after filling missing values in pandas or python using different techniques?

How to add columns with 'filled data' after filling missing values in pandas or python, using different or several techniques like various statistical techniques or machine learning techniques.
What I want to do is that, after filling the data let's say with mean, median or standard deviation values or with other machine learning algos, like KNN or XGBoost or some other technique, then I want to add or append those or that particular column(s) at the end of the csv or excel file but not below the actual data, I mean towards the right-end side of the file.
For instance, I've filled the missing data of a particular column using statistical techniques and other ML techniques then I want to add those 'filled values' along with the original values in a new column having it's actual name with underscore and the technique with which the data is filled for that particular feature and add it at the end of the data to the right side of the data. Example, the column or feature is 'phone' then at the right-end side after filling missing values it must show the whole original or actual values plus the values calculated by statistical means or ML means with column name like "phone_Mean" or "phone_interpolation" or 'phone_KNN' or 'phone_XGBoost' like that.
What I've done so far ?
I've applied the ways from the pandas documentation page and stackoverflow as well, the ones which are generally high enlisted and are in top 7/10 links on google or duckduckgo search engines, but all went in vain.
I'm really facing this issue from last few days due to which I'm crippled at convincing my client. So, it will be great help if you can assist me with some code example using pandas or core python code to support your answer.
Here's the snippet of the dataset. Let's say I'm applying techniques on a feature/column named 'phone':
One of the way is by making use of pandas like:-
df_01["phone_mean"] = df_01["phone"].fillna().mean()

Alteryx Analyse the similarity of the words

I am currently doing out the top 10 types of fault chart. So the user will key in what is the fault about, ex. light bulb fused. As it is free flow text box, the words may not be the same. Is there anyway to make Alteryx understand that some words may be the same, allowing me to find the top 10 types of fault. Thank you.
You have a couple of ways. You can use the Fuzzy Match tools in the Join category to sort out slight spelling mistakes. You can find Alteryx examples of Fuzzy Match on Youtube.
You can also use the Record ID followed by Text to Columns (Split to Rows based on space) to get a list of single words.
In what you are trying to do, I would advise building up a bit of a lookup table. You can then use the Find-Replace Tool to Append the Category from the lookup depending on the words that are found.
Depending on the cleanliness of your data and how different each category is will guide you as to how far down the above paths you should go.

excel vba Delete entire row if cell contains the GREP search

I have a single column of text in Excel that is to be used for translating into foreign languages. The text is automatically generated from an InDesign File. I would like to clean it up for the translator by removing rows that simply contain a number ("20", 34.5" etc), or if they contain a measurement "5mm", "3.5 µm", etc. I've found many posts (see link below) on how to remove a row with specific string, but none that use search strings, such as those I typically use with GREP searches: "\d+" and "\d.\d µm"
How would I do this? I am on Mac iOS if that helps.
Note that I would need to delete the row if the cell only contains a number or a measurement, not if the number is contained within a phrase, sentence, or paragraph, etc.
https://stackoverflow.com/a/30569969
It may not be what you are looking for, but how about just sorting the column and remove the rows starting with numbers? It is a manual approach but from what I understand this translation process only happens from time to time. Am I right?
I see two possible issues in your question:
How to work with regular expressions in Excel?
How to delete rows in a loop?
Let me start with the second question: when you want to create a for-loop in order to remove items from a list, you MUST start at the end and go back to the beginning (it's a beginner's trick, but a lot of people trip over it.
About the first question: this is a very useful post about this subject, it's too large to even give a summary here.

Replacing numeric values in Excel sheet with text values from other sheet

I am using Surveymonkey for a questionnaire. Most of my data has a regular scale from 0-6, and additionally an "Other" option that people can use in case they choose to not answer the item. However, when I download the data, Surveymonkey automatically assigns a value of 0 to that not-answer category, and it appears this cant be changed.
This leads to me not knowing when a zero in my numeric dataset actually means zero or just participants choosing to not answer the question. I can only figure that out by looking at another file that includes the labels of participants answers (all answers are provided by the corresponding labels, so this datafile misses all non-labeled answers...).
This leads me to my problem: I have two excel files of same size. I would need to find a way to find certain values in one dataset (text value, scattered randomly over dataset), and replace the corresponding numeric values in the other dataset (at the same position in the dataset) with those values.
I thought it would just be possible to find all values and copy paste in the same pattern, but I cannot seem to find a way to do that. I feel like I am missing an obvious solution, but after searching for quite a while I really could not find an answer to my specific question.
I have never worked with macros or more advanced excel programming before, but have a bit of knowledge about programming in itself. I hope I explained this well, I would be very thankful for any suggestions or scripts that could help me out here!
Thank you!
Alex
I don't know how your Excel file is organised, but if it's like the legacy Condensed format, all you should need to do is to select the column corresponding to a given question (if that's what you have), and search and replace all 0 (match entire cell) with the text you want.

Resources