Excel file clean up - excel

I have an exported excel file where sentences are repeated without a space or period after each sentence. Is there any way to clean this up by removing the repeated sentence without doing this manually ? Here is the sample of sentence
Integrase, superantigen-encoding pathogenicity islands SaPIIntegrase, superantigen-encoding pathogenicity islands SaPI

if your sentence is only repeated once, then this will do
=LEFT(A1,LEN(A1)/2)

Put the following formula in cell B1
=LEFT(A1, LEN(A1)/2)
Then select B1 and double click the drag handle (little black box in bottom right corner). If I understood your problem correctly you now have "single instances" in column B. Finally, select all of column B, copy, and do Paste Special Values into column A. Lastly, delete column B.

Related

Add blank column to query

Is there a way to add a blank column to a query in Query Studio? I tried to use a calculation on an existing column but the only options that I get are for First Characters, Last Characters, Concatenation, and Remove Trailing Spaces. None of these options allow you to enter a decode, case or IF statement.
Any assistance is greatly appreciated. Thanks.
It's a bit of a hack as Query Studio is really all about making it easy to get data and doing anything with layout is really a job for Report Studio, but you can do the following:
a) create a calculated column on a text field. Select 'Concatenation' as the operation and put a space as the preceding text. Click ok
b1) right-click on the new column and select 'Format', then 'Text' and enter 1 for the number of characters
or
b2) create another calculated column from the first calculated column, set it to 'first characters' and enter 1 for the number of characters. The first calculated column can now be deleted.
Both of these approaches will give a column that only contains a single space - not actually blank but close enough for most purposes. The first approach is a little quicker but may result in the text still existing in some output versions (e.g. csv) - I'd need to do more testing to confirm.
The column title can be edited (to be set to blank) by double clicking it, of course.

How to apply conditional formatting in excel for multiple text conditions

I am trying to delete a large number of cases (Tweets) in excel based on certain words. Only one word has to be present for me to delete it.
example:
blue big bird
orange bird flies
elephant is angry
cool cat in tree
List of words I would want to delete on: bird, blue and cat. Therefore, the function should delete 1. 2. and 4, no matter if all words are present, or only one or two. Currently I only know how to format it based on one word, but I have roughly 50 words per file to filter on, so it would save a lot of time to have a function. I am not sure which function works for this? I already have a list of the words I want to delete on in another spreadsheet.
The formula you are searching is
=IF(SUM(COUNTIF(B2,"*"&{"cool","orange"}&"*"))>0,B2,"")
where B2 is a cell in the row with your values (e.g. 1. blue big bird).
Apply this for every cell and you get the cell value for every hit and "" for no hit.
If you already have a list of words in a spreadsheet, you can assign that list a range name, since 50 words are a bit much for a formula and hard to maintain.
Consider the following screenshot. The highlighted range has the range name TriggerWords.
The formula in cell B1 is
=IF(SUM(COUNTIF(A1,"*"&TriggerWords&"*"))>0,"",A1)
which is an array formula that must be confirmed with Ctrl-Shift-Enter. Then copy down.

Excel macro? formula? for duplicate lines

I have an excel file with patient ID entries and date of blood draw and which tubes were drawn on that date. The way it's currently set up, there's a new line entry for each date (regardless of whether the patient already exists).
How can I easily "transpose" (correct word) the spreadsheet so that there is only one line per unique patient and the multiple dates are converted into additional columns instead of duplicate records?
As suggested by #chancea, you might apply a PivotTable:
If you want actual dates in each row rather than a count to indicate which column is relevant then, in the example in E11 and copied across and down to suit:
=IF(ISNUMBER(E3),E$2,"")
To compact such results you might want to select (in the example D11:L13), Copy and Paste Special..., Values (to D16) then select E16:L18 and HOME > Editing - Find & Select, Replace, Replace with: z, Replace All, OK, Find what: z, Replace with : nothing, Replace All, OK followed by Find & Select, Go To Special..., check Blanks (only), Delete..., check Shift cells left, OK.

Formula for excel

Help me please, to find a formula for excel, which takes all the words in the text (for example, text from column A) and gives all the words from the text without repeating in a column B.
For example,
Column A
Text
Although simplicity is a virtue, theories regarding pedagogy do not work in practice if they are black and white. To say that the best way to teach is only to praise positive actions and to ignore negative ones is like saying that strawberries reduce one’s risk for cancer so people should cut apples out of their diet and only eat strawberries. In both situations, there does not have to be a choice.
Column B - Words from text
Although
simplicity
is
a
virtue,
theories
regarding
pedagogy
do
not
work
in
practice
if
they
are
black
and
white.
To
say
that
the
best
way
to
teach
is
only
to
praise
positive
actions
and
to
ignore
negative
ones
is
like
saying
that
strawberries
reduce
one’s
risk
for
cancer
so
people
should
cut
apples
out
of
their
diet
and
only
eat
strawberries.
In
both
situations,
there
does
not
have
to
be
a
choice.
This is a rather complex thing for a single formula .... here's a method ...
part 1: splitting a text into single words:
A1: your text
A3: =SUBSTITUTE(A1,",","") .... removing commas
A5= =SUBSTITUTE(A3,".","") .... removing full stops (repeat this for other punctations you might have
A8: constant value 0
A9: =FIND(" ",$A$5,A8+1) .... find the first blank in $A$5 after the position indicated by the cell above .... copy this formula down until you get the first #VALUE error
B9: =MID($A$5,A8+1,A9-A8-1) .... extract the word between previous and this blank position .... copy this formula down until you get the first #VALUE error
when you are happy with your split list, copy/paste as values the list and do some headers
part 2: finding uniques words:
You need to find each unique word exactly once. A method strictly without VBA would consist of the following:
sort the text in column B ascending
enter in C8: =IF(B8=B7,C7+1,1) and copy down to end of list ... you create a "running number starting with 1 and continuing to increment as long as the word remains the same
autofilter column C for value = 1 ... this will display the first occurence of each word
copy / paste the filtered list to whereever you want to store it for further processing ... I recommend a sheet different from your raw data
You can restore the original sort order of the result by sorting on the numeric values in column A.
As you can see in the example of words "in", "to", this method is case insensitive. A limitation is a possible false seperation between "ones" and "one's" ... this needs to be decided.
You can try this formula:
=TRIM(MID(SUBSTITUTE($A$1;" ";REPT(" ";LEN($A$1)));1+(ROW(A1)-1)*LEN($A$1);LEN($A$1)))
Assuming test in A1, write formula in B1 and copy down till you got last word
Depending on your regional settings you may need to replace ";" by ","

How do I remove duplicate content within a sigle excel cell

I have individual cells in excel with the following content in each of them
http://www.teng.mossdemo.com.au/wp-content/uploads/images/products/m1423.jpg|http://www.teng.mossdemo.com.au/wp-content/uploads/images/products/m1423.jpg
http://www.teng.mossdemo.com.au/wp-content/uploads/images/products/rt2899.jpg|http://www.teng.mossdemo.com.au/wp-content/uploads/images/products/rt2899.jpg
This is one cell in a long row for a dump of data for products within an ecommerce site. A data migration has somehow added the same image more than once to the same product. Each separate image image is separated by the Pipe "|" symbol.
I want to search each cell in this column of the sheet and remove the duplicated image reference and the Pipe symbol.
So the examples above become
http://www.teng.mossdemo.com.au/wp-content/uploads/images/products/m1423.jpg
and
http://www.teng.mossdemo.com.au/wp-content/uploads/images/products/rt2899.jpg
The suggested answer of finding the pipe with SEARCH is a good general answer, however in this instance as the source string is always twice the length of the desired we can just chop it in half with the formula below and drag it down.
=LEFT(A1,(LEN(A1)-1)/2)
In addition to a formula, you can use Data>Text to Columns, which is a good thing to know about. Select the entire column and then you up the dialog. In step one choose "Delimited" and in step two choose the pipe symbol:
When you're finished, delete the first column.
I figured out that this works for some more complex scenarios. I think it should work for this one as well.
=IFERROR(LEFT(C2,(FIND(LEFT(C2,20),C2,2)-2)),C2)
I entered this into D2 and copied it all the way down the column. I then copied and pasted the values back into Column C.
The problem I had was that not all of the cells in my column had duplicate text. Of those that did, the duplications were not delineated by any unique character (There was a single space in front of each duplication.), and the duplicated text was often an incomplete duplication so the length was not consistently symmetrical.
The "20" is an arbitrary number of characters I picked for excel to use from the front of the text to identify where the text started to repeat. There are enough people here who know excel better than I who can explain what the rest of the formula does. I figured it out by poking around.

Resources