How can I batch format string attributes in a CSV file? - string

So I have a csv file of around 15,000 rows. I only need to edit one of the 10 columns which is a Postcode. None of the columns have headers. It is currently in the format 'AB101AA' which I need to change to 'AB10 1AA'.
First off, is there a method for which I can do this for every row?
Then it gets more complicated in that Postcodes vary in format to these four types;
'A1 1AA',
'A10 1AA',
'AB1 1AA' and
'AB10 1AA'.
What I'm trying to do is to find a way to run through every row and first of all test the format to check whether it is as above and then edit if needs be, to force that space.
Any help would be much appreciated.
Cheers.

How about opening it in Excel, then
Use a formula to add a column which takes the first LENGTH(A1)-3 characters, a space, and the last three characters (copy/paste, or drag the + on the lower right corner of the cell, to make sure the formula is replicated in every row of that new column)
Copy the extra column
Paste the values over the original column
Delete the extra column

Related

Excel selective increment

Hi I'm trying to write a script in excel that returns a correlating job serial number (which resides on a different sheet) when a cell is filled with the job name.
=IF(D5="Misc",Jobs!A2,IF(D5=1715,Jobs!A3,IF(D5=1725,Jobs!A4,IF(D5=5640,Jobs!A5,IF(D5=6121,Jobs!A7,IF(D5=6150,Jobs!A8,IF(D5="6161-IDC",Jobs!A10,IF(D5="6161-JM",Jobs!A11,IF(D5=6161,Jobs!A12,IF(D5=6535,Jobs!A14,IF(D5="Hudson",Jobs!A14,IF(D5="Berendo",Jobs!A15,IF(D5="Berendo-Move",Jobs!A16,IF(D5="Bungalos",Jobs!A17,IF(D5="Bungalo",Jobs!A17,IF(D5="Camarillo",Jobs!A18,IF(D5="Indio",Jobs!A19,IF(D5="Lillian",Jobs!A20,IF(D5="6161-Beam",Jobs!A21,IF(D5="6161-Roof",Jobs!A22))))))))))))))))))))
The above script does what I need it to do, the problem is I need it copied to 30+ rows with ONLY the "D5" value incrementing. When I hover in the cell corner and drag down to new cells it increments other references besides the "D5" value.
I did a search and replace and manually copied the script to each of the 30 cells to get it functional for now but I'm going to have to do that every time I add a job. I'd like to just add the new job condition and use the fill handle to drag it to all cells, I feel like I need an escape character though to limit what cells are being incremented, I'm just not sure what that would be (it's always going to be column "D", I just need the row incremented to "d6, d7," etc)- thank you in advance for your help!
Seems a case sensitive replacement of A$ for A may be adequate:
=IF(D5="Misc",Jobs!A$2,IF(D5=1715,Jobs!A$3,IF(D5=1725,Jobs!A$4,IF(D5=5640,Jobs!A$5,IF(D5=6121,Jobs!A$7,IF(D5=6150,Jobs!A$8,IF(D5="6161-IDC",Jobs!A$10,IF(D5="6161-JM",Jobs!A$11,IF(D5=6161,Jobs!A$12,IF(D5=6535,Jobs!A$14,IF(D5="Hudson",Jobs!A$14,IF(D5="Berendo",Jobs!A$15,IF(D5="Berendo-Move",Jobs!A$16,IF(D5="Bungalos",Jobs!A$17,IF(D5="Bungalo",Jobs!A$17,IF(D5="Camarillo",Jobs!A$18,IF(D5="Indio",Jobs!A$19,IF(D5="Lillian",Jobs!A$20,IF(D5="6161-Beam",Jobs!A$21,IF(D5="6161-Roof",Jobs!A$22))))))))))))))))))))

Add blank column to query

Is there a way to add a blank column to a query in Query Studio? I tried to use a calculation on an existing column but the only options that I get are for First Characters, Last Characters, Concatenation, and Remove Trailing Spaces. None of these options allow you to enter a decode, case or IF statement.
Any assistance is greatly appreciated. Thanks.
It's a bit of a hack as Query Studio is really all about making it easy to get data and doing anything with layout is really a job for Report Studio, but you can do the following:
a) create a calculated column on a text field. Select 'Concatenation' as the operation and put a space as the preceding text. Click ok
b1) right-click on the new column and select 'Format', then 'Text' and enter 1 for the number of characters
or
b2) create another calculated column from the first calculated column, set it to 'first characters' and enter 1 for the number of characters. The first calculated column can now be deleted.
Both of these approaches will give a column that only contains a single space - not actually blank but close enough for most purposes. The first approach is a little quicker but may result in the text still existing in some output versions (e.g. csv) - I'd need to do more testing to confirm.
The column title can be edited (to be set to blank) by double clicking it, of course.

EXCEL Filter Columns - Containing Numbers Greater Than X

So here's an example of my data in one of the columns
/festivals/upcoming
/win/competitions/572
/latest/reviews/14940 --- THIS ONE TO BE SHOWN
/latest/news/15521
/download-festival-2014/lineup
Is it possible to filter this column so the 3rd one down (as commented) to be shown. What I need to do is take the last string after the last forward slash, check if it's a number, if it is a number and if it's less than 14650, to keep it
I exported the excel from google analytics so if it's easier to do there, then all good!
Thanks
if the data was in A1 to An do this, and drag down: =IF(IFERROR(VALUE(TRIM(CLEAN(TRIM(LEFT(RIGHT(SUBSTITUTE($A1,"/",REPT(" ",99)),(2-COLUMN(A1))*99),99))))),"")>14650,IFERROR(VALUE(TRIM(CLEAN(TRIM(LEFT(RIGHT(SUBSTITUTE($A1,"/",REPT(" ",99)),(2-COLUMN(A1))*99),99))))),""),"")

Remove Duplicates with or without sorting

I have a large column of texts (5 digit integers concatenated with two letters, like: 12345AB ) and values (up to 8 digit positive integers, like: 12345678) . The list is around 12,200 total and when I do remove duplicates, it reduces to 7015 total. If I sort the result and then do another remove duplicates, I am left with 6324 entries. On the other hand if I sort first and then do remove duplicates, I am left with 6324 entries.
Is this a common issue that when number and text are mixed up that removing duplicates works only after sorting.
I can upload my file if this is not a common issue and is a problem with my file. I'm guessing if the row starts with numbers (text) then the excel search algorithm only goes down the column till such a point that it stops seeing numbers (text) and we miss out on the duplicates that show up later?
I shudder at the thought that I've been using remove duplicates incorrectly all this while.
Please help. Thanks.
EDIT To Include the actual file I am working with:
Link here
seems like you want to ensure is that they're all the same type, no? an easy way to coerce a cell to be text is:
=A1 & ""
and a number is:
=A1 * 1
I was able to accomplish this by using the Text to Columns option.
Select column (B)
Select Text to Columns on the Data Tab
Select delimited click next
Click next as there are no delimiters
Under column data format select Text
Then remove duplicates
I ran into this issue with VLookup before as well it ensures proper formatting of all data in the column.

How do I remove duplicate content within a sigle excel cell

I have individual cells in excel with the following content in each of them
http://www.teng.mossdemo.com.au/wp-content/uploads/images/products/m1423.jpg|http://www.teng.mossdemo.com.au/wp-content/uploads/images/products/m1423.jpg
http://www.teng.mossdemo.com.au/wp-content/uploads/images/products/rt2899.jpg|http://www.teng.mossdemo.com.au/wp-content/uploads/images/products/rt2899.jpg
This is one cell in a long row for a dump of data for products within an ecommerce site. A data migration has somehow added the same image more than once to the same product. Each separate image image is separated by the Pipe "|" symbol.
I want to search each cell in this column of the sheet and remove the duplicated image reference and the Pipe symbol.
So the examples above become
http://www.teng.mossdemo.com.au/wp-content/uploads/images/products/m1423.jpg
and
http://www.teng.mossdemo.com.au/wp-content/uploads/images/products/rt2899.jpg
The suggested answer of finding the pipe with SEARCH is a good general answer, however in this instance as the source string is always twice the length of the desired we can just chop it in half with the formula below and drag it down.
=LEFT(A1,(LEN(A1)-1)/2)
In addition to a formula, you can use Data>Text to Columns, which is a good thing to know about. Select the entire column and then you up the dialog. In step one choose "Delimited" and in step two choose the pipe symbol:
When you're finished, delete the first column.
I figured out that this works for some more complex scenarios. I think it should work for this one as well.
=IFERROR(LEFT(C2,(FIND(LEFT(C2,20),C2,2)-2)),C2)
I entered this into D2 and copied it all the way down the column. I then copied and pasted the values back into Column C.
The problem I had was that not all of the cells in my column had duplicate text. Of those that did, the duplications were not delineated by any unique character (There was a single space in front of each duplication.), and the duplicated text was often an incomplete duplication so the length was not consistently symmetrical.
The "20" is an arbitrary number of characters I picked for excel to use from the front of the text to identify where the text started to repeat. There are enough people here who know excel better than I who can explain what the rest of the formula does. I figured it out by poking around.

Resources