Web scraping in VBA and Excel - excel

I'm looking to download data tables from websites into Excel and pull specific pieces of data into a separate worksheet to create a database. I'm having trouble parsing numbers that are imported into one cell in Excel. For example, the numbers "-7 -110" written exactly like that on the website are inputted into one cell in Excel as "=-7-110" yielding the cell contents "-117". When the cell is highlighted, the original =-7-110 displays in the formula bar but I do not know how to either a) input the data as-is and turn-off the autoformatting by excel or b) grab the specific cell contents without the formula being formed from the cell.
Any help is appreciated - thanks!
Jason

Microsoft offers three methods to counter-act built-in number formatting:
Avoiding Automatic Number Formatting
If you want to type a value such as
10e5, 1 p, or 1-2, and you do not want
the value to be converted to a
built-in number format, type the
number as a text value. To type a
number as a text value, use any of the
appropriate methods below.
Method 1
Place a space at the beginning of the entry.
NOTE: This method does not work if the entry resembles a number formatted in scientific notation. For example, typing 1e9 results in a scientific number.
Method 2
Select a range of cells, and then click Cells on the Format menu.
Click the Number tab.
Click Text, and then click OK.
This method allows you to type data in the selected cells as text. You must perform these steps before you type the numbers in the cells.
Method 3
Precede the entry with an apostrophe.
For example, type the following: '1 p

Adam has covered the options if you're making changes within the workbook.
Of course, you can also prepare the area into which the data is being imported directly from VBA using the
.NumberFormat
property.
To set the NumberFormat property of a range or cell/s to "Text" you can use
.NumberFormat="#"
For example
Range("C:C").NumberFormat = "#"
will convert all cells in column C to "Text" format.
As far as retrieving the "-7-110" if it's entered into a cell with general format (and is displaying the result of the formula i.e -117), you can access the
.Formula
property of the cell.
For example, if the above formula was entered into cell C1, the VBA code below
Range("C1").Formula
or equivalently
Cells(1, 3).Formula
would return the string "=-7-110".

Related

Retroactively stop Excel from interpreting cell values as dates

I have a series of spreadsheets filled by farmers that list the composition of NPK fertiliser applied to their fields.
In some of these something happened with the cell format (that was originally set to text) and now the fertiliser composition is read by Excel as a date.
Example of text entered in the cell (N:P:K):
25:00:14
Excel now reads this as:
01/01/1900 01:00:14
When looking at the cell, I see the correct value (25:00:14). However, when importing the data into Python, or when copying the value to a different cell, the date (01/01/1900 01:00:14) gets exported instead.
I have tried: changing the cell type to text, copying the contents of the cell into another cell set as text, replacing ':' with a different special character. Nothing along these lines seems to work.
I have a few hundreds of these entries, so any ideas on how to avoid having to re-enter them manually would be greatly appreciated!

Excel paste special using values only, also copies the data type of the value along with the value into the destination cell

I've been trying to understand excel cells more and specifically their data types. If anyone is interested in the detail my investigation is in the numbered points below. My conclusions are labelled A to D. I'm really interested in whether anyone has anything to add.
A. Each excel cell has a property that defines the expected "data type" of the data that it will store. The data value stored in a cell also has a "data type" property, that does not have to be the same the the cell "data type" property. (Data types are General, Text or Date). (The Cell's data type is not the same as the cells formating set using "Format Cell">"Number tab" "format category" but it is related
B. When data is entered into a cell, the data type given to the the data value is inherited from the cell's data type (when the data value can be converted into the cell data type). If the data contains a ' character at the start, it causes the text data type to be allocated as the value's data type, regardless of the cells data type.
C. When you use excel copy>Paste(Values Only) to copy the data value from one cell to another, the data type is also copied. (This is a bit nuts as there is no override to this in paste special. You could kind of do with a PasteSpecial>Raw option)
Note: If you paste data in from a text document (eg 01234), using the default paste will cause the text 0123 to be converted into a number. Using Value Only paste, the 01234 is pasted a a value with data type of text - if th cell's data type is text.
D. Entering =SUM(1,2) into a cell with a text data type causes the formula to be displayed and not calculated. ie it is treated as text data.
Note that this VBA function can be used to convert the data type of a number value stored in a cell to have the text data type with the same characters. Simply reference the cell you want to convert with the formula. Yu can use copy>paste special(Values Only) to move the converted value into a cell so it's then stored as a value.
Public Function ConvertValueToHaveTextDataType(Avalue As Variant) As String
ConvertValueToHaveTextDataType = CStr(Avalue)
End Function
Here's why I came to these conclusions.
I've been experimenting and found:
Loading data into worksheet cells using QueryTables.TextFileColumnDataTypes has a parameter TextFileColumnDataTypes which can be assigned xlTextFormat, xlGeneralFormat and some date formats. There is no numeric option. So using the xlGeneralFormat to load numeric data seems to be "how it's done".
Based on this I assume each Excel cell has a "data type" (ie General, text or date).
"Format Cells" will change how data is displayed, but also changes how they behave when data is being input ie changing the cell format can also change a cells "data type". When a cell format is TEXT, entering "01234" will cause entered data to be stored as text and the leading "0" is kept. When a cell has GENERAL format entering "0123" will cause it to be stored as the number 123 (ie note the 0 is removed).
Consider a cell with TEXT format and "0123" stored as a text value. Changing the format of the cell to GENERAL or NUMBER will not change the characters stored in the cell, but will add a green tag to the cell that provides a "popup warning" that informs you "Number Stored as Text". (Paste special,multiply can be used to convert these to a number)
This is where it gets weird.
If you edit the text value "0123" stored in a "General format cell" on pressing the ENTER key the "0" is removed and it become 123, which is what you would expect. However if you, copy the "0123" cell and paste it using the "VALUE ONLY" option, the data is pasted as "0123", the same thing happens if you paste this into a number Format cell. So when you use excel copy>Paste(Values Only) to copy the data value, the data type is also copied. (This is nuts!).
If you paste 123.12 stored as number from a numeric or general format cell, into a text format cell the value remains as 123.12 and the cell type remains as text, no popup warning is shown, I'm not sure what the data type of the value is. However if you edit the cell and press RETURN to enter 123.12, you do get a warning tag and popup saying "number stored as text"
As a further complication if you want to enter a number 0123 into a cell with NUMERIC format, and have it stored a text "0123" you can enter it as '0123 the ' indicates that excel should store the value with a data type of text.
This is useful.
Note that if you were to use LEFT({acell stroing '0123},1) it would return "0". Which is ok. Also note that, if you copy and paste (values only) a numeric format cell storing '0123 into a general format cell, the ' is no longer displayed for the cell and you get a green popup warning "number stored as text". Which is Ok, but worth being aware of.
The formula =TEXT("123","0###") creates a value with a text data type. If you copy and paste the result, it becomes a data value with the text data type.
The formula T is useful as it can be used to find whether the data type of a value stored in a cell is TEXT or not. =T(0123) = "" =T("0123") = 0123
If you reference a cell in a formula that adds 1 to the cell value, the cell data type and the data type of the value make no difference to the calculation. The data value is converts to be a number a necessary, implicitly.
As you would expect, when pasting data into cells, if you choose to match the destination formatting and the cells you are pasting the data into uses the TEXT format, any data with leading "0"'s will have the 0's kept.
Did anyone read this far?
If you want to show preceding zeroes on a number, providing you have a fixed number of digits before the decimal point you can Custom format the cells as e.g. '0000.00'.
This example will always show four digits before the decimal point and two afterwards; it has its limits but is very useful for e.g. product codes. if you standardise on 7-digit codes, set the custom format as '0000000' and '0001234' will stay correct (not '1234') and still be recognised as a number.

Formatting cells in excel

I have excel sheets with thousands rows and columns of numeric data, and need to do some calculations on this data. But in few files there is a cell or two which have their format as text even when they contain a number. The data is so huge that it is not possible to check each and every cell for the format. So is there a way I could rectify these errors?
If you are using Excell 2007 or later, use search and replace.
Leave Find What and Replace with blank, select Text in the find what format, and General (or other numeric format) in the replace with format.
Run Replace All and its done!
Use ASAP utilities www.asap-utilities.com/
They have macro to turn text into numbers
You could also use the value function
=value(A1)
Which converts text to numbers
I would start by selecting the particular row/column headers you wish to have formatted as numbers. Then, just choose "Number" from the drop down menu that allows you to select the cell type. This will apply the format the all value cells in that particular row/column.
This is of course assuming that all cells read as text will be valid numbers. Otherwise, you'll have to employ additional functionality.
Or, copy the values to a new worksheet, but before it, you need format all cells to number format and use the "Paste Special" option, choosing the "Values" dialog options.
If you don't have any formulas, just values, you could run this code
Sub MakeValues()
ActiveSheet.UsedRange.NumberFormat = "General"
ActiveSheet.UsedRange.Value = ActiveSheet.UsedRange.Value
End Sub

Getting formula of another cell in target cell

How does one cell obtain the formula of another cell as text without using VBA? I can see this question has already been asked many times and the answer is always to write a custom function in VBA.
However, I found a post made in 2006 which claimed to have found the non-VBA solution but the link provided in that post is already broken.
=FormulaText(Reference) will do the trick Documentation
There is nice way of doing this without VBA. It uses XL4 macros (these are macros, but it is not VBA, as asked).
With reference to the figure 1, cells A2:A4 contain usual formulas.
Going to Formulas -> Define Name, I defined two named ranges (see fig. 2), with the information shown in cells A6:B8.
Enter in cell B2 =FormulaAsText. This will retrieve the formula in cell A2 as text.
Explanation:
The named range FormulaAsText uses =GET.CELL(info_type,reference). In this case, ìnfo_type = 6 retrieves the formula, and reference = OFFSET(INDIRECT("RC",FALSE),0,-1) uses the cell with 0 rows and -1 columns offset from the one the formula is used in.
Copy B2 and paste into B3:B4. This will show formulas in A3:A4. Cell A4 shows that the worksheet function CELL only retrieves values, not formulas (as opposed to GET.CELL).
Since FormulaAsText gets the formula from a cell at fixed offset (0,-1) from the current, I defined another range FormulaAsText2, which uses an offset (rows,cols) read from the worksheet itself. Cells D2:D4 contain =FormulaAsText2. Thus, cell D2 shows the contents of cell B3 (=OffSET(D2,1,-2)), which is FormulaAsText. cells D3:D4 show the contents of themselves. This adds some flexibility. YMMV.
PS1: The essence was taken from
http://www.mrexcel.com/forum/excel-questions/20611-info-only-get-cell-arguments.html
PS2: Tim Williams mentioned in a comment "the old XLM GET.FORMULA()". This answer is possibly related (not the same, since this one uses GET.CELL()).
PS3: A simple VBA solution is given, e.g., in
http://dmcritchie.mvps.org/excel/formula.htm
EDIT: Complementing this nice answer, the worksheet function FormulaText is available for Excel 2013 and later.
This suggestion may be helpful for those who after retrieving a block of formulas and transporting them to a new spreadsheet want to put them to work again. Excels FORMULATEXT function is great for picking up formulas but it leaves them as unusable text strings. If you want to get them back as fully functioning formulas you have to edit each one individually to remove the string character, but here is a shortcut for larger blocks.
Get to the position where you have the required formulas as text (in other words after using FORMULATEXT - you have done a copy and (value only) paste). The next step involves highlighting all the cells you want to convert and then navigating to the [Text-To-Columns] menu option ({Data} bar on Excel 2016). You can select 'Delimited' but on the next screen just make sure you de-select any marks that do appear in your formulas. Then 'Finish'. Excel should automatically analyse the cells as containing formulas and you should now have them working again.
There is a way to do this. In my example I had a table that showed a date. The date comes from Sheet!G91. In my table I also had a column that showed the sheet name. I added two more columns to my table. The first column had column(Sheet!g91), which returns the number 7, because G is the seventh letter in the alphabet. I then converted the number to a letter (G) using another table in my workbook. In the second column that I added, I made a formula row(Sheet!G91), which returns the number 91. Note: Row and Column may appear as volatile formulas, which recalculate with every calculation of the workbook.
I wanted another column to show the formula contents of the date cell mentioned at the beginning of this post. I included the following string function (you can also use CONCATENATE).
"=" & AJ9 & "!" & AM9 & AN9
The items separated by ampersands get strung together (that is, concatenated). AJ9 in my example contains the sheet name, AM9 contains the column letter, and AN9 contains the row number.
I now have a column that dynamically updates its contents to reflect the sheet name and cell reference. The results in my workbook cell are
=Sheet!G91.
You can't. This is most likely a design choice to eliminate an average Excel user from accidentally getting something they did not want.
What you are reading is correct - writing a UDF is the solution you want.

Force Excel 2007 to treat all values in a column as text

I have a large column of data in Excel. This data should all be treated as text, but in some cells Excel is "magically" changing the data to numeric. This is screwing up my vlookpup() functions in another part of the spreadsheet, and I need to override Excel's automatic data type detection.
If I manually go through the cells, and append ' to each numeric cell, it works. I just don't want to do this by hand for several thousand cells.
For example, this works:
Manually type '209
And this does not work:
Manually type 209, right click and format as text.
If changing the format of the column is not an option, it's helpful sometimes to create another column that's 'vlookup friendly' and leave your main column alone.
This is a trick I've used a few times:
Say your 'mixed' column is column A.
In column B, enter the formula:
=CONCATENATE(A1)
or as Jean-François pointed out in a comment, the shorter version:
=A1 & ""
And drag it down for to the bottom row.
Column B will be all strings. The VLookup can then use column B.
Under the Data Tab, open the Text to Columns wizard and set the Column data format to Text. The destination cell can be set to the original data cell, but it will replace the original formatting with text formatting. Other aspects of formatting e.g. Bold, color, font, etc. are not changed using this method.
Setting the cells to "Text" format, as Jean mentioned, should work. The easiest way to do this, in any version of Excel, is:
Right-click cell, "Format Cells", "Number" tab, select "Text" format.
Have you tried setting the cells' number format to "Text"?
In Excel 2003: Format > Cells... > Number > Category: Text.
I don't have the more recent Excel versions, but it has to be something similar.
I tried all the above but didn't work. And then added an apostrophe before the number. Only then it changed to text from the exponential notation.
If you already have your data and manually adding a quote in front of your data in each cell is not an option you can use a helper column and write
="'"&A1
in B1, where A1 is the reference to your cell, and drag down the formula in B1 to the bottom. At this point you will see the quote, and you need to paste data in column B as values (CTRL+C & ALT+E+S and select values, or paste special as values from the top menu). Then find+replace a '(quote) with a '(quote) and you will have a column with values forced to text and a quote in front of each numeral representation of the number.
Updated for Office 365 / Excel 365:
CONCATENATE is being deprecated and replaced by CONCAT.
This method still works, i.e. I need 7E10 to appear as 7E10 and not 7.00E+10
Microsoft documentation source here.

Resources