I have a .tsv file with some fields being ranges like 1 - 4. I want to read these fields as they are textually written. However, upon file opening excel converts automatically those range fields to dates. For instance 1 - 4 is converted to 4-Jan. If I try to format back the cell to another type, the value is already changed and I can only get a useless number (39816). Even if the range fields are within double quotes, the wrong conversion to date still takes place. How to avoid this behavior?
I think you best use the import facility in excel but you may have to manually change the file extension to a csv.
When importing be sure to select text for all the columns with these values.
My question is in fact a duplicate of at least:
1) Stop Excel from automatically converting certain text values to dates
2) Excel: Default to TEXT rather than GENERAL when opening a .csv file
The possible solutions for Excel are to 1) either writing the fields with special double quotes like "May 16, 2011" as "=""May 16, 2011""" or 2) importing the csv/tsv file with the external data wizard and then selecting manually which columns you want to read as TEXT and not GENERAL (which could convert fields to dates)
As for my use case, I was only using Excel to remove some columns. None of the solutions was appealing to me because I wouldn't like to rewrite the tsv files with special quotes and because I had hundreds of columns and I didn't want to select each manually to be read as TEXT.
Therefore I wrote a scala script to filter tsv files by column names:
package com.jmcejuela.ml
import java.io.InputStream
import java.io.Writer
import scala.io.Codec
import scala.io.Source
import Table._
/**
* Class to represent tables with a fixed size of columns. All rows have the same columns.
*/
class Table(val rows: Seq[Row]) {
lazy val numDiffColumns = rows.foldLeft(Set[Int]())((set, row) => set + row.size)
def toTSV(out: Writer) {
if (rows.isEmpty) out.write(TableEmpty.toString)
else {
out.write(writeLineTSV(rows.head.map(_.name))) //header
rows.foreach(r => out.write(writeLineTSV(r.map(_.value))))
out.close
}
}
/**
* Get a Table with only the given columns.
*/
def filterColumnsByName(columnNames: Set[String]): Table = {
val existingNames = rows.head.map(_.name).toSet
assert(columnNames.forall(n => existingNames.contains(n)), "You want to include column names that do not exist")
new Table(rows.map { row => row.filter(col => columnNames.contains(col.name)) })
}
}
object TableEmpty extends Table(Seq.empty) {
override def toString = "Table(Empty)"
}
object Table {
def apply(rows: Row*) = new Table(rows)
type Row = Array[Column]
/**
* Column representation. Note that each column has a name and a value. Since the class Table
* is a sequence of rows which are a size-fixed array of columns, the name field is redundant
* for Table. However, this column representation could be used in the future to support
* schemata-less tables.
*/
case class Column(name: String, value: String)
private def parseLineTSV(line: String) = line.split("\t")
private def writeLineTSV(line: Seq[String]) = line.mkString("", "\t", "\n")
/**
* It is assumed that the first row gives the names to the columns
*/
def fromTSV(in: InputStream)(implicit encoding: Codec = Codec.UTF8): Table = {
val linesIt = Source.fromInputStream(in).getLines
if (linesIt.isEmpty) TableEmpty
else {
val columnNames = parseLineTSV(linesIt.next)
val padding = {
//add padding of empty columns-fields to lines that do not include last fields because they are empty
def infinite[A](x: A): Stream[A] = x #:: infinite(x)
infinite("")
}
val rows = linesIt.map { line =>
((0 until columnNames.size).zip(parseLineTSV(line) ++: padding).map { case (index, field) => Column(columnNames(index), field) }).toArray
}.toStream
new Table(rows)
}
}
}
Write 01-04 instead of 1-4 in excel..
I had a "text" formatted cell in excel being populated with a chemical casn with the value "8013-07-8" that was being reformatted into a date format. To remedy the problem, I concatenated a single quote to the beginning of the value and it rendered correctly when viewing the results. When you click on the cell, you see the prefixed single-quote, but at least I stopped seeing it as a date.
In my case, When I typed 5-14 in my D2 excel cell, is coverts to date 14 May. With a help from somebody , I was able to change the date format to the number range (5-14) using the following approach and wanted to share it with you. (I will use my case an example).
Using cell format in excel, I converted the date format in D2 (14 May) to number first ( in my case it gave me 43599).
then used the formula below ,in excel, to convert it 5-14.
=IF (EXACT (D2, 43599), "5-14", D2).
Related
I have a column with dates stored that need to be cleared if they match a variable.
I've tried a ton of different ways, but this is my most recent attempt:
let dateRange = selectedTable
.getColumnByName("Date")
.getRangeBetweenHeaderAndTotal()
.getTexts();
let date: string = "12/2/2022"
console.log(dateRange);
dateRange.forEach(dates => {
if (dates === date){
ExcelScript.ClearApplyTo.contents
}
})
This one won't work as 'dates' is an array and can't be compared to the 'date' variable as far as I can tell.
I think there are two issues you might be running into here:
First: getTexts() returns a 2D array to preserve the row/column structure of the grid. So even though dateRange is a column, it's still a 2D array - something like [['value1'], ['value2'], ...]. You can get a single cell with the expression dateRange[rowIndex][0].
Second: ExcelScript.ClearApplyTo.contents is simply an enum member and does not do anything on its own. To clear the contents of a specific cell/range, you need to call the clear() method on the corresponding Range object.
Putting this together, you get the following script (assuming you've defined selectedTable elsewhere):
let dateRange = selectedTable
.getColumnByName("Date")
.getRangeBetweenHeaderAndTotal();
let texts = dateRange.getTexts();
let date: string = "12/2/2022"
texts.forEach((text, row) => {
if (text[0] === date) {
dateRange.getCell(row, 0).clear();
}
})
Additionally, as pointed out in the comments, you should be careful about date formatting. Since you're comparing strings, this script will fail to clear cells that contain 12/02/2022, December 2, 2022, etc. even though the underlying date is the same.
Hopefully that helps!
I posted question previously as "using “.between” for string values not working in python" and I was not clear enough, but I could not edit, so I am reposting with clarity here.
I have a Data Frame. In [0,61] I have string. In [0,69] I have a string. I want to slice all the data in cells [0,62:68] between these two and merge them, and paste the result into [1,61]. Subsequently, [0,62:68] will be blank, but that is not important.
However, I have several hundred documents, and I want to write a script that executes on all of them. The strings in [0,61] and [0,69] are always present in all the documents, but along different locations in that column. So I tried using:
For_Paste = df[0][df[0].between('DESCRIPTION OF WORK / STATEMENT OF WORK', 'ADDITIONAL REQUIREMENTS / SUPPORTING DOCUMENTATION', inclusive = False)]
But the output I get is: Series([], Name: 0, dtype: object)
I was expecting a list or array with the desired data that I could merge and paste. Thanks.
enter image description here
If you want to select the rows between two indices (say idx_start and idx_end), excluding these two rows) on column col of the dataframe df, you will want to use
df.loc[idx_start + 1 : idx_end, col]
To find the first index matching a string s, use
idx = df.index[df[col] == s][0]
So for your case, to return a Series of the rows between these two indices, try the following:
start_string = 'DESCRIPTION OF WORK / STATEMENT OF WORK'
end_string = 'ADDITIONAL REQUIREMENTS / SUPPORTING DOCUMENTATION'
idx_start = df.index[df[0] == start_string][0]
idx_end = df.index[df[0] == end_string][0]
For_Paste = df.loc[idx_start + 1 : idx_end, 0]
I have seen some questions (like this one) here asking about if a cell in Excel can be formatted by NPOI/POI as if formatted by Excel. As most of you, I have to deal with issues with Currency and DateTime. Here let me ask how the formatting can be achieved as if it has been formatted by Excel? (I will answer this question myself as to demonstrate how to do it.)
Setting: Windows 10, English, Region: Taiwan
Excel format: XLSX (version 2007 and later)
(Sorry about various edit of this question as I have pressed the 'Enter' button at unexpected time.)
If you format a cell as Currency, you have 4 choices:
The internal format of each style is as follow:
-NT$1,234.10
<numFmt formatCode=""NT$"#,##0.00" numFmtId="164"/>
[RED]NT$1,234.10
<numFmt formatCode=""NT$"#,##0.00;[Red]"NT$"#,##0.00" numFmtId="164"/>
-NT$1,234.10
<numFmt formatCode=""NT$"#,##0.00_);("NT$"#,##0.00)" numFmtId="7"/>
[RED]-NT$1,234.10
<numFmt formatCode=""NT$"#,##0.00_);[Red]("NT$"#,##0.00)" numFmtId="8"/>
Note: There is a pair of double quote (") comes before and after NT$.
(To get internal format of XLSX, just unzip it. The Style information is available in <unzip dir>\xl\Styles.xml Check out this answer if you need more information.)
(FYI: In formatCode, the '0' represent a digit. The '#' also represent a digit, but will not appear if the number is not large enough. So any number less than 1000 will not have the comma inside it. The '_' is a space holder. In format 3, '1.75' appears as 'NT$1.75 '. The last one is a space.)
(FYI: In numFmtId, for case 1 and case 2, number 164 is for user-defined. For case 3 and 4, number 7 and 8 are build-in style.)
For developers using POI/NPOI, you may find out if you format your currency column using Build In Format using 0x7 or 0x8, you can get only the third or fourth choice. You cannot get the first or second choice.
To get the first choice, you build upon style 0x7 "$#,##0.00);($#,##0.00)". You need to add the currency symbol and the pair of double quotes in front of it.
styleCurrency.DataFormat = workbook.CreateDataFormat().GetFormat("\"NT$\"#,##0.00");
Apply this format to a cell with number. Once you open the Excel result file, right click to check formatting, you will see the first choice.
Please feel free to comment on this post.
var cell5 = row.CreateCell(5, CellType.Numeric);
cell5.SetCellValue(item.OrderTotal);
var styleCurrency = workbook.CreateCellStyle();
styleCurrency.DataFormat= workbook.CreateDataFormat().GetFormat(string.Format("\"{0}\"#,##0.00", item.CurrencySymbol));//styleCurrency;
cell5.CellStyle = styleCurrency;
styleCurrency = null;
Iterate over loop for multiple currency.
Function to GetCurrencySymbol against currency Code on C#
private string GetCurencySymbol(string isOcurrencyCode)
{
return CultureInfo.GetCultures(CultureTypes.AllCultures).Where(c => !c.IsNeutralCulture)
.Select(culture =>
{
try
{
return new RegionInfo(culture.LCID);
}
catch
{
return null;
}
})
.Where(ri => ri != null && ri.ISOCurrencySymbol == isOcurrencyCode)
.Select(ri => ri.CurrencySymbol).FirstOrDefault();}
I have number stored in my database as Strings, and I would like to sort them numerically using Grails sortableColumns. Is there anyway to do this ?
Storing numbers as formatted strings prevents them from taking advantage of native numerical sorting. Have a look at the Grails formatNumber tag that can use your desired locale to display decimal separators so you can use the actual numeric data and not have to store the formatted string for display purposes.
If the domain class you want to sort has both the formatted and unformatted numeric data, you could try something like this, substituting the sort column param as necessary:
def list = {
if (params?.sort == 'formattedNumber') {
params.sort = 'rawNumber'
}
[ records : Record.list(params) ]
}
If your domain class only has the formatted string you can try parsing it to BigDecimal (or whatever the matching numeric type is) but this may not work properly if your server's locale does not match the string format's decimal separator locale.
def list = {
def records = (params?.sort == 'formattedNumber') Record.list().sort{ it.formattedNumber.toBigDecimal() : Record.list(params)
}
[ records : records ]
}
Current code:
row.column.each(){column ->
println column.attributes()['name']
println column.value()
}
Column is a Node that has a single attribute and a single value. I am parsing an xml to input create insert statements into access. Is there a Groovy way to create the following structured statement:
Insert INTO tablename (col1, col2, col3) VALUES (1,2,3)
I am currently storing the attribute and value to separate arrays then popping them into the correct order.
I think it can be a lot easier in groovy than the currently accepted answer. The collect and join methods are built for this kind of thing. Join automatically takes care of concatenation and also does not put the trailing comma on the string
def names = row.column.collect { it.attributes()['name'] }.join(",")
def values = row.column.collect { it.values() }.join(",")
def result = "INSERT INTO tablename($names) VALUES($values)"
You could just use two StringBuilders. Something like this, which is rough and untested:
def columns = new StringBuilder("Insert INTO tablename(")
def values = new StringBuilder("VALUES (")
row.column.each() { column ->
columns.append(column.attributes()['name'])
columns.append(", ")
values.append(column.value())
values.append(", ")
}
// chop off the trailing commas, add the closing parens
columns = columns.substring(0, columns.length() - 2)
columns.append(") ")
values = values.substring(0, values.length() - 2)
values.append(")")
columns.append(values)
def result = columns.toString()
You can find all sorts of Groovy string manipulation operators here.