Selecting output lines in chunk output by modifying the default output hook

Selecting output lines in chunk output by modifying the default output hook - hook

The knitr book, p. 118, \S 12.3.5, has an example of how to suppress long output by modifying
the output chunk hook, but it isn't at all general because it does it globally for all chunks.
I've tried to generalize that, to allow a chunk option, output.lines, which, if NULL, has no
effect, but otherwise selects and prints only the first output.lines lines. However, this version
seems to have no effect when I try it, and I can't figure out how to tell why.
More generally, I think this is useful enough to be included in knitr, and would be better if one
could specify a range of lines, e.g., output.lines=3:15, as is possible with echo=.
# get the default output hook
hook_output <- knit_hooks$get("output")
knit_hooks$set(output = function(x, options) {
lines <- options$output.lines
if (is.null(lines)) {
hook_output(x, options) # pass to default hook
}
else {
x <- unlist(stringr::str_split(x, "\n"))
if (length(x) > lines) {
# truncate the output, but add ....
x <- c(head(x, lines), "...\n")
}
# paste these lines together
x <- paste(x, collapse = "\n")
hook_output(x, options)
}
})
Test case:
<<print-painters, output.lines=8>>=
library(MASS)
painters
#

Actually, this solution does work. My actual test example was flawed. Maybe others will find this helpful.

Related

Is there a way to extend the lines of context for comparing strings in Jest?

I have a NodeJS test in Jest which compares very long strings with lots of lines. In my real example this results from pretty-printed JSON but the simple example below illustrates the problem:
describe('Stuff', () => {
it('should make it clear where the diff is but does not', () => {
const str1 = 'Hellox\nx\nx\nx\nx\nx\nx\nx\nx\nx\nx\nx\nworldx\nx\nx\nx\nx\nx\nx\nx\nx\nx\n'
const str2 = 'Hellox\nx\nx\nx\nx\nx\n123\nx\nx\nx\nx\nx\nx\nworldx\nx\nx\nx\nx\nx\nx\nx\nx\nx\n'
expect(str1).toEqual(str2)
})
})
When I run it, I get this:
● Stuff › should make it clear where the diff is but does not
expect(received).toEqual(expected) // deep equality
- Expected - 1
+ Received + 0
## -2,11 +2,10 ##
x
x
x
x
x
- 123
x
x
x
x
x
There's so many similar or identical lines either side of the diff that it's very difficult to figure out where the problem is. Is there a way I can control or expand the context or is it hard-coded as a 5-before, 5-after rule?

Not cleanly. There are options in jest-diff to set this:
https://github.com/facebook/jest/tree/main/packages/jest-diff#options
However, at the time of writing, these are not passed through from the jest config - there's an open feature request for that:
https://github.com/facebook/jest/issues/12576
Hence, there's no clean way to configure this. It is possible to simply edit the value in node_modules/jest-diff/build/normalizeDiffOptions.js which works but is nasty for obvious reasons.

Reading an OBJ file with Haskell

I'm trying to use the wavefront-obj package to read an OBJ file. Here is an example of OBJ file.
After downloading this file, I do
import Data.WaveFrontObj
x <- loadWavefrontObj "pinecone.obj"
Then:
> :t x
x :: Either String WavefrontModel
import Data.Either.Extra
y = fromRight' x
Then:
> :t y
y :: WavefrontModel
> y
WavefrontModel []
Looks like the result is empty. What am I doing bad ?

Looks like your OBJ file has some directives that wavefront-obj doesn't recognize. You can see in the source that wavefront-obj only understands the #, v, vt, vn, and f directives. Your file kicks off with mtllib and o directives, and appears to have several others not in the supported list.
A priori, I would therefore expect a Left result instead of a Right as you're getting. But the wavefront-obj author fell into a common parser-combinator pitfall: their top-level parser does not end with eof. So it sees the first two comment lines, then none of its parsers match the next line but it doesn't mind not being at the end of the file, so it reports successfully parsing an empty list of directives.
Between this and a few other things I noticed while sourcediving (comments are almost certainly not treated correctly, failure to exploit the predictable structure of directives and therefore code duplication), I expect you're going to have to do quite a bit of work if you want this package to work reliably and correctly.

assign global variable in R function using tm: bug on linux?

Using the package tm in R, I want to transform a corpus with a pretty complicated function, and I need some side effects for storing pertinent information. Since content_transformer requires a specific function format, the easy way is to use <<- in my function. The problem occurs with the code below:
library(tm)
a <- 4
n <- 2
corp<-VCorpus(VectorSource(rep("fish",n)))
(func<-content_transformer(
function(x) {
a <<- 42
return(x)
}))
corp<-tm_map(corp,func)
print(a)
It prints the wrong answer, i.e. 4. But with n=1, it prints the right one. So I assume it is the multi-threading that tm does that fails to behave as scalar R. I guess it's a bug since on windows, it works. (Note: I use R 3.1.2 on linux and R 3.1.1 on windows).
Questions: is it a bug? if yes, a known bug? Is there an easy solution that does not require to refactor the code?
Thanks!
edit: additional example using assing
rm(list=ls())
library(tm)
env <- new.env()
a <- 1
n <- 2
corp<-VCorpus(VectorSource(rep("fish",n)))
(func<-content_transformer(
function(x,e) {
assign("a", 42, envir=e)
print(e)
print(ls.str(e))
return(x)
}))
corp<-tm_map(corp,func,env)
print(env)
print(ls.str(env))

In fact this works accidentally under Windows because mclapply is not defined under windows and it is just a call to to an lapply.
Indeed, when you are calling tm_map , you are using this function :
tm:::tm_map.VCorpus
function (x, FUN, ..., lazy = FALSE)
{
if (lazy) {
fun <- function(x) FUN(x, ...)
if (is.null(x$lazy))
x$lazy <- list(index = rep(TRUE, length(x)), maps = list(fun))
else x$lazy$maps <- c(x$lazy$maps, list(fun))
}
else x$content <- mclapply(content(x), FUN, ...) ## this the important line
x
}
So you can reproduce the "odd/normal" behavior by calling mclapply:
library(parallel)
res <- mclapply(1:2, function(x){a<<- 20;x})
a
[1] 4
a is unchanged and is still equal to 4. This is the normal paralel behavior since we avoid to have side effect. Under windows mcapply is just a call to lapply so the value of the global variable is correctly changed.
pseudo solution
Here better to use lapply if you want to the global side effect, but you can emulate the global variable in a reading specially if you add a as a second argument your function...
func <-
function(x,a) {
a <- 42 ## use a here
x$a <- a ## assign it to the x environment
return(x) ## but the new value of a can not be used by others documents..
}
(Func<-content_transformer(func))
res <- tm_map(corp,Func,a=4)

R coding string

I have a list of 96 files that I would like to open and perform some functions on the data. I am VERY new to R, and am unsure how to manipulate strings to open the sequential file names. Here is my code below, which clearly does not work:
for (N in (1:96)){
if (N > 10)
TrackID <- "000$N"
}
else{
TrackID <- "00$N"
}
fname_in <- 'input/intersections_track_calibrated_jma_from1951_$TrackID.csv'
fname_out <- 'output/tracks_crossing_regional_polygon_$TrackID.csv'
......data manipulations.....
}
So basically I just need to be able to, for instance, when N=1, reference a file called intersections_track_calibrated_jma_from1951_0001.csv.
Thanks in advance!
Kimberly

I think that what you are looking for is the sprintf() function...
sprintf will save you from having the tests on n, to known how many leading zeros are needed.
Combined with the paste() or paste0 function, producing the desired file name becomes a one-liner.
Indeed it would be possibly to just use the sprintf() function alone, as in
sprintf("intersections_track_calibrated_jma_from1951_%04d.csv", n) but having a function to produce the file names and/or the "TrakID" may allow to hide all these file naming convention details away.
Below, see sprintf() and paste0() in action, in the context of a convenience function created to produce the filename given a number n.
> GetFileName <- function(n)
paste0("intersections_track_calibrated_jma_from1951_",
sprintf("%04d", n),
".csv")
> GetFileName(1)
[1] "intersections_track_calibrated_jma_from1951_0001.csv"
> GetFileName(13)
[1] "intersections_track_calibrated_jma_from1951_0013.csv"
> GetFileName(321)
[1] "intersections_track_calibrated_jma_from1951_0321.csv"
>
Of course, you could make the GetFileName function more versatile by adding parameters, some of which with a default value. In that fashion it could be used for both input and output file name (or any other file prefix/extension). For example:
GetFileName <- function(n, prefix=NA, ext="csv") {
if (is.na(prefix)) {
prefix <- "intersections_track_calibrated_jma_from1951_"
}
paste0(prefix, sprintf("%04d", n), ".", ext)
}
> GetFileName(12)
[1] "intersections_track_calibrated_jma_from1951_0012.csv"
> GetFileName(12, "output/tracks_crossing_regional_polygon_", "txt")
[1] "output/tracks_crossing_regional_polygon_0012.txt"
> GetFileName(12, "output/tracks_crossing_regional_polygon_")
[1] "output/tracks_crossing_regional_polygon_0012.csv"
>

Try using paste and paste0 to generate strings like this instead.
for (N in (1:96)){
if (N > 10)
TrackID <- paste0("000",N)
}
else{
TrackID <- paste0("00",N)
}
fname_in <- paste0('input/intersections_track_calibrated_jma_from1951_',
TrackID.'.csv')
fname_out <- paste0('output/tracks_crossing_regional_polygon_',
TrackID,'.csv')
......data manipulations.....
}
paste0 just saves you from writing sep="" if you don't require a seperator (as in your case)

Ungroupable line break using wl-pprint

I'm writing a pretty-printer for a simple white-space sensitive language.
I like the Leijen pretty-printer library more than I like the Wadler library, but the Leijen library has one problem in my domain: any line break I insert may be overridden by the group construct, which may compress any line, which might change the semantics of the output.
I don't think I can implement an ungroupable line in the wl-pprint (although I'd love to be wrong).
Looking a bit at the wl-pprint-extras package, I don't think that even the exposed internal interface allows me to create a line which will not be squashed by group.
Do I just have to rely on the fact that I never use group, or do I have some better option?

Given that you want to be able to group and you also need to be able to ensure some lines aren't uninserted,
why don't we use the fact that the library designers encoded the semantics in the data type,
instead of in code. This fabulous decision makes it eminently re-engineerable.
The Doc data type encodes a line break using the constructor Line :: Bool -> Doc.
The Bool represents whether to omit a space when removing a line. (Lines indent when they're there.)
Let's replace the Bool:
data LineBehaviour = OmitSpace | AddSpace | Keep
data Doc = ...
...
Line !LineBehaviour -- not Bool any more
The beautiful thing about the semantics-as-data design is that if we replace
this Bool data with LineBehaviour data, functions that didn't use it but
passed it on unchanged don't need editing. Functions that look inside at what
the Bool is break with the change - we'll rewrite exactly the parts of the code
that need changing to support the new semantics by changing the data type where
the old semantics resided. The program won't compile until we've made all the
changes we should, while we won't need to touch a line of code that doesn't
depend on line break semantics. Hooray!
For example, renderPretty uses the Line constructor, but in the pattern Line _,
so we can leave that alone.
First, we need to replace Line True with Line OmitSpace, and Line False with Line AddSpace,
line = Line AddSpace
linebreak = Line OmitSpace
but perhaps we should add our own
hardline :: Doc
hardline = Line Keep
and we could perhaps do with a binary operator that uses it
infixr 5 <->
(<->) :: Doc -> Doc -> Doc
x <-> y = x <> hardline <> y
and the equvalent of the vertical seperator, which I can't think of a better name than very vertical separator:
vvsep,vvcat :: [Doc] -> Doc
vvsep = fold (<->)
vvcat = fold (<->)
The actual removing of lines happens in the group function. Everything can stay the same except:
flatten (Line break) = if break then Empty else Text 1 " "
should be changed to
flatten (Line OmitSpace) = Empty
flatten (Line AddSpace) = Text 1 " "
flatten (Line Keep) = Line Keep
That's it: I can't find anything else to change!

You do need to avoid group, yes. The library's designed to facilitate wrapping or not wrapping based on the width of the output that you specify.
Dependent on the syntax of language you're implementing, you should also be cautious about softline and softbreak and the </> and <//> operators that use them. There's no reason I can see that you can't use <$> and <$$> instead.
sep, fillSep, cat and fillCat all use group directly or indirectly (and have the indeterminate semantics/width-dependent line breaks you want to avoid). However, given the your purpose, I don't think you need them:
Use vsep or hsep instead of sep or fillSep.
Use hcat or vcat instead of cat or fillCat.
You could use a line like
import Text.PrettyPrint.Leijen hiding (group,softline,softbreak,
(</>),(<//>),
sep,fillSep,cat,fillCat)
to make sure you don't call these functions.
I can't think of a way to ensure that functions you do use don't call group somewhere along the line, but I think those are the ones to avoid.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Selecting output lines in chunk output by modifying the default output hook - hook

Actually, this solution does work. My actual test example was flawed. Maybe others will find this helpful.

Related

Is there a way to extend the lines of context for comparing strings in Jest?

Reading an OBJ file with Haskell

assign global variable in R function using tm: bug on linux?

R coding string

Ungroupable line break using wl-pprint

Categories

Resources