Text encoding - fine on Windows, not nix - linux

I have an issue with loading data between default encoding on Win and nix machines (ISO-8859-1 and UTF-8 respectively).
Example - Windows first:
library(stringi)
dummy <- as.character("BØÅS")
write(dummy, "saveFile")
getData <- read.table("saveFile", header=F, sep="\t", quote="\"")
reEncode=function(x) {
stri_trans_general(x, "Latin-ASCII")
}
enCoded <- apply(getData, 1, reEncode)
result <- as.data.frame(enCoded)
In Windows the above produces "BOAS" as desired.
Now move to nix and use the saved file:
getData <- read.table("saveFile", header=F, sep="\t", quote="\"")
reEncode=function(x) {
stri_trans_general(x, "Latin-ASCII")
}
enCoded <- apply(getData, 1, reEncode)
result <- as.data.frame(enCoded)
Nix gives "B??S".
I believe this is a read.table encoding issue but haven't been able to figure out how to get nix to use ISO-8859-1. Any suggestions?

read.table("saveFile", header=F, sep="\t", quote="\"",encoding="latin1")

Related

Cabal package difference between readPackageDescription and parsePackageDescription

Haskell package Cabal-1.24.2 has module Distribution.PackageDescription.Parse.
Module has 2 functions: readPackageDescription and parsePackageDescription.
When I run in ghci:
let d = readPackageDescription normal "C:\\somefile.cabal"
I got parsed GenericPackageDescription
But when I run in ghci:
content <- readFile "C:\\somefile.cabal"
let d = parsePackageDescription content
I got Parse error:
ParseFailed (FromString "Plain fields are not allowed in between stanzas: F 2 \"version\" \"0.1.0.0\"" (Just 2))
File example is a file that generated using cabal init
parsePackageDescription expects the file contents themselves to be passed it, not the file path they are stored at. You'll want to readFile first... though beware of file encoding issues. http://www.snoyman.com/blog/2016/12/beware-of-readfile

Haskell: simplehttp appending "%0D"?

I am using simplehttp to query webpage. eg: let webLink = "www.example.com/" and number= 257 (number is read from file).
res <- simpleHttp $ "webLink" ++ number
It is working fine on windows but on mac, it is throwing error 404 as its showing path as "www.example.com/257%0D"
I have no idea where this "%0D" is coming from because printing number is giving me 257 . I have tried filtering "%0D" as well like below, but still mac is showing error 404 due to %0D in path...Please suggest.
res <- simpleHttp $ (filter (not . (`elem` "%0D")) ("webLink" ++ number))
The 0x0D character is a component of the newline sequence on windows but not on mac. You are probably reading in a line from your windows-encoded file that contains a windows newline that your mac doesn't understand without a little help from you.

Problems with Character Encoding Using Haskells Text.Pandoc

I want to parse a LaTeX-File using Pandoc and output the text, like this:
import qualified Text.Pandoc as P
import Text.Pandoc.Error (handleError)
tex2Str = do
file <- readFile "test.tex"
let p = handleError $ P.readLaTeX P.def file
writeFile "A.txt" $ P.writePlain P.def p
writeFile "B.txt" $ file
While the encoding in file B.txt seems to be "right" (i.e. uft-8), the encoding in file A.txt is not correct.
Here the respective extracts of the files:
A.txt:
...
Der _Crawler_ läuft hierbei über die Dokumentenbasis
...
B.txt:
...
\usepackage[utf8]{inputenc}
...
Der \emph{Crawler} läuft hierbei über die Dokumentenbasis
...
Anyone knows how to fix this? Why does Pandoc use the wrong encoding (I thought, it uses utf-8 by default)?
Update:
I got a (partial) solution: Using the readFile and writeFile-Functions from Text.Pandoc.UTF8 seems to fix some of the problems, i.e.
import qualified Text.Pandoc as P
import Text.Pandoc.Error (handleError)
import qualified Text.Pandoc.UTF8 as UTF (readFile, writeFile)
tex2Str = do
file <- UTF.readFile "test.tex"
let p = handleError $ P.readLaTeX P.def file
UTF.writeFile "A.txt" $ P.writePlain P.def p
UTF.writeFile "B.txt" $ file
However, I still didnt get the clue what the actual problem was, since both Prelude.readFile and Prelude.writeFile seem to work uft8-aware...

can snap handle utf8 text? [duplicate]

I have used writeBS writeText from Snap and renderTemplate from heist but none of them seems to support unicode.
site :: Snap ()
site = do
ifTop (writeBS "你好世界") <|>
route [("test", testSnap)]
testSnap :: Snap ()
testSnap = do
fromJust $ C.renderTemplate hs "test"
-- test.tpl
你好世界
I expected it to output "你好世界" for the / or /test route, but in fact its output is just some messy code.
The problem here is not with writeBS or writeText. It's with the conversion used by the OverloadedStrings extension. It is also important to understand the distinction between ByteString and Text. ByteString is for raw bytes. There is no concept of characters or an encoding. That is where Text comes in. The Data.Text.Encoding module has a bunch of functions for converting between Text and ByteString using different encodings. For me, both of the following generate the same output:
writeBS $ encodeUtf8 "你好世界"
writeText "你好世界"
The reason your code didn't work is because your string literal is being converted to ByteString by the OverloadedStrings extension, and it is not giving you the behavior you want. The solution is to treat it as the proper type...Text.
On the Heist side of things, the following works fine for me:
route [("test", cRender "test")]
In fact, this one renders correctly in my browser, while the previous two don't. The difference is that cRender sets an appropriate content-type. I found it enlightening to observe the differences using the following snippet.
site = route [ ("/test1", writeBS "你好世界")
, ("/test2", writeBS $ encodeUtf8 "你好世界")
, ("/test3", writeText "你好世界")
, ("/test4", modifyResponse (setContentType "text/html;charset=utf-8") >> writeText "你好世界")
, ("/testHeist", cRender "test")
]
In my browser test4 and testHeist work correctly. Tests 2 and 3 give you the correct behavior but might not be rendered properly by browsers because of the lack of content-type.

Lattice problems: lattice objects coming from JAGS, but device can't be set

I ran JAGS with runjags in R and I got a giant list back (named results for this example).
Whenever I access results$density, two lattice plots (one for each parameter) pop up in the default quartz device.
I need to combine these with par(mfrow=c(2, 1)) or with a similar approach, and send them to the pdf device.
Nothing I tried is working. Any ideas?
I've tried dev.print, pdf() with dev.off(), etc. with no luck.
Here's a way to ditch the "V1" panels by manipulation of the Trellis structure:
p1 <- results$density$c
p2 <- results$density$m
p1$layout <- c(1,1)
p1$index.cond[[1]] <- 1 # remove second index
p1$condlevels[[1]] <- "c" # remove "V1"
class(p1) <- "trellis" # overwrite class "plotindpages"
p2$layout <- c(1,1)
p2$index.cond[[1]] <- 1 # remove second index
p2$condlevels[[1]] <- "m" # remove "V1"
class(p2) <- "trellis" # overwrite class "plotindpages"
library(grid)
layout <- grid.layout(2, 1, heights=unit(c(1, 1), c("null", "null")))
grid.newpage()
pushViewport(viewport(layout=layout))
pushViewport(viewport(layout.pos.row=1))
print(p1, newpage=FALSE)
popViewport()
pushViewport(viewport(layout.pos.row=2))
print(p2, newpage=FALSE)
popViewport()
popViewport()
pot of c.trellis() result http://img142.imageshack.us/img142/3272/ctrellisa.png
The easiest way to combine the plots is to use the results stored in results$mcmc:
# prepare data, see source code of "run.jags"
thinned.mcmc <- combine.mcmc(list(results$mcmc),
collapse.chains=FALSE,
return.samples=1000)
print(densityplot(thinned.mcmc[,c(1,2)], layout=c(1,2),
ylab="Density", xlab="Value"))
For instance, for the included example from run.jags, check the structure of the list using
sink("results_str.txt")
str(results$density)
sink()
Then you will see components named layout. The layout for the two plots of each variable can be set using
results$density$m$layout <- c(1,2)
print(results$density$m)
The plots for different parameters can be combined using the c.trellis method from the latticeExtra package.
class(results$density$m) <- "trellis" # overwrite class "plotindpages"
class(results$density$c) <- "trellis" # overwrite class "plotindpages"
library("latticeExtra")
update(c(results$density$m, results$density$c), layout=c(2,2))
output of c.trellis http://img88.imageshack.us/img88/6481/ctrellis.png
Another approach is to use grid viewports:
library("grid")
results$density$m$layout <- c(2,1)
results$density$c$layout <- c(2,1)
class(results$density$m) <- "trellis"
class(results$density$c) <- "trellis"
layout <- grid.layout(2, 1, heights=unit(c(1, 1), c("null", "null")))
grid.newpage()
pushViewport(viewport(layout=layout))
pushViewport(viewport(layout.pos.row=1))
print(results$density$m, newpage=FALSE)
popViewport()
pushViewport(viewport(layout.pos.row=2))
print(results$density$c, newpage=FALSE)
popViewport()
popViewport()
grid output http://img88.imageshack.us/img88/5967/grida.png

Resources