Smalltalk stdin nextLine for fixed number of lines of input - io

I am currently trying to get 3 specific lines of input of the form:
XX.XX (float)
XX (1-3 digit integer)
XX (1-3 digit integer)
Below is the current code I have:
inputStringOne := stdin nextLine.
mealCost := inputStringOne.
Transcript show: inputStringOne; cr .
inputStringTwo := stdin nextLine.
tipPercent := inputStringTwo.
Transcript show: inputStringTwo; cr .
inputString := stdin nextLine.
taxPercent := inputString .
mealCost inspect .
tipPercent inspect .
taxPercent inspect .
I have been at this for a good 3-4 hours programming, scouring the Internet, etc. I am an uber-beginner teaching myself Smalltalk, so I am trying to see why the nextLine isn't reading the inputs correctly. Below are the sample inputs put into stdin, and the results of the Transcript show and inspect:
Sample Input on stdin:
10.25
17
5
Output on stdout:
10.25
.25
An instance of String
contents: [
[1]: $1
[2]: $0
[3]: $.
[4]: $2
[5]: $5
]
An instance of String
contents: [
[1]: $.
[2]: $2
[3]: $5
]
An instance of String
contents: [
[1]: $.
[2]: $2
[3]: $5
]
10.25
When I just do the following code, I see all 3 inputs separated by <10>, which I assume is the carriage-return or linefeed.
"just print all contents in stdin"
inputS := stdin contents.
inputS inspect .
Sample input (stdin):
10.25
17
5
Output (stdout):
An instance of String
contents: [
[1]: $1
[2]: $0
[3]: $.
[4]: $2
[5]: $5
[6]: $<10>
[7]: $1
[8]: $7
[9]: $<10>
[10]: $5
]
So it seems that for some reason, the first line of the code is the only one gotten by stdin nextLine ., but the rest only get the last 3 characters (the decimal point and the 2 digits after the decimal point). I am not sure why this is the case.
I have tried nextAvailable, flush, commit, stdin close, and a handful of other methods, all to no avail. One idea I have is to go about this problem is to just get the whole stdin contents, split by <10>, then save each "part" to 3 different variables, but I want to learn more about how stdin nextLine works and to have a better understanding of how that relates to stdin contents as well. Is there no good way to just do stdin nextLine 3 times given we know that there are only 3 inputs?

This appears to be a bug in GNU smalltalk. If you type in the input directly from the keyboard then stdin nextLine works just fine. However, if you redirect the input from a file then stdin nextLine messes it up.
Significantly, I can't find a single answer in HackerRank that is written in smalltalk so this must be an old problem.

I can not reproduce this on macOS with GNU Smalltalk 3.2.5:
→ gst -f foo.st
12.23
12.23
12
12
7
An instance of String
contents: [
[1]: $1
[2]: $2
[3]: $.
[4]: $2
[5]: $3
]
An instance of String
contents: [
[1]: $1
[2]: $2
]
An instance of String
contents: [
[1]: $7
]
As you can see every input is echoed back from your Transcript show: messages and the output of inspect looks correct too. Which OS and version of GNU Smalltalk are you using?

using a very buggy version of smalltalk on another site. The best approach I could find builds on contents.
Note: array in smalltalk starts at 1, not 0.
ins := stdin contents substrings '\n'.
loopMax := 3.
ix := 1.
[ line := ins at: ix.
Transcript show: 'ins at ', ix printString;
show: ' =', line; cr.
ix := ix + 1.
b := ix asNumber > loopMax.
] whileFalse: [ b ].

Related

Convert carriage return (\r) to actual overwrite

Questions
Is there a way to convert the carriage returns to actual overwrite in a string so that 000000000000\r1010 is transformed to 101000000000?
Context
1. Initial objective:
Having a number x (between 0 and 255) in base 10, I want to convert this number in base 2, add trailing zeros to get a 12-digits long binary representation, generate 12 different numbers (each of them made of the last n digits in base 2, with n between 1 and 12) and print the base 10 representation of these 12 numbers.
2. Example:
With x = 10
Base 2 is 1010
With trailing zeros 101000000000
Extract the 12 "leading" numbers: 1, 10, 101, 1010, 10100, 101000, ...
Convert to base 10: 1, 2, 5, 10, 20, 40, ...
3. What I have done (it does not work):
x=10
x_base2="$(echo "obase=2;ibase=10;${x}" | bc)"
x_base2_padded="$(printf '%012d\r%s' 0 "${x_base2}")"
for i in {1..12}
do
t=$(echo ${x_base2_padded:0:${i}})
echo "obase=10;ibase=2;${t}" | bc
done
4. Why it does not work
Because the variable x_base2_padded contains the whole sequence 000000000000\r1010. This can be confirmed using hexdump for instance. In the for loop, when I extract the first 12 characters, I only get zeros.
5. Alternatives
I know I can find alternative by literally adding zeros to the variable as follow:
x_base2=1010
x_base2_padded="$(printf '%s%0.*d' "${x_base2}" $((12-${#x_base2})) 0)"
Or by padding with zeros using printf and rev
x_base2=1010
x_base2_padded="$(printf '%012s' "$(printf "${x_base2}" | rev)" | rev)"
Although these alternatives solve my problem now and let me continue my work, it does not really answer my question.
Related issue
The same problem may be observed in different contexts. For instance if one tries to concatenate multiple strings containing carriage returns. The result may be hard to predict.
str=$'bar\rfoo'
echo "${str}"
echo "${str}${str}"
echo "${str}${str}${str}"
echo "${str}${str}${str}${str}"
echo "${str}${str}${str}${str}${str}"
The first echo will output foo. Although you might expect the other echo to output foofoofoo..., they all output foobar.
The following function overwrite transforms its argument such that after each carriage return \r the beginning of the string is actually overwritten:
overwrite() {
local segment result=
while IFS= read -rd $'\r' segment; do
result="$segment${result:${#segment}}"
done < <(printf '%s\r' "$#")
printf %s "$result"
}
Example
$ overwrite $'abcdef\r0123\rxy'
xy23ef
Note that the printed string is actually xy23ef, unlike echo $'abcdef\r0123\rxy' which only seems to print the same string, but still prints \r which is then interpreted by your terminal such that the result looks the same. You can confirm this with hexdump:
$ echo $'abcdef\r0123\rxy' | hexdump -c
0000000 a b c d e f \r 0 1 2 3 \r x y \n
000000f
$ overwrite $'abcdef\r0123\rxy' | hexdump -c
0000000 x y 2 3 e f
0000006
The function overwrite also supports overwriting by arguments instead of \r-delimited segments:
$ overwrite abcdef 0123 xy
xy23ef
To convert variables in-place, use a subshell: myvar=$(overwrite "$myvar")
With awk, you'd set the field delimiter to \r and iterate through fields printing only the visible portions of them.
awk -F'\r' '{
offset = 1
for (i=NF; i>0; i--) {
if (offset <= length($i)) {
printf "%s", substr($i, offset)
offset = length($i) + 1
}
}
print ""
}'
This is indeed too long to put into a command substitution. So you better wrap this in a function, and pipe the lines to be resolved to that.
To answer the specific question, how to convert 000000000000\r1010 to 101000000000, refer to Socowi's answer.
However, I wouldn't introduce the carriage return in the first place and solve the problem like this:
#!/usr/bin/env bash
x=$1
# Start with 12 zeroes
var='000000000000'
# Convert input to binary
binary=$(bc <<< "obase = 2; $x")
# Rightpad with zeroes: ${#binary} is the number of characters in $binary,
# and ${var:x} removes the first x characters from $var
var=$binary${var:${#binary}}
# Print 12 substrings, convert to decimal: ${var:0:i} extracts the first
# i characters from $var, and $((x#$var)) interprets $var in base x
for ((i = 1; i <= ${#var}; ++i)); do
echo "$((2#${var:0:i}))"
done

Finding character location of all instances of a string in bash

I'm trying to find the location of all instances of a string in a particular file; however, the code I'm currently running only returns the location of the first instance and then stops there. Here is what I'm currently running:
str=$(cat temp1.txt)
tmp="${str%%<C>*}"
if [ "$tmp" != "$str" ]; then
echo ${#tmp}
fi
The file is only one line of string and I would display it but the format questions need to be in won't allow me to add the proper amount of spaces between each character.
I am not sure of many details of your requirements, however this is an awk one-liner:
awk -vRS='<C>' '{printf("%u:",a+=length($0));a+=length(RS)}END{print ""}' temp1.txt
Let’s test it with an actual line of input:
$ awk -vRS='<C>' \
'{printf("%u:",a+=length($0));a+=length(RS)}END{print ""}' \
<<<" <C> <C> "
4:14:20:
This means: the first <C> is at byte 4, the second <C> is at byte 14 (including the three bytes of the first <C>), and the whole line is 20 bytes long (including final newline).
Is this what you want?
Explanation
We set (-v) record separator (RS) as <C>. Then we keep a variable a with the count of all bytes processed so far. For each “line” (i.e., <C>-separated substrings) we add the length of the current line to a, printf it with a suitable format "%u:", and increase a by the length of the separator which ended the current line. Since no printing so far included newlines, at the END we print an empty string, which is an idiom to output a final newline.
Look at the basically the same question asked here.
In particular your question may be answered for multiple instances thanks to user
JRFerguson response using perl.
EDIT: I found another solution that might just do the trick here. (The main question and response post is found here.)
I changed the shell from ksh to bash, changed the searched string to include multiple <C>'s to better demonstrate an answer the question, and named it "tester":
#!/bin/bash
printf '%s\n' '<C>abc<C>xyz<C>123456<C>zzz<C>' | awk -v s="$1" '
{ d = ""
for(i = 1; x = index(substr($0, i), s); i = i + x + length(s) - 1) {
printf("%s%d", d, i + x - 1)
d = ":"
}
print ""
}'
This is how I ran it:
$ tester '<C>'
1:7:13:22:28
I haven't figured the code out (I like to know why it works) but it seems to work! It would nice to get an explanation and an elegant way to feed your string into this script. Cheers.

Two file numeric comparison in awk

I'm trying to compare the contents of two files, both of which are just a single column of numbers, i.e.
File1:
1.2
2.6
3.4
4.7
5.3
File2:
5.1
4.8
3.2
2.5
1.6
The output should just be the number of lines in file1 that are greater than the corresponding line in file2; so in this case it'd just be
3
awk single process can do that job:
awk 'NR==FNR{a[NR]=$0;next}a[FNR]>$0{i++}END{print i}' file1 file2
outputs:
3
EDIT
by reading JonathanLeffler and steveha's comments, I would add another solution, to avoid to save a monster file into memory. still single awk process:
awk '{getline x < "file2"}$0>x{i++}END{print i}' file1
outputs:
3
Try using paste followed by awk
paste file1 file2 | awk '$1>$2 {i++} END {print i}'
Output:
3
Here is a solution using only AWK, reading only one line at a time from each input file.
BEGIN {
if (ARGC != 3)
{
print "Usage: this_program <file1> <file2>"
exit(1)
}
c = 0
for (;;)
{
result = getline < ARGV[1]
if (1 != result)
break
n1 = $1 + 0
result = getline < ARGV[2]
if (1 != result)
break
n2 = $1 + 0
if (n1 > n2)
++c;
}
print c
}
P.S. I'm a fan of Python and for fun I also solved this in Python.
import sys
if sys.version_info.major < 3:
import itertools
zip = itertools.izip
with open(sys.argv[1]) as f1, open(sys.argv[2]) as f2:
print(sum(float(x) > float(y) for x, y in zip(f1, f2)))
Notes:
zip() pairs values read from two sources. zip(f1, f2) pairs a line read from each of the two input files.
I made this use itertools.izip() when you run it on Python 2.x, so it will handle one line at a time. The built-in zip() in Python 2 reads all the data at once and builds a list.
The error checking isn't obvious but it is there. If an input doesn't work as a float value, you will get an exception; if the user doesn't specify at least two input files, you will get an exception.
This is using a slightly sleazy trick: sum() will treat a Boolean True value as a 1, and a Boolean False value as a 0. Thus this gets a count of all the lines for which the > comparison is true.

Best way to split large string in linux with single character shift

I have a large file that contains a single example string
ABCDEFGHI (example length 10 characters).
Actual file length could be millions of characters.
I would like to split the string into multiple lines with a predetermined length but while splitting the character is shifted 1 at a time. This means after splits the
no. of lines = string length - split size + 1
Example if I split it by 3 character at a time then desired output
ABC
BCD
CDE
DEF
...
If I split by 4 characters then
ABCD
BCDE
CDEF
DEFG
What is the best way of doing this split using shell commands or scripting?
Thanks for any hints
You can try something like this:
gawk -v FS="" '{
r=3 # Set the length
s=1 # Set the start point
while(s<=NF-r+1) {
for (i=s;i<r+s;i++) {
printf $i
}
s++
print ""
}
}'
Test:
$ echo "ABCDEFGHI" | gawk -v FS="" '{r=4; s=1; while(s<=NF-r+1) { for (i=s;i<r+s;i++) printf $i ; s++; print ""}}'
ABCD
BCDE
CDEF
DEFG
EFGH
FGHI
$ echo "ABCDEFGHI" | gawk -v FS="" '{r=3; s=1; while(s<=NF-r+1) { for (i=s;i<r+s;i++) printf $i ; s++; print ""}}'
ABC
BCD
CDE
DEF
EFG
FGH
GHI
Here is a way with sed (in bash):
GNU sed:
sed -r ':a;s/([^\n])([^\n]{'$(( n-1 ))'})([^\n])/\1\2\n\2\3/;ta' filename
or POSIX sed (I think):
sed ':a;s/\([^\n]\)\([^\n]\{'$(( n-1 ))'\}\)\([^\n]\)/\1\2\n\2\3/;ta' filename
Output:
with n=3:
ABC
BCD
CDE
DEF
EFG
FGH
GHI
with n=4:
ABCD
BCDE
CDEF
DEFG
EFGH
FGHI
Another awk-based option, involving substr
echo 'abcdefgh' |
awk -v limit=3 'BEGIN{FS=""};
{value=$0; for (i=1; i<= NF-limit +1; ++i) print substr(value, i, limit)}'
abc
bcd
cde
def
efg
fgh
ghi
While I generally dislike bringing in heavyweight scripting languages like this, python makes this pretty much trivial
$ cat test.py
#!/usr/bin/env python
from os import sys
n = int(sys.argv[1])
s = sys.argv[2]
while len(s) > 0:
print s[:n]
s = s[1:]
$ python test.py 3 abcdef
abc
bcd
cde
def
ef
f
$ python test.py 4 abcdef
abcd
bcde
cdef
def
ef
f
$
If you want to stop once you run out of characters, you can change the while condition to len(s) >= n.
using python you could write something like this:
import itertools
filename = "myfile"
length = 4
with open(filename, 'r') as f:
out = ''
# get your input character by character
for c in itertools.chain.from_iterable(f):
# append it to your output buffer
out += c
# if your buffer is more than N characters, remove the first char
if len(out) > length:
out = out[1:]
# if your buffer is exactly N characters, print it out (or do something else)
if len(out) is length:
print out
# if the last iteration was less than N characters, print it out (or do something else)
if len(out) < length:
print out
where file is a string containing the full path of your string. You can use also raw_input() instead of open()/read(). There sure is a neat solution using awk, but I would need to RTFM to tell you how to do it.
Whatever your solution is, this algorithm is a good way to do it, as you always keep only up to N+1 characters for the buffer, plus one character for the new read. So the complexity of this algorithm is linear (O(n)) to the input character stream.

What's the fastest/most efficient way to count lines in Rebol?

Given a string string, what is the fastest/most-efficient way to count lines therein? Will accept best answers for any flavour of Rebol. I've been working under the assumption that the parse [some [thru]] combination was the fastest way to traverse a string, but then I don't know that for certain, hence turning to SO:
count-lines: func [string [string!] /local count][
parse/all string [
(count: 1) some [thru newline (count: count + 1)]
]
count
]
Or:
count-lines: func [string [string!] /local count][
count: 0
until [
count: count + 1
not string: find/tail string newline
]
count
]
And how about counters? How efficient is repeat?
count-lines: func [string [string!]][
repeat count length? string [
unless string: find/tail string newline [
break/return count
]
]
]
Update: line count goes by the Text Editor principle:
An empty document still has a line count of one. So:
>> count-lines ""
== 1
>> count-lines "^/"
== 2
count-lines: func [
str
/local sort-str ][
sort-str: sort join str "^/"
1 + subtract index? find/last sort-str "^/" index? find sort-str "^/"
]
Enhanced PARSE version, as suggested by BrianH:
i: 1 ; add one as TextMate
parse text [any [thru newline (++ i)]]
print i
Here's the best simple non-parse version I can think of:
count-lines: function [text [string!]] [
i: 1
find-all text newline [++ i]
i
]
It uses function and ++ from more recent versions of Rebol, and find-all from either R3 or R2/Forward. You could look at the source of find-all and inline what you find and optimize, but situations like this are exactly what we wrote find-all for, so why not use it?
Here is the best for me:
temp: read/lines %mytext.txt
length? temp
remove-each can be fast as it is native
s: "1^/2^/3"
a: length? s
print a - length? remove-each v s [v = #"^/"]
; >> 2
or as a function
>> f: func [s] [print [(length? s) - (length? remove-each v s [v = #"^/"])]]
>> f "1^/2^/3"
== 2
Why no one came with the simplest solution I wonder :)
t: "abc^/de^/f^/ghi"
i: 0 until [i: i + 1 not t: find/tail t newline] i
== 4
Not sure about the performance but I think it's quite fast, as UNTIL and FIND are natives.
WHILE could be used as well.
i: 1 while [t: find/tail t newline] [i: i + 1] i
== 4
Just need to check for empty string. And if it would be a function, argument series needs to be HEADed.
Not the most efficient, but probably one of the fastest solution (anyway if a benchmark is run, I would like to see how this solution performs):
>> s: "1^/2^/ ^/^/3"
>> (length? s) - length? trim/with copy s newline
== 4
Do not know about performance, and the last line rule (r3).
>> length? parse "1^/2^/3" "^/"
== 3
hehehe the read/lines length? temp is a great thing I though about read/lines -> foreach lines temps [ count: count + 1]
another way to do it would be to do
temp: "line 1 ^M line2 ^M line3 ^M "
length? parse temp newline ; that cuts the strings into a block
;of multiple strings that represent each a line [ "line 1" "line2" "line3" ]
:then you count how much strings you have in the block with length?
I like to code in rebol it is so funny
Edit I didnt read the whole post so my solution already waas proposed in a different way...
ok to amend for my sin of posting a already posted solution I will bring insight comment of a unexpected behavior of that solution. Multiple chained carriage returns are not counted (using rebol3 linux ...)
>> a: "line1 ^M line2 ^M line3 ^M^M"
== "line1 ^M line2 ^M line3 ^M^M"
>> length? parse a newline
== 3

Resources