Why is the complexity of simple string concatenation O(n^2)? - string

I read on several manuals and online sources that the running time of "simple string concatenation" is O(n^2)?
The algorithm is this: we take the first 2 strings, create a new string, copy the characters of the 2 original strings in the new string, and repeat this process over and over again until all strings are concatenated. We are not using StringBuilder or similar implementations: just a simple string concatenation.
I think the running time should be something like O(kn) where k = number of strings, n = total number of characters.
You don't copy the same characters n times, but k times, so it should not be O(n^2). For example, if you have 2 strings, it's just O(n).
Basically it's n + (n-x) + (n-y) + (n-z)... but k times, not n times.
Where am I wrong?

A precise problem statement is necessary here:
There are two metrics to consider: How much space is required and how much time is required.
This note looks at the time requirements.
The concatenation operation is specified to only concatenate two of the strings at a time, with concatentation being performed with left association:
((k1 + k2) + k3) ...
There are two parameters that may be considered, and two ways of looking at the second parameter.
The first parameter is the total size (in characters) of the strings which are to be concatenated.
The second parameter is either the number of strings which are to be concatenated, or is the size of each of the strings which are to be concatenated.
Considering the first case:
n - Total size (in characters) of the strings to be concatenated.
k - Total number of strings to be concatenated.
The time the concatenation is roughly:
(n/k) * (k^2) / 2
Or, to within a constant factory:
n * k
Then, for a fixed 'k', the concatenation time is linear!
Considering instead the second case:
n - Total size of the strings
m - Size of each of the sub-strings
This corresponds to the prior case but with:
k = n / m
The prior estimate then becomes:
n * k = n * (n / m) = n^2 / m
That is, for a fixed 'm', the concatenation time is quadratic.

If you write some tests and look at the byte code you will see that StringBuilder is used to implement concatenation. And sometimes it will pre-allocate the internal array to increase the efficiency to do so. That is clearly not O(n^2) complexity.
Here is the Java code.
public static void main(String[] args) {
String[] william = {
"To ", "be ", "or ", "not ", "to ", ", that", "is ", "the ",
"question."
};
String quote = "";
for (String word : william) {
quote += word;
}
}
Here is the byte code.
public static void main(java.lang.String[] args);
0 bipush 9
2 anewarray java.lang.String [16]
5 dup
6 iconst_0
7 ldc <String "To "> [18]
9 aastore
10 dup
11 iconst_1
12 ldc <String "be "> [20]
14 aastore
15 dup
16 iconst_2
17 ldc <String 0"or "> [22]
19 aastore
20 dup
21 iconst_3
22 ldc <String "not "> [24]
24 aastore
25 dup
26 iconst_4
27 ldc <String "to "> [26]
29 aastore
30 dup
31 iconst_5
32 ldc <String ", that"> [28]
34 aastore
35 dup
36 bipush 6
38 ldc <String "is "> [30]
40 aastore
41 dup
42 bipush 7
44 ldc <String "the "> [32]
46 aastore
47 dup
48 bipush 8
50 ldc <String "question."> [34]
52 aastore
53 astore_1 [william]
54 ldc <String ""> [36]
56 astore_2 [quote]
57 aload_1 [william]
58 dup
59 astore 6
61 arraylength
62 istore 5
64 iconst_0
65 istore 4
67 goto 98
70 aload 6
72 iload 4
74 aaload
75 astore_3 [word]
76 new java.lang.StringBuilder [38]
79 dup
80 aload_2 [quote]
81 invokestatic java.lang.String.valueOf(java.lang.Object) : java.lang.String [40]
84 invokespecial java.lang.StringBuilder(java.lang.String) [44]
87 aload_3 [word]
88 invokevirtual java.lang.StringBuilder.append(java.lang.String) : java.lang.StringBuilder [47]
91 invokevirtual java.lang.StringBuilder.toString() : java.lang.String [51]
94 astore_2 [quote]
95 iinc 4 1
98 iload 4
100 iload 5
102 if_icmplt 70

Related

VBA string to byte array conversion giving incorrect results for certain values only

I a routine to take a string and make it into an array of numbers. This is in VBA running in Excel as part of Office Professional 2019.
The code below is a demo version to illustrate the problem, which encapsulates the original code.
I need to display the numberical equivalent of each char in the string, so am using Cstr(by) elsewhere in code.
Public Sub TestByteFromString()
'### vars
Dim ss As String, i As Integer
Dim arrBytes() As Byte
Dim by As Byte
'###
ss = ""
For i = 0 To 127 Step 1
ss = ss & Chr(Val(i + 126))
Next i
arrBytes = ss
'###
For i = LBound(arrBytes) To UBound(arrBytes) Step 2
by = arrBytes(i)
Debug.Print "Index " & CStr(i) & " Byte " & CStr(by) & " Original " & CStr((i / 2 + 126)) & " Difference = " & CStr(((i / 2 + 126)) - CInt(by))
Next i
'###
End Sub
`
It seems to work fine except for certain values greater than 126, some of which are shown by the demo above.
I am getting these results and cannot see an explanation or a consistant pattern. Does this make sense to anyone what is wrong?
Index 0 Byte 126 Original 126 Difference = 0
Index 2 Byte 127 Original 127 Difference = 0
Index 4 Byte 172 Original 128 Difference = -44
Index 6 Byte 129 Original 129 Difference = 0
Index 8 Byte 26 Original 130 Difference = 104
Index 10 Byte 146 Original 131 Difference = -15
Index 12 Byte 30 Original 132 Difference = 102
Index 14 Byte 38 Original 133 Difference = 95
Index 16 Byte 32 Original 134 Difference = 102
Index 18 Byte 33 Original 135 Difference = 102
Index 20 Byte 198 Original 136 Difference = -62
Index 22 Byte 48 Original 137 Difference = 89
Index 24 Byte 96 Original 138 Difference = 42
Index 26 Byte 57 Original 139 Difference = 82
Index 28 Byte 82 Original 140 Difference = 58
Index 30 Byte 141 Original 141 Difference = 0
Index 32 Byte 125 Original 142 Difference = 17
Index 34 Byte 143 Original 143 Difference = 0
Index 36 Byte 144 Original 144 Difference = 0
Index 38 Byte 24 Original 145 Difference = 121
Index 40 Byte 25 Original 146 Difference = 121
Index 42 Byte 28 Original 147 Difference = 119
Index 44 Byte 29 Original 148 Difference = 119
Index 46 Byte 34 Original 149 Difference = 115
Index 48 Byte 19 Original 150 Difference = 131
Index 50 Byte 20 Original 151 Difference = 131
Index 52 Byte 220 Original 152 Difference = -68
Index 54 Byte 34 Original 153 Difference = 119
Index 56 Byte 97 Original 154 Difference = 57
Index 58 Byte 58 Original 155 Difference = 97
Index 60 Byte 83 Original 156 Difference = 73
Index 62 Byte 157 Original 157 Difference = 0
Index 64 Byte 126 Original 158 Difference = 32
Index 66 Byte 120 Original 159 Difference = 39
Index 68 Byte 160 Original 160 Difference = 0
It seems fine for everything beyond 160 and below 126.
I don't think it is the Cstr() function. If I multiply the byte value by 2 and use Cstr() I get this kind of result, suggesting the byte numerical value is the problem.
Index 66 Byte 120 Original 159 Difference = 39
Index 66 2*Byte 240
Other causes investigated but cannot see an explanation-
-two byte storage in strings for chars.
-ASCII char set
-bytes being decoded as negative numbers if MSB set, but unlikley as 160 onwards is correct.
There may be much better ways to get the array, and they would be very useful, but if possible I would like to also know what has gone wrong so I, and anyone reading, would not make the same mistake again.
Thanks for any assistance, R.

replacing a value in python

I'm writing a bingo game in python. So far I can generate a bingo card and print it.
My problem is after I've randomly generated a number to call out, I don't know how to 'cross out' that number on the card to note that it's been called out.
This is the ouput, it's a randomly generated card:
B 11 13 14 2 1
I 23 28 26 27 22
N 42 45 40 33 44
G 57 48 59 56 55
O 66 62 75 63 67
I was thinking to use random.pop to generate a number to call out (in bingo the numbers go from 1 to 75)
random_draw_list = random.sample(range(1, 76), 75)
number_drawn = random_draw_list.pop()
How can I write a funtion that will 'cross out' a number on the card after its been called.
So for example if number_drawn results in 11, it should replace 11 on the card with an x or a zero.

Perl string weirdness : equal strings being not equal?

I am using Perl v5.16.2
I am using the Net::SMPP modules and it returns me some data.
If I show this data, I get this (simplified) :
$VAR1 = bless( {
'receipted_message_id' => '400002F6E09C61701222120140',
'30' => '400002F6E09C61701222120140'
}, 'Net::SMPP::PDU' );
Now, let's assume this data is in $pdu and I do this :
$message_id = $pdu->{30}; # or $pdu->{receipted_message_id}, same result
myfunction($message_id);
Then, I have myfunction defined as :
sub myfunction {
my $message_id = shift;
my $message_id_static = '400002F6E09C61701222120140';
print Dumper($message_id);
print Dumper($message_id_static);
print hexdump($message_id);
print hexdump($message_id_static);
if ($message_id eq $message_id_static)
{
print "match\n";
}
else
{
print "no match\n";
}
}
The output of the program is :
$VAR1 = '400002F6E09C61701222120140';
$VAR1 = '400002F6E09C61701222120140';
Data::Hexdumper: data length isn't an integer multiple of lines
so has been padded with NULLs at the end.
0x0000 : 34 30 30 30 30 32 46 36 45 30 39 43 36 31 37 30 : 400002F6E09C6170
0x0010 : 31 32 32 32 31 32 30 31 34 30 00 00 00 00 00 00 : 1222120140......
Data::Hexdumper: data length isn't an integer multiple of lines
so has been padded with NULLs at the end.
0x0000 : 34 30 30 30 30 32 46 36 45 30 39 43 36 31 37 30 : 400002F6E09C6170
0x0010 : 31 32 32 32 31 32 30 31 34 30 00 00 00 00 00 00 : 1222120140......
no match
Which doesn't make any sense to me... !
If I try to use $message_id to do a SQLite query, it fails miserably. If I use $message_id_static instead, it works perfectly.
So, is this a weird internal Perl bug, or am I missing something ?
This has been driving me nuts for hours...
EDIT :
Using the perl debugger, I get this :
DB<3> x $message_id_static
0 '400002F6E09C61701222120140'
DB<4> x $message_id
0 "400002F6E09C61701222120140\c#"
So at least I see there is a difference in the strings, but why isn't it seen by the hexdump, and what is that \c# ?
Thanks !
The \c# character is Ctrl-#, which is the ASCII NUL character at code point zero
You can't see it in your hexdump output because it is indistinguishable from the 00 padding at the end of the dump
If you set $Data::Dumper::Useqq = 1 then it will be visible in the output from print Dumper $message_id
You can remove it from the variable by using s/\0\z// or tr/\0//d, but you should really investigate why it is there in the first place

the iterator is printing till 48 when argument supplied is 1

static void main(args){
System.in.withReader {
def input = it.readLine()
for(def i = 0; i < input; i++){
println i
}
}
}
The source code..simple one I guess but dont know why it is printing till 48..here is the output if the argument supplied is 1.
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
what could be the problem?
Tartar is right, the solution is to change
def input = it.readLine()
To
def input = Integer.parseInt( it.readLine() )
Or (more Groovy)
def input = it.readLine().toInteger()
(the reason it is using the ASCII value of 1 is that groovy will convert single char strings to their ASCII value if you try to coerce them into an int... It has been argued that this is confusing, and it may change in future versions of groovy, but for now it remains for backward compatibility reasons)
ascii value for character 1 is 49. so convert input to integer maybe?

Haskell doubt: how to transform a Matrix represented as: [String] to a Matrix Represented as [[Int]]?

Im trying to solve Problem 11 of Project Euler in haskell. I almost did it, but right now im
stuck, i want to transform a Matrix represented as [String] to a Matrix represented as [[Int]].
I "drawed" the matrices:
What i want:
"08 02 22 97 38 15 00 40 [ ["08","02","22","97","38","15","00","40"], [[08,02,22,97,38,15,00,40]
49 49 99 40 17 81 18 57 map words lines ["49","49","99","40","17","81","18","57"], ??a [49,49,99,40,17,81,18,57]
81 49 31 73 55 79 14 29 ----------> ["81","49","31","73","55","79","14","29"], ---------> [81,49,31,73,55,79,14,29]
52 70 95 23 04 60 11 42 ["52","70","95","23","04","60","11","42"], [52,70,95,23,04,60,11,42]
22 31 16 71 51 67 63 89 ["22","31","16","71","51","67","63","89"], [22,31,16,71,51,67,63,89]
24 47 32 60 99 03 45 02" ["24","47","32","60","99","03","45","02"] ] [24,47,32,60,99,03,45,02]]
Im stuck in doing the last transformation (??a)
for curiosity(and learning) i also want to know how to do a matrix of digits:
Input:
"123456789 [ "123456789" [ [1,2,3,4,5,6,7,8,9]
124834924 lines "124834924" ??b [1,2,4,8,3,4,9,2,4]
328423423 ---------> "328423423" ---------> [3,2,8,4,2,3,4,2,3]
334243423 "334243423" [3,3,4,2,4,3,4,2,3]
932402343" "932402343" ] [9,3,2,4,0,2,3,4,3] ]
What is the best way to make (??a) and (??b) ?
What you want is the read function:
read :: (Read a) => String -> a
This thoughtfully parses a string into whatever you're expecting (as long as it's an instance of the class Read, but fortunately Int is such).
So just map that over the words, like so:
parseMatrix :: (Read a) => String -> [[a]]
parseMatrix s = map (map read . words) $ lines s
Just use that in a context that expects [[Int]] and Haskell's type inference will take it from there.
To get the digits, just remember that String is actually just [Char]. Instead of using words, map a function that turns each Char into a single-element list; everything else is the same.

Resources