Home > Uncategorized > Strings, lists & gatherers

Strings, lists & gatherers

Problem statement: You need to send a dynamically constructed stream of bytes over some I/O channel.

An illustrative example

As always, I shall take an example in the internet space and more specifically in HTML. Let us say that you have a classical two dimensional array whose dimensions are not known at compile time. The contents of this array happens to be strings. This 2d array needs to be printed in the form of a valid HTML table. To make things more interesting, each cell in the table may be given a CSS class based on the content of the cell. The output could look as follows:
 <table>
<tr>
<td class="ve">one</td>
<td class="ve">two</td>
<td class="ve">three</td>
</tr>
<tr>
<td>four</td>
<td class="ve">five</td>
<td>six</td>
</tr>
<tr>
<td>seven</td>
<td>eight</td>
<td class="ve">nine</td>
</tr>
</table>

The rule for having the optional CSS class is left as a puzzle for the bored reader.

A naïve approach

The most straightforward  technique is to keep writing one chunk at a time, peeking into the the array as necessary. The reality is that very few people do it since everyone is taught that unbuffered I/O is expensive. The origin of this problem comes the fact that in most mainstream operating systems, only the kernel is allowed to do actual I/O and hence a system call needs to be made. The cost of switching from user space to kernel space and back is considered to be high.

The “fix”

Hence, the strategy is to accumulate a fair amount bytes that needs to be transmitted and then do fewer system calls. I/O APIs in programming languages usually provide transparent APIs where this accumulation happens. It usually comes with a burden of expecting the programmer to indicate the end of the stream so that any left over bytes can be flushed by making one last syscall.

Memory copy

One of sources of overhead (not the largest) of making syscall arises from having to copy data from user space to kernel space. This data copy operation happens ever so often in user space when string are concatenated. If either the user program directly concatenates or the buffering implementation does so in the end, part of the syscall overhead is incurred. The plausible reason is that the most common form of output APIs that programmers are exposed to takes a single stream/string.

Fewer copies

A far less known technique known as scatter/gather I/O exists that sort of addresses this problem. The gather operation is used for writing output in a single shot from multiple input byte buffers. This API (in both POSIX and Windows) accepts an array of buffers over which is sequentially iterates and writes the output. The problem is now reduced to having an array with pointers/references to all the buffers. If you have come down to having to operate at this level, chances are, you might not want to deal with magically expanding arrays. Your option at that point would be to use a linked list to accumulate all the references to the buffers and then turn it into an array just before performing the write operation.

But why all this you ask

… because unless you are coding at what is considered fairly low levels these days (such a POSIX & C), you do not realize how many times you are ripping up and recreating little byte streams in multiple layers of code using both direct and indirect constructs. This problem becomes very evident when trying to generate  outputs in formats such as XML or JSON where there is a mix of a lot of what I call gluing bytes sprinkled liberally between the actual payload. Given an arbitrarily nested variable loosely typed languages like say the ones available in perl/php/python/javascript, I am wondering what is the most elegant way to arrive a representation like JSON and perform an output operation without mindless string concatenation. Thoughts are welcome.

  1. No comments yet.
Comment pages
1 3 4 5
  1. January 15th, 2012 at 14:59 | #1
  2. January 15th, 2012 at 20:28 | #2
  3. January 16th, 2012 at 03:04 | #3
  4. January 16th, 2012 at 06:23 | #4
  5. January 17th, 2012 at 12:28 | #5
  6. January 18th, 2012 at 06:27 | #6
  7. January 18th, 2012 at 06:47 | #7
  8. January 19th, 2012 at 06:41 | #8
  9. January 19th, 2012 at 07:32 | #9
  10. January 19th, 2012 at 07:34 | #10
  11. January 19th, 2012 at 08:45 | #11
  12. January 19th, 2012 at 09:08 | #12
  13. January 19th, 2012 at 09:33 | #13
  14. January 19th, 2012 at 10:51 | #14
  15. January 20th, 2012 at 22:50 | #15
  16. January 20th, 2012 at 23:04 | #16
  17. January 20th, 2012 at 23:48 | #17
  18. January 22nd, 2012 at 19:10 | #18
  19. January 22nd, 2012 at 21:39 | #19
  20. January 23rd, 2012 at 06:46 | #20
  21. January 23rd, 2012 at 12:05 | #21
  22. January 25th, 2012 at 17:27 | #22
  23. January 25th, 2012 at 19:24 | #23
  24. January 25th, 2012 at 20:52 | #24
  25. January 25th, 2012 at 21:32 | #25
  26. January 25th, 2012 at 21:33 | #26
  27. January 25th, 2012 at 22:33 | #27
  28. January 25th, 2012 at 23:40 | #28
  29. January 26th, 2012 at 00:05 | #29
  30. January 26th, 2012 at 00:16 | #30
  31. January 26th, 2012 at 00:29 | #31
  32. January 26th, 2012 at 00:39 | #32
  33. January 26th, 2012 at 01:24 | #33
  34. January 26th, 2012 at 03:07 | #34
  35. January 26th, 2012 at 06:18 | #35
  36. January 26th, 2012 at 09:23 | #36
  37. January 26th, 2012 at 09:46 | #37
  38. January 26th, 2012 at 11:16 | #38
  39. January 27th, 2012 at 00:12 | #39
  40. January 27th, 2012 at 01:53 | #40
  41. January 27th, 2012 at 02:37 | #41
  42. January 27th, 2012 at 03:03 | #42
  43. January 27th, 2012 at 04:37 | #43
  44. January 27th, 2012 at 04:57 | #44
  45. January 27th, 2012 at 05:16 | #45
  46. January 27th, 2012 at 05:19 | #46
  47. January 27th, 2012 at 05:44 | #47
  48. January 27th, 2012 at 05:54 | #48
  49. January 27th, 2012 at 06:03 | #49
  50. January 27th, 2012 at 06:20 | #50
  51. January 27th, 2012 at 06:44 | #51
  52. January 27th, 2012 at 06:58 | #52
  53. January 27th, 2012 at 07:06 | #53
  54. January 27th, 2012 at 07:23 | #54
  55. January 27th, 2012 at 07:45 | #55
  56. January 27th, 2012 at 08:19 | #56
  57. January 27th, 2012 at 08:37 | #57
  58. January 27th, 2012 at 09:34 | #58
  59. January 27th, 2012 at 09:39 | #59
  60. January 27th, 2012 at 16:21 | #60
  61. January 27th, 2012 at 17:25 | #61
  62. January 27th, 2012 at 18:02 | #62
  63. January 27th, 2012 at 18:36 | #63
  64. January 27th, 2012 at 18:37 | #64
  65. January 27th, 2012 at 19:18 | #65
  66. January 27th, 2012 at 21:53 | #66
  67. January 27th, 2012 at 22:57 | #67
  68. January 27th, 2012 at 23:43 | #68
  69. January 28th, 2012 at 00:04 | #69
  70. January 28th, 2012 at 02:53 | #70
  71. January 28th, 2012 at 04:34 | #71
  72. January 28th, 2012 at 05:14 | #72
  73. January 28th, 2012 at 05:38 | #73
  74. January 28th, 2012 at 05:50 | #74
  75. January 28th, 2012 at 13:59 | #75
  76. January 28th, 2012 at 14:37 | #76
  77. January 28th, 2012 at 19:02 | #77
  78. January 28th, 2012 at 19:41 | #78
  79. January 28th, 2012 at 19:41 | #79
  80. January 28th, 2012 at 20:11 | #80
  81. January 28th, 2012 at 20:49 | #81
  82. January 28th, 2012 at 21:19 | #82
  83. January 28th, 2012 at 22:27 | #83
  84. January 28th, 2012 at 22:51 | #84
  85. January 28th, 2012 at 23:50 | #85
  86. January 29th, 2012 at 02:02 | #86
  87. January 29th, 2012 at 02:24 | #87
  88. January 29th, 2012 at 03:04 | #88
  89. January 29th, 2012 at 03:11 | #89
  90. January 29th, 2012 at 03:33 | #90
  91. January 29th, 2012 at 04:16 | #91
  92. January 29th, 2012 at 06:14 | #92
  93. January 29th, 2012 at 19:23 | #93
  94. January 29th, 2012 at 19:59 | #94
  95. January 29th, 2012 at 20:31 | #95
  96. January 29th, 2012 at 20:46 | #96
  97. January 29th, 2012 at 20:55 | #97
  98. January 29th, 2012 at 21:10 | #98
  99. January 29th, 2012 at 21:16 | #99
  100. January 29th, 2012 at 21:18 | #100
  101. January 29th, 2012 at 21:31 | #101
  102. January 29th, 2012 at 21:50 | #102
  103. January 29th, 2012 at 22:03 | #103
  104. January 29th, 2012 at 22:24 | #104
  105. January 29th, 2012 at 22:31 | #105
  106. January 29th, 2012 at 22:32 | #106
  107. January 29th, 2012 at 22:48 | #107
  108. January 29th, 2012 at 23:01 | #108
  109. January 29th, 2012 at 23:32 | #109
  110. January 30th, 2012 at 00:02 | #110
  111. January 30th, 2012 at 00:39 | #111
  112. January 30th, 2012 at 00:52 | #112
  113. January 30th, 2012 at 01:16 | #113
  114. January 30th, 2012 at 01:23 | #114
  115. January 30th, 2012 at 01:40 | #115
  116. January 30th, 2012 at 01:54 | #116
  117. January 30th, 2012 at 02:17 | #117
  118. January 30th, 2012 at 02:29 | #118
  119. January 30th, 2012 at 03:05 | #119
  120. January 30th, 2012 at 04:22 | #120
  121. January 30th, 2012 at 04:25 | #121
  122. January 30th, 2012 at 04:35 | #122
  123. January 30th, 2012 at 05:06 | #123
  124. January 30th, 2012 at 05:20 | #124
  125. January 30th, 2012 at 05:50 | #125
  126. January 30th, 2012 at 12:46 | #126
  127. January 30th, 2012 at 13:27 | #127
  128. January 30th, 2012 at 14:37 | #128
  129. January 30th, 2012 at 15:20 | #129
  130. January 30th, 2012 at 15:57 | #130
  131. January 30th, 2012 at 16:02 | #131
  132. January 30th, 2012 at 17:01 | #132
  133. January 30th, 2012 at 17:49 | #133
  134. January 30th, 2012 at 18:24 | #134
  135. January 30th, 2012 at 19:02 | #135
  136. January 30th, 2012 at 19:48 | #136
  137. January 30th, 2012 at 22:05 | #137
  138. January 30th, 2012 at 23:26 | #138
  139. January 30th, 2012 at 23:46 | #139
  140. January 31st, 2012 at 00:05 | #140
  141. January 31st, 2012 at 00:23 | #141
  142. January 31st, 2012 at 01:11 | #142
  143. January 31st, 2012 at 02:34 | #143
  144. January 31st, 2012 at 03:15 | #144
  145. January 31st, 2012 at 03:52 | #145
  146. January 31st, 2012 at 13:15 | #146
  147. January 31st, 2012 at 15:38 | #147
  148. January 31st, 2012 at 16:23 | #148
  149. January 31st, 2012 at 17:26 | #149
  150. January 31st, 2012 at 18:10 | #150
  151. January 31st, 2012 at 18:43 | #151
  152. January 31st, 2012 at 20:11 | #152
  153. January 31st, 2012 at 20:39 | #153
  154. February 1st, 2012 at 01:14 | #154
  155. February 1st, 2012 at 01:36 | #155
  156. February 1st, 2012 at 01:45 | #156
  157. February 1st, 2012 at 01:56 | #157
  158. February 1st, 2012 at 02:19 | #158
  159. February 1st, 2012 at 03:14 | #159
  160. February 1st, 2012 at 03:32 | #160
  161. February 1st, 2012 at 12:46 | #161
  162. February 1st, 2012 at 13:25 | #162
  163. February 1st, 2012 at 14:16 | #163
  164. February 1st, 2012 at 14:52 | #164
  165. February 1st, 2012 at 15:00 | #165
  166. February 1st, 2012 at 15:49 | #166
  167. February 1st, 2012 at 16:04 | #167
  168. February 1st, 2012 at 16:35 | #168
  169. February 1st, 2012 at 16:55 | #169
  170. February 1st, 2012 at 18:21 | #170
  171. February 1st, 2012 at 18:23 | #171
  172. February 1st, 2012 at 19:16 | #172
  173. February 1st, 2012 at 19:28 | #173
  174. February 1st, 2012 at 20:05 | #174
  175. February 1st, 2012 at 20:22 | #175
  176. February 1st, 2012 at 20:58 | #176
  177. February 2nd, 2012 at 01:02 | #177
  178. February 2nd, 2012 at 01:46 | #178
  179. February 2nd, 2012 at 01:58 | #179
  180. February 2nd, 2012 at 02:16 | #180
  181. February 2nd, 2012 at 02:30 | #181
  182. February 2nd, 2012 at 02:52 | #182
  183. February 2nd, 2012 at 03:28 | #183
  184. February 2nd, 2012 at 03:40 | #184
  185. February 2nd, 2012 at 04:13 | #185
  186. February 2nd, 2012 at 04:27 | #186
  187. February 2nd, 2012 at 04:55 | #187
  188. February 2nd, 2012 at 05:07 | #188
  189. February 2nd, 2012 at 05:28 | #189
  190. February 2nd, 2012 at 05:41 | #190
  191. February 2nd, 2012 at 05:56 | #191
  192. February 2nd, 2012 at 06:08 | #192
  193. February 2nd, 2012 at 12:52 | #193
  194. February 2nd, 2012 at 12:53 | #194
  195. February 2nd, 2012 at 14:19 | #195
  196. February 2nd, 2012 at 14:23 | #196
  197. February 2nd, 2012 at 15:54 | #197
  198. February 2nd, 2012 at 23:50 | #198
  199. February 3rd, 2012 at 02:55 | #199
  200. February 3rd, 2012 at 13:57 | #200
  201. February 3rd, 2012 at 17:10 | #201
  202. February 3rd, 2012 at 17:14 | #202
  203. February 3rd, 2012 at 17:19 | #203
  204. February 3rd, 2012 at 18:14 | #204
  205. February 3rd, 2012 at 19:27 | #205
  206. February 3rd, 2012 at 19:33 | #206
  207. February 3rd, 2012 at 19:52 | #207
  208. February 3rd, 2012 at 19:59 | #208
  209. February 3rd, 2012 at 23:13 | #209
  210. February 4th, 2012 at 01:00 | #210
  211. February 4th, 2012 at 01:02 | #211
  212. February 4th, 2012 at 02:06 | #212
  213. February 4th, 2012 at 09:11 | #213
  214. February 4th, 2012 at 12:51 | #214
  215. February 4th, 2012 at 15:41 | #215
  216. February 4th, 2012 at 20:29 | #216
  217. February 5th, 2012 at 08:22 | #217
  218. February 5th, 2012 at 08:27 | #218
  219. February 5th, 2012 at 09:03 | #219
  220. February 5th, 2012 at 09:27 | #220
  221. February 5th, 2012 at 10:08 | #221
  222. February 5th, 2012 at 18:02 | #222
  223. February 5th, 2012 at 18:07 | #223
  224. February 5th, 2012 at 20:04 | #224
  225. February 5th, 2012 at 22:24 | #225
  226. February 6th, 2012 at 01:25 | #226
  227. February 6th, 2012 at 02:32 | #227
  228. February 6th, 2012 at 02:36 | #228
  229. February 6th, 2012 at 05:16 | #229
  230. February 6th, 2012 at 05:20 | #230
  231. February 6th, 2012 at 06:34 | #231
  232. February 6th, 2012 at 06:54 | #232
  233. February 6th, 2012 at 07:11 | #233
  234. February 6th, 2012 at 07:36 | #234
  235. February 6th, 2012 at 08:21 | #235