Message1464

Author	pekka.klarck
Recipients
Date	2007-02-17.13:48:57
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
I was able to fix join and the patch is below. Before really submitting it I want to create tests for this first, try to fix also other affected methods and get some comments about the patch. The patch itself is not too complicated but I'm a bit worried about the overhead in iterating over the given sequence twice -- first in PyString.str_join and then again in this code. I made the check for possible unicode items so that it short-circuits but the worst case of iterating over the whole sequence when there is nothing unicode is unfortunately also the common case. I'd say a better approach would be determining the return type already in PyString.str_join but that requires changes into so many places that it's better done by someone who understands also the big picture behind this expose/derive system. Already changing PyString.str_join to return PyString instead of String requires few changes elsewhere in PyString. Index: src/templates/str.expose =================================================================== --- src/templates/str.expose (revision 3110) +++ src/templates/str.expose (working copy) @@ -40,12 +40,17 @@ expose_meth: :b isupper expose_meth: join o String result = self.str_join(arg0); - //XXX: do we really need to check self? - if (self instanceof PyUnicode \|\| arg0 instanceof PyUnicode) { + if (self instanceof PyUnicode) { return new PyUnicode(result); - } else { - return new PyString(result); } + PyObject iter = arg0.__iter__(); + PyObject obj = null; + for (int i = 0; (obj = iter.__iternext__()) != null; i++) { + if (obj instanceof PyUnicode) { + return new PyUnicode(result); + } + } + return new PyString(result); expose_meth: :s ljust i expose_meth: :s lower expose_meth: :s lstrip S?

I was able to fix join and the patch is below. Before really submitting it I want to create tests for this first, try to fix also other affected methods and get some comments about the patch.

The patch itself is not too complicated but I'm a bit worried about the overhead in iterating over the given sequence twice -- first in PyString.str_join and then again in this code. I made the check for possible unicode items so that it short-circuits but the worst case of iterating over the whole sequence when there is nothing unicode is unfortunately also the common case. I'd say a better approach would be determining the return type already in PyString.str_join but that requires changes into so many places that it's better done by someone who understands also the big picture behind this expose/derive system. Already changing PyString.str_join to return PyString instead of String requires few changes elsewhere in PyString.


Index: src/templates/str.expose
===================================================================
--- src/templates/str.expose    (revision 3110)
+++ src/templates/str.expose    (working copy)
@@ -40,12 +40,17 @@
 expose_meth: :b isupper
 expose_meth: join o
     String result = self.str_join(arg0);
-    //XXX: do we really need to check self?
-    if (self instanceof PyUnicode || arg0 instanceof PyUnicode) {
+    if (self instanceof PyUnicode) {
         return new PyUnicode(result);
-    } else {
-        return new PyString(result);
     }
+    PyObject iter = arg0.__iter__();
+    PyObject obj = null;
+    for (int i = 0; (obj = iter.__iternext__()) != null; i++) {
+        if (obj instanceof PyUnicode) {
+            return new PyUnicode(result);
+        }
+    }
+    return new PyString(result);
 expose_meth: :s ljust i
 expose_meth: :s lower
 expose_meth: :s lstrip S?

History
Date	User	Action	Args
2008-02-20 17:17:43	admin	link	issue1659819 messages
2008-02-20 17:17:43	admin	create