Message1496

Author kzuberi
Recipients
Date 2007-02-20.00:23:17
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
To clarify your description, its a limit of the size of string constants in the source and not a limit to the size of strings handled by the program (i think that's what you mean anyway). Looking in the source history, it seems to have been introduced with this ancient checkin:

  http://jython.svn.sourceforge.net/viewvc/jython?view=rev&revision=131

But the bug number mentioned there refers to a system that predated our use of the sourceforge trackers (a jitterbug instance?), and i've not been able to dig up the actual bug report.

Experimenting with that limit removed in CodeCompiler.java using a little one-liner like:

 exec('x="%s"' % ('1' * 65536 ))

shows an underlying problem. The relavant bit of stacktrace is:
 
 java.io.UTFDataFormatException: encoded string too long: 65536 bytes
         at java.io.DataOutputStream.writeUTF(DataOutputStream.java:347)
         at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
         at org.python.compiler.ConstantPool.UTF8(ConstantPool.java:88)
         at org.python.compiler.ConstantPool.String(ConstantPool.java:188)
 
So i think what's happening here is that the string constants that appear in the source are stored in the java class's constant pool, but that the max size allowed there and allowed by writeUTF() is 64k bytes. Here's an old reference to this limit:

 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4071592

Notice that the check in CodeCompiler.java is actually comparing the number of (presumably 16-bit encoded) characters in the string to this 32767 limit and not the length of its encoding in UTF-8. So its possible that we are actually disallowing string constants that would actually fit, say in the case of the plain old ascii subset that is represented by 1-byte chars in UTF-8.

Anyhow, if you can control your input, you may be able to work around this by transforming your large string constants into smaller constants concatenated at runtime. It would be interesting to see if a similar transformation were possible to do automagically within jython, but i wouldn't expect it for the upcoming release.

Lowering priority and removing assignment to next beta.

- kz 
History
Date User Action Args
2008-02-20 17:17:45adminlinkissue1663711 messages
2008-02-20 17:17:45admincreate