Message8611

Author jeff.allen
Recipients jeff.allen, rpan, zyasoft
Date 2014-06-09.07:57:34
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1402300655.63.0.188911675695.issue2123@psf.upfronthosting.co.za>
In-reply-to
Content
I poked around at this yesterday. A couple of problems combine here.

The absence of the codec is the larger obstacle:

Jython 2.7b3+ (default:6cee6fef06f0+, Jun 8 2014, 19:49:20)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_51
Type "help", "copyright", "credits" or "license" for more information.
>>> import codecs
>>> codecs.lookup("cp936")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding 'cp936'

And the same happens for "ms936" and "x-mswin-936". A look at aliases.py shows "ms936" and "cp936" should map to the gbk codes, which we don't have yet (#1066).

Secondly, in a couple of places, when no encoding is explicit, I've chosen to get a name from the console Charset like this:
    encoding = Py.getConsole().getEncodingCharset().name();
thinking the "canonical name" would be the one we want. In this case, that returns "x-mswin-936", which wouldn't be recognised in aliases.py even if we had the gbk codec.

Stepping through the part of InteractiveConsole that parses Python, I find that I'm using the canonical name to set the cflags.encoding consulted by the compiler. Oddly, when parsing the quoted string, it reacts to the missing "x-mswin-936" codec in the same way as to an incomplete line, hence the continuation prompt. It is also odd to me that it doesn't notice sooner that the codec is missing: almost as if it only used the Python codec for string parsing. Elsewhere, it is definitely using a Java codec, which of course it has no trouble obtaining by the canonical name.

At present, I would like to see if we could use one codec consistently when parsing. I can see that using the Python codec is preferable in some ways, but this code uses the Java one predominantly (I think). It would be cool if we could make Java codecs into Python ones. Or the other way around.

I'll think about the name confusion too. Amongst the alias names (in Java) for "x-mswin-936" is "ms936" which Python would accept. It's a bit ugly to sort through them until we find a Python-acceptable one, but it may come to that. Perhaps Python (all Pythons) should accept the (Java) canonical codec name for a codec.
History
Date User Action Args
2014-06-09 07:57:35jeff.allensetmessageid: <1402300655.63.0.188911675695.issue2123@psf.upfronthosting.co.za>
2014-06-09 07:57:35jeff.allensetrecipients: + jeff.allen, zyasoft, rpan
2014-06-09 07:57:35jeff.allenlinkissue2123 messages
2014-06-09 07:57:34jeff.allencreate