Message11070

Author stefan.richthofer
Recipients asmeurer, stefan.richthofer
Date 2017-02-02.20:07:55
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1486066075.67.0.830980860601.issue2548@psf.upfronthosting.co.za>
In-reply-to
Content
Aaron: Alright. Pleasure for me.

Creating ucnhash.dat for current Unicode 9.0 turns out to be more challenging than I expected. The script responsible for this Misc/make_ucnhashdat.py seems to be ancient. It writes several values in 16 Bit which exceed value of 65535 for current UnicodeData.txt, e.g.:

Raw size = 184608
    writeUcnhashDat()
  File "/data/workspace/linux/Jython/stewori/jython/Misc/make_ucnhashdat.py", line 340, in writeUcnhashDat
    raw.writeto(outf)
  File "/data/workspace/linux/Jython/stewori/jython/Misc/make_ucnhashdat.py", line 188, in writeto
    file.write(struct.pack("!H", self.size()))
struct.error: 'H' format requires 0 <= number <= 65535

So I'll have to switch some stuff to 32 bit numbers, which will also require changes in the parser ucnhash.java.

Stay tuned...
History
Date User Action Args
2017-02-02 20:07:55stefan.richthofersetmessageid: <1486066075.67.0.830980860601.issue2548@psf.upfronthosting.co.za>
2017-02-02 20:07:55stefan.richthofersetrecipients: + stefan.richthofer, asmeurer
2017-02-02 20:07:55stefan.richthoferlinkissue2548 messages
2017-02-02 20:07:55stefan.richthofercreate