The ravings of a sane person. [entries|archive|friends|userinfo]
Josiah Carlson

[ website | My Website ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Interesting OSX/wxPython issue [Jun. 19th, 2008|08:48 am]
[Tags|]

One thing that has cropped up as I attempt to use OS X as a real development platform is that on occasion in PyPE, the Undo/Redo and Cut/Copy menu commands go gray, making it so that I can't do any of those operations. Now, I can ctrl+click on a selection and use the underlying Scintilla cut/copy, but I can't Undo/Redo, which sucks.

A quick restart of PyPE and I'm good to go, but that the menu options (and thusly the keyboard shortcuts) get disabled is a mystery.

I thought that it was a double-tapping of the command key when Google Desktop was open (that was my hotkey), but I can't seem to reproduce the issue when I start both up fresh.

Anyone have any ideas?


Also, I'm going to have to update PyPE to properly handle hotkeys in menus on OS X. It doesn't work the same as on Windows and Linux, and I can't seem to figure out the pattern (some Alt+key combos work, others don't, setting Ctrl turns into Command in the menus, Command is actually Meta according to the underlying layer, but Meta is not understood by the OS X menuing system, ...).

Is it so much to ask for Apple to be consistent with the two other major platforms?

Speaking of which, I actually had an idea for some GUI hacker to play with out there. One of the biggest complaints (at least among those that I've spoken to) about switching from any other platform to OS X is the single menu bar at the top of the screen, making it bothersome to get to the menu for smaller windows at the bottom of the screen. Here's the idea; hack the menuing system to do nothing. Then re-implement the menu drawing, selection, hotkey support, etc. (none of which is terribly difficult) and draw it yourself at the top of your application. In wxPython, I've seen some very slick custom menuing systems, some of which are a lot more convenient to use than the regular menus system that we are all used to.
link4 comments|post comment

Interesting applied cryptography (I am not a cryptographer) [Jun. 17th, 2008|12:40 pm]
[Tags|]

This will probably only be interesting to a few of you, but that's ok.

I need to encrypt data in a platform-independent way (Windows, Mac, Linux), so I need to implement an encryption algorithm in pure Python (I don't want to deal with C compilers). I started with the ARCFOUR algorithm (RC4). Implementing straight RC4 is quite easy, but it has a few security issues.

The first security issue is that if you use the same password over and over, you can leak details about your data and password every time you use it. In order to handle this issue, most people end up using a random salt either prepended or appended to the plaintext password. I have chosen to generate two 10-byte random salts, one as a prefix, one as a suffix, which are prepended to the output file in plaintext (necessary for all random salts). Why not just a suffix? There are known attacks against plaintext prefixes and suffixes.

I apply the setup routine not a single time (as is typical in RC4), but cycled 256 times (thus increasing setup time for the algorithm by a factor of 256), which also allows for passwords of up to around 235 bytes to still reasonably apply. Expanding the table from 256 bytes to 65536 bytes would give us the opportunity to take keys of up to ~64k, but it would also increase setup time significantly for short sessions (never mind result in questions about endianness for this "stream" cipher).

I discard the first 2048 bytes of the keystream (as per recommendations online, which is greater than the 256 byte recommendation from "Foundations of Security").

For verification, I set up a sha1 MAC with the presalt+key+postsalt, which is appended to the data stream and encrypted (so if the wrong password is provided, the hash is also garbage ;) ).


The results? ~400KB/second encode/decode speed on a modern machine in pure Python, over 1.2Mb/second on my 1.3 ghz celeron laptop with Psyco, and an algorithm that attempts to address all of the known attacks against RC4. Assuming your /dev/random doesn't suck (or Mersenne Twister initialized with the system timer gives you decent random data), this algorithm would seem to be pretty reasonable.

Now all I need to do is to finish the interface, get my passwords stored properly, implement this with a stream interface without the checksum (it currently does single-shot string encryption/decryption), and make all of the extra "improve security" features options for both the one-shot and stream versions (the latter of which will become the only version).

Update: Using Psyco 1.6 on my OSX laptop gets me 5.7 megs/second. Not bad.
link8 comments|post comment

PyPE [Mar. 3rd, 2008|09:26 am]
[Tags|]

At some point in the future, I presume I will have some free time. There are a few things I would like to do with PyPE so as to help support the tasks that I have been doing recently.


Among the changes that I'm planning on is to replace the hodge-podge of events/notifications with a unified architecture. Basically using wx.lib.pubsub (a publish/subscribe architecture) to handle event notifications and plugin callback processing.

I'm also looking to switch to a better scheduling system to handle little things like auto-refreshes (parsing/spell checking/etc.). Currently I use a wx.Timer() instance, but it seems as though over long periods of time, timers leak, and the Python process begins to take up more and more processor time. Because there are easy ways around wx.Timer() instances, I may as well exploit those.

I'm also looking to change the way I do widget layout. Some of you may be thinking, "finally, he's going to use XRC to define layout". No, I'm not. I'm going to be using an idea I had when Python 2.5's context managers were just being thought up; in-code hierarchy-based layout...

with Panel(name='toplevel') as p:
    with HorizontalSizer:
        add(Label('entry 1:'))
        add(SPACE)
        add(Text(name='entry'))


Only the plan is to do it without context managers (so it's Python 2.3+ compatible). Basically, you use all of the same stuff, but instead of using 'with', you use 'for'. The effect is that by using for p in Panel(name='toplevel'), p will leak into the surrounding scope. I'm not worried.

What's even better is that if I write the wrapper objects correctly, I get an automatic object hierarchy created (toplevel.entry in the above for the text box), as well as simple event bindings ( pub.subscribe('toplevel.entry.event', myfcn) ).


Some will say, but Josiah, that sounds *a lot* like MVC/MVP design! To that I say somewhat, only without the code ugly that typically follows MVC/MVP design. If I ever get around to actually making it happen, the code will actually be visually appealing.


Now all I need is a month of down time to make it happen.
linkpost comment

First programming languages [Apr. 11th, 2007|06:09 pm]
[Tags|, ]

For those in computer science, more or less everyone has their opinions on what programming languages to teach first, second, third, etc. I know I do (it's not Python, despite Python being my favorite). It's also not uncommon for people who know a particular language well to see it as better than other languages for a first language. This guy doesn't seem to be very different; he learned BASIC, then VBA, then Perl. And what does he recommend? Perl.

I disagree on many points; from his claim that being able to do one thing in a bazillion different ways is good for a first language, to his claim that a first language needs to be able to grow with the programmer, and more. A first programming language only needs to do a few things well:
1. help the programmer understand the basics of algorithmic computation
2. try not to get in the way of the programmer expressing algorithmic computation
3. get the programmer excited to program more (possibly in other languages), because they see the potential in what they are doing

In that sense, feel free to grade any and all languages according to the above three requirements as a first language. In my opinion, Perl obviously fails for the first 2, and whether it gets a passing grade on 3 depends on whether the student is able to learn the language to see the potential.

Of all the languages I've learned over the years, I still agree with MIT that Scheme is the proper first language. Yes, the syntax can be clunky, but it is consistent, and the standard Scheme interpreter will happily match parenthesis for you (with a bit of work, maybe one could even get it to colorize those parens and braces), and after you have learned it, not only do you understand the concept of "divide and conquer" (which many people struggle with in other languages for years), but many people in my class (quite a few first-time programmers) were excited to take the next in the series.

As for some later on programming language for learning algorithms, I fully agree with Professor Chu, Python should be used. Go ahead and read the article.
link7 comments|post comment

Better luck next time, eh? [Mar. 15th, 2007|06:28 pm]
[Tags|, , , ]

So...there is a problem that David and I have been working on and off on for the last year or two. Nothing huge, but it is interesting, and neither of us had seen any progress on the problem since 1997 (the paper was submitted in 1995, published in 1997).

On Tuesday I managed to solve the problem in the shower, and continued working on the paper that I had started over a year ago. I met with David this afternoon, gave him a description of the algorithm (because the ICS email server is being a bit wonky), and started gathering references.

In my search, I wandered upon a paper that seemed to have already solved the problem that was published in 2000. But it didn't. Then I checked one of the papers it referenced, which was published in 1996, a full year prior to when the 1995/1997 paper was published, though a year after it was submitted. It turns out *that* one solves the problem, almost exactly the way I came up with (there are some differences, namely that my version is simpler to understand and implement) - though they claim a dynamic solution that I don't believe works.


Disappointed, I started looking for prior examples of my LZ77 variant. I had spent a few days during Christmas looking around, but today I followed a link I hadn't before, and found a simplified version of the algorithm I had been using. Specifically, it was a tuned variant of what I called "Algorithm 2", and it manages to be competitive with Bzip2. A commercial version actually seems to beat Bzip2 and is sometimes even competitive with PAQ. From what I can tell, they don't seem to be bothering with "Algorithm 3", though I could be misreading the algorithm description they are offering. Further looking around leads me to a published paper from 1996 describing "Algorithm 2" in detail.

Well, I suppose I'll keep going at Algorithm 3, 5, and possibly even Algorithm 6.
link1 comment|post comment

email clients [Mar. 3rd, 2007|06:33 pm]
[Tags|]

The search for a good email client, like the search for just about any good piece of software, is fraught with "if only it worked like X instead". After having used my current email software (Becky! Internet Email) for the last 8 years, I'm feeling its limits. Most folders have at least 5,000 messages, some are even over 60,000 (my suspected spam folder, which I keep around just in case I missed a message, and haven't the time to go through it).

The problem I'm currently running into is that switching to a folder I haven't clicked on in a while results in a 5-10 second delay while the program loads up its index for that folder. It's understandable, the index file for my 9000 message "UCI" folder is 2 1/4 megs, which it needs to parse in order to construct the message listing. Never mind that whenever I do a search, I cannot do anything else with the software (single-threaded applications can be a pain).

I've considered switching to Thunderbird, but about two years ago I imported all of my email into Thunderbird because of issues that Becky! was having (the drive it was installed on ran out of space); it suffered the same lag, if not worse, and offered a horrible interface for adding automatic filtering (messages from or to person X go in folder Y).

I just checked out Evolution (inside Ubuntu on a VMWare virtual machine), but its filtering and virtual folder dialog is just as horrible as Thunderbird's, and being that I'm not going to be switching to Ubuntu as my main platform any time soon, the delays (and memory use) involved with using Evolution in a VM just aren't worth it.

Through the internets I hear that Opera's mail client is pretty good, and its support for virtual folders is quite convenient, though the fact that all folders are virtual irks me - it feels restricting. I am also disappointed that it embeds itself as part of the Opera web browser, and I suspect that it opens links in Opera, uck.


The ongoing search for an email client that doesn't suck continues. I have talked about writing one before, and while I have the experience in writing high-capacity mail backends, the purpose and features are not so subtly different to make a complete rewrite necessary. I've also considered a database-backed storage mechanism (SQLite specifically), but then I get to thinking about a database schema and want to cry. I would consider mbox or one of the other standard formats, but the size of the folders I have necessitates a secondary index, and if one has a secondary index anyways, one may as well store email messages in a format that isn't ambiguous (mbox can be). I'll probably end up using the SQLite database as a transactional data store, if I ever have the time to write an email client.
link13 comments|post comment

Progress update, part whatever [Jan. 25th, 2007|08:03 pm]
[Tags|, , ]

Since I last posted, I have been able to squeeze another 32K from the calgary corpus, even managing to get the GEO file down to 66396 bytes (without pre-transforming the data). With these changes, it puts me at 2.44 weighted bits per character, or 2.65 unweighted bits per character, which is just about in the middle of this listing.

I've also managed to translate a 0-order arithmetic coder from C to Python, and I don't believe it will be terribly difficult to add in k-order markov modeling (possibly even easier than it was for adaptive huffman); which should shave a couple bits here and there (and acutally be faster than the adaptive huffman implementation I'm currently using). We'll see how effective it is.

I'm putting the transformation ideas on hold for the time being in order to take the time to implement some of the other ideas I've had. Why? Because transformations and/or data modeling is something that I could possibly spend years on without producing anything worthwhile, whereas I'm fairly certain my other ideas have at least a leg to stand on (I keep getting improvements).

I'll probably post the 0-order arithmetic coder in the Python cookbook at some point, if only because I've not seen any practical arithmetic coders in Python available. I will say that the one that used the Rational number library and Python infinite precision integers was very cute (though even after heavy modifications, falls behind a "real" arithmetic coder in both speed and compression ratio).
linkpost comment

One interesting thing about doing compression research... [Jan. 22nd, 2007|01:43 pm]
[Tags|, , ]

... at least for me is that I have far more ideas to implement than time to implement (and test) them.

Say, for example, an idea I borrowed from databases research; if you know a column of data is integers, and you need to create an index of the integers, you can sort the integers, then represent the list of integers as 4 (or 8 for 64 bit integers) lists of each the least to most significant byte of the integers, and compress the 4 lists individually. Generally speaking, the data correlation of each of those lists will be high amongst itself, but low (in many cases) to the other lists.

An example is in the 'geo' file of the calgary corpus. It seems to be 25600 4-byte integers (total size 102400 bytes) in little-endian format. Standard lz77 compressors (like zlib) do relatively poorly, only getting it down to perhaps 68k. However, compression algorithms which reorder the content in some way do quite well; bzip2, for example, gets it down to about 57k. This particular example shows us that the BWT is a very interesting transform; it is able to manipulate very different kinds of input data in such a way that (when coupled with RLE, MTF, and Arithmetic coding) it can become *more* compressable after the transform than before. In fact, performing the BWT on the 'geo' file and running it through zlib allows zlib to do almost as well as bzip2 at 59k (it does worse with all other calgary corpus files).

Then again, if one is willing to perform the 'break it into 4 lists, compress them individually and concatenate', one ends up with fairly amazing ratios: bzip2 is able to get down to 53482 bytes, zlib is able to get down to 52436 bytes, and the latest version of my algorithm is able to get down to 50819 bytes. It's still a ways away from the PAQ family of compressors (which are able to get down to about 45k), but it is not bad - it may be one of the overall best lz77 compressors around; who knows?


I've implemented the above mechanism, which works quite well on certain inputs, but I'm in a situation where I have about a dozen possible invertible transforms that could be applied to input data in various orders, each of which could apply to different kinds of data in different situations. The question is "when do you apply them?" The PAQ family has more or less answered these questions; don't apply the transformations (except in certain situations), but have a bunch of different models that predict what byte is going to come next, and combine those predictions based on how accurate they were in the past. It actually works quite well, which is why the PAQ family is currently the record holder in various compression benchmarks.
linkpost comment

And yet another data compression post [Jan. 8th, 2007|10:50 pm]
[Tags|, , ]

Two nights ago I prototyped algorithm 5 (which is algorithm 2 + algorithm 3, and algorithm 4 was already taken), yesterday I debugged and tested, today I post. The ideas, generally speaking, panned out.

In my last entry, I stated that the version of zlib I was using compressed the calgary corpus to 1,025,183 bytes, with algorithm 2 compressing it to 1,024,802 bytes. Algorithm 5 manages to do significantly better at 998,766 bytes, a savings of 26,417 bytes over zlib.

There are some inputs for which zlib does better, sometimes significantly, but on larger files, my algorithm tends to win. Here is some samples of the particular calgary corpus files, their original size, the size of the compressed representation that my algorithm, zlib, and bzip2 produce (I usually run my algorithm on the smallest stuff first, just so I can see relevant output faster)

name   size      alg5    zlib    bzip2
----------------------------------------
obj1   21504  -> 11881   10317   10787
progc  39611  -> 14506   13343   12544
progp  49379  -> 11654   11228   10710
paper1 53161  -> 19501   18558   16558
progl  71646  -> 16809   16255   15579
paper2 82199  -> 30290   29760   25041
trans  93695  -> 19227   19045   17899
geo    102400 -> 77655   68433   56921
bib    111261 -> 34500   35228   27467
obj2   246814 -> 87453   81505   76441
news   377109 -> 136305  144800  118600
pic    513216 -> 58509   56465   49759
book2  610856 -> 190542  206664  157443
book1  768771 -> 289904  313582  232598


For larger natural language texts, my algorithm does pretty well, the overall ~26k improvement coming primarily from the 23.5k improvement on book1 alone, with book2's improvements overcompensating for the mild (except for geo) losses in the other examples.

I had started trying out some other inputs, like the Canterbury corpus, some artifical inputs, and some "large" inputs, all made available with the Canterbury corpus. It only got a little ways in when I stopped it to muck about with some other tunings (which didn't work out), but here are some sample results from the other corpuses, corpi?

name                  size      alg5    zlib    bzip2
--------------------------------------------------------
artificl\a.txt        1      -> 3       9       37
cantrbry\grammar.lsp  3721   -> 1349    1222    1283
cantrbry\xargs.1      4227   -> 1927    1736    1762
cantrbry\fields.c     11150  -> 3356    3122    3039
cantrbry\cp.html      24603  -> 8832    7961    7624
cantrbry\sum          38240  -> 14891   12990   12909
artificl\aaa.txt      100000 -> 5       121     47
artificl\alphabet.txt 100000 -> 33      290     131
artificl\random.txt   100000 -> 91321   75735   75684
cantrbry\asyoulik.txt 125179 -> 48731   48897   39569
cantrbry\alice29.txt  152089 -> 52814   54404   43202
cantrbry\lcet10.txt   426754 -> 130877  144904  107706
(updated)
cantrbry\plrabn12.txt 481861 -> 181116  195261  145577
cantrbry\ptt5         513216 -> 58509   56465   49759
cantrbry\kennedy.xls 1029744 -> 228718  203992  130280
large\world192.txt   2473400 -> 570187  724970  489583
large\bible.txt      4047392 -> 997609  1191885 845623
large\E.coli         4638690 -> 1451107 1341990 1251004


Again, for larger natural language texts, my algorithm does pretty well against zlib, and even manages to do pretty well on the most compressable of inputs (aaa.txt and alphabet.txt), due to some implementation quirks.

I'm curious to see how well it does on the bible (included in the "large" inputs), but it probably won't get to that for a little while longer.

Updated:
I added a few more results, the most interesting being a portion of the 1992 edition of the CIA world Factbook (large\world192.txt) where my algorithm is able to improve upon zlib by almost 150,000 bytes, and a portion of the bible also beating zlib by just under 195,000 bytes. It still generally loses to bzip2, sometimes by a large margin, but that isn't terribly surprising.

I still have some ideas, which is good, as so far at least a few of my ideas seem to offer improvements.

Hopefully I'll eventually be able to win on average bits per character...
bpci = 8*compressedsizei/inputsizei
avg_bpc = SUM(bpci/n)
linkpost comment

By a hair... [Dec. 28th, 2006|01:49 pm]
[Tags|, , ]

Using the zlib supplied with Python 2.3.3 for Windows (the Python I have installed on my laptop), the total size of the calgary corpus compressed with zlib is 1,025,183 bytes. The total size using the algorithm I have developed is 1,024,802 bytes. A savings of 381 bytes. It took a week and 2 days, but I did it (please understand that while this is a LZ77 family compressor, like zlib, it uses a different semantic, and lacks the long-term analysis and tweaking present in the 14+ year old DEFLATE method).

I've got some other ideas, but in the short term, I should spend some time writing a paper that describes the intuition behind my current method, polishing up another paper, and continue expanding two other papers for journals.
link1 comment|post comment

Part X in a Y part series of my data compression posts. [Dec. 25th, 2006|11:23 pm]
[Tags|, , ]
[Current Location |Phoenix, AZ]

Tweaking things a bit, I've gotten it down to 33.3%. Still need to add the "the next X characters are plain", which has been giving me troubles (I haven't worked on it in a day or so, Christmas and all). Can I beat zlib in a week on the Calgary Corpus? Maybe a week and a couple days. Right now I'm running into some hard to track down bugs that are preventing me from adding the features I described in the previous post. Simulating them suggests that there could be significant savings, but until I can work out the bugs, I'm not going to make any solid claims.

I also have an idea for merging algorithm 3 with algorithm 2, which may produce something better than either of them, but that will come after I fix the bugs with the algorithm 2 variant I'm dealing with now.
linkpost comment

Another compression update [Dec. 23rd, 2006|08:05 am]
[Tags|, , ]

I had tried the idea that I have alluded to in my previous two posts (I'll call it algorithm 3), and unfortunately, it didn't seem to work out all that well, doing worse than what I already had. I went back to my original idea, tuned it a bit more, and because I messed up in a few places, took the time to actually start producing a binary format (I had previously been producing huffman encoding sizes based on ceil(-log_2(prob)) zero-order entropy calculations, which is a known over-estimation), and managed to eek out 34.5%. I'll call this algorithm 2. zlib is currently sticking its tongue out at me. Can I beat it in a week's worth of work? We'll see.

Where to go from here?

I'll revisit algorithm 3 and add the binary format production to it, that should get it to where algorithm 2 was a couple days ago.

I need to add some symbols to the encoding format: 1) don't process, just store the next X bytes, 2) reset your state. The first preventing LZ overhead during ranges of data that are not compressible, the second to say "history does not predict the future", both of which are used to significant success in other compression packages.

Anyways, time to get ready for Phoenix.
linkpost comment

Update on compressor [Dec. 20th, 2006|04:52 pm]
[Tags|, , ]

Last night I was looking at 40.7% (lower is better). Today I am looking at 38.2% after a small change in semantics (not the change I was previously considering). Switching to a (very naive) huffman encoder for offsets/lengths gets me down to 35.6%. Nicely within spitting distance of zlib's 32.6% .

The next step is trying what I had been considering last night. I think that could result in the most significant savings, and with todays advancements, may even be competitive against zlib or bzip2.
link4 comments|post comment

[Dec. 19th, 2006|10:02 pm]
[Tags|, , ]

Last night I had an idea for an LZ77 style compressor*. While doing various chores today, I managed to hack together a compressor with a handful of tunable parameters, and did a bit of research to see if my idea had been tried before. I've not yet found an algorithm that does anything like it in any family of compressors, but I've only read short descriptions of perhaps 20 or so algorithms.

The original variant I wrote was able to compress the calgary corpus to 52.2%. Not terribly good, but not bad considering I was using a worst-case approach to encoding symbols. For comparison, a naive LZW compressor gets around 58.6%, a "pure" LZ77 algorithm without tuning gets 67.5%, raw zlib (WinZip, pkzip, gzip) are able to get 32.6%, bzip2 CL1 gets 29.1%, and bzip2 CL9 gets 26.4%).

While I was looking around, I managed to happen upon the LZRW-* group of compressors, most of which derive from LZ77. While Dr. Ross' algorithms didn't offer methods that were similar to what I had been doing, he did offer a couple ideas or two that pushed me towards some experiments that led me to what I currently have. And what is that? As of right now, I've managed to get it to 40.7%, with it being competitive with or beating LZRW-4, and generally beating unix compress (which uses a heavily tuned LZW compressor). Not bad for a few hours work.

Where to go from here? There are various methods I can use to tune symbol encodings, and I believe that I may be able to remove around 10-20% of the output, which could get me down to 32-36%. I'm planning on checking on the idea tonight or tomorrow.

How fast is it? Compressing is significantly slower than decompressing, as is generally the case for LZ77 compressors, running around 17k/second using slow algorithms in Python, decompression running close to 200k/second. Or really, barely competitive against an 8mhz Macintosh SE running assembly or C. :D I haven't bothered to attempt to speed it up using any fast dictionary search techniques, but this is so that it stays easy to describe, understand, tune (for size), and/or fix. Once I get it compressing better, I'll see about making it run faster.


* LZ77 style compressors generally take a stream of bytes and consider an output window and input buffer. During the course of execution, they generally either produce an (offset, length) pair for when some amount of data in the input buffer matches something in the output window, or they produce a sequence of uncompressed bytes. These are called "copy" and "store" respectively.
linkpost comment

OpenSolaris 10 [Nov. 1st, 2006|05:11 pm]
[Tags|, ]

For the past while, I've been reading about ZFS in OpenSolaris 10. Generally speaking, it makes creating and manipulating filesystems involving many disks in various RAID configurations very easy. How easy? This link has a sample of creating a file system with mirrored and striped disks.

Well, I don't have a spare computer to run this kind of thing, nor do I have the time or money to build/buy a machine to run OpenSolaris 10. What to do?

I've recently switched from using Parallels Workstation to using VMware Server (for the VMware tools for linux for faster video performance, which I use to test PyPE), and VMware Server has support for a Solaris guest operating system. Even better, VMware offers raw disk access to drives, which would allow me to run Solaris inside VMware, inside Windows, and get all of the goodness that is ZFS without dealing with another computer, etc. Toss in Samba/CIFS mounting, and I've got a complete storage solution.

It's just finishing up the installation, and aside from the huge download (around 3gigs), and a really awful prerequisite check (if you want package A that requires package B, it will tell you that it requires it, but will neither automatically add it (good) nor offer to add it (bad)), the installation wasn't that bad.

Now I just need a bunch of big disks.


On a completely unrelated note, WinTabber is quite convenient, even if it did crash one of my SSH clients once. It has reduced by 10 the number of windows I have sitting in my taskbar.

Update:
The network speed between the Solaris on VM and my local machine over sftp is horrible; 30k/second. Even worse than its speed is that it randomly hangs. Http, ftp, etc., all suffer the same (crap) speed. Adding users is a pain (still haven't gotten it to work), configuring everything is a pain, and even when comparing it to Ubuntu without VMware tools, it's awfully slow. Checking df tells me that it's not using any swap, so I don't know. Maybe someone else has some ideas, but for right now, I'm blowing away the Solaris virtual machine and calling the time I spent trying to get it to work a learning experience.

Ah well, I don't have the big disks to use anyways.
link2 comments|post comment

fractals [Oct. 26th, 2006|11:37 pm]
[Tags|, ]

I've been mulling over this particular blog entry for a while now, and every time "Mandlebrot Set" by Jonathan Coulton comes up on the mp3 player in the car, I think about writing...so I will.

For a while, I've been searching for a good fractal viewer (not actively mind you, I've got things to do). Be it for Mandlebrot, Juliet, etc., any would be fine. My problem? Every one I've managed to get my hands on limits its precision and the fractals break down after only a minute or two of zooming. At first becoming blurry, then going single-color. Ruins the experience of zooming into a fractal, especially when the claim is that the patterns repeat forever. Lies I say! Well, they are only lies when one is limited by numeric precision.

At some point I suppose I'll have to break down and implement everything in terms of some rational number class (paired up for imaginary representation), and deal with the fact that as one zooms in, processing time will increase. But not today.
linkpost comment

Parsers [Jul. 24th, 2006|11:11 pm]
[Tags|]

For a while I've been looking for a C/C++ parser for Python so that PyPE could gain a source tree and various other tools that come along with having a parser for the language. While casually browsing the net, I happened upon Synopsis, which includes support for both C/C++ and Python. In my attempts to spelunk through their documentation, I've come to the conclusion that their documentation sucks. I actually learned about as much as I needed to know from a single post in their mail archives from last week available here, then I probably could have hoped to gain from reading their documentation and trying out their (broken) sample.

A bit of spelunking around by parsing a bit of Python suggests that Synopsis is significantly faster than my attempts to spelunk through the Python tokenizer output (though Synopsis can't seem to get line information correct), and though I'm not terribly enamoured with its interface, it would probably be an upgrade. The only thing that is causing me pause is that the parsers would run around 15 megs uncompressed, 5 megs compressed, which would double the Windows distribution size.

Eh. So close, but so far.

Update:
I can pull out the Python parsing stuff, as it uses the standard Python parser guts. The C/C++ stuff...still 15 megs uncompressed. Perhaps it could be a user-installed optional component?

Update 2:
The other day I ended up hacking together a 95% solution using a few regular expressions and some post-processing. It works for everything I've thrown at it as of yet. It'll be in the next release.
link5 comments|post comment

Argh. [Jul. 24th, 2006|04:02 am]
[Tags|]

Couldn't sleep, so I got up to release version 2.6 of PyPE, the culmination of focusing way too much on non-work-related work.

After releasing, I started mucking about with Scintilla indicators, under the belief that if I could get indicators working properly, I may just be able to do that crazy remote group programming thing that SubErthaEdit does. As I was hacking away, using the newly-released macro functionality to quickly test out different approaches to highlighting specific blocks of text, and discovering that you lose any indicators the moment that an 'indicated' block of text is styled by the standard lexer (talk about a PITA, a reasonable assumption would be that stc.StartStyling(posn, mask) would tell the lexer what bits it can use...as it is documented), but I digress... PyPE crashed.

I had been running two copies, one to edit the current version of PyPE, and the other to actually experiment with. The one I was using to edit the current version crashed. "The instruction at ... referenced memory at "0x00000004". The memory could not be "read"."

Then the being experimented with PyPE crashed with the same error (but different instruction). This does not bode well. Running with Python 2.4 works fine, except that it crashes on shutdown. 3 reboots, a few Python DLL wipes, a reinstall of Python 2.3, 2.4, and both versions of wxPython for each Python version, and I'm not sure that I have the problem licked.

What's the problem? Well, aside from a seeming change in semantics between wxPython 2.6.3 and 2.6.3.3 (isinstance(item, (wx.MenuItem, wx.MenuItemPtr)) no longer returns true in some cases, because for some reason, multiple wx imports import different copies of itself...), but both Python 2.3 and Python 2.4 crash on exit. How they crash on exit is arbitrary. Sometimes it is in pythonxy.dll, _core_.pyd, _sre.pyd, etc. Basically every DLL that could be loaded has crashed on exit. This suggests, to me, that in mucking about with scintilla, I may have accidentally changed some base OS runtime DLL, which seems to fail only when Python and wxPython are run together.


While I have been mucking about, my virus scanner has been doing its job and scanning files. It has found that a few files have been changed, but I'm unable to find the md5s anywhere on the net. It's 5AM, I'm tired, and Python crashes on exit. Maybe I'll figure it out tomorrow when I've gotten some sleep.

still not fixed )

fixed! )

explanation for why both PyPEs crashed )
linkpost comment

A seemingly novel method of encrypting information [Jul. 16th, 2006|06:19 pm]
[Tags|, ]

There are many different methods for encryping/decrypting information, some better than others, some using number theory, others using interesting 'bit twiddling' methods. The method I'm going to describe in the cut is what I would consider a novel method of 'bit twiddling', which may or may not be secure. I suspect it is secure, but I lack the tools (education, software, etc.) to prove as much. It does seem generally resistant to statistical attacks (in as much as I have tested*), but I'm not a cryptographer.

Generally speaking, this particular 'bit twiddling' cypher constructs a mapping of input to output symbols by generating a permutation of the output symbols and choosing the symbol at the current input symbol offset. The interesting bit about this permutation generation is that it is reliant on the somewhat-random behavior of 'good' hash functions.

Click here to read the details )
link5 comments|post comment

Josiah's 10 minute wxPython tutorial... [Mar. 22nd, 2006|09:30 am]
[Tags|]

(or at least it will take me 10 minutes to give; this is unrelated to recent livejournal goings-on)

1. Download and install Python.
1.a. If you don't have experience with Python and classes in Python, go through the Python tutorial http://docs.python.org/tut/tut.html .
2. Download and install wxPython.
3. Download and install the wxPython demo (available at http://www.wxpython.org/download.php , search for 'demo').
4. Go through every single demo. Not once, not twice, but at least until you have an idea of what building blocks you can start with in wxPython.
5. Look through the how to learn wxPython wiki page.
6. Look through the general wxPython wiki.
7. If you have a question, go through the demo again.
8. If you still have a question, check the wxPython documentation (while not all methods or classes made the conversion at 100%, they are at least 95%).
9. If you really still have a question, check the wiki (via google site:wiki.wxpython.org ).
10. If your question is about threads:
10.a. Don't try to use multiple threads to control the GUI. wxPython doesn't like that. One GUI thread.
10.b. Use events to post updates to the GUI thread (see Processes and Events -> Threads in the demo).
10.c. Use queue(s) to push events from the GUI thread to the worker thread(s) (import Queue).
11. If your question hasn't been answered by steps 7-9, ask on the wxpython-users mailing list (available via http://www.wxpython.org/maillist.php ).

Update: Added wiki references, also good resources.
link2 comments|post comment

navigation
[ viewing | most recent entries ]
[ go | earlier ]