Software Projects TODO

This is a list of software projects that I would like to do, but probably won't get to any time soon. I write things here so that I don't forget about them. I make this public so that others interested in similar projects can stumble upon it, have another interested party to chat with, benefit from any resources and references I may have already collected, and if you want to leave contact info or a link to your own project page, hook up with others who find this page later.

This is no longer a wiki because spam traffic overwhelmed legitimate traffic and MediaWiki's anti-spam features don't scale down very well. Instead, you can update this page by dropping me a note here:


ACL3: More CL syntax for ACL2

"A CL Layer for ACL2". Add apply, mapcar, etc. to ACL2. Take this augmented language (ACL3) as input and emit plain ol' ACL2 by re-writing into the less human-friendly but equivalently expressive forms required by ACL2.

Android: Larger unlock pattern grid

3x3 is not large enough for security enthusiasts. For fun, it could prompt for a normal 3x3, and then a larger one.

Update: Cyanogen mod allows this

Audacity: Paint in the spectrogram

In the spirit of Sound Mural, add some simple paint controls to Audacity's spectrogram view. Most useful: erase and copy/paste. In the Audacity feature request tracker.

Ballbot: Smooth motion with neural net

Add a neural net to ballbot to smooth out the motion.

Manually-coded physics software provides the basic movement plan. A neural net modifies the plan to minimize jerkiness.

Then dump the result of training the neural net (ask it about every point in its input space) to see what it learned. Use this insight to write a better manually-coded basic movement planner.

Bash: heredocs should be seekable

Heredocs ( << ) and herestrings ( <<< ) should be seekable.

Bit display

Patch everything to display sizes in bits instead of bytes.

The byte is getting more and more arbitrary. No buses are eight bits wide anymore. Even characters aren't 8 (or 7) bits anymore.

Really, why 8? Why tell the user that they have 128 octets instead of 1 kbit? Is it the same reason that we used to use scores?

Maybe make this a locale thing. Group bits by 2LC_BIT: 0 to show bits, 1 to show 'quads' (for Trekkies), 2 to show nibbles, 3 to show bytes/octets, and 4-6 to show 'words' depending upon your hardware. Or follow the Intel convention and call 25 bits a 'doubleword', 26 a 'quadword', and 27 a 'double quadword' (ugh).

Boot time reduction via disk layout

GNU boots way too slowly. Hypothesis: most of the delay is disk seeks. Disk seeks are slow.

  1. Use bootchart or similar to determine what files are accessed during boot.
  2. Use the ext4 defragmenter or similar to pack those files in one contiguous lump.
  3. Add a kernel hack to load that lump into the disk cache in full-size reads just before passing control to init.

The only disk activity after that should be writing to log files and whatnot. If this becomes the bottleneck, experiment with more write caching.

Update: The e4rat project is working on this.

Update: sReadahead does #1 and #3, skipping #2 because boot drives are largely SSDs these days.

Block puzzle generator

Write a program to automatically generate 3d block-grid wood puzzles. Search through the space of possible decompositions for one that meets constraints. Algorithm sketch:

frontier_insert(frontier, candidate) {
  if seen(candidate):
  foreach extraction_plan:
    if candidate.current_piece can be removed with this extraction plan:

search(input_model, min_size, max_size) {
  frontier = [ { pieces = [], current_piece = nil, unallocated_blocks = num_input_blocks } ]
  candidate = frontier.pop()
  if candidate.current_piece && candidate.current_piece.size < max_size:
    foreach grow_point in [candidate.current_piece.head, candidate.current_piece.tail]:
      foreach block adjacent to grow_point new_block:
        frontier_insert(frontier, candidate.extend_current_piece(new_block))
  if candidate.current_piece == nil || candidate.current_piece.size > min_size:
    free_blocks = input_model
    foreach piece in candidate.pieces:
      foreach block in piece:
        free_blocks -= block
    foreach block in free_blocks:
      frontier_insert(frontier, candidate.extend_as_new_piece(block))

An extraction plan is a list of (direction, distance) tuples. Distance in the final tuple is ∞.

Bulk torrent seeder

Torrent seeding software for library hosts. A library host has much more data than bandwidth -- think, non-latest versions of GNU distros, etc. Maintaining active connections to very large numbers of torrents incurs significant overhead. A bulk torrent seeding program would connect to each torrent briefly, note the time and the peers that are waiting, and disconnect. It would remain connected to and seed only a few least well-served torrents on some metric -- longest waiting peer, slowest transfer speed, longest ETA, blocks with the most peers waiting for them, etc. The goal would be to let the bittorrent protocol shine on popular torrents (not participate in them) and to catch torrents/peers that have fallen through the cracks -- to be Databringer.


Make a C → lisp compiler.

CellWriter: Integrate PrimWriter

CellWriter has good character recognition, but you have to write in those tiny, constricting cells. PrimWriter lets you scrawl wherever you like, but the character recognition isn't as good nor as well integrated.

Add PrimWriter's input widget to CellWriter. Cut out each character and put it into a PrimWriter cell.

Bonus: Incorrectly guessed characters can be corrected in the CellWriter cells.

Chaos detector

Simulate a double-inverted pendulum, Furuta pendulum, etc. Like this

Maintain error margins. See how fast the error margins overwhelm the simulation.

Use large-precision arithmetic. Visually display how many bits are known.

Chat: Reliable chat

Problem: IRC is unreliable. Messages are sometimes not delivered. Network interruptions that affect a single server cause a service disruption.

Proposed solution: A different network topology with redundant paths:

Every message is sent to every server and received from every server. Only one server need be operating for full functionality of the network. Additional servers provide reliability.

Update: I made a prototype.

Peer-to-peer chat

The method of reliable chat but without servers. Or, equivalently, each client is also a server. Web browsers can participate as servers if the protocol is WebSocket-based.

Chat: Reliable encrypted chat

The method of reliable chat above, plus encryption.

Every message is a tuple of (message cyphertext, map of peer public key hash → message key encrypted with peer's public key). A new randomly-generated message key is used for every message.

Message keys are encrypted with each peer's public key. Upon seeing a new peer, clients exchange public keys and record them in persistent storage associated with the peer's nickname (and also associate unclaimed nicks with the known key upon witnessing nick changes). Nicknames are thus by default locked to their first user. Clients should display nicks using non-first keys as "<nick (impostor!)> ..." or similar. Because all this state is retained separately in each client, different clients may have different opinions of which key is the impostor for a given nickname. Users are encouraged to not share nicknames to reduce confusion. Users are also encouraged to speak up when their client is proclaiming impersonation to allow the discrepancy to be resolved socially. Clients should provide a way for users to re-associate nicknames to new keys with appropriately stern messaging and confirmation.

Used with a 128-bit block cipher such as AES, this method imposes a typical overhead of 8 bytes to pad messages out to the block size plus 24-40 bytes (8 bytes for the public key hash + 16-32 bytes for the encrypted message key) per recipient.

Clients should provide a way to export and import the private key so that a user can present a consistent identity from multiple clients. Users may also use different nicks from different clients (eg: joe-irssi, joe-firefox, joe-chrome, joe-at-work) to avoid the hassle of key management.

Update: Multi-party Off-the-Record Messaging has way more attention to detail than the above rambling. Just do that, with reliable delivery.

Chat: multi-line input buffer

Irssi / silc-client / ERC should have a multi-line input buffer. Scrolling back and forth on one line while composing a message is annoying.

Maybe add readline to irssi?

Chat: Join/part summaries

IRC clients offer two levels of verbosity for join/part messages: all or none. This sucks. I want summary lines:

*** You are now chatting in #channel
*** users: +Alice +Sam
<Alice> Hi Bob
*** users: +Fred -Lisa -Joe
<Bob> Hi Alice
*** users: +Joe

This requires the capability to edit lines that have already been displayed on the screen. Joins/parts cannot be queued until someone speaks -- you need to know who's in the channel when you're about to speak. I'm not sure which clients' plugin APIs support this kind of history editing.

Chat: s/// substitutions

On IRC, people use ed s/// syntax to indicate corrections for typos, etc. It would be neat for the client to actually interpret these and apply them.

Update: I hear that hipchat supports this.

Chat: spellcheck

Pidgin provides spellcheck in the input buffer. Irssi / silc-client / ERC should too.

Chrome extension source verifier

Many chrome extensions have links to github or otherwise purport to contain publicly accessible software. But looking inside these extensions before installing them is tedious.

A service could fetch these and automatically compare the contents of the extension to the contents of the public repository. It could then report on the veracity of these extensions with green badges or something.

Commentroversy: update on vote

Commentroversy should update its counts by one when you vote.

Date: strptime

BSD date provides a command-line interface to strptime(). GNU date doesn't. It should.

DeHasp: Haskell to Hasp

Haskell in, Hasp out.

Ali, creator of Hasp, writes:

A first step towards this might be to create a script which takes the Haskell code and simply adds (:hs ) around it, ie.

putStr "foo"

would give

(:hs "putStr \"foo\"")

In fact there is very little that could be directly changed from Haskell to Hasp, since Hasp code assumes -XNoMonomorphismRestriction, whereas normal Haskell code might not. Additionally a few Haskell forms do not have equivalents in Hasp, such as record syntax and the default form, so these would need to be wrapped in an :hs form.

That's not to say you can't automate the conversion of Haskell to Hasp, but you would probably want to check over it yourself.

Dictionaryokie editor

Hack together a quick little gui to place words in a stream on top of a midi track. Key idea: stretch each box to the appropriate size to match the music. The program then applies Sound Touch to stretch the audio clip to the proportional size of the word box. Automatically fetch word audio clips from the web, of course.

Maybe write this on top of Audacity?

Disable screensaver when cleared quickly

When the screensaver kicks on but is cleared within 10 seconds, prompt: "Disable screensaver for 2 hours? [Yes] [No]". This solves the "I forgot to turn off my screensaver before starting this presentation/movie/etc." problem.

Distributed Email

Store documents by running them through Staple, breaking them into pieces, and sending each piece to a different host in a different locality / legal jurisdiction. This prevents one corrupt national government from being able to illegitimately subpoena or otherwise seize these documents. Email is just storing documents for someone else to retrieve. Use Reed-Solomon to allow for some of the repositories to be down (sporadically or maliciously), but not with so much redundancy that small collusion of repositories could reconstruct your data without your permission.

This is orthogonal to encryption. Encrypted documents stored in this system would not even be available to the hosts for cryptanalysis.

Down to Size

A simple RTS game. More complex than the minimalistic Liquid War. About on the same order of complexity as Globulation (less complex buildings, more varied units).

Many RTS games have complicated units. They have health. They have a type, which often dictates size, shape, weaponry, armor, speed, mobility, sight distance, etc. Many games' units also have experience points, training, or component-specific damage. All this adds up to a gigantic unit struct.

Question: What's the minimum in per-unit properties that an RTS can have and still be fun to play?

Proposed answer: Health!

Have the unit's health dictate all its other properties. Most notably, size. When a unit takes damage, it shrinks. Isn't that cute?

Bigger units can deal more damage and take more abuse. Bigger units move slower. Bigger units can see farther. Bigger units strike less frequently, but do much more damage per strike. Sufficiently large units get ranged-attack capability ... because they're big enough to pick up smaller units and throw them. In the heat of battle, prefer your enemies' units for this purpose. =)

All units are cubes. This makes them really, really fast to draw (good for hand-held ports). It also allows for some interesting mechanics. Eight cubes can meld together to make a double-size cube. This is the equivalent of the more units vs. high technology trade-off in other RTS games: N units, N/8 big units, or N/64 really big units? Note how there's no end to this progression -- games can get arbitrarily large-scale (or arbitrarily small-scale).

Unit production facilities are 11 cubes melded together in a U shape. The size of the cubes that composes the facility dictate the size of the units it produces. This is the same tech trade-off: invest 11 units in a factory, or 88 units to make a factory of big units. Bigger factories lay down material at a faster rate, but churn out units at a slower rate. But since they're bigger units, bigger factories produce more firepower/second.

Resource model: Any unit can carry an arm-load of resources (larger units carry more, of course). Unit factories build units from resources within arms' reach. Other units go out, find resources, and carry them back to factories. Permanent deposits (TA metal patches, DK gems), large deposits (Warcraft gold mines), regenerating deposits (C&C tiberium), and randomly-appearing deposits (SimAnt) are all available in the map editor.

Controls: Select groups of units by dragging rectangles. A "stack" button causes them to organize themselves into groups of eight and up-size. Similarly, a "become factories" button. If the player cares much about factory placement, they can do it one factory at a time. Any unit without standing orders to move or hold position goes resource gathering.

Dynamic Temperament

In even temperament, mean-tone temperament, etc., you must assign tones to all the notes and then play all the music. In dynamic temperament, the tone assigned to a note is determined by what other notes are to be played concurrently, just before, and just after. For much music, I suspect all intervals could be pure.

Composition with dynamic temperment in mind is another layer of fun. An new kind of endlessly rising canon is possible where a carefully chosen chord progression causes the tone assigned to a note to shift an entire half-step on each round. Notationally and for the performer it is merely repetition, but the sound the instrument produces changes with each round.

Update: Justonic sort of does this, but without hysteresis, so you can't push the tuning around over time with playful composition.

ERC: silc

Add silc support to ERC.

Fancy clock

Make a really fancy clock. Inspiration. Maybe as a screensaver.

Filesystem recursive usage metadata

Add to every directory inode a field for its size including all children.

Hack up kdirstat to use this metadata.

Problem: hard links.

Workaround: Keep a map of filenames to inodes with link count > 1?

Firefox: Detect useless 'parked' domains

Detect 'parked' domains, 'landing' pages, etc. Display a browser error message similar to the 'Address Not Found' DNS failure message. There's no actual content there, so don't present something that looks like a web page to the user.

There are only a few big vendors of these things, so a little detection effort can go a long way.

Firefox: Plugin bundles

Allow Firefox to install multiple plugins from one file.

Firefox: redirect loops and cookies

When Firefox (3.0.x) gets stuck in a redirect loop, it tells the user "Redirection limit for this URL exceeded. Unable to load the requested page. This may be caused by cookies that are blocked."

Firefox should usually know if cookies are being blocked or not. If Firefox itself is blocking cookies, this is an appropriate message. If Firefox isn't blocking any cookies, then it shouldn't introduce cookie-blocking as a suggested cause in this error notification because other causes (like the server just being broken) are much more likely. Yes, there could be a proxy (or transparent proxy) blocking cookies, but is so common that it needs to be included in end-user error messaging for redirect loops?

Flexible transactions

The SQL transaction paradigm has some concurrency problems. For example, say you want to put some URLs in a normalized database:

Two tables: Page and Domain.  The Page table has rows domain_id and path.  The Domain table has rows id and domain_name.  An arrow conects Page.domain_id to

domain_id path
1 /
1 /links.html
2 /
id domain_name

This would represent the URLs:

The obvious method of adding a URL to this database is

domain_id := SELECT id FROM domain WHERE domain_name = :domain_name;
if (! domain_id) {
  domain_id := generate an id;
  INSERT INTO domain VALUES (:domain_id, :domain_name);

INSERT INTO path VALUES (:domain_id, :path);

There is a race condition here. If two threads try to insert a URL with the same domain, they can both determine that the domain needs to be added to the domain table and generate different ids for it. The second insert (or the second commit) will fail with a unique key constraint violation on the domain_name column.

In essence, the application has to be involved in rolling back and retrying the transaction because the domain_id leaked into the application. The database has a fighting chance of retrying the transaction on its own without application involvement if the application does it like this:

 INSERT INTO domain (domain_name) VALUES (:domain_name) ON DUPLICATE KEY DO NOTHING;
 INSERT INTO path (domain_id, path)
   SELECT id, :path FROM domain WHERE domain.domain_name = :domain_name;

Here, the database somehow generates the and the application never sees it. If two transactions both try to insert the same domain, the database is free to use the same id for both transactions and commit the row if either transaction commits. However, even if the application is written in this later way, the behavior of all(?) existing database implementations degenerates to the former. They behave as if the results of the first INSERT have to be concrete before execution can continue. Really, things only have to be concrete at commit. Other transactions in flight should remain as vague as possible until commit in order to maximize their chance of being applied successfully. If the database supports this, then applications can use the later, more vague SQL syntax to indicate which details they don't care about -- which details the database is free to continue to change as other transactions post. In order to implement this, the database must keep track of what information it has passed back to the application and what information it has not, and the application must be scrupulous about not accessing irrelevant information.

Fmod-compatible API

Software is being developed against FMOD. FMOD is not Free Software. There is even some otherwise Free software encumbered by a dependency on FMOD (eg: TA3D).

Like Harmony, Lesstif, Classpath, Mono, Wine, ReactOS, etc., it would be nice to have a Free implementation of this API.

GDB script: Attach Apache, replace content

As a form of documentation in practical GDB usage, make a script that attaches to Apache and replaces the document-body of the next request with a given string. This would also handy for debugging in web development.

Remove gets()

gets() should never be used. Remove it from the C library, recompile everything, & see what breaks. Submit patches upstream.

glsa-check --fix all should support --ask

glsa-check runs each update individually. Instead, it should determine all the work that needs to be done in one pass and then do it in a second pass. This would allow the administrator to see the scope of proposed changes before committing to the update.

Gnome password prompt and login screen: Add date/time

All computers know what time it is. Sometimes, that's all you want from one.

You shouldn't have to log in to get the current time.

You should be able to get the current time even when the machine is locked by another user.

All auth screens should display the current time. Start with gdm and the gnome-screensaver prompt.

Haskell with tactics

I want a clean Haskell with no UnsafePerformIO, but I also want a Haskell in which it is possible to express some of the semi-legitimate uses of UnsafePerformIO. For example, memoization. Classically, memoization is implemented with UnsafePerformIO on a shared memoization table to avoid the dilemma of either 1) having many copies of the memoization table (risking defeating its purpose), or 2) introducing monads everywhere in order to pass the table around (risking unnecessary serialization).

So, how might one permit the semi-legitimate uses of UnsafePerformIO without leaving a gaping hole in language safety? Well, Isabelle and other proof systems allow the user to direct the proof without leaving a gaping hole in proof soundness through the use of tactics. Tactics are advice to the proof engine that it can take or leave. They needn't be correct. They needn't even be helpful. They can even be malicious, and the only consequence is that the exploration for the proof takes a little longer. Contrast this with malicious use of UnsafePerformIO, which can be arbitrarily destructive to language safety.

So, consider a Haskell in which you must at first write everything 100% pure, but then you can add alternate implementations for certain segments of the program that use something akin to UnsafePerformIO that are more performant or whatever. Then, only in cases where you can prove to the compiler that the alternate implementation will have the same result as the pure implementation, the compiler can substitute in the syntactically unsafe code, because it has been shown to be semantically safe.

At first, proving equivalence sounds like an onerous burden on the programmer. However, there's an out: the proof can leak. The proof can use all kinds of pre-conditions (assumptions, premises) that make expressing the proof simple (perhaps even trivial), and the compiler can insert runtime checks for these and fall back to the 100% pure implementation in cases where they don't hold. This lets the programmer focus their efforts and not end up more entangled in the formalism than they want to be. Just as in other languages you can drop down to assembly for the performance of just one innermost loop (and have a fallback implementation in the high-level language for other architectures), you should be able to drop down to unsafe IO syntax safely for one small piece of the program (or even some subset of inputs to that small piece of the program) with automatic fallback to the pure safe implementation.

Going back to the memoization example, one could implement it in two parts:

  1. Specify an alternate implementation that replaces the entire call and simply returns a value from the table with a precondition that the relevant entry exists in the table, and
  2. Specify an alternate implementation of the empty space in the source program after the value is calculated but before it is returned (think wrapper) that adds the arguments and return value to the memoization table.

The proof that #2 is safe is trivial. It replaces nothing in the original program, and its only effect is to touch the memoization table, which is not referenced by any pure code. The only tricky part might be to make sure that it doesn't get optimized away. The proof that #1 is safe is simply the definition of purity -- same args, same result. Note that in a language with UnsafePerformIO you can't appeal to purity in a proof, but you can here.

Hasp macros in Hasp

Hasp macros should be evaluated as Hasp code, not scheme. Maybe.

Ali, creator of Hasp, writes:

This is very feasible in the sense that one could safely compile that Hasp code to Haskell before it is download, and then all you need on your computer is a Haskell interpreter.

The basic issue here is that all of the Scheme code would need to be ported to Hasp, and this would require writing a new "read" procedure plus quasiquote, and the indentation reading is written quite imperatively, so may have to just rewrite that too.

First steps towards this goal would be to start recoding the Scheme more functionally so the port is straight-forward.

One other difference is that since Hasp does not allow variable arity functions, a slightly different syntax would be needed for defmac.

The reason I have not already done this port is because I wanted Hasp to be reasonably language agnostic, so that later it might output code in a similar language like Clean or Curry, at which point having Hasp written in Haskell would be equally as meaningless as having it written in Scheme.

Thank you, Ali. Excellent point about language independence. If Hasp is a meta-language for multiple underlying languages, it would be inappropriate to tie Hasp to any one of them. In this case, the Hasp semantics would ideally be simple as possible, and Scheme is a good choice for that.

On the other hand, one of the things that I really like about Common Lisp is that the language available for macro expansion is exactly the same language used at runtime. Macro expansion can do anything that normal execution can, including make calls into complex CL libraries, reach out over the network and talk to databases, etc.. Naturally, this makes macro expansion much more strongly tied to the runtime language.

I think there is value in both of these approaches, but it would be difficult for both to exist within one implementation. A runtime-identical-semantics macro expander would behave differently depending upon the target output language. Writing a portable Hasp parser (even using Hasp syntax) that would run correctly in all possible output languages would be brittle and needlessly complex. It would be the first step on the path of trying to create a new programming language that could be compiled to any of Haskell, Curry, or Clean. While potentially interesting, that would be a project with a much larger scope than Hasp aims to have.


Resurrect Hoard, the token collection game for AI development. Send to Rodger.

C: Run unit tests with different integer layouts

The C spec allows a great deal of flexibility with respect to the size and layout of integers. Key points from sections and 6.2.6 (imprecisely summarized):

Anything not nailed down is permitted. You can have 33-bit ints that are three 11-bit chars. You can have 33-bit ints that are one 33-bit char. You can have inefficient ints that take up more space than the value range the provide. Multiple representations can map to the same value, and/or some of the representations can be illegal (eg: to implement parity). Negative zero can be permitted or not. etc. etc.

When I run my unit tests normally, I exercise my code on my platform, which is a single point in this extremely flexible space. I am coding in the programming language C-on-my-platform, not the programming language C.

I want a thing that will run my unit tests cross-compiled to a bunch of emulated platforms that systematically explore this space, ensuring that my program is written in C, not an unknown subset.

Jbofi'e: automake/autoconf -ify

Jbofi'e's build process is ancient. Modernize it. This probably entails becoming the maintainer.

Jump mechanic - Input after jump begins changes initial parameters

In a video game, the user holds down the jump button for a variable amount of time to control the height of the jump and can usually move around in the air somewhat. Instead of fudging that by modifying velocities after the jump, go back in time and recompute the initial jump parameters. All jumps, in the end, are physics-perfect parabolas with the only mid-air force being gravity.

Bonus: If the path of the jump is significant, such as whether a mid-air token is collected or not, and later input changes the jump path such that a collected token is no longer collected, put it back on the screen and roll back its effects!

Kernel image in /proc

Linux should have an option to retain the original compressed bootable kernel image instead of discarding it after decompression and make it accessible as a file in /proc. This is handy for machines that boot from removable media, initrds, or otherwise no longer have the original bootable kernel image easily at hand.

Bonus feature: proactively swap it out when swap appears.

Kstars: Add Earth's shadows

Add Earth's penumbral and umbral shadows at the orbit of the moon to kstars so that you can see lunar eclipses.

Lambda calculus visualization

To Dissect a Mockingbird: A Graphical Notation for the Lambda Calculus with Animated Reduction doesn't appear to have the actual animating visualizer written yet. Write it.

Other lambda calculus visualization stuff:

  1. Alligator Eggs!
  2. Visual Languages: Alligator Eggs revisited via LC for kids (alligators, oh my!) | Lambda the Ultimate

Lisp: store type info in pointer

Write a scheme compiler that stores type information in the reference, not in the object.

Consider a point type, defined as a tuple type, defined in terms of a pair of integers.

In C++:

template <T1,T2>    struct pair { T1 car; T2 cdr; }
   template <T1>       struct tuple { T1 first; }
   template <T1,T2>    struct tuple { pair<T1,T2> p; }
   template <T1,T2,T3> struct tuple { pair<T1,pair<T2,T3> > p; }
   struct point { tuple<int, int> t; }

An object of type point is two machine words -- two integers. No type information is stored in the object. All types are resolved at compile time.

In Java:

   class pair<T1,T2> { T1 car; T2 cdr; }
   class tuple<T1>       { T1 first; }
   class tuple<T1,T2>    { pair<T1,T2 p; }
   class tuple<T1,T2,T3> { pair<T1,pair<T2,T3>> p; }
   class point { tuple<Integer,Integer> t; }

An object of type point is seven machine words:

  1. Object of type point follows:
  2. Object of type tuple follows:
  3. Object of type pair follows:
  4. Object of type Integer follows;
  5. <an integer>
  6. Object of type Integer follows;
  7. <an integer>

Clearly, storing the type information in the object is a potentially large overhead. So, instead of defining a reference to be (address) → (type, data), use (address,type) → (data). Initially, this does nothing. But, as types nest, inner type information can be omitted. (address,point) → (int, int). This way, types don't need to be fully resolved at compile time. Types that are known can be omitted.

Update: This is how interface types are implemented in Go.

Lock queue administration

System administrators should be able to manipulate kernel lock queues. When multiple threads are blocked on a kernel lock (sysv semaphore, futex, fcntl(F_SETLK), etc.), the order in which they get the lock is unspecified. There should be an interface for an admin to step in and specify an order.

Motivating example: Dspam has a token database. Each incoming mail spawns a process that takes a lock on the token database. Also, retraining a message from the quarantine takes a lock on the token database. The retrain thread consumes a lot of ram (by holding the quarantine in memory while retraining). When a retrain starts, it has a 1/N chance of acquiring the token database lock. N might be 5 initially. But, then the ram pressure of holding the entire quarantine in memory kicks the token database out of cache, and processes have to start going to disk for it. Mail processing slows down and N increases dramatically, further reducing the chance that the situation will be resolved any time soon.

An admin could step in and place the retrain thread as the next lock taker and immediately resolve the situation.

This situation could also be resolved by fixing dspam to not be stupid (not hold the quarantine in ram (or have a two-tier lock to prioritize the retrain thread)). However, this doesn't resolve the problem in production -- the machine is still swap-thrashing and the application may or may not gracefully handle being unexpectedly killed. Giving administrators this capability gives them the power to respond to current problems, not just to fix things for next time they happen.

Logflash: Port to gtk

There's a lojban learning assistant for DOS. It runs in dosemu. I think it's called Logflash. It needs to be ported to a modern platform.

Ls: command line order

Add an option to ls to emit output in the order that it was specified on the command line. This is approximately equivalent to invoking ls many times with one file each time, except that with one invocation the columns line up correctly.

Ltris: difficulty levels

Add difficulty levels to ltris' expert mode. Full expert mode is too hard for the average user, but the current watered-down mode is too easy.

Lyric graph hand mapper

Map lyric graphs onto hands for the production of daft-hands -like performances.


Finally realize the M-expression aspiration of Lisp. Except, instead of inventing a new language to be the M-expression language, just borrow all the existing ones. Compile C, ML, Java, JVM bytecode, assembly, etc. to Lisp.

This was one of the original reasons why GNU chose Scheme (by way of Guile) for the official GNU extension language. I heard something about this being based on later-retracted research. Look this up.

See also C2lisp, DeHasp.

Mercurial shouldn't recursively scan ignored directories

Mercurial matches filenames to ignore with regular expressions. It does this by recursively exploring the repository root, then matching every filename against every regular expression. Thus, even if ^build/.* is ignored, Mercurial recursively scans build/ to verify that every file in there matches that regular expression. This is very expensive for large ignored directories in mercurial repositories.

By being smarter about regular expressions (not just treating them as opaque things to pass to the RE library), mercurial could catch some cases where it can avoid these large recursive scans. Doing this in the general case is hard, but handling a few simple cases, like .endswith(".*"), would help a lot.

ML S-expressions

Make an S-expression syntax for ML. Implement defmacro.

Microsoft Singularity clone

Use managed languages and typed assembly to prove that programs don't create references (don't cast integers to pointers, don't exceed array bounds, etc.).

Use Linux to bootstrap. User code is compiled to Linux kernel modules. Kernel modules provide a mechanism for loading and unloading code and for managing dependencies (think dynamically-linked libraries).

Address space split, usually 3/1, 2/2, or 1/3. Can it be 4/0 ?

Problem: Kernel allocations cannot be swapped out.

Update: The Chrome Native Client stuff does a little tiny bit of this general sort of thing.

In-order Markov generator

From a Markov Transition Matrix, generate chains in maximum-likelihood order. Also, subject to some constraint, like

Minilight: Go

Port Minilight to Go. And other fun languages (prolog, haskell)?

Moodle should use revision control

Use subversion or git or something for Moodle's database of grades, student-uploaded assignments, etc. This data is precious -- far more precious than the storage required to keep the full history. Old revisions should not be thrown away. Accidents happen.



Split a media stream into its component packets. Run some kind of user-specified program on each packet. Conditionally emit the packet into the output stream.

MrSID image format decoder

There is no Free Software to decode this image format. There is otherwise public data (maps, the Voynich manuscript) locked up in this format.

Suggestive music Editor

Reduce the task of music composition to the simpler task of music critic by repeatedly randomly generating the next note in a sequence and asking the user if it sounds good or not.

Narrative generator

Encode and decode an arbitrary bit sequence as a narrative. This makes it much easier for a human to recognize a bit sequence. Being able to verify a long bit sequence from memory is useful for signature/hash checking.

This is mostly constructing lists of story elements that are a power-of-two long and then selecting elements from them by consuming an appropriate number of bits from the input stream.

This is fully-generalized Mad Libs.

See also xkcd/904


Make something like stunnel, but for one time pads. Discussion.

Pam: accept reverse-case passwords

Add an option to PAM to optionally accept reverse-case passwords. IE, if a password check fails, flip the case of every character and try again before returning failure.

The state of the capslock key conveys only one bit of information. That bit is more clearly visible than any other keyboard activity during password entry (it turns on an LED). Further, it can be inadvertently toggled. If a user can supply the exact reverse-case password, they obviously know the correct password, so there's no good reason to deny their authentication attempt.

Perl pack() varint

Add support for Protocol Buffers' varint to perl's pack() and unpack().

Pipe splicing

There should be an administrative interface for re-routing pipes in-flight. For example, I'm copying a bunch of files over the network. It's going slower than I thought it would be because it gets network bound whenever it hits a large file and disk-bound whenever it hits small files. If I had added |buffer| to the original command, it could saturate both network and disk, sloshing load from one to the other automatically. I could stop it and add |buffer|, but it's been running for an hour already and it would be complicated to try to figure out which files have already been transferred to avoid re-transferring them. I want to add |buffer| to the currently-running pipeline, between nc | cpio. Call that pipe's descriptors (0,1). In sequence:

  1. Make new pipes (2,3) and (4,5)
  2. fork() a new process
  3. Attach fd's 2 and 5 to the new process' stdin and stdout.
  4. exec() buffer.
  5. Block new writes from nc to 1.
  6. Wait for reads on 0 to empty the internal kernel buffer for that pipe.
  7. Replace fd 0 with 4.
  8. Replace fd 1 with 3.
  9. Unblock writes from nc to 3 if this doesn't happen automatically with the replace in the previous step.
  10. Destroy the now unused (0,1) pipe.

Steps 1-4 are existing syscalls. I'm not sure if it would be a good idea to provide a syscall for Steps 5 and 9 (start and stop pipes). That just sounds like another piece of userland-visible state that the kernel has to keep track of and could randomly get messed up, making debugging unix stuff even more unnecessarily complicated. I think it makes more sense to have an interface like select where you pass a list of pipe splice changes that you would like to perform and the kernel handles blocking and buffer draining automatically within the space of one syscall rather than having this process be a bunch of smaller syscalls that aren't really useful on their own.

There's a risk of introducing deadlocks here if applications depend upon there being a non-zero kernel pipe buffer. One can argue that applications should not depend upon this.

Pitch shifter

A non-realtime pitch shifter / tempo stretcher. Burn CPU cycles to try to get better-quality audio.

Key idea: measure error. What does the correct output look like? After each pass, use the error from the previous pass to generate a new candidate output stream. Iteratively converge on the perfect output.

Pitch visualizer

Display in realtime the pitch on the microphone.

Display a ruler with the lettered notes and and indicator of the current pitch so that a vocalist can see if they are over or under their target pitch. Allow the user to set the key so that notes not in the key are not labeled on the ruler.

Show a pitch history so that the user can see scoops and the magnitude of vibrato. (Maybe include an intensity display for tremolo).

Bonus: Make this an OpenMoko app so that users are not not tied to computers.

Starting points: Aubio and the Goertzel algorithm.

Update: I tried a few of these. The ones I tried had serious noise / signal quality issues.

Play with large address spaces

The very large address space made available by new 64-bit architectures offers an opportunity for rethinking some OS design decisions made for much smaller address spaces.

Political sign image maker website

Political sign image maker website -- like

Starry, stripy, red, white, blue motif. (For the US. Other locales later.)

Make it dirt simple: Key in name/phrase → get sign.

Political speech parser

Much political speech (eg: on the US senate floor) is empty of denotative meaning. It is name dropping, referring to historical events, quoting not-immediately-relevant passages from founding documents, etc. However, this speech is rich in implied meaning. It is the foundation of the social network of the governing body. Also, it is largely transcribed and available for data mining.

Turn this mountain of transcripts of flowery speech into a reference web. See which name-drops, historical allusions, etc. are statistically significant with respect to correlated voting, introducing a bill with co-sponsors, having an amendment accepted, having a bill sent in or out of committee, passed, etc. Look for correlation with out-of-character behaviour, which (cynically) would indicate successful negotiation of a horse-trade.

Portage: remove unused configuration

/etc/portage/package.* allow Gentoo administrators to influence which versions of packages are selected for installation. The most common operation is switching a package from 'stable' to 'unstable' to permit a newer but not yet vetted version to be selected. To avoid jumping onto the unstable branch indefinitely (the version that is marked unstable today may be marked stable tomorrow), these directives can be version-limited.

For example, suppose the foo package has available a version 2.5 marked stable and a version 3.0 marked unstable. You need specific functionality from 3.0. You do not want to be randomly upgraded to an unstable 4.0 at some point in the future. You would specify this preference with =foo-3.* or <foo-4.

In the future when a 3.x version is marked stable, this directive will no longer be needed. The configuration directive will remain in /etc/portage/package.*, but it will do nothing. Unused directives can be detected by removing them one at a time and running emerge -DuNp and checking if anything has changed, but this can be achieved much more efficiently by modifying portage to detect unused configuration directives directly.

Subtlety: In this example, there may be a stable 3.0.1 and an unstable 3.0.2. The configuration directive will still be 'used' to pull in 3.0.2 instead of 3.0.1, but 3.0.1 would satisfy your 3.x requirement. This is a limitation in portage's version specification syntax. There is no way to directly express the target notion "if <there are stable packages that match foo-3.*> then nothing else foo-3.*". If this was available, it would give the slightly counter-intuitive behavior of downgrading a package when the first stable version became available, but this would considered correct behavior for those who wish to maximize their use of packages considered stable and/or minimize their local customization.

Portage: revdep-rebuild as an RDEPEND check

In Gentoo, programs should only link against things that they RDEPEND upon (transatively / indirectly). revdep-reubild scans the dynamic link structure. It should complain when it finds a link not implied by an RDEPEND.

PS5: Post-command-entry 'prompt'

bash has \\D in PS1 and/or PROMPT_COMMAND to show the date in the prompt. Since the prompt is customarily printed immediately after the previous command finishes, this is an excellent way to determine the stop-time of commands by looking at a console log. There is no corresponding general way to record the start time of each command. So: PS5, a prompt string that is expanded and printed after command entry, just before command execution. With \\D in both PS5 and PS1, it would be clear from the console log exactly when each command started and stopped.

This can be done in a clunky way by trapping DEBUG.

Prolog: explain plan

Prolog should have an EXPLAIN PLAN operation like SQL.

For every combination of bound and unbound arguments, show the execution plan. Also show which combinations are erroneous.

Protocol interpreter

It looks like Viewpoints does something along this direction.

Pv library

Pipe view is neat, but it imposes an overhead -- all the data must be delivered to and collected from an additional process, just for the purpose of counting it. It would be better if pipe view was a library that could be invoked by any process reading from or writing to any file descriptor. At each read/write, the process would notify libpipeview of how much it read or wrote.

Of course, the point of pipe view is that it can be added to any pipeline without the components knowing about it. libpipeview would need to intercept calls to read and write (and readv, writev, pread, and pwrite) by shimming itself between the program and libc with LD_PRELOAD (like artsdsp). This method isn't limited to stdin and stdout. Intercepting open (and popen, connect, etc.) would be interesting too because it could get -N labels for each descriptor. Think of wrapping cpio in libpipeview and getting not only a progress bar for the output archive but also for each input file as it is scanned (intercept stat to get -s sizes).

Pv Unicode option

pv should optionally use Unicode block characters for the progress bar, for greater horizontal precision. Demo.

Reddit comment preview

Reddit's comment system should have preview capability.

Reddit social context

For logged-in users, /user/<other_username> should list your past interactions with other_user. Have you commented on their comments/submissions? Have they commented on yours? Grandchild/grandparent? How many up and down votes have you cast for them? Reddit is a big place. I can't keep all this information in my head.

Rrdtool: compression control

Add a command line option to rrdtool graph to control the .png compression level.

This used to be as simple as

-  png_set_compression_level(png_ptr,1);
+  png_set_compression_level(png_ptr,9);

in rrd_gfx.c in 1.2.11, but things have been reorganized since then.

pngcrush can reduce the size of the graphs page from 1354 kB to 1114 kB, a savings of 240 kB (18%). Its favorite method is 113 (fm 0 zl 9 zs 0).

SIGSTOP swapoff

One ought to be able to ^Z swapoff and resume it with fg or bg. Instead, it fails:

# strace swapoff LABEL=theswap
execve("/sbin/swapoff", ["swapoff", "LABEL=theswap"], [/* 64 vars */]) = 0
[1]+  Stopped                 strace swapoff LABEL=theswap
# fg
strace swapoff LABEL=theswap
)                    = -1 EINTR (Interrupted system call)
--- SIGCONT (Continued) @ 0 (0) ---
write(2, "swapoff: ", 9)                = 9
write(2, "LABEL=theswap: swapoff failed"..., 33) = 33
write(2, ": ", 2)                       = 2
write(2, "Interrupted system call\n", 24) = 24
exit_group(-1)                          = ?

The swapoff(2) manpage should mention that swapoff() can fail with EINTR and the signal(7) manpage should list swapoff in the "Interruption of System Calls and Library Functions by Stop Signals" section.

sort should coalesce duplicate lines

When /usr/bin/sort processes input with many exactly identical lines, it stores a copy of each. If, instead, it stored only one copy and a count, it could run over highly-duplicative input with drastically reduced memory and disk requirements.

SQL dotted join syntax

An SQL evaluator aware of foreign key relationships would enable easy implicit joining syntax by just reaching through the key field. For example, suppose table A has a column b_id that is an index into table B's id column:

I ought to be able to simply say

SELECT A.a_val, A.b_id.b_val FROM A

and have the evaluator automatically expand that to

SELECT A.a_val, B.b_val FROM A,B WHERE A.b_id =

SQL progress indicator

Add a progress indicator to SQLite, PostgreSQL, and/or MySQL. Researchy Microsoft SQL implementation.

Or, for something different, deliver the query response in packets. Return data as it becomes available. For results that count(*), return 1 upon finding the first, then update that cell in the result by sending updates the client result-set. This would be especially handy for interactively exploring data. Progress indication is implied -- the client could include a 'rows/range not yet examined' display that decrements with the same type of updates.

SQLite: Benchmark with proper sysv locks

This sqlite benchmark uses sleep(0.1) between lock attempts. A real (eg: sysv) lock with a sleep queue and wakeup() calls can do better. Try this benchmark with real locks and see if it does even better.

STL: radix sort

Add radix sort to the C++ STL. It's O(kn).


Make a SWF to SVG+SMIL compiler.

Bonus: Integrate with browser for seamless flash playback without a flash plugin.

Double bonus: Write it in javascript so that the browser need not even be aware of the transition.

Update: Adobe did this

Scheduling feedback tool

Feedback is an important part of developing intuition. A time visualizer can provide feedback for learning how to schedule (eg: software) projects.

GUI mockup of a project with a perpetually-retreating final milestone:

A series of timelines arranged vertically.  The top timeline shows three milestones as blue triangles.  The passage of time is depicted by a yellow background that grows at a steady rate as one scans down the figure.  As milestone triangles pass into the yellow reigon (into the past) they turn green.  Future milestones move about as estimates change while past milestones remain fixed.  The project is slightly late on the first milestone, slightly early on the second milestone, and significantly late on the third milestone.

Scheme compiler

Write a scheme complier. Everyone should do this. I haven't yet.

screen: Sort window list alphabetically

GNU screen's windowlist command has -m to sort the view by last access time. It should have a flag to sort them alphabetically by title as well.

Screensaver: Birds

A bird's-eye view of a flock of birds flying / fighting.

Screensaver: Mondrian

Make a screensaver that paints like Piet Mondrian. Example.

Screensaver: Graphs

Make a screensaver that gradually draws planar graphs (nodes and edges).

Algorithm sketch:

// Part 1: Construct the graph (invisibly):
Working on a buffer that overhangs the screen by ~10% on all edges:
Pick a random point in the buffer.  Put the first node there.
  Center = center of mass of nodes so far
  Initial_Angle = a random number
  Number_of_Approaches = ~50
  Loop: For each i 1 to Number_of_Approaches:
    Angle = Initial_Angle + i * 360 degrees / Number_of_Approaches
    // Approach Center from Angle
    Loop: Binary search along the line projected out of Center at Angle:
      // Determine if there's room for a new node here
      Loop: For points around a circle ~3x node diameter, linewidth-1 apart:
        If this pixel is colored:
          Binary search test result: Too close.  Break.
      Binary search test result: Too far.
    Distances[Angle] = Binary search result
  If Distances is empty:
    // We are done.  The screen is full.
  The new node is at Min(Distances)
  Loop: For each other node N in a random order (or sorted by distance?):
    // Try drawing some edges.  Edge paths are ellipse arcs from the centers
    // of nodes.  The portion inside the node is not drawn, obviously.  The
    // ellipses' eccentricity is such that they inscribe a golden rectangle.
    Loop: For each Curve_Direction in [clockwise, couter-clockwise]:
      // Try drawing an edge between the new node and N
      // First, test some pixels along the path
      Loop: For points spaced linewidth-1 apart along the path:
        If pixel is colored, Try next Curve_Direction
      // Looks like this one might work
      Copy the image buffer
      Try drawing the edge.  If any pixel filled in was already colored:
        Abandon the scribbled-on buffer; go back to the copy from before.
        Try next Curve_Direction
// We now have a random planar graph that fills the screen.

// Part 2: Reveal it slowly.
// Goal: After the first pixel, no pixel is painted that is not
// adjacent to another already-painted pixel.

Revealed = []
Currently_Revealing = []
Eligible_for_Revealing = [(A random node, a random angle)]
For each frame:
  Frames_Between_New_Reveals = ~200
  If Currently_Revealing is empty:
    Frames_Between_New_Reveals = ~50
  Every Frames_Between_New_Reveals-th frame:
    Move a random item from Eligible_for_Revealing to Currently_Revealing
  For each Thing in Currently_Revealing:
    Paint another ~1 pixel of length
    If Thing is a node:
      For each Edge out of the node:
        If Edge departs though the pixel just painted:
          If Edge is not already in Revealed or Currently_Revealing:
            Add Edge to Eligible_for_Revealing if it's not there already
    If Thing is now completely revealed:
      Move it from Currently_Revealing to Revealed.
      If Thing is an Edge:
        Paint the arrowhead
        If Thing's other node is not already in Revealed or Currently_Revealing:
          Add Thing's other node to Eligible_for_Revealing, to begin its
            reveal at the angle at which this Edge intersects it.

Screensaver: coincidence

A physics simulation of a lot of objects bouncing around apparently randomly, but with each interaction adjusted imperceptibly (less than 5%) to bring objects into alignment for future unlikely interactions. For example, a bunch of spheres and tori, where the spheres pass through the tori frequently. Or a bunch of basketballs that pass through stationary hoops frequently (reminiscent of Pleasantville).

Inspiration: this spinning taijitu.

Screensaver: conch clock

Turn Mike's Conch Shell Clock into an xscreensaver hack.

Screensaver: water caustics

Update: There's now a good software renderer for these! Video! All that's left is to translate it into an xscreensaver hack and add some recurring surface disturbances (simulate wind?) to keep the caustics moving.


Screensaver: Footprints amongst debris

Footprints walk across the screen. Also, separately, debris accumulates. The footprints must step around, walk around, and eventually jump over the debris.

Even though only the 2d footprints are visible, 3d body models and physics are probably necessary to make the step patterns look natural.

Screensaver: Throwing for Thrust

This could also be a game, but watching perfect computer play at difficulty settings that would be impossible for humans would make this a great screensaver.

The player-object (a cartoon space ship?) continuously throws things out the back for thrust. The playing field is screen-width but indefinitely tall. There is gravity. As things are thrown out the back for thrust, they pile up. Play moves upwards, to avoid being buried in all the stuff.

Tokens appear. The player must collect them by colliding with them. If the player covers the token with thrust-rubbish before collecting it, the game is lost. To be sporting, tokens don't appear too close to the pile of rubbish. Each token appears at a random position when the previous token is collected.

Objects are emitted at a roughly constant rate by mass. I.e., the magnitude of the thrust is not under player control, only the direction of thrust. Also, there is a limit to the speed at which the ship can turn. The difficulty is primarily determined by the mass of the ship -- how fast is space-filling crap generated? The difficulty increases gradually: each token adds mass to the ship and correspondingly increases the thrust volume so that it can continue to stay aloft in the presence of gravity. The penalty for crashing into the pile of rubbish is getting buried to the point at which you can no longer emit thrust, at which point you explode. Also, you die if you crash into the wall too hard. To be sporting, tokens don't appear too close to the walls either.

Generating a library of objects to use for thrust is the fun part. A diversity of sizes and shapes is important for visual interest. Soccer balls, wrenches, chairs, the kitchen sink, etc.

Box2d can provide the physics. Thrust rubbish that has scrolled off the bottom of the screen a sufficient distance and is not likely to move much more on account of being covered with a bunch of other rubbish can be fixed in place (in Box2d terms, converted from a dynamicBody to a staticBody) to reduce load on the physics engine. Thrust rubbish a ways below that layer can be deleted.

Probably inspired in part by Saishū Heiki Kanojo, Osmos, Flyboard videos on youtube, and thinking about thrust after reading The Martian.

Screensaver: Progress Clocks

Display our progress as a species on appropriate scales. For example:

% of mass converted to high-density information processing equipment (brains + CPUs):
Earth               0.0000000000000016306%
Solar Planets       0.0000000000000000036498361%
Solar System        0.0000000000000000000048909285%
Milky Way           0.0000000000000000000000000000000084439505%
Local Group         0.0000000000000000000000000000000037965049%
Virgo Supercluster  0.0000000000000000000000000000000000048974913%
Laniakea            0.000000000000000000000000000000000000048974913%
Observable Universe 0.000000000000000000000000000000000000000000097416%

These would update continuously, showing enough figures to have them change noticeably frame-to-frame (far more figures than we have accurate measurements for, unfortunately). The data would be generated by trend-fitted curves. The curve parameters would be updated periodically in new xscreensaver releases.

Screensaver: Paperclipper

Draw models of various paperclip manufacturing machines in motion, producing paperclips.

There are many designs. Model several of them parametrically and animate them, choosing random parameters to generate differently-shaped paperclips. Some inspiration:

(The joke)

Simplify x86

Start with an x86 emulator like valgrind's or bochs'. Remove all the instructions that aren't commonly used (eg: PHMINPOSUW). See what still works.

Port Linux to the simplified x86 instruction set.

To get really ambitious:

Once there is software to run on this hardware (including dynamic translators for old software), have the hardware built.


Extract collection information from a Sirsi interface and dump it into a Koha database.

Slashcode user merge

Add a user-merge feature to slash-code.

I have two user accounts. (I lost the password to the first one). I want them merged.

Interface mock-up:

old account: chkno new account: chkn0
use old combine use new
ID # x
Acct name x
Password x
Email addy x
Description x
Comments x
Friends x
Tags x
... x

This would need to leave "This account has been merged with" forwarding pages for the abandoned ID# and user name to make archived posts' links work.

Smalloc: secure memory

A separate malloc() call for secure memory -- memory used to store encryption keys, passwords, or other secure data. Such memory, in addition to being flagged not-swappable, is tracked by the kernel to be explicitly zeroed on deallocation or shutdown/reboot (including panic).

Cold-boot forensic tools are now available that fish through memory to recover data from RAM that was unpowered for only a few seconds. Explicitly zeroing sensitive data in RAM on shutdown is a partial defense against this attack. Especially fancy hardware would detect loss of power and alert the OS even when there are only milliseconds of power remaining as capacitors in the power supply drain. Modern machines have impressive memory bandwidth and could do some useful cleanup in these precious few milliseconds.

PaX Linux implementation.

Solitaire goal reachability

Hack a solitaire client to exhaustively search the game state space before play. Record whether each state can reach a goal state. Then, during play, interrupt the user if they transition from a winnable state to a not-winnable state.

This trains the user's intuition of exactly when they cross the line from risky play to losing play.

Spam filters that can change their minds

Spam filters make correctable mistakes. If a spam filter passes mail A, but then later receives mail B which is similar to mail A but is more clearly spam, and the mailbox owner has not yet checked their mail, the spam filter should be able to change its mind -- go back and flag message A as spam.

However, this exacerbates the mail poisoning problem where a malicious user can hide mails from a target user by sending copies of that mail with spammy keywords added, causing their filter to associate specific legitimate content with spammy subjects.

Sort: semi-random

/usr/bin/sort either sorts or randomizes ( -R ). It should be able to randomize 'a little', not completely. For example, add the line number to the hash. Things initially toward the front of the stream tend to stay toward the front, but each individual line is jostled somewhat.

How to specify the magnitude of the shuffle without knowing the input size? Maybe average displacement.

A possible lead: soft sorting with soft heaps. Question: would this approach add disorder to an already-sorted list? Could it generate all possible disorderings or only a subset of them?

Strongly-typed C compiler

C is a weakly-typed language. It could be compiled into a strongly-typed executable if the compiler is willing to put in a lot of extra work:

Update: Fail-Safe C does this sort of thing.

Update: gcc and LLVM now both have an Undefined Behaviour Sanitizer.

Sundaram's sieve: order

Write a program to walk Sundaram's sieve in ascending numerical order, thus generating primes. Apparently it takes θ(n lg n) time and θ(n) space? Find out why by implementing it.

TaggedRevs: prefer no page to untagged pages

TaggedRevs can almost create a moderated wiki. Except:

Unreviewed content should not show anywhere because googlebot will find it.

Tetex: 'access' should be in coreutils

tetex provides /usr/bin/access, a command line interface to access(2). This shouldn't be in tetex. It should be in coreutils.

Timestamped terminal

Terminals (xterm, urxvt, etc.) should have an option to display a timestamp to the left of each line indicating when it was last changed. For normal, scrolling output, this would be a bit like piping the output through awk '{ print systime(), $0; fflush(); }', but it would be outside the terminal area so selecting the text would not pick up the timestamps, lines would not wrap into the timestamp collumn, etc. Doing it in the terminal means that it could also do something reasonable for ncurses programs, which |awk cannot.

Tune2fs should be able to create and destroy inodes

Currently, the inode count of ext2, ext3, and ext4 filesystems is fixed at mke2fs time. It's easy to get this wrong (through inattention or imperfect foresight) and painful to correct (move data, mke2fs again, move data back). The ext4 default of allocating 1.6% of the disk to inodes (up from 0.8% in ext3) doesn't help. On a 2T disk, that's 32GB of inodes. Rather than continuing to fight over what reasonable defaults are, let's just make it easy to change.

The trick is to do this without making inode → address mapping arbitrarily complicated. Inodes are distributed throughout the disk. It's not good enough to just repeat the initial distribution process. Consider a disk with 10 groups that started with 100 inodes, then added 100 inodes ten times. Group 0 would have inodes 0-9, 100-109, 200-209, ... 900-909. The amount of on-disk configuration information (and possibly inode lookup time) would scale linearly with the number of inode adjustments ever made to the filesystem--not acceptable.

The alternative is changing the mapping of existing inodes. Group 0 initially had inodes 0-9 and should end up with inodes 0-99. But moving inode data to a new location defeats the point of distributing it across the disk--so that it's close to the file data. So if we're going to move the home location of inode numbers, we're going to have to change existing inodes' numbers, which means rewriting all the directories to update all inode references. This is an extended operation that needs to be done in a safe, resumable way (eg: journaled) for protection against loss of power, etc.

There is a way to avoid the need for this: Use inodes' offsets as their inode number. This way, inodes own their number just by virtue of being at that place on the disk. If there's room on the disk for more inodes, there's room in the inode number space for them there also. The problem with this is that ext2/3/4 inode numbers are 32-bit integers, limiting this arrangement to 4GB filesystems. Dividing through by the minimum inode size of 128 bytes would allow still-inadequate 512GB filesystems. 64-bit inode numbers would make this approach feasible.

Visualize light propagation

This room has a dark spot. How big is it? What shape is it? What path does the light that almost gets there take? How far can you move the light before the dark spot goes away? It would be straightforward to create a program that visualizes this -- shine a beam out from a wall segment, find the opposite wall sections and vertices it illuminates, repeat.

Vim syntax: awk, perl, etc in bash

Vim's syntax highlighting for bash should, inside awk '...', switch to awk syntax highlighting instead of it just being one giant bash literal red string.

Wget: accept encodings

curl --compress sends Accept-Encoding: deflate, gzip. wget cannot accept any non-identity encodings. It should.

Wget: modernize --page-requisites --convert-links

HTML pages have acquired all kinds of new and interesting ways to reference other resources since wget was created. Wget's --page-requisites and --convert-links options seem to have fallen behind a bit -- they don't pull in and/or rewrite everything needed to render pages offline anymore.

How tangled does this get, and how much should wget attempt to tackle? If a <script> uses DOM functions to insert another <script> tag, there's no way wget can know about this unless it actually executes all the javascript. Is this feasible? Are the gecko / webkit / etc. interfaces flexible enough to build a DOM and execute javascript without rendering the page to a display? Is it less effort to add wget's functionality to a browser rather than adding all this browser functionality to wget?

Wget saves the URL to the file, but firefox upon seeing that link will look for a in the filesystem, discarding the ? and everything after it. Which behaviour is correct? Should firefox fall back and look for the second if the first isn't available?

  • Wget: Verbs
  • curl allows you to specify the HTTP verb (GET/POST/PUT/DELETE/PROPFIND/COPY/MOVE/etc.) via -X. wget should allow this also.

    Whirlwind WIL: Lisp

    Make an s-expression syntax for Whirlwind WIL. Then, implement lisp on top of it.

    White Elephant game: analyze

    Do a full statistical analysis of the White elephant gift exchange game. Under the assumption of a globally consistent value ordering, what are the odds of ending up with the best gift for each player (by play order)? With two value orderings (eg: masculine gifts vs. feminine gifts)? When the categories of the wrapped gifts are marked and unmarked? What is the ideal play strategy?

    Maybe write a Monte-Carlo simulator.

    Whole system PIE

    Set up a hardened system with PaX and PIE. Get everything working. Apparently X, mplayer, and xine are known to be broken. I noticed that ffmpeg doesn't compile with -fPIC. Probably a lot more brokenness.

    PIC tips & tricks.


    A giant colorful distortion follows the mouse cursor.

    With all these LCD screens everywhere, we need software to bring back the fun of playing with magnets on old CRT displays.

    Challenge: It has to be *fast* to be realistic.

    Bonus: Multitouch for holding the magnet closer or further and for twisting.

    For fun: lasting, lower-intensity color distortions and a degauss feature that cleans them up.

    Yellow dot print filter

    Make a printer filter that adds extra yellow dots before sending a document to a color printer. If these are indistinguishable from the dots added by the printer hardware for tracking purposes, the final on-the-page pattern would be ambiguous. This would of course need to be tailored to each printer model. Several printers of each model would need to be examined to see what kind of dot patterns they produce.