wiki:Minutes-08-10-29

Oct 29, 2008

Application characterization discussion

Options for characterizing memory usage:

  • Initial approach is to mark pages that change between checkpoints.
  • look at rsync-like (checksum) model for detecting changes
    • should talk to Hargrove about his BLCR future work on this subject.

Issues:

  • What about multiple cores sharing pages?
    • If it's the same app on all cores, there may be an opportunity to condense state?
    • Probably want to focus on single-core, single address space issues first.
  • What apps are we interested in, and can apps run on Kitten yet?
    • Haven't discussed particular apps.
    • Apps like CTH should be able to run on Kitten within a couple months.

Quiescence discussion

Brightwell mentioned some recent changes to Portals to support quiescence for BLCR on CNL. He took the action of finding out if we could use these changes.

CIFTS

Should we be involved in CIFTS? What can we leverage?

  • It might be worthwhile to develop to their APIs, even if we don't use their implementation.

There may be design flaws inherent in their design:

  • assume TCP,
  • Network state to them means MPI state. Our network state is a layer below MPI.

Reasons to be involved:

  • Influence design for improved scalability... this didn't work so well in SciDAC SSS.
  • Improved visibility to dispute isolation criticism from external reviewers.
  • Participation in a standardization effort. If CIFTS gets broad adoption, we don't want to be left behind.

Actions before next meeting

  • Pedretti: Enumerate issues with memory characterization work in preparation for a detailed design and implementation plan.
  • Brightwell: Look into recent changes to Portals for BLCR on CNL. What can we leverage?
  • Oldfield: Look at BLCR to identify system state that needs to be managed.
  • Brightwell: Email Geist and Beckman about the best course of action for involvement in CIFTS. Copy 9lives mailing list.