wiki:InputOutput

MTGL provides functions to facilitate reading and writing data from and to disk. All of the functions described in this tutorial can be used after including the header mtgl/mtgl_io.hpp.

Non-XMT Definitions for Snapshot Functions

The first thing in included by the header file is definitions for the XMT snapshotting functions for use on a non-XMT UNIX system. These definitions are based on standard C input and output calls and are included whenever the MTGL is compiled on a non-XMT UNIX machine. This makes all of the input and output using MTGL work seamlessly on both the XMT and other UNIX machines.

Reading and Writing Binary Arrays

MTGL includes functions for reading and writing arrays in binary format. Here are the functions.

template <typename array_t>
inline
array_t* read_array(char* filename, uint64_t& array_size)

template <typename array_t>
inline
void write_array(char* filename, array_t* array, int array_size)

The read function allocates memory for the array and reads the array from filename into the allocated memory. The size of the array is returned in array_size, and the newly allocated memory containing the array is returned by the function. The user MUST be sure to deallocate this memory.

The write function writes the data in array, which is of size array_size, to filename.

Reading and Writing Matrix Market Graphs

MTGL can read a subset of the Matrix Market graph format ( http://math.nist.gov/MatrixMarket/formats.html). MTGL reads only the 'matrix coordintate' type. It supports 'real', 'integer', and 'pattern' entry types and 'general', 'symmetric', and 'skew-symmetric' symmetry types. MTGL currently supports only writing Matrix Market format files using the 'matrix coordinate pattern general' format, and the writing is done in serial. At some point in the future, we plan to parallelize the code to write Matrix Market format graphs and support more types.

Here is the function for reading Matrix Market files.

template <typename Graph, typename WT>
void read_matrix_market(Graph& g, char* filename, dynamic_array<WT>& weights)

After the function returns, the graph g will be populated with the graph given in filename. If the graph format supports matrix entries, they will be populated into weights. If the graph format doesn't support matrix entries, weights will remain empty.

Reading and Writing DIMACS Graphs

MTGL supports reading a version of the DIMACS graph format ( http://dimacs.rutgers.edu/Challenges/). As the DIMACS format has changed over the years to meet the needs of the various challenges, we decided to support the graph file format for the 9th challenge for shortest paths ( http://www.dis.uniroma1.it/~challenge9/format.shtml).

Comment lines give human-readable information and are ignored. They start with the letter 'c'. Here's an example.

c This is a comment line describing the graph.

The file will have a single problem line that must come before the lines describing the graph edges. The problem line has the following format, where the problem_type is ignored.

p <problem_type> <num_vertices> <num_edges>

The lines describing the edges have the following format.

a <from_vertex> <to_vertex> <weight>

Currently, MTGL only supports writing the DIMACS format in serial. We hope to parallelize the code sometime in the near future. This is not a huge problem as you should write

Here is the function for reading DIMACS format files.

template <typename Graph, typename WT>
void read_dimacs(Graph& g, char* filename, dynamic_array<WT>& weights)

After the function returns, the graph g will be populated with the graph given in filename, and the weights will be stored in the weights array.

Reading and Writing Binary Graphs

MTGL has a simple binary format that it uses to read and write graphs from and to disk. The format consists of two binary files: the sources file and the destinations file. We typically use the extension .srcs for the sources file and the extension .dests for the destinations file. The sources file is an array of ids for the source vertices of the edges. The destinations file is an array of ids for the target vertices of the edges.

Here are the functions.

template <typename Graph>
bool read_binary(Graph& g, char* src_filename, char* dest_filename)

template <typename Graph>
bool write_binary(Graph& g, char* src_filename, char* dest_filename)

Both functions take parameters of the graph, the file name for the sources file, and the file name for the destinations file. The functions use the graph's size_type as the record type.

The file tutorial/input_output.cpp gives an example of reading a graph from standard input, writing the graph to disk, and reading it from disk into a new graph.

Here's code that initializes a graph and writes it to a file specified as a program argument.

  typedef compressed_sparse_row_graph<directedS> Graph;
  typedef graph_traits<Graph>::size_type size_type;

  // Usage error message.
  if (argc < 2)
  {
    std::cerr << "Usage: " << argv[0] << " <fileroot>" << std::endl << std::endl
              << "    fileroot: filename root for output file" << std::endl;
    exit(1);
  }

  size_type n;
  size_type m;

  // Read in the number of vertices and edges.
  std::cin >> n;
  std::cin >> m;

  size_type* srcs = new size_type[m];
  size_type* dests = new size_type[m];

  // Read in the ids of each edge's vertices.
  for (size_type i = 0; i < m; ++i)
  {
    std::cin >> srcs[i] >> dests[i];
  }

  // Initialize the graph.
  Graph g;
  init(n, m, srcs, dests, g);

  delete [] srcs;
  delete [] dests;

  // Get the output filenames from the file root supplied as a program
  // argument.
  char srcs_fname[256];
  char dests_fname[256];

  strcpy(srcs_fname, argv[1]);
  strcpy(dests_fname, argv[1]);
  strcat(srcs_fname, ".srcs");
  strcat(dests_fname, ".dests");

  // Write the graph to disk.
  write_binary(g, srcs_fname, dests_fname);

Here's code that initializes a new graph with the graph we just wrote to disk.

  // Restore the graph from disk to a different graph.
  Graph dg;
  read_binary(dg, srcs_fname, dests_fname);

Remember that reading and writing on the XMT only occurs via the parallel file system, so you must be sure give a file root that is on the parallel file system when on the XMT. To use the parallel file system, a user must do some setup. For instructions check here.

There's another definition for both read_binary() and write_binary() that allows the user to specify the record type. One circumstance where this might be useful is if you want to use the same input / output file on multiple systems which define the type used for size_type differently. (For example, size_type is usually defined as unsigned int. On many machines an unsigned int is 4 bytes, but an unsigned int is 8 bytes on the XMT.)

The functions are:

template <typename Graph, typename int_type>
bool read_binary(Graph& g, char* src_filename, char* dest_filename,
                 int_type val)

template <typename Graph, typename int_type>
bool write_binary(Graph& g, char* src_filename, char* dest_filename,
                  int_type val)

The user would typically call these functions by passing a default constructed type as the last parameter. I strongly recommend that the user only use the standard portable types (int16_t, uint32_t, etc.); otherwise, you can still have issues with type differences.

Here's an example of writing the triangle graph using a record type of int64_t.

  write_binary(g, sfname, dfname, int64_t());

And here is the corresponding example of reading in that same graph.

  read_binary(g, sfname, dfname, int64_t());

The functions cast between the record type provided by the user and size_type, so the standard caveats that come with type casting are applicable. You also have to worry about changes in endianness as well when moving files between machines.