15File Operation

15.1Overview

Gura provides mechanism of Stream and Directory to work on files: Stream is prepared to read and write content of a file and Directory to retrieve lists of files stored in some containers. Here, a term "file" is not limited to what is stored in a file system of an OS. You can also use Stream and Directory to access files through networks and even ones stored in an archive files. Gura provides a generic framework to handle these resources so that you can expand the capabilities by importing Modules.

Each of Streams and Directories is associated with a uniquely identifiable string called pathname. A framework called Path Manager is responsible of recognizing pathname for Stream and Directory and dispatching file operations to appropriate processes that have been registered by built-in and imported Modules.

15.2Pathname

15.2.1Acceptable Format of Pathname

A pathname is a string that identifies Stream and Directory, which should be handled by Path Manager.

By default, built-in module fs has been registered to Path Manager, which tries to recognize a pathname as what is for ones stored in a file system. Below are some of such examples:

/home/foo/work/example.txt
C:\Users\foo\source\main.cpp

You can use both a slash or a backslash as a directory separator for a file in file systems, which is to be converted by fs module to what the current OS can accept. You can see variable path.sep_file to check what character is favorable to the OS.

15.2.2Utility Functions to Parse Pathname

Function path.dirname() extracts a directory part by eliminating a file part from a pathname.

rtn = path.dirname('/home/foo/work/example.txt')
// rtn is '/home/foo/work/'

If the pathname ends with a directory separator, the function determines it doesn't contain a file part and returns the whole string.

rtn = path.dirname('/home/foo/work/')
// rtn is '/home/foo/work/'

Function path.filename() extracts a file part from a pathname.

rtn = path.fileame('/home/foo/work/example.txt')
// rtn is 'example.txt'

When given with a pathname that ends with a directory separator, the function determines it doesn't contain a file part and returns a null string.

rtn = path.filename('/home/foo/work/')
// rtn is ''

Function path.split() splits a pathname by a directory separator and returns a list containing its directory part and file part. This works the same as a combination of path.dirname() and path.filename().

rtn = path.split('/home/foo/work/example.txt')
// rtn is ['/home/foo/work/', 'example.txt']

Function path.cutbottom() eliminates the last element in the directory hierarchy. This works the same as path.dirname() when the pathname ends with a file part.

rtn = path.cutbottom('/home/foo/work/example.txt')
// rtn is '/home/foo/work/'

Note that it would have a different result if the pathname ends with a directory separator.

rtn = path.cutbottom('/home/foo/work/')
// rtn is '/home/foo/'

Function path.bottom() splits a pathname and returns the last element. This works the same as path.filename() when the pathname ends with a file part.

rtn = path.bottom('/home/foo/work/example.txt')
// rtn is 'example.txt'

Note that it would have a different result if the pathname ends with a directory separator.

rtn = path.bottom('/home/foo/work/')
// rtn is 'work'

Function path.splitext() splits a pathname by a period existing last and returns a list containing its preceding part and suffix part.

rtn = path.splitext('/home/foo/work/example.txt')
// rtn is ['/home/foo/work/example', 'txt']

Function path.absname() takes a relative path name in a file system and returns an absolute name based on the current directory.

15.3Stream

15.3.1Stream Instance

A Stream is a data object to read and write content of a file and represented by a stream instance created by a constructor function named stream(). Below shows a declaration of the constructor function:

stream(pathname:string, mode?:string, codec?:codec):map {block?}

In many platforms and languages, people are accustom to using a term open as a function name for opening a file, or a stream. So, function open() is provided as a complete synonym for stream(), which has the same declaration with it.

open(pathname:string, mode?:string, codec?:codec):map {block?}

In many cases, this document uses function open() instead of stream() to create a stream instance.

Function open() takes a pathname string as its argument and returns a stream instance.

fd = open('foo.txt')
// fd is a stream to read data from "foo.txt"

When it is called with its second argument mode set to 'w', the function would create a new file and returns a stream instance to write data into it.

fd = open('foo.txt', 'w')
// fd is a stream to write data into "foo.txt"

A stream instance will be closed when method stream#close() is called on it.

fd.close()

When a stream for writing is closed, all the data stored in some buffer would be flushed out before the instance is invalidated.

Method stream#close() would also be called automatically when the instance is destroyed after its reference count decreases to zero. At times, it may be ambiguous about when the instance is destroyed, so it may be better to use stream#close() explicitly when you want to control the closing timing.

Another way to create and utilize a stream instance is to call open() function with a block procedure that will take a stream instance through its block parameter.

open('foo.txt') {|fd|
    // any jobs here
}

Using this description, you can access the created instance only within the block, which will be automatically destroyed at the end of the procedure.

15.3.2Cast from String to Stream Instance

If a certain function has an argument that expects a stream instance, you can pass it a string of a pathname, which will automatically be converted to a stream instance by a casting mechanism. The stream instance would be created as one for reading.

f(fd:stream) = {
    // fd is a stream instance for reading
    // any jobs here
}
f('foo.txt')   // same as f(open('foo.txt'))

If the argument is declared with :w attribute, the stream instance would be created for writing.

f(fd:stream:w) = {
    // fd is a stream instance for writing
    // any jobs here
}
f('foo.txt')   // same as f(open('foo.txt', 'w'))

Attribute :r is also prepared to explicitly declara that the stream is to be opened for reading.

15.3.3Stream Instance to Access Memory

Beside string, an instance of class that accesses data stored in memory can also be casted to stream. These classes are binary, memory and pointer. Using this mechanism, you can read/write memory content through stream methods.

Below is an example to cast binary to stream.

f(fd:stream) = {
    // read/write access to content of buff through fd
}
buff = binary()
f(buff)

15.3.4Stream Instance for Standard Input/Output

There are three stream instances for the access to standard input and output, which are assigned to variables in sys module.

  • sys.stdin … Standard input that retrieves data from key board.
  • sys.stdout … Standard output that outputs texts to console screen.
  • sys.stderr … Standard error output that outputs texts to console screen without interference of pipe redirection.

Functions print(), printf() and println() output texts to the stream sys.stdout. This means that the following two codes would cause the same result.

println('Hello world')
sys.stdout.println('Hello world')

You can also assign a stream instance you create to these variables. Assignment to sys.stdout would affect the behavior of functions such as println().

sys.stdout = open('foo.txt', 'w')
println('Hello world')   // result will be written into 'foo.txt'.

15.3.5Stream with Text Data

There are fundamental functions that print texts out to standard output stream.

  • Function print() takes multiple values that are to be printed out to sys.stdout in a proper format.
  • Function println() works the same as print() but also puts a line feed at the end.
  • Function printf() works similar with C language's printf() function and prints values to sys.stdout based on format specifiers. See chapter String Operation for more details about formatter.

Below is a sample code using above functions to get the same result each other.

n = 3, name = 'Tanaka'
print('No.', n, ': ', name, '\n')
println('No.', n, ': ', name)
printf('No.%d: %s\n', n, name)

Class stream is equipped with methods stream#print(), stream#println() and stream#printf() that correspond to functions print(), println() and printf() respectively, but output result to the target stream instread of sys.stdout. The code below outputs string to a file foo.txt.

n = 3, name = 'Tanaka'
open('foo.txt', 'w') {|fd|
    fd.print('No.', n, ': ', name, '\n')
    fd.println('No.', n, ': ', name)
    fd.printf('No.%d: %s\n', n, name)
}

Method stream#readline() returns a string containing one line of text from the stream. It will return nil when it reaches to end of the stream, so you can write a program that prints content of a file as below:

fd = open('foo.txt')
while (line = fd.readline()) {
    print(line)
}

Regarding that you often need to read multiple lines from a stream, method stream#readlines() may be more useful. It creates an iterator that returns each line's string as its element. A program to prints contet of a file comes as below:

fd = open('foo.txt')
lines = fd.readlines()
print(lines)

Using function readlines() that takes stream instance as its argument, you don't need to explicitly open a stream because of casting mechanism from string to stream. This is the simplest way to read text files.

lines = readlines('foo.txt')
print(lines)

If you want to eliminate a line feed character that exists at each line, specify :chop attribute.

lines = readlines('foo.txt'):chop
println(lines)

An iterator created by method stream#readlines() and function readlines() owns a reference to the stream instance because they're designed to read data from it while iteration. This means that the stream instance won't be released while such iterator is running.

Consider the following code that is expected to read text from foo.txt and write text back to the same file after converting alphabet characters to upper case.

lines = readlines('foo.txt')
open('foo.txt', 'w').print(lines:*upper())

Unfortunately, this program doesn't work correctly. The iterator lines owns a stream to read content from the file foo.txt, which conflicts with the attempt to open foo.txt for writing. To avoid this, you need to call readlines() function with :list attribute that reads whole the lines from the stream before storing them to a list instance. The function would release the stream and then return the list instance as its result.

lines = readlines('foo.txt'):list
open('foo.txt', 'w').print(lines:*upper())

Method stream#readtext() returns a string containing the whole content of the stream.

txt = fd.readtext()

As for the character sequence existing at each end of line in a file, two types of sequence are acceptable: LF (0x0a) and CR(0x0d)-LF(0x0a). Some systems like Linux that have inherited from UNIX uses LF code at line end while Windows uses CR-LF sequence. By default, the following policies are applied so that the string read from a file would only contain LF code.

  • When reading, all the CR codes are removed.
  • When writing, there's no modification about the sequence of end of line. This results in a file containing only LF code.

To change this behavior, use methods stream#delcr() and stream#addcr(). If you want to keep CR code from the read text, call stream#delcr() method with an argument set to false.

fd.delcr(false)

If you want to append CR code at each end of line in a file to write, call stream#addcr() method with an argument set to true.

fd.addcr(true)

15.3.6Character Codecs

While a string instance holds string data in UTF-8 format, there are various character code sets to describe texts in files. To be capable of handling them, a stream instance may contain an instance of codec class that is responsible of converting characters between UTF-8 and those codes. You can specify a codec instance to a stream by passing it as a third argument of open() function.

fd = open('foo.txt', 'r', codec('cp932'))

Since there's a casting feature from string to codec instance, you can simply specify a codec name to the argument as well.

fd = open('foo.txt', 'r', 'cp932')

Below is a table that shows what codecs are available and what module provides them.

Module Available Codec Names
codecs.basic base64, us-ascii, utf-8, utf-16
codecs.chinese big5, cp936, cp950, gb2312
codecs.iso8859 iso8859-1, .. iso8859-16
codecs.japanese cp932, euc-jp, iso-2022-jp, jis, ms_kanji, shift_jis
codecs.korean cp949, euc-kr

Codecs only have effect on methods to read/write text data that are summarized below:

stream#print(), stream#println(), stream#printf()
stream#readline(), stream#readlines(), stream#readtext()

The standard output/input streams, sys.stdin, sys.stdout and sys.stderr, must be equipped with a codec of the character code set that the console device expects. While the detection of a proper codec is done by a value of environment variable LANG or a result of some system API functions, it may sometimes happen that you want to change codec in these. In such a case, you can use stream#setcodec() like below:

sys.stdout.setcodec('utf-8')

15.3.7Stream with Binary Data

In addition to methods to handle text data, class stream is equipped with methods to read/write binary data as well.

Method stream#read() reads specified size of data into a binary instance and returns it. When the stream reaches its end, the method returns nil.

open('foo.bin') {|fd|
    while (buff = fd.read(512)) {
        // some jobs with buff
    }
}

Method stream#write() writes content of a binary instance to the stream.

open('foo.bin', 'w') {|fd|
    fd.write(buff)
}

Method stream#seek() moves the current offset at which read/write operations are applied.

Method stream#tell() returns the current offset.

Methods stream.copy(), stream#copyto() and stream#copyfrom() are responsible of copying data from a stream to another stream. They have the same result each other but take stream instances in different ways. Below shows how they are called where src means a source stream and dst a destination.

stream.copy(src, dst)
src.copyto(dst)
dst.copyfrom(src)

These methods can take a block procedure that takes binary instance containing a data segment during the copy process. The size of a data segment can be specified by an argument named bytesunit.

stream.copy(src, dst) {|buff:binary|
    // any job during copying process
}

You can use the block procedure with the copying method to realize a indicator that shows how much process the methods have done.

Method stream#compare() compares contents between two streams and returns true if there's no difference and false otherwise.

15.3.8Filter Stream

A Filter Stream is what is attached to other stream instance and applies data modification while reading or writing operation.

There are two types of Filter Stream: reader and writer.

A Filter Stream of reader type applies operation on methods for reading data including stream#read(), stream#readline(), stream#readlines() and stream#readtext().

+--------+    +---------------+
| stream |--->| filter stream |---> (reading data)
|        |    |   (reader)    |
+--------+    +---------------+

A Filter Stream of writer type applies operation on methods for writing data including stream#write(), stream#print(), stream#println() and stream#printf().

+--------+    +---------------+
| stream |<---| filter stream |<--- (writing data)
|        |    |   (writer)    |
+--------+    +---------------+

Module gzip provides functions to read and write files in gzip format, which usually have a suffix .gz. Importing the module would add following methods to stream class.

  • stream#gzipreader() returns a stream from which you can read data after decompressing a sequence of gzip format from the attached stream.
  • stream#gzipwriter() returns a stream to which you can write data that are to be compressed to a sequence of gzip format into the attached stream.

Module bzip2 provides functions to read and write files in bzip2 format, which usually have a suffix .bz2. Importing the module would add following methods to stream class.

  • stream#bzip2reader() returns a stream from which you can read data after decompressing a sequence of bzip2 format from the attached stream.
  • stream#bzip2writer() returns a stream to which you can write data that are to be compressed to a sequence of bzip2 format into the attached stream.

Module base64 provides functions to encode and decode files in Base64 format, which often appear in protocols of network. It's a build-in module that you can utilize without importing and makes following methods available in stream class.

  • stream#base64reader() returns a stream from which you can read data after decoding a sequence of Base64 format from the attached stream.
  • stream#base64writer() returns a stream to which you can write data that are to be encoded to a sequence of Base64 format into the attached stream.

Following code is an example to read content of a file in gzip format:

import(gzip)
open('foo.gz') {|fd_gzip|
    fd = fd_gzip.gzipreader()
    // reading process from fd
    fd.close()
}

These methods that generate a Filter Stream can accept a block procedure just like open() function, in which you can take the instance of Filter Stream as a block parameter.

import(gzip)
open('foo.gz') {|fd_gzip|
    fd_gzip.gzipreader {|fd|
        // reading process from fd
    }
}

Or simply, you can write it as below:

import(gzip)
open('foo.gz').gzipreader {|fd|
    // reading process from fd
}

The same goes with a writing process. In this case, the attached stream must have a writing attribute.

import(gzip)
open('foo.gz', 'w') {|fd_gzip|
    fd = fd.gzipwriter()
    // writing process to fd
    fd.close()
}

You can also attach a Filter Stream on yet another Filter Stream, which enables you to compose a chain of stream. Following is a code to decode a sequence in Base64 and then decompress it with gzip algorithm:

import(gzip)
open('foo.gz.hex') {|fd_hex|
    fd_hex.base64reader().gzipreader {|fd|
        // reading process from fd
    }
}

Below shows a diagram of the process:

+--------+    +-----------------+    +---------------+
| stream |--->|  filter stream  |--->| filter stream |---> (reading data)
|        |    | (base64 reader) |    | (gzip reader) |
+--------+    +-----------------+    +---------------+

You can construct a chain of stream for writing process, too.

import(gzip)
open('foo.gz.hex', 'w') {|fd_hex|
    fd_hex.base64writer().gzipwriter {|fd|
        // writing process to fd
    }
}

Below shows a diagram of the process:

+--------+    +-----------------+    +---------------+
| stream |<---|  filter stream  |<---| filter stream |<--- (writing data)
|        |    | (base64 writer) |    | (gzip writer) |
+--------+    +-----------------+    +---------------+

15.3.9Stream with Archive File and Network

After importing tar module, you can create a stream that reads an item stored in a TAR archive file. When a pathname contains a filename suffixed with .tar, .tgz, .tar.gz or tar.bz2, it would decompress the content in accordance with TAR format. The example below opens an item named src/main.cpp in a TAR file foo/example.tar.gz.

import(tar)
open('foo/example.tar.gz/src/main.cpp') {|fd|
    // reading process from fd
}

Since all the works necessary to decompress content of archive files are encapsulated in Path Manager framework, you can access them just like an ordinary file in file systems. Below is an example to print content of src/main.cpp in foo/example.tar.gz.

import(tar)
print(readlines('foo/example.tar.gz/src/main.cpp'))

After importing zip module, you can create a stream that reads an item stored in a ZIP archive file. When a pathname contains a filename suffixed with .zip, it would decompress the content in accordance with ZIP format. The example below opens an item named src/main.cpp in a TAR file foo/example.zip.

import(zip)
open('foo/example.zip/src/main.cpp') {|fd|
    // reading process from fd
}

Importing curl module, which provides features to access network using curl library, or importing http module would make Path Manager able to recognize URIs that begin with protocol names like "http" and "ftp".

import(curl)
open('http://www.example.com/doc/index.html') {|fd|
    // reading process from fd
}

15.4Directory

15.4.1Operations

A Directory is a data object to seek a list of files and sub directories and is represented by directory class. But currently, there's few chance to utilize the directory instance explicitly since it is usually built in other objects like iterators and hidden from users. One thing you have to note about directory is that you can cast a string containing a pathname to directory instance, so you can pass a pathname to an argument declared with directory type.

There are three functions that searches items like files and sub directories: path.dir(), path.glob() and path.glob(). Consider the following directory structure to see how these functions work.

example
|
+--dir-A
|  +--file-4.txt
|  `--file-5.txt
+--dir-B
|  +--dir-C
|  |  +--file-6.doc
|  |  `--file-7.doc
|  `--dir-D
+--file-1.txt
+--file-2.doc
`--file-3.txt

Function path.dir() creates an iterator that returns pathname of items that exists in the specified directory. For example, a call path.dir('example') create an iterator that returns following strings.

example/dir-A/
example/dir-B/
example/file-1.txt
example/file-2.doc
example/file-3.txt

Function path.glob() creates an iterator that returns pathname of items matching the given pattern with wild cards. For example, a call path.glob('example/*.txt') create an iterator that returns following strings.

example/file-1.txt
example/file-3.txt

Function path.walk() creates an iterator that seeks directory structure recursively and returns pathname of items. For example, a call path.walk('example') create an iterator that returns following strings.

example/dir-A/
example/dir-B/
example/file-1.txt
example/file-2.doc
example/file-3.txt
example/dir-A/file-4.txt
example/dir-A/file-5.txt
example/dir-B/dir-C/
example/dir-B/dir-D/
example/dir-B/dir-C/file-6.doc
example/dir-B/dir-C/file-7.doc

15.4.2Status Object

By default, functions path.dir(), path.glob() and path.glob() create an iterator that returns a string of pathname. Specifying :stat attribute would create an iterator generating an object called stat that contains more detail information about items.

There are several different stat instances depending on the container in which an item exists, which provide various properties for additional information as well as the item's name.

An item in file system returns fs.stat instance that has following properties.

Property Name Data Type Content
pathname string
dirname string
filename string
size number
uid number
gid number
atime datatime
mtime datatime
ctime datatime
isdir boolean
ischr boolean
isblk boolean
isreg boolean
isfifo boolean
islnk boolean
issock boolean

The code below shows an example that prints each filename and size of items under a directory example.

stats = path.dir('example'):stat
printf('%-16s %d\n', stats:*filename, stats:*size)

15.4.3Directory in Archive File

After importing tar module, you can get a list of items stored in a TAR archive file. The code below prints all the items stored in example.tar.gz by path.walk().

println(path.walk('example.tar.gz/'))

Note that you have to append a directory separator after the archive filename so that Path Manager recognize it as a container, not an ordinary file.

An item in TAR archive file returns tar.stat instance that has following properties.

Property Name Data Type Content
name string
filename string
linkname string
uname string
gname string
mode number
uid number
gid number
size number
mtime datetime
atime datetime
ctime datetime
chksum number
typeflag number
devmajor number
devminor number

After importing zip module, you can get a list of items stored in a ZIP archive file. The code below prints all the items stored in example.tar.gz by path.walk().

println(path.walk('example.zip/'))

An item in ZIP archive file returns zip.stat instance that has following properties.

Property Name Data Type Content
filename string
comment string
mtime datetime
crc32 number
compression_method number
size number
compressed_size number
attributes number

15.5OS-specific Operations

15.5.1Operation on File System

Module fs provides functions that work with file systems.

Function fs.mkdir() creates a directory. If there are non-existing directories in the specified pathname, it would occur an error. Specifying attribute :tree would create intermediate directories in the pathname if they don't exist.

Function fs.rmdir() removes a directory. If the specified directory contains files or sub directories, it would occur an error. Specifying attribute :tree would remove all such items before deleting the specified directory.

Function fs.remove() removes a file.

Function fs.rename() rename a file or a directory.

Function fs.chmod() modifies attribute of a file or a directory.

Function fs.cpdir() copies content of a directory to another directory.

15.5.2Execute Other Process

Function os.exec() executes a process and waits until it finishes. While the process runs, its standard output and standard error are redirected to streams defined by variables os.stdout and os.stderr, and its standard input is redirected from os.stdin. By default, variables os.stdin, os.stdout and os.stderr are assigned with sys.stdin, sys.stdout and sys.stderr respectively. You can modify those variables to retrieve console output from the process and feed text data to it through standard input. Below is an example to run an executable foo after redirecting the standard output to a memory buffer.

buff = binary()
saved = os.stdout
os.stdout = buff.writer()
os.exec('foo')
os.stdout = saved
print(os.fromnative(buff))

Function os.fromnative() converts binary instance that contains a raw data from the process to a string in UTF-8 format.