Want this course?
Course: JVM Fundamentals for Clojure
In this video, we go over some common patterns for using the java.io package and how the clojure.java.io namespace handles the most common cases for you.
The Java standard library has an interesting io story. It is an object oriented API. It has a lot of types with classes, with sub-classes and stuff like that. But, it's actually one of the better designed parts of Java, and it's built to be very composal. And we'll see, we'll see that in a minute.
So, there's two ideas. There's streams and there's reading and writing. These are the two basic kinds of things that you can do with data, and we'll go over them. So in input stream, there's two types of streams, there's in and out, it's like a different direction. The input stream is a stream of bytes. So you're getting direct access to the bytes that are in there. With reading and writing, you're getting characters in a particular encoding. So if you just want bytes, like you want binary data, you use the streams, the input stream and the output stream.
So I'm here on the Java doc, and say I'm in the package java.io and the class is input stream. Now, we're going to go down to the methods, and we see that there are very few methods, which is nice, it means it's a very clean, coherent interface. The smaller the interface the better. It's not a Java interface because it takes care of some stuff for you. But, it is something that you're supposed to subclass. And notice that the main operation on an input stream is read, so you read an int, like a single byte, okay, here's the reason Java does not have unsigned, it does not have unsigned types, unsigned numeric types.
So, you use an int to represent one byte, even though an int is four bytes. Okay, no unsigned types means that you have negative numbers. Everything can be negative, and then that means that you can't have a pure byte because you would get negative numbers. So, we have reading, one byte reading into a buffer, a small array of bytes, and then reading into an array but with an offset in a length. So how many bytes do you want to read?
There are some other operations like skip, you can just not read them, just skip so many bytes. And then there's this other idea of marking and resetting. These are in alphabetical order so you can't, these actually are related. When you mark, you can then later reset to that same point. On some streams, and so to know if you can do that there's a method called mark supported, which will return and tell you whether you can do that mark reset pattern. You need to close a stream when you're done, so it's got the close method. And then it can sometimes tell you, it's an estimate, it can tell you the number of bytes that are ready to read without blocking.
All io and Java is blocking. So typically what people do is they make a new thread to read stuff. They use threads that are blocked to read io. So I said they're composable. One of the very common patterns, you see there's file input stream which reads bytes from a file, and then down here there's buffered input stream. If we go to buffered input stream, we're going to stroll down to the contructors. You see, it takes an input stream. So, we're going to be making like a pipeline of input streams. Usually the pipeline isn't very big, but in Java, it can get kind of annoying because it will be, oh let's say verbose.
So we have java.io.BufferedInputStream. So you have a buffered input stream equals new java.io.BufferedInputStream. And then inside of that you create a new input stream, like a file input stream, and then you give it something like slash path to file. There. So, it's annoying because you say, "Well every time I want to file input stream, I have-" I want it buffered obviously, because it becomes, it's more efficient to buffer it while you're reading it, you want to read in a certain amount at a time, instead of just one byte. So, what the buffered input stream does is it will fill up a whole buffer of stuff.
Every time you read, it either gives you something from the buffer, or reads from the file as much as it needs to fill the buffer if the buffer is empty. Okay, so if there's something in the buffer it just gives you that one byte, but if there's nothing in the buffer, it's going to read in from the file and fill up the whole thing. And that's more efficient because on a spinning disk you have to do seeks, the reading head has to move, and that physical motion is kinda slow compared to reading in a lot of stuff.
So, this is a very common pattern, and in clojure, there's this name space called clojure.java.io that is trying to make all these patterns much easier. Let's actually go into that. So, here's clojure.java.io. And you see that we have a buffered input stream imported. So, there's a function in here called input stream and it will default to always return a buffered input stream. Okay, so that's because there's almost no penalty for having a buffered input stream, and you almost always want it, so why not just make it. So, let's look at what that would look like. Io input stream.
Okay, this is the clojure, and there's another thing called io file. We haven't looked at that, but we will in a minute. So this actually creates a java.io.file, which represents a path on the disk, and so I'm going to do path to file. So now I have an input stream for this file, and I'm going to run it. I don't think it'll work because I don't have that file actually, but let's see. I could make a file. Let's go to tmp, somefile.txt.
Here is a message. Okay. Somefile.txt. Uh-oh, did I not write it? Oh, I need tmp. Tmp. Okay, so it made a buffered input stream for this file. Clojure.java.io, which wraps the java.io package, is smart. If you pass it a file, it makes a file input stream, just like we made up here. But, you don't have to think about that. It makes the right one, and then it always wraps it in a buffered input stream.
Okay, so now we can read bytes from this thing, which is nice. I'm going to make this a little smaller. Okay, so what about output? Output, if you want to output bytes, you have some binary data you want to write, you use an output stream. And if we look down at the methods, we see that the primary operation is write.
There's three different ways to write. There's also close, you have to close it when you're done, that's part of the contract, and also, there's a thing called flush. So like a buffered output stream it's going to be buffering bytes and you can tell it, "Well whatever you got in the buffer, just send it out right now, instead of waiting until the buffer is full." Normally it waits until the buffer is full and it sends out a big burst of bytes. But, flush makes sure that everything gets sent out, and the buffer will be empty. That happens automatically when you close, but you don't want to close it all the time. Like, sometimes, when would you use a buffered output stream?
If you want to print something out to the user, so like a prompt, and it's just a short string so it's not going to, it's not going to fill up the buffer. But if you just said, "Write these bytes," it wouldn't show up because it's just going in the buffer. So then you flush it, and then it shows up. Very common. Okay, and then there's different types. You know, there's a file output stream, here's the buffered output stream right here, you can write directly to a byte array, if something needs an output stream. So, it's composable. If something is like, "I want to write this to an output stream," and you're like, "Well I want it in memory, I don't want you to send it over the Internet or something." You could just write it to a byte array output stream. So let's look at this.
So you can construct a byte array output stream. This one doesn't have a buffer capacity, so it's basically infinite, and then you can get the byte array back to byte array, so you get the bytes that were written. So it's composable, it's a way of kind of connecting things together. And the buffered thing happens just like with buffered input stream. And of course, clojure has a thing called output stream that will do similar things. So we can write some bytes to this.
Okay, I'm going to actually do this. So, that's output stream. Let's make that. And then I'm going to do .write. I want to write some bytes. What is the letter "A" in Ascii? Is it what, 32? No, that's space. "A" in Ascii. 65, okay, so let's try this out. I'm going to write 65. Done. And then I'm going to close. That just makes sure that it's done. Okay, now let's go back to our some file, I'm going to read it back in, and there we go, capital letter A is in there. Pretty cool. Okay, but that's bytes. I had to write this in a number, right, it's a byte. If you want to read and write characters, which is most often what we're doing, you want to use readers and writers. So, java.io.reader is the thing that's reading in. And you notice in a constructor, well, it doesn't really have, it's a character stream, okay, it's going to return characters. So, let's go down to here, and we see that these are reading characters, reads a single character.
That means that you need to have an encoding. So, I'm going to, it's very much the same kind of hierarchy, you're going to have different types of readers. For instance, string reader is reading characters from a string into a reader, and character array reader reading into a reader, you've got a buffered reader, but the really important one and really interesting one is input stream reader. So, this is going to say, "Well, this input stream reader is giving me bytes, and I'm going to be reading them as characters." But, that means I need to give it a character set. A charset. And you give it a charset or you can give it a charset name, that's another thing.
Charset decoder is not as commonly used, but you give it a charset. Let's go look at charset. So, when you make a charset you're using a canonical name. You get a list of charsets, all this stuff is part of the io stuff in Java. This one happens to be an nio, I think of the "n" as new, because this was actually added a little bit later, see this in 1.4, and io was added in 1.0. But, you know, it's still commonly used, this java.nio. And you can use a string like UTF8 to make the charset. I believe that clojure defaults to UTF8, which is what you should do. The Internet uses UTF8, and you know, what other networks are we using really? Okay, so this is the reader. You can also do that in clojure very easily. So you do... io.reader and you can give it a file. I may def it, def r. We have the reader, and then I can do .read r. And this should give me, yes, the same character code we used before. Okay? Now, if you have a charset that has multibyte characters and stuff like that, you'll get bigger numbers that can be represented in Ascii. You have to, you just have to know what characters you're using. And then we close. .close r.
Now the interesting thing about readers, let's go back to the Java docs. So if I do a buffered reader, I get some nice other methods. So I have stuff like read line. So, I can read a whole line, and look, it returns a string. I can get on streams, that's a new Java eight thing. This returns a stream of string. But, if I want to do, if I want to read the whole line I can do that. So, let's do that again. I'm going to open up a new reader because I've already closed this one. Instead of doing .read, I'm going to do .readline r. And you see, we get the string "A". Then, we close. Awesome. And the opposite of read is write, this is for writing characters. You're sending characters in a certain, you know, in a code. But, you also can send a char, which is the internal representation for characters, or you can write a whole string.
Okay? So, this is the basics of it. They're composable. If you want to buffer it you can buffer it. You can write it to character array, piped writer, this pipes stuff for output stream and everything. This is like if you wanted to take an input stream and an output stream and pipe the input stream into the output stream, or pipe the reader into the writer. You can do all this, so you see, it takes a reader and it will read stuff from it and give you the writer.
Everything you write to this writer will be, oh wait, no, it's hard to think about, create a piped writer connected to the pipe, so that's my piped reader. I mean, they're equivalent. It depends on what you need. If you need a reader, you make the reader. If you're given a writer, and you need a reader, you make the reader, the piped reader, and you can connect them up. And this is sort of like Unix pipes, where you can have stuff coming in, going out, stuff like that.
Okay, so that's the basics of input output and reader writer. Look through, I'm going to open that up, there's the clojure.java.io, which wraps all this stuff. Oh no, is my Internet out? No, it's thinking that's the main name. There. One eight API documentation. So it's small. But, it's got these things that kind of do those patterns that they recommend even in the docs, you know you go up here and it says, "Oh, this is how you should make readers." You should do it in file reader and then buffered reader. Well, clojure's just going to do it that way every time. It has other stuff which we'll get to in another lesson, all about the file system in Java.