Dynamic iterators

Introduction

If you ever tried to return an Iterator from a function and had difficulties with it, this article might help.

Let’s introduce a context first. Imagine for a minute you’re trying to define a trait to list the colors of anything. For example, you could implement this for vegetables, logos or even flags.

You’ll probably define the trait like this.

#[derive(Clone, Copy, Eq, Hash)]
struct Color(u8, u8, u8);
trait Colors {
	fn colors(&self) -> Vec<Color>;
}

Very simple right? But what about the zero-cost abstraction that Rust embraces. Why would we want to return a Vec<Color> if when we use it, we only need the first one for example? No need to allocate memory space for an entire Vec.

First try: return an Iterator type

Fine, let’s use an Iterator instead: the one returned by Vec::iter().

trait Colors<'a> {
	fn colors(&'a self) -> std::slice::Iter<'a, Self>;
}

At this point, we are enforcing which type of Iterator any implementor of Colors would need to use. If you don’t see the problem, let’s understand what is wrong with it before going forward.

Imagine the following implementation for struct Flag.

struct Flag {
	country: String,
	colors: Vec<Color>,
}
impl Colors<'a> for Flag {
	fn colors(&'a self) -> std::slice::Iter<'a, Color> {
		self.colors.iter()
    }
}

Everything looks fine so far. But why do we use a Vec<Color> for the list of colors. We shouldn’t have twice the same color in this list so let’s replace Vec<Color> with HashSet<Color> (Eq and Hash are implemented on Color).

You will get the following compilation error because self.colors.iter() doesn’t produce a std::slice::Iter but a std::collections::hash_set::Iter.

error[E0308]: mismatched types
  --> src/lib.rs:15:9
   |
15 |         self.colors.iter()
   |         ^^^^^^^^^^^^^^^^^^ expected struct `std::slice::Iter`, found struct `std::collections::hash_set::Iter`
   |
   = note: expected struct `std::slice::Iter<'a, Color>`
              found struct `std::collections::hash_set::Iter<'_, Color>`

Second try: associated type

Let’s change a bit our trait definition so the implementor can decide the Iterator type.

trait Colors<'a> {
    type ColorsIter: Iterator<Item = &'a Color>;
    fn colors(&'a self) -> Self::ColorsIter;
}

And then the implementation would become the following.

impl<'a> Colors<'a> for Flag {
    type ColorsIter = std::collections::hash_set::Iter<'a, Color>;
    fn colors(&'a self) -> Self::ColorsIter {
        self.colors.iter()
    }
}

Perfect, we’re done. Are we, really? Something itches me. Indeed, we basically push the complexity to the implementor. If I could, I’d like to have an API as simple as possible. On top of that, the complexity here might actually become close to a nightmare. Let’s say for example that Colors trait is documented as follow “list all the colors of an object (black is not considered a color)”.

But you’re the implementor of struct Flag and you do not really care of the consideration of some scientists to consider that black is not a color: some flags contains black and you have to store it. But that’s fine, you can still implement it by filtering out the black.

impl<'a> Colors<'a> for Flag {
    type ColorsIter = std::collections::hash_set::Iter<'a, Color>;
    fn colors(&'a self) -> Self::ColorsIter {
        self.colors
            .iter()
            .filter(|color| color.0 != 0 && color.1 != 0 && color.2 != 0)
    }
}

Here is the compilation error you’ll get.

error[E0308]: mismatched types
  --> src/lib.rs:17:9
   |
16 |       fn colors(&'a self) -> Self::ColorsIter {
   |                              ---------------- expected `std::collections::hash_set::Iter<'_, Color>` because of return type
17 | /         self.colors
18 | |             .iter()
19 | |             .filter(|color| color.0 != 0 && color.1 != 0 && color.2 != 0)
   | |_________________________________________________________________________^ expected struct `std::collections::hash_set::Iter`, found struct `std::iter::Filter`
   |
   = note: expected struct `std::collections::hash_set::Iter<'_, _>`
              found struct `std::iter::Filter<std::collections::hash_set::Iter<'_, _>, [closure@src/lib.rs:19:21: 19:73]>`

Of course, using .filter() doesn’t get you a std::collections::hash_set::Iter anymore but a std::iter::Filter. Lucky for you, the compiler is nice and tell you the type you need to use…​ or is it? Did you notice the part that looks like [closure@src/lib.rs:19:21: 19:73]? How do you define that?

Either you’re experienced enough and know how to get out of this situation, or you’re not and feel like maybe, you’re not going to use this Colors trait afterall (read fourth try below, it might help!). In any case, I hope you’re convinced our API leaked some bad complexity to the implementor. Can we do better?

Third try: impl Iterator

Since Rust 1.26, we can use the impl keyword in function parameters and return types. Let’s try it.

trait Colors<'a> {
    fn colors(&'a self) -> impl Iterator<Item = &'a Color>;
}

impl<'a> Colors<'a> for Flag {
    fn colors(&'a self) -> impl Iterator<Item = &'a Color> {
        self.colors
            .iter()
            .filter(|color| color.0 != 0 && color.1 != 0 && color.2 != 0)
    }
}

Well, here is what the compiler will tell you.

error[E0562]: `impl Trait` not allowed outside of function and inherent method return types
 --> src/lib.rs:6:28
  |
6 |     fn colors(&'a self) -> impl Iterator<Item = &'a Color>;
  |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

And indeed, if we jump to the RFC1522 that introduced this nice behavior, there is a limitation (might be lifted in the future but not now).

`impl Trait` may only be written within the return type of a freestanding or
inherent-impl function, not in trait definitions or any non-return type
position. They may also not appear in the return type of closure traits or
function pointers, unless these are themselves part of a legal return type.

Fork! It seems like the ideal solution but is not yet supported in the language.

Fourth try: Box<dyn Iterator>

Associated types were not a nice solution because we let the implementor deal with the complexity. But maybe there is a solution not so complex for the implementor. We know we want to return something that implements Iterator. Trait-objects might be the solution here.

Let’s get back to our previous definition using an associated type with a little tweak.

trait Colors<'a> {
    type ColorsIter: Iterator<Item = &'a Color> + 'a;
    fn colors(&'a self) -> Self::ColorsIter;
}

Note that since we iterate over references of colors, the Iterator itself must live at least as long as these references, which explains why we need to add + 'a.

Then we now can use some boxed trait-object to solve our problems.

impl<'a> Colors<'a> for Flag {
    type ColorsIter = Box<dyn Iterator<Item = &'a Color> + 'a>;
    fn colors(&'a self) -> Self::ColorsIter {
        Box::new(
            self.colors
                .iter()
                .filter(|color| color.0 != 0 && color.1 != 0 && color.2 != 0),
        )
    }
}

This works just fine. Using trait-object, the implementor don’t need to know the exact Iterator type, just that it implements Iterator.

Fifth try: wrapping Box<dyn Iterator>

Are we not done here? Well, here is me splitting hairs. Sure we’ve found something manageable but can we simplify a bit the life of a potential implementor by hiding some of these Box and dyn? How about we create a type to hide this complexity.

pub struct DynIter<'iter, V> {
	iter: Box<dyn Iterator<Item = V> + 'iter>,
}

And then we need to implement Iterator on it.

impl<'iter, V> Iterator for DynIter<'iter, V> {
    type Item = V;
    fn next(&mut self) -> Option<Self::Item> {
        self.iter.next()
    }
}

Let’s also create a new() method to remove some of the boilerplate of creating a DynIter.

impl<'iter, V> DynIter<'iter, V> {
    pub fn new<I>(iter: I) -> Self
    where
        I: Iterator<Item = V> + 'iter,
    {
        Self {
            iter: Box::new(iter),
        }
    }
}

And here we are, no more dyn or Box.

impl<'a> Colors<'a> for Flag {
    type ColorsIter = DynIter<'a, &'a Color>;
    fn colors(&'a self) -> Self::ColorsIter {
        DynIter::new(
            self.colors
                .iter()
                .filter(|color| color.0 != 0 && color.1 != 0 && color.2 != 0),
        )
    }
}

Conclusion

Since I’ve found out that the above solution could be reusable, I actually made a crate out of it: dyn-iter. Please report issue or improvement in the project repository.

I hope you learned something. At least, I’ve learned a lot by trying to make all of this work. I’m sure there is still some hidden corners that I don’t master yet. If you found any mistake in what I’m saying, or some ideas to improve this blog post, please let me know on the blog repository.

Take second try and third try, shake, and here we go

After the first publication of this article, chrysn provided some more information about RFC2515 that actually brings the impl keyword into associated types. At the time of writing [1], this is already available in nightly compiler.

#![feature(type_alias_impl_trait)]

#[derive(Clone, Copy, PartialEq, Eq, Hash)]
struct Color(u8, u8, u8);

struct Flag {
    country: String,
    colors: Vec<Color>,
}

trait Colors<'a> {
    type ColorsIter: Iterator<Item = &'a Color>;
    fn colors(&'a self) -> Self::ColorsIter;
}

impl<'a> Colors<'a> for Flag {
    type ColorsIter = impl Iterator<Item = &'a Color>;

    fn colors(&'a self) -> Self::ColorsIter {
        self.colors
            .iter()
            .filter(|color| color.0 != 0 && color.1 != 0 && color.2 != 0)
    }
}

Once this is stable, I’ll pretty much forget about dyn-iter crate and use this new solution instead.


1. Written on October 24th, 2020

links

social