wolfygame 2/25/19

The engine now supports variable height ceilings and floors, plus distance fog.

Distance Fog

There are two kinds of distance fog applied, linear and exponential. The linear fog linearly interpolates between the original pixel color and the fog color over the range of a start and end distance. The exponential fog causes the non-fog amount to decay exponentially with distance.

Linear Fog

     1 |              --------------
 fog   |             /
 amt   |            /
     0 | __________/
       ---------------------------
       0          s   e
                distance

    function Unlerp(v, v1, v2) = (v - v1) / (v2 - v1)
    function Clamp(v, v1, v2)  = min(max(v, v1), v2)
    fog_amt = Clamp(Unlerp(distance, start, end), 0, 1)

Linear fog is zero until the start distance and then linearly rises to 1 at the end distance. All distances beyond the end are 100% fog. Linear fog does not look as nice as other fog models, but the explicit end distance provides another benefit.

The engine casts a single ray for each screen column. In a Wolfenstein 3D style world, all of the walls are the same maximum height and they occlude all walls behind them. As a result, you can stop casting as soon as you hit a single wall and render it. However, when rendering with variable floor and ceiling heights, each collision with a block may occlude some of the bottom and top of the column, but not all, so you cannot stop casting at the first wall.

At each tile edge, if the floor height changes, the old and new floor heights are perspective projected onto the column. Then either a floor then wall, or just a floor (if the new height is lower) are drawn upwards from the top of the last floor/wall segment drawn. The same thing happens for the ceiling heights, drawing from the top down. Rendering is not done until the last rendered bottom pixel (rendering floors) meets the last rendered top pixel (rendering ceilings) and all of the column's pixels are written. This, however, can take a while and will cause your rays to cast out to a very far distance just to render a small number of floor and ceiling pixels in the center.

The linear distance fog allows you to end the ray cast early. Once the ray length is beyond the end distance for the fog, the remaining pixels in the column can all be written as the fog color. When rendering a large level, this can provide a huge speedup. However, if the end distance for the fog is short and the color of the fog is a strong color, this creates an obvious "fog wall" beyond which nothing is rendered.

To make nice, "realistic" fog, like what you would use for atmosphere, the exponential fog equation is better.

Exponential Fog

Exponential fog rises quickly but slows as it approaches full fog, never quite reaching it.

    fog_amt = Clamp(1 - exp(-(distance * density)), 0, 1)

The density parameter replaces the start and end distances. A higher density makes the fog greater and closer. This looks much more like actual fog, but it no longer has an explicit end distance. Still, past a certain point, nothing is visible.

In the engine, both of these fog effects are used at the same time. The exponential fog is used for effect, but at a far distance (where the expontential fog is already thick) linear fog is applied too, allowing the rays cast for each column to end early.

Projections

The lodev raycasting tutorial (see the previous devlog entry) gives the following formula for determining the height of walls:

lineHeight = screen_height / perpWallDist;
lineTop    = screen_height / 2 - lineHeight / 2
lineBottom = screen_height / 2 + lineHeight / 2

This code works and is easy to implement, but it is only technically correct with a 90 degree field of view if the screen buffer has a 2:1 aspect ratio. If you have a 16:9 aspect ratio, this makes walls that are a bit too tall.

To get the correct equation, let's break the projection calculation down into parts. The player is at some position P , facing along a unit direction vector D . The normal N of D gives you the projection plane onto which the scene is rendered. For a 90 degree field of view, you cast rays sweeping from D - N to D + N .

Let's assume that we are rendering to a 640x360 size screen buffer. We can get the screen X coordinate by mapping [-1, 1] on the projection plane to [0, 640] on the screen. In practice, this is done the other way around by iterating the screen columns and generating a ray through the corresponding projection plane point:

for (int column_number = start_column;
     column_number < end_column;
     column_number++) {
    f64  plane_scale   = (2.0 * (f64)column_number / (f64)cam->target.width) - 1.0;
    V2   ray_direction = Unit(cam->direction + cam->plane * plane_scale);
    f64  ray_cos       = Dot(ray_direction, cam->direction);
    auto ray_cast      = BeginRayCast(position, ray_direction, map->tile_size);

    ... etc ...
    ... etc ...
    ... etc ...
}

This mapping means that, with a 90 degree field of view, one unit along the projection plane corresponds to screen_width / 2 == 320 pixels.

Perspective Projection

When doing rasterization (ie, rendering the normal, familiar way), you have a camera and some objects somewhere in world coordinate space. The coordinates of all of vertices of the objects are transformed from world space into camera space: a coordinate system with the camera at the center (0, 0 ) and with two of the axes (typically, X and Y) aligned with the axes of the 2D projection plane. In this space, (typically) Z is perpendicular to the projection plane, parallel to the direction vector of the camera. Once transformed into camera space, the X and Y coordinates of a perspective projection onto the projection plane are simply:

proj_x = vertex_camera_space_x / vertex_camera_space_z;
proj_y = vertex_camera_space_y / vertex_camera_space_z;

This same transform from world space to camera space to the projection plane occurs in ray casting, but is never made clear by tutorials. Because each ray is cast per screen column, we already know the X coordinate for a column:

proj_x = 2 * column_number / screen_width - 1

The Y coordinate of the point still needs to be found. From the ray cast, we know the distance (full_distance below) between the player and the point in question. This full_distance needs to be transformed into a distance perpendicular to the projection plane in order to use the perspective transform above. This can be done by multiplying the full distance by the cosine of the angle between the ray and direction vectors (you can see this by drawing a right triangle using the ray, the direction vector, and the point). This gives you the following:

// Dot product is an easy way to get cosine iff both are unit vectors:
ray_cos = Dot(ray, direction) 

perpendicular_distance = full_distance * ray_cos
camera_space_z = point_z - camera_z
proj_y = camera_space_z / perpendicular_distance;

Note that the projection plane Y is computed from the camera space Z. X and Y are the top down coordinates corresponding to the 2D tile map and Z is height. Now that we have coordinates for a point on the projection plane, we can convert them to pixel coordinates on the screen buffer. Remember, our 90 degree field of view gives a [-1, 1] to [0, 640] mapping, so 1 projection plane unit is 320 pixels. Thus:

pixels_per_proj_plane_unit = screen_width / 2   
screen_x = screen_width / 2  - pixels_per_proj_plane_unit * proj_x
// screen_x here is superfluous, since it should be equal to
// column_number, which is already known
screen_y = screen_height / 2 - pixels_per_proj_plane_unit * proj_y

Just getting the height of a wall (like in the lodev code that started this section) is:

screen_line_height = pixels_per_proj_plane_unit * 
                     wall_height / perpendicular_distance

Different FOV

But what if we have a 60 degree field of view?

For a 60 field of view, the rays are cast from -(2/3)N to (2/3)N . So [-0.66, 0.66] on the projection plane maps to [0, 640] on the screen. This means we have:

pixels_per_proj_plane_unit = (screen_width / 2) / 0.66

In general, for any field of view, we can get this value:

fov_ratio = field_of_view / 90.0
pixels_per_proj_plane_unit = (screen_width / 2) / fov_ratio

Putting this all together

This gives us the following alternative to the lodev tutorial's formula for getting wall heights.

fov_ratio = field_of_view / 90.0
pixels_per_proj_plane_unit = (screen_width / 2) / fov_ratio
screen_line_height = pixels_per_proj_plane_unit * wall_height / perpendicular_distance

This shows the unexplained assumptions that are present in the lodev tutorial code:

It assumes that the walls have a height of 1.0.
It assumes that the field of view is 90 degrees with a 2:1 aspect ratio
OR It assumes that the field of view is 60 degrees with a 4:3 aspect ratio (see below).

// The lodev tutorial uses a 60 degree field of view and a 640x480 screen
// Lets compute the pixels per projection plane unit:
pixels_per_proj_plane_unit == (640 / 2) / 0.66 == 484.84... ~= 480
// HUH, the result is almost exactly the screen_height, making the formula
// in the tutorial approximately correct.

In general, the tutorial uses default values of 1.0 in many places to simplify the code, but this comes at a cost of not fully explaining how or why things work. The screen size and its relationship to the projection is another one unexplained.

The point of all of this

So why explain all of that? Am I just here to attack someone's tutorial? No, I actually like the tutorial a lot, and it along with the permadi.com tutorial are both good resources for raycasting. They also, as far as I can tell, are the only ones, as other tutorials I have found are mostly re-hashes of the same content.

The real reason is that, when you move to having variable floor and ceiling heights, along with other features, you end up having to change so much that the tutorials don't help enough. For example:

The permadi tutorial leads to a warped view at field of views greater than about 60 (see my last post), making it incorrect when you want variable FOV.
The lodev tutorial computes the perpendicular distance with a "shortcut" instead of using the cosine, which is fine - but once you add floor textures, ceiling textures, and fog, you will need the full distance and the cosine anyways, so you're better off doing it the ordinary way.
The permadi tutorial has a working and correct explanation of variable height floors and ceilings, but it is all implemented using similar triangles between the camera space and a sort-of pixel-space where the viewer is looking at the screen. Thus, early on, it computes a distance from the viewer to the projection plane in pixels, instead of using a unit vector and transforming everything to screen coordinates later.

If I get around to it, I'd like to write a new raycasting tutorial that incorporates all the things I think are missing from these two.

Some thoughts on ray casting

A good approach to thinking about raycasting is to consider it a half raytracing half rasterization process.

When ray tracing (and you are just using textured voxels, not doing lighting, etc), you do not need do any rasterization or coordinate transforms. You cast rays through the projection plane and then color the corresponding pixel based on what it hits.

When doing full rasterization, you transform all of the vertices of objects into camera space and get a perspective projection of them to render.

In raycasting you do a little of both of these things. The X/Y plane of the world and the X column of the screen/projection are handled by the ray. The ray searches for anything of interest, and when hit, the distance from the ray and height information about what was hit are used to rasterize part of a column. So when you draw wall and floor textures in a column, it is best to reason about this using coordinate transforms and projection, only from two dimensions to one (height + distance to screen y) instead of three to two (x, y, z to screen x, y).

What next

The next task, before adding more graphics features like sprites and basic lighting, is to get a map editor and the ability to edit the map "in-game". It will be hard to test later features with this arbitrarily generated mess of a map that is currently used. (Like, for instance, right now, the floor and ceiling heights are set by testing bit masks on the tile coordinates, there is no stored height info per tile yet).