Skip to content

Converting glTF to X3D

Michalis Kamburelis edited this page Feb 21, 2023 · 34 revisions

Table of Contents:

What is this?

This document reflects my (Michalis) experience of implementing glTF in Castle Game Engine. We load glTF files in Castle Game Engine, and convert them internally to X3D nodes before doing anything substantial (like rendering or animating). So I needed to express every idea from glTF (that I want to support in CGE) as some X3D node/field construction.

And I wanted to have lots of glTF features: Physically-Based Rendering, animations (skinned and not skinned), lights, cameras etc..

The unit that implements the conversion is here: X3DLoadInternalGltf source code. If anything in this is not clear, just go there and read the actual source.

I welcome feedback from other browsers about how they implement glTF. I have collected some information in Binary meshes about some X3DOM parts. You can write e.g. on x3d-public mailing list and I will try to incorporate it here.

Two most important accompanying documents for this:

Who is this document for?

  1. For people using glTF in conjunction with X3D. It may be helpful to you to know what happens under the hood when you do Inline { url "model.gltf" } in X3D.

    E.g. it is useful to know that each glTF animation is a TimeSensor in X3D, and you can control it from X3D. We demonstrate this case in skinned_anim_run_animations_from_x3d.x3dv (to test it, download demo models and open blender/skinned_animation/skinned_anim_run_animations_from_x3d.x3dv with view3dscene.

  2. For browser implementors that already support X3D, and want to add glTF. This means you probably want to implement something similar to what I'm doing.

Sample glTF files

Meshes

glTF "mesh" is a collection of glTF "primitives". In X3D this is just a Group of Shape nodes.

Most glTF primitive modes translate naturally to X3D:

  • glTF Triangles -> X3D [Indexed]TriangleSet (indexed or not, depending on whether indices are provided in glTF)
  • glTF TriangleStrip -> X3D [Indexed]TriangleStripSet
  • glTF TriangleFan -> X3D [Indexed]TriangleFanSet
  • glTF LineStrip -> X3D [Indexed]LineSet
  • glTF Points -> X3D PointSet (note that we don't have IndexedPointSet in X3D; it could make sense for consistency, but the usefulness of it would be probably very low; for now in CGE, we just ignore indexes of glTF Points primitive, so we possibly display more points)

Careful: glTF Lines do not naturally map to X3D [Indexed]LineSet. glTF Lines are like OpenGL GL_LINES primitive, i.e. 2 vertexes for each line. X3D [Indexed]LineSet is more like a number of OpenGL GL_LINE_STRIP primitives.

In CGE/view3dscene, we have introduced [Indexed]LineSet.mode field (see docs: https://castle-engine.io/apidoc/html/X3DNodes.html#TLineMode ):

  • STRIP (default): results in X3D spec behavior, like GL_LINE_STRIP.
  • LOOP: similar to STRIP, but each polyline is automatically closed, so it's like GL_LINE_LOOP.
  • PAIR: results in "each 2 vertexes form a line", like GL_LINES.

In effect we can also handle:

  • glTF Lines -> X3D [Indexed]LineSet with lineMode = PAIR
  • glTF LineLoop -> X3D [Indexed]LineSet with lineMode = LOOP

Most of vertex attributes and texture parameters have a straightforward translation.

For explicit tangent information, CGE has extension Tangent node.

Transformations and their animations

glTF "node" is X3D Transform.

glTF samplers that animate transformations with Linear interpolation can be expressed in X3D perfectly using TimeSensor + PositionInterpolator (to animate translation/scale) or OrientationInterpolator (to animate rotation).

glTF samplers that animate transformations with Step or CubicSpline could be expressed in X3D:

  • By simulating them using existing X3D linear interpolation.

    • For Step, you can just duplicate appropriate keys and values. It's not efficient (you'll have 2x more time points), but it is correct.
    • For CubicSpline, you can calculate a number of values in-between to approximate the curve e.g. by 10 points. This means you'll have more time points, and it is not fully precise (you'll approximate curve by a number of points), but in practice it works well for typical models.
  • Or add a field to X3D interpolators like mode to specify mode as [LINEAR, STEP, CUBIC_SPLINE]. This is more efficient, and CGE is going in this direction.

For now, Castle Game Engine

  • approximates CubicSpline by a Linear with more points (to simulate a curve).
  • as an extension, adds STEP mode for interpolators.

Extras (metadata)

glTF "extras" is a key->value dictionary to express "any additional data" at various places of the model. The idea is identical to X3D "metadata".

We convert a glTF object with extras by adding a MetadataSet into the relevant X3D node. The name of this MetadataSet is "ContainerForAllMetadataValues" (we have to invent something for MetadataSet that merely acts as a container). Then as value, we place a number of MetadataString, MetadataBoolean, MetadataDouble that correspond to the glTF extras.

For example this glTF:

            "extras" : {
                "MyObjectProperty" : "object prop value",
                "FloatProperty" : 456.789
            },

-> gets converted to thix X3D:

	metadata MetadataSet {
		name "ContainerForAllMetadataValues"
		value [
			MetadataString {
				name "MyObjectProperty"
				value "object prop value"
			}
			MetadataDouble {
				name "FloatProperty"
				value 456.78899999999999
			}
		]
	}

Cameras

glTF node with a camera translates into X3D OrthoViewpoint or Viewpoint node wrapped in Transform. This is mostly straightforward.

X3D Viewpoint.FieldOfView is equal to glTF Camera.Perspective.YFov.

Remember when converting that the default X3D Viewpoint.position is 0 0 10, for glTF camera you want to set it to 0 0 0.

glTF says that the "+Y is up", which means that e.g. gravity should work in -Y direction, regardless of the camera node transformation. But in X3D, the transformation of X3DViewpointNode changes the gravity vector. This can be solved by

  • (complicated) an extra Transform node that "cancels out" the transformation around viewpoint, and then you specify it using only orientation field of viewpoint. This requires calculating accumulated rotation during conversion.

  • (simpler) in CGE, we just use X3DViewpointNode.gravityTransform extension. Setting it to false means that gravity vector is not transformed by viewpoint transformation.

Materials

  • The standard glTF pbrMetallicRoughness material should be converted to X3D 4.0 PhysicalMaterial node. The names and interpretation of the fields base*, emissive*, metallicRoughness*, normalTexture is deliberately consistent between PhysicalMaterial and glTF standard material, to make this a straightforward conversion. All texture data is treated the same way (same channels are used for the same purpose, same channels are ignored).

  • glTF materials specified with KHR_materials_unlit extension should be converted to X3D 4.0 UnlitMaterial. Note that baseColor/baseTexture are converted to X3D emissiveColor/emissiveTexture (we are inconsistent in naming here with glTF (base->emissive), because this is better: this color is really used like "emissive" and it allows for X3DOneSidedMaterialNode to have emissive* fields that are inherited by all materials).

  • glTF materials specified with KHR_materials_pbrSpecularGlossiness extension cannot for now be reliably converted to a standard X3D node. You can handle them by converting to pbrMetallicRoughness coefficients, but this is far from perfect. My X3DLoadInternalGltf source code has some code to convert them, but only on CPU at loading (so textures with SpecularGlossiness coefficients are ignored, which breaks the look of some models).

    X3DOM has a little different variant of PhysicalMaterial that seems to account for specular/glossiness, judging from field names in the example on Binary meshes.

    glTF is backing off from KHR_materials_pbrSpecularGlossiness, recommending instead KHR_materials_specular

  • We plan to introduce in X3D PhysicalMaterial additional fields to support features consistent with glTF PBR extensions. There are 6 PBR extensions relevant now, see

Gamma Correction

While gamma correction is not something to take into account at conversion moment (you don't need to convert nodes/colors differently), it's something to take into account when rendering.

Gamma correction is necessary to get the same rendering results as glTF. X3D does not specify whether to do gamma correction.

CGE does gamma correction by default only on PBR materials. This allows to get good look for PBR materials from glTF, but also keep backward-compatible look for Phong and unlit. To be precise:

  • by default gamma correction is enabled for PhysicalMaterial (regardless if it comes from glTF or explicit X3D),
  • by default it is disabled for Material and UnlitMaterial (again, regardless if it comes from glTF or explicit X3D).

This is not perfect (for 100% glTF compatibility one should enable it always, so also on UnlitMaterial -- yes it has an effect on how the emissive color field is processed). But this default seems best to do "what authors expect" while simultaneously "not break a lot of existing models" (we use UnlitMaterial a lot internally in CGE already, and the Phong Material is used in almost all existing VRML/X3D models).

For 100% glTF correctness, you should use gamma on all glTF possible materials (so PBR and unlit). In CGE, user can switch GammaCorrection := gcAlways to achieve this.

X3DOM does gamma correction always, by default.

Future X3D specification may address this.

alphaMode

glTF allows to specify alphaMode which forces the author to explicitly choose alpha treatment: opaque, blend, mask (alpha-test).

X3D 4 now includes the Appearance.alphaMode field to express it too.

I very encourage all browsers to implement it.

This is a great feature IMHO, because auto-detecting this unavoidably fails in some complicated situations. X3D 3 specification didn't say how to decide whether you use blending or alpha-testing. While some cases are easy to auto-detect (if Material.transparency is 0.75 then you probably want blending), other cases are harder (need to analyze the texture contents to differentiate from yes/no alpha channel and smooth alpha channel; and what do you do in case multiple textures (using MultiTexture or various texture slots) indicating different blending?).

In Castle Game Engine we also had an older solution to this, Appearance.alphaChannel. It is now deprecated in favor of X3D 4 standard Appearance.alphaMode.

alphaCutoff

X3D 4 now includes the Appearance.alphaCutoff field to express it.

Note: X3DOM Appearance.alphaClipThreshold seems to provide a straightforward translation of this. (TODO: Not tested in X3DOM. Do linked X3DOM docs show good default (0.1)? CGE and glTF alphaCutoff is by default 0.5.)

Per-vertex colors

glTF mesh can contain a COLOR_0 attribute. This can be translated to X3D Color or ColorRGBA node (depending on whether accessor type is vec3 or vec4), with a caveat: the X3D Color or ColorRGBA nodes replace the color by default, while glTF attributes multiply them.

In Castle Game Engine we introduced mode to X3DColorNode to address this.

SFString  []  mode  "REPLACE"  # allowed values: ["REPLACE","MODULATE"]
  • "REPLACE" is default, and is compatible with X3D 3.
  • "MODULATE" means to multiply per-vertex colors (with the same value as was replaced by "REPLACE", like Material.diffuseColor or PhysicalMaterial.baseColor or UnlitMaterial.emissiveColor, with alpha added from XxxMaterial.transparency).

See https://castle-engine.io/x3d_implementation_rendering_extensions.php#section_ext_color_mode .

So when you import glTF, simply set mode to "MULTIPLY" on Color / ColorRGBA node, to get behavior required by glTF.

Texture coordinates

glTF says that vertical texture coordinates 0..1 go from top to bottom.

X3D, like OpenGL, says that vertical texture coordinates 0..1 go from bottom to top.

There are various possible ways to reconcile this.

  • In Castle Game Engine I introduced flipVertically field for this purpose. It is set to TRUE for all texture nodes created when importing glTF. This allows me to forget about this problem later (in shader code), and I don't need to process texture coordinates. I only need to flip image vertically at loading, which can be done in zero time (because many graphic formats, like PNG, actually already store the data from bottom to top).

  • X3DOM just flips the Y texture coordinate in the shader for PhysicalMaterial. This is simple, but it also assumes that PhysicalMaterial always comes from glTF model. Which is not true for X3D 4, PhysicalMaterial "stands on its own" -- X3D authors may use it, independently from glTF.

  • You could also use TextureTransform to achieve this, i.e. flip texture coordinates.

    appearance Appearance {
      textureTransform TextureTransform {
        translation 0 -1
        scale 1 -1
      }
      ...
    

Skinned mesh animation

TODO. Work in-progress.

  • In Castle Game Engine we simply read the glTF skinned animation data, and "unpack" it at loading time, using CPU, into TimeSensor + CoordinateInterpolator. This means that at runtime, we just do CoordinateInterpolator animation, not skinned mesh animation anymore. This is not the final solution. Although in practice it works very nicely:

    • It is very efficient, even on large models (since CoordinateInterpolator is so simple, it's very nicely optimized, even though it means we update GPU vertex object every frame).
    • The loading time (when we calculate CoordinateInterpolator) isn't a practical problem.
    • It cooperates nicely with animation blending.
    • The bones can still be animated, to attach additional objects to bones, e.g. attach a weapon to the animated hand.

    Still, there are some big drawbacks:

    • You can no longer transform bones (just Transform nodes) to modify skin at runtime. I mean, you can move bones (translate, rotate) at runtime, but it has no effect on the skinned mesh, since it's animation is now expressed as CoordinateInterpolator, and it's already calculated. So you cannot do procedural animation, e.g. you cannot do inverse kinematics at runtime. You can only play the animation that was designed.
    • The memory use of long-running animation is significant. As we precalculate positions, normal vectors, and (in case of bump mapping) tangent vectors for all keyframes, the memory usage is non-trivial when the animation is long and the model is high-poly. We have a log message when it is more than 10 MB.

    In CGE we have also implemented H-Anim, which is X3D way of doing skinned mesh animation. However our implementation of H-Anim is not optimized. It moves bones at runtime, but on CPU (not GPU), and this is slow at runtime for non-trivial models. Contrary to glTF, it is not obvious how to implement H-Anim on GPU, likely we should be able to calculate "inverse bind matrices" from H-Anim nodes (thus essentially converting H-Anim -> glTF animation) and then follow glTF skinned animation approach to make it suitable on GPU.

    The future:

    • We must convert glTF skinned mesh animation into some X3D nodes. Either H-Anim nodes (with additional information to preserve "inverse bind matrices", not calculate them again, when not needed), or some new node like SkinAnimation node (that would be designed to match glTF approach easily) . These nodes should allow straightforward conversion from glTF, and efficient playback of animation on GPU.

      glTF animation data already leans extremely nicely toward GPU calculation, we definitely want to use it. I.e. the pipeline "glTF -> X3D nodes -> rendering" must preserve the "inverse bind matrices" information and nice GPU-friendly layout. It would be bad if in the middle of this pipeline we have to lose and then recalculate the data to make it efficient.

    • Eventually we want to also speed up existing H-Anim implementation. Whether this happens, depends on user needs.

      If any major 3D authoring software becomes capable of exporting to H-Anim, then it will have higher priority. Otherwise it may remain at lower priority, I'm afraid. As any major 3D authoring software I consider this trio: Blender, 3ds Max, Maya (following Unity and Unreal and CGE and Babylon docs). Right now neither of them has any support to export to H-Anim, as far as I know.

    As an optimization, in CGE we also use Shape.collision to make the animated shape collide as a bbox, and animated Shape.bbox to make it reflect the current animation. This makes animation faster (no need to recalculate bounding boxes when shape is changing).

  • In X3DOM, Andreas Plesch started investigating how to convert glTF skinned animation into H-Anim. It isn't finished (and so is not yet actually implemented in X3DOM), but should be a great starting point to resume. Thank you for documenting it! The links:

Punctual lights

glTF punctual lights mostly map nicely to X3D PointLight, SpotLight, DirectionalLight.

The lights equations follows X3D, basically "sum material.emissive + contribution for each light".

Image-based lighting

I made initial implementation / sketch of specification of X3D EnvironmentLight node to express this. It is not 100% ready yet (neither the spec nor the implementation in CGE). Hopefully we will add it in the future X3D version :)

See Image Based Lighting (EnvironmentLight node).

Clone this wiki locally