Before this change, a skeleton that was not updated every frame would
result in a difference of 2+ between last_change and frame index every
frame, which would disable the buffer rotation and set motion vectors to
zero. This results in significant visual artifacts for FSR2 that are
especially prominent on the characters that move together with the view
such as the main character in third person mode.
This is a significant problem for high refresh rate displays: at 120 Hz,
we are effectively guaranteed to skip skeleton updates every other frame
with skeleton update happening during physics processing, and the lack
of physics interpolation for skeletons. This happens by default in TPS
demo when FSR2 is enabled.
In other places where motion vectors are disabled, such as multi-mesh
and mesh rendering (where previous transform is updated), the logic
effectively allows for a single-frame gap in updates, because it
compares the frame where the update happened (which is the current frame
if updates are consistent) with the current frame, so the latency of 0
means "update just happened", but both multi-mesh and mesh transform
updates permit a latency of 1 as well.
Here, however, last_change is updated *after* the frame processing has
concluded, so a zero-latency update has a distance of 1. Allowing a
distance of 2 (latency 1) reduces the severity of the problem and aligns
the logic with transform updates.
Note that the problem will still happen when refresh rate is noticeably
higher than physics rate times 2. For example, it still happens at 240
Hz. However, a longer latency allowance is inconsistent with other
transforms and could lead to issues, so ideally long term physical
interpolation of skeleton transforms would completely solve this.
Use correct shadow sampling for omni and spot lights
Disable transmittance if shadows are disabled
Correct DirectionalLight transmittance bias to match shadow bias (its still pretty sensitive though)
* Servers now use WorkerThreadPool for background computation.
* This helps keep the number of threads used fixed at all times.
* It also ensures everything works on HTML5 with threads.
* And makes it easier to support disabling threads for also HTML5.
CommandQueueMT now syncs with the servers via the WorkerThreadPool
yielding mechanism, which makes its classic main sync semaphore
superfluous.
Also, some warnings about calls that kill performance when using
threaded rendering are removed because there's a mechanism that
warns about that in a more general fashion.
Co-authored-by: Pedro J. Estébanez <pedrojrulez@gmail.com>
Selecting "Generate AABB" on a 3D particle node in the editor would not work
and printed an error about incorrect buffer size if the particle shader used
one or more of the USERDATA build-ins.
- Supporting custom AABB on the MultiMesh resource itself allows us to prevent costly runtime AABB recalculations.
- Should also help improve CPU Particle performance.
Adds a new system to automatically reorder commands, perform layout transitions and insert synchronization barriers based on the commands issued to RenderingDevice.
A new `Math::division_round_up()` function was added, allowing for easy
and correct computation of integer divisions when the result needs to
be rounded up.
Fixes#80358.
Co-authored-by: Rémi Verschelde <rverschelde@gmail.com>
Credit and thanks to @bruzvg for multiple build fixes, update of 3rd-party items and MinGW support.
Co-authored-by: bruvzg <7645683+bruvzg@users.noreply.github.com>
This fixes a bugs where per-viewport samplers were being used for internal texture fetches (probes, sky, etc.).
This also fixes a bug when using multiple viewports in the same scene.
This also fixes a bug where the texture bias would override the bias from 3D scale.
This cleans up a few more cases of uint32_t->uint64_t
Importantly this fixes an edge case in the axis-angle compression by
using the pre-existing Basis methods instead
Fixes issue #83152. Due to how BLUR_0 is reused for multiple purposes and requires being at native resolution for some post-processing effects to work, FSR2 will use an alternate texture at internal size to use as the screen texture read by shaders instead. The rendering pipeline will prefer using this texture if it exists.
This is a longstanding issue in both the Mobile and GL Compatibility renderer.
Meshes pair with all lights that touch them, and then at draw time, we send all paired lights indices to the shader (even if that light isn't visible). The problem is that non-visible lights aren't uploaded to the GPU and don't have an index. So we end up using a bogus index
This allows Godot to automatically compress meshes to save a lot of bandwidth.
In general, this requires no interaction from the user and should result in
no noticable quality loss.
This scheme is not backwards compatible, so we have provided an upgrade
mechanism, and a mesh versioning mechanism.
Existing meshes can still be used as a result, but users can get a
performance boost by reimporting assets.
Introduces support for FSR2 as a new upscaler option available from the project settings. Also introduces an specific render list for surfaces that require motion and the ability to derive motion vectors from depth buffer and camera motion.
TAA + MSAA would make Godot request unnecessary flags for an MSAA
velocity texture. flags that were not even actually needed.
This was causing:
1. Unsupported GPUs to fail completely (e.g. Intel Arc 770)
2. Wrong codepaths to be followed (causing validation errors, possibly
crashes or glitches)
3. Unnecessary performance impact in all GPUs.
See
https://github.com/godotengine/godot/issues/71929#issuecomment-1722274359
Introduces a new structure to store samplers created with certain parameters instead of storing a 'custom' set of samplers. Allows viewports to correctly configure the mipmap bias and use it when rendering the scene.
Add the capability of resizing the transforms buffer for particles to be double its size and alternate where the current output is written to. Only works for particles that use index as their draw order.
Extends mesh instances that required custom vertex buffers to create two alternating buffers that are written to and binds them to use them as the previous vertex buffer when generating motion vectors.
There was an error in the other branch of the refactored function where the size of the array was not properly multiplied by the size of the float to check against the buffer size. This was only an error in the error-checking itself and not the functionality. There was also an error where the proper notification was not emitted whenever the buffer for the multimesh is recreated to invalidate the previous references the renderer might've created to it. This fixes CPU Particles getting corrupted when they're created without emission being enabled.
Direct buffer copies are required to perform certain operations more efficiently, as the only current alternative is to download the buffer to the CPU and upload it again. As the first use case, the new function is used when enabling motion vectors on multimeshes.
Fixes#67287. There was a subtle error where due to how enabling motion vectors for multi-meshes was handled, only the first instance would have a valid transforms buffer and the rest would point to an invalid buffer. This change moves over the responsibility of enabling motion vectors only when changes happen to the individual 3D transforms or the entire buffer itself. It also fixes an unnecessary download of the existing buffer that'd get overwritten by the current cache if it exists. Another fix is handling the case where the buffer was not set, and enabling motion vectors would not cause the buffer to be recreated correctly.
This is needed to allow 2D to fully make use of 3D effects (e.g. glow), and can be used to substantially improve quality of 2D rendering at the cost of performance
Additionally, the 2D rendering pipeline is done in linear space (we skip linear_to_srgb conversion in 3D tonemapping) so the entire Viewport can be kept linear.
This is necessary for proper HDR screen support in the future.
See Issue #69528. When building with precision=double, the TAA pass would break due to the motion vectors being corrupted. It was apparent the origin of the camera itself was corrupted in the UBO for the previous frame because the camera origin was only being split correctly for the current block but not for the previous block (to effectively support the double precision float on the shader).
Fixes bug where bounding box of 1 unit was used in some skinned models and had wrong LODs.
(this could become very large if the mesh is scaled, such as FBX conversions)
Also fixes a mistake in calcualting bone index.