Also adds a new possible texture layout and API trait to support a particular behavior in D3D12 where only the COMMON layout is supported in copy queues. Fixes#98158.
- Implements asynchronous transfer queues from PR #87590.
- Adds ubershaders that can run with specialization constants specified as push constants.
- Pipelines with specialization constants can compile in the background.
- Added monitoring for pipeline compilations.
- Materials and shaders can now be created asynchronously on background threads.
- Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
PR #90993 added several debugging utilities.
Among them, advanced memory tracking through the use of custom
allocators and VK_EXT_device_memory_report.
However as issue #95967 reveals, it is dangerous to leave it on by
default because drivers (or even the Vulkan loader) can too easily
accidentally break custom allocators by allocating memory through std
malloc but then request us to deallocate it (or viceversa).
This PR fixes the following problems:
- Adds --extra-gpu-memory-tracking cmd line argument
- Adds missing enum entries to
RenderingContextDriverVulkan::VkTrackedObjectType
- Adds RenderingDevice::get_driver_and_device_memory_report
- GDScript users can easily check via print(
RenderingServer.get_rendering_device().get_driver_and_device_memory_report()
)
- Uses get_driver_and_device_memory_report on device lost for appending
further info.
Fixes#95967
Features:
- Debug-only tracking of objects by type. See
get_driver_allocs_by_object_type et al.
- Debug-only Breadcrumb info for debugging GPU crashes and device lost
- Performance report per frame from get_perf_report
- Some VMA calls had to be modified in order to insert the necessary
memory callbacks
Functionality marked as "debug-only" is only available in debug or dev
builds.
Misc fixes:
- Early break optimization in RenderingDevice::uniform_set_create
============================
The work was performed by collaboration of TheForge and Google. I am
merely splitting it up into smaller PRs and cleaning it up.
Enables support for enhanced barriers if available.
Gets rid of the implementation of [CROSS_FAMILY_FALLBACK] in the D3D12 driver. The logic has been reimplemented at a higher level in RenderingDevice itself.
This fallback is only used if the RenderingDeviceDriver reports the API traits and the capability of sharing texture formats correctly. Aliases created in this way can only be used for sampling: never for writing. In most cases, the formats that do not support sharing do not support unordered access/storage writes in the first place.
* Replaces `find(...) != -1` with `contains` for `String`
* Replaces `find(...) == -1` with `!contains` for `String`
* Replaces `find(...) != -1` with `has` for containers
* Replaces `find(...) == -1` with `!has` for containers
It used to warn when opening a new project because no cache pre-exists,
which isn't particularly helpful.
Also include the rendering method in the cache filename, as it differs
between Forward+ and Mobile for a same GPU.
* Do not print empty line when header is disabled
* Do not print Vulcan header
* Also add "Print header" project setting (default On)
(suggested by @kaissouDev)
* Add docs for the project setting
(with suggestions by @Mickeon and @akien-mga)
Co-authored-by: Micky <66727710+Mickeon@users.noreply.github.com>
Co-authored-by: A Thousand Ships <96648715+AThousandShips@users.noreply.github.com>
Co-authored-by: Rémi Verschelde <rverschelde@gmail.com>
Adds a new system to automatically reorder commands, perform layout transitions and insert synchronization barriers based on the commands issued to RenderingDevice.
A new `Math::division_round_up()` function was added, allowing for easy
and correct computation of integer divisions when the result needs to
be rounded up.
Fixes#80358.
Co-authored-by: Rémi Verschelde <rverschelde@gmail.com>
Credit and thanks to @bruzvg for multiple build fixes, update of 3rd-party items and MinGW support.
Co-authored-by: bruvzg <7645683+bruvzg@users.noreply.github.com>
This allows Godot to automatically compress meshes to save a lot of bandwidth.
In general, this requires no interaction from the user and should result in
no noticable quality loss.
This scheme is not backwards compatible, so we have provided an upgrade
mechanism, and a mesh versioning mechanism.
Existing meshes can still be used as a result, but users can get a
performance boost by reimporting assets.
- Add compatibility methods for `RenderingDevice::shader_create_from_bytecode`
and `CodeEdit::get_text_for_symbol_loopup`.
- Silence errors which now have compatibility methods.
- Acknowledge GraphEdit/GraphNode compat breakage, intended and WIP.
This allows us to specify a subset of variants to compile at load time and conditionally other variants later.
This works seamlessly with shader caching.
Needed to ensure that users only pay the cost for variants they use
* This should optimize GDScript function calling _enormously_.
* It also should simplify the GDScript VM considerably.
NOTE: GDExtension calling performance has most likely been affected until going via ptrcall is fixed.
As many open source projects have started doing it, we're removing the
current year from the copyright notice, so that we don't need to bump
it every year.
It seems like only the first year of publication is technically
relevant for copyright notices, and even that seems to be something
that many companies stopped listing altogether (in a version controlled
codebase, the commits are a much better source of date of publication
than a hardcoded copyright statement).
We also now list Godot Engine contributors first as we're collectively
the current maintainers of the project, and we clarify that the
"exclusive" copyright of the co-founders covers the timespan before
opensourcing (their further contributions are included as part of Godot
Engine contributors).
Also fixed "cf." Frenchism - it's meant as "refer to / see".
Adds a FramebufferCache singletion that operates the same way as UniformSetCache.
Allows creating framebuffers on the fly (and keep them cached if re-requested) such as:
```C++
RID fb = FramebufferCache::get_singleton()->get_cache(texture1,texture2);
```
On the only platform where PVRTC is supported (iOS),
ETC2 generally supersedes PVRTC in every possible way. The increased
memory usage is not really a problem thanks to modern iOS' devices
processing power being higher than its Android counterparts.
This can be used to distinguish between integrated, dedicated, virtual
and software-emulated GPUs. This in turn can be used to automatically
adjust graphics settings, or warn users about features that may run
slowly on their hardware.