This fixes UBSAN errors reported by running our testsuite, importing the
TPS demo, and running the TPS demo. I have tried, wherever possible, to
fix issues related to reported issues but not directly reported by UBSAN
because thse code paths just happened to not have been exercised in
these cases.
These fixes apply only to errors reported, and caused by, core/
The following things have been changed:
* Make sure there are no implicit sign changing casts in core.
* Explicitly type enums that are part of a public API such that users of
the API cannot pass in wrongly-sized values leading to potential stack
corruption.
* Ensure that memcpy is never called with invalid or null pointers as
this is undefined behavior, and when the engine is built with
optimizations turned on leads to memory corruption and hard to debug
crashes.
* Replace enum values only used as static values with constexpr static
const values instead. This has no runtime overhead. This makes it so
that the size of the enums is explicit.
* Make sure that nan and inf is handled consistently in String.
* Implement a _to_int template to ensure that all of the paths use the
same algorhithm, and correct the negative integer case.
* Changed the way the json serializer precision work, and added tests to
verify the new behavior. The behavior doesn't quite match master in
particulary for negative doubles as the original code tried to cast -inf
to an int. This then led to negative doubles losing all but one of
their decimal points when serializing. Behavior in GDScript remains
unchanged.
3205a92ad8 was a major commit which removed `PoolVector`, and replaced
most references to `PoolVector` with `Vector` instead. In most cases,
this was appropriate, given that `PoolVector` was being replaced with
`Vector`, as an effective generalist in 64-bit address space layouts.
However, vector.h itself was left with an artifact advising the reader
to use `Vector` instead of `Vector` for large arrays. While this led
to a fascinating deep dive, and hopefully improved some of the
documentation along the way, it's probably best to clean this up for
the next person.
This helps, for importers spitting out new resources to the res://
filesystem to actually hash them to generate deterministic UIDs.
This PR also fixes the determinism for translations.
This PR makes RID_Owner lock free for fetching values, this should give a very
significant peformance boost where used.
Some considerations:
* A maximum number of elements to alocate must be given (by default 256k).
* Access to the RID structure is still safe given they are independent from addition/removals.
* RID access was never really thread-safe in the sense that the contents of the data are not protected anyway. Each server needs to implement locking as it sees fit.
Refactored the following template classes by replacing runtime checks with compile-time checks using if constexpr for improved code clarity and maintainability:
- RID_Alloc
- SortArray
- PagedAllocator
Changes made:
- Updated conditional checks for THREAD_SAFE in the RID_Alloc class.
- Updated conditional checks for Validate in the SortArray class.
- Updated conditional checks for thread_safe in the PagedAllocator class.
ResourceLoader:
- Fix invalid tokens being returned.
- Remove no longer written `ThreadLoadTask::dependent_path` and the code reading from it.
- Clear deadlock hazard by keeping the mutex unlocked during userland polling.
WorkerThreadPool:
- Include thread call queue override in the thread state reset set, which allows to simplify the code that handled that (imperfectly) in the ResourceLoader.
- Handle the mutex type correctly on entering an allowance zone.
CommandQueueMT:
- Handle the additional possibility of command buffer reallocation that mutex unlock allowance introduces.
Update CowData::insert to call ptrw() before loop, to avoid calling _copy_on_write for each item in the array, as well as repeated index checks in set and get. For larger Vectors/Arrays, this makes inserts around 10x faster for ints, 3x faster for Strings, and 2x faster for Variants. Less of an impact on smaller Vectors/Arrays, as a larger percentage of the time is spent allocating.
Random-access access to `List` when iterating is `O(n^2)` (`O(n)` when
accessing a single element)
* Removed subscript operator, in favor of a more explicit `get`
* Added conversion from `Iterator` to `ConstIterator`
* Remade existing operations into other solutions when applicable
* Servers now use WorkerThreadPool for background computation.
* This helps keep the number of threads used fixed at all times.
* It also ensures everything works on HTML5 with threads.
* And makes it easier to support disabling threads for also HTML5.
CommandQueueMT now syncs with the servers via the WorkerThreadPool
yielding mechanism, which makes its classic main sync semaphore
superfluous.
Also, some warnings about calls that kill performance when using
threaded rendering are removed because there's a mechanism that
warns about that in a more general fashion.
Co-authored-by: Pedro J. Estébanez <pedrojrulez@gmail.com>
Adds fixed timestep interpolation to the rendering server (2D only).
Switchable on and off with a project setting (default is off).
Co-authored-by: lawnjelly <lawnjelly@gmail.com>
Following upgrades to `CowData` to 64 bit indices these helpers are no
longer able to handle the index ranges, possibly causing bugs on sort
and search.
Existing shadow caster culling takes no account of the camera.
This PR adds the highly encapsulated class RenderingLightCuller which can cut down the casters in the shadow volume to only those which can cast shadows on the camera frustum.
Just a little optimization.
**NOTE:**
With `RID_Owner` we could replace each pair of `PagedAllocator` and
`HashMap`-of-ids-to-pointers. However, that would force us to expose
`RID` as the task/group id, instead of `int`, which would break the
API. Too bad. Let's wait until Godot 5.0.
This commit lets CommandQueueMT play nicely with the WorkerThreadPool to avoid
non-progressable situations caused by an interdependence between both. While a
command queue is being flushed, it allows the WTP to release its lock while tasks
are being awaited so they can make progress in case they need in turn to post
to the command queue.
Copying of these types is unsafe and should be detected
Also removed unnecessary constructors for `TileMap` `DebugQuadrant` and
`RenderingQuadrant` which used copying of `SelfList::List`
* Node processing works on the concept of process groups.
* A node group can be inherited, run on main thread, or a sub-thread.
* Groups can be ordered.
* Process priority is now present for physics.
This is the first steps towards implementing https://github.com/godotengine/godot-proposals/issues/6424.
No threading or thread guards exist yet in most of the scene code other than Node. That will have to be added later.
As many open source projects have started doing it, we're removing the
current year from the copyright notice, so that we don't need to bump
it every year.
It seems like only the first year of publication is technically
relevant for copyright notices, and even that seems to be something
that many companies stopped listing altogether (in a version controlled
codebase, the commits are a much better source of date of publication
than a hardcoded copyright statement).
We also now list Godot Engine contributors first as we're collectively
the current maintainers of the project, and we clarify that the
"exclusive" copyright of the co-founders covers the timespan before
opensourcing (their further contributions are included as part of Godot
Engine contributors).
Also fixed "cf." Frenchism - it's meant as "refer to / see".
Android was the last platform to still attempt to disable RTTI (for binary
size), but both the Android editor and now the ICU library used by templates
need RTTI.
There could still be the possibility to support this for non-ICU template
builds (i.e. without the TextServerAdvanced module), but since this isn't one
of the build configurations we test regularly it's pretty risky to keep this
option only for that specific use case. And our code is already littered with
`dynamic_cast`s which weren't guarded with `!defined(NO_SAFE_CAST)`.
Adds a FramebufferCache singletion that operates the same way as UniformSetCache.
Allows creating framebuffers on the fly (and keep them cached if re-requested) such as:
```C++
RID fb = FramebufferCache::get_singleton()->get_cache(texture1,texture2);
```
This is not enabled by default in the core version for performance reasons,
as Vector/CowData are used in critical code paths where not zero'ing memory
which is going to be set later on can be important.
But for bindings / the scripting API, we make zero the new items by default
(which already happened for built types like Vector3, etc., but not for
trivial types like int, float).
Fixes#43033.
Co-authored-by: David Hoppenbrouwers <david@salt-inc.org>
Implement built-in classes Vector4, Vector4i and Projection.
* Two versions of Vector4 (float and integer).
* A Projection class, which is a 4x4 matrix specialized in projection types.
These types have been requested for a long time, but given they were very corner case they were not added before.
Because in Godot 4, reimplementing parts of the rendering engine is now possible, access to these types (heavily used by the rendering code) becomes a necessity.
**Q**: Why Projection and not Matrix4?
**A**: Godot does not use Matrix2, Matrix3, Matrix4x3, etc. naming convention because, within the engine, these types always have a *purpose*. As such, Godot names them: Transform2D, Transform3D or Basis. In this case, this 4x4 matrix is _always_ used as a _Projection_, hence the naming.
This PR implements a worked thread pool. It uses a fixed amount of threads in a pool and allows scheduling tasks
that can be run on threads (and then waited for). It satisfies the following use cases:
* HTML5 thread count is fixed (and similar restrictions are known in consoles) so we need to reuse threads.
* Thread spawning is slow in general, so reusing threads is faster anyway.
* This implementation supports recursive waiting for tasks, making it less prone to deadlocks if threads from the pool also run tasks.
After this is approved and merged, subsequent PRs will be needed to replace the ThreadWorkPool usage by this class.
Clean up and do fixes to hash functions and newly introduced murmur3 hashes in #61934
* Clean up usage of murmur3
* Fixed usages of binary murmur3 on floats (this is invalid)
* Changed DJB2 to use xor (which seems to be better)
* Map is unnecessary and inefficient in almost every case.
* Replaced by the new HashMap.
* Renamed Map to RBMap and Set to RBSet for cases that still make sense
(order matters) but use is discouraged.
There were very few cases where replacing by HashMap was undesired because
keeping the key order was intended.
I tried to keep those (as RBMap) as much as possible, but might have missed
some. Review appreciated!
Adds a new, cleaned up, HashMap implementation.
* Uses Robin Hood Hashing (https://en.wikipedia.org/wiki/Hash_table#Robin_Hood_hashing).
* Keeps elements in a double linked list for simpler, ordered, iteration.
* Allows keeping iterators for later use in removal (Unlike Map<>, it does not do much
for performance vs keeping the key, but helps replace old code).
* Uses a more modern C++ iterator API, deprecates the old one.
* Supports custom allocator (in case there is a wish to use a paged one).
This class aims to unify all the associative template usage and replace it by this one:
* Map<> (whereas key order does not matter, which is 99% of cases)
* HashMap<>
* OrderedHashMap<>
* OAHashMap<>
* Changed syntax usage for RD::Uniform to create faster with a single RID
* Converted render pass setup to use this in clustered renderer to test.
This is the first step into creating a proper uniform set cache system to simplify large parts of the codebase.
The same is done for `Vector` (and thus `Packed*Array`).
`begin` and `end` can now take any value and will be clamped to
`[-size(), size()]`. Negative values are a shorthand for indexing the array
from the last element upward.
`end` is given a default `INT_MAX` value (which will be clamped to `size()`)
so that the `end` parameter can be omitted to go from `begin` to the max size
of the array.
This makes `slice` works similarly to numpy's and JavaScript's.
Each file in Godot has had multiple contributors who co-authored it over the
years, and the information of who was the original person to create that file
is not very relevant, especially when used so inconsistently.
`git blame` is a much better way to know who initially authored or later
modified a given chunk of code, and most IDEs now have good integration to
show this information.
We prefer to prevent using chained assignment (`T a = b = c = T();`) as this
can lead to confusing code and subtle bugs.
According to https://en.wikipedia.org/wiki/Assignment_operator_(C%2B%2B), C++
allows any arbitrary return type, so this is standard compliant.
This could be re-assessed if/when we have an actual need for a behavior more
akin to that of the C++ STL, for now this PR simply changes a handful of
cases which were inconsistent with the rest of the codebase (`void` return
type was already the most common case prior to this commit).