On 08 May, 21:53, tevador tevador@gmail.com wrote:
In particular, the following parameters should be set differently from Monero:
RANDOMX_ARGON_SALT = "RandomX-TOR-v1"
The unique RandomX salt means we do not need to use a separate salt as PoW input as specified in ยง 3.2.
RANDOMX_ARGON_ITERATIONS = 1 RANDOMX_CACHE_ACCESSES = 4 RANDOMX_DATASET_BASE_SIZE = 1073741824 RANDOMX_DATASET_EXTRA_SIZE = 16777216
These 4 changes reduce the RandomX Dataset size to ~1 GiB, which allows the number of iteration to be reduced from 8 to 4. The combined effect of this is that Dataset initialization becomes 4 times faster, which is needed due to more frequent updates of the seed (Monero updates once per ~3 days).
RANDOMX_PROGRAM_COUNT = 2 RANDOMX_SCRATCHPAD_L3 = 1048576
Additionally, reducing the number of programs from 8 to 2 makes the hash calculation about 4 times faster, while still providing resistance against program filtering strategies (see [REF_RANDOMX_PROGRAMS]). Since there are 4 times fewer writes, we also have to reduce the scratchpad size. I suggest to use a 1 MiB scratchpad size as a compromise between scratchpad write density and memory hardness. Most x86 CPUs will perform roughly the same with a 512 KiB and 1024 KiB scratchpad, while the larger size provides higher resistance against specialized hardware, at the cost of possible time-memory tradeoffs (see [REF_RANDOMX_TMTO] for details).
Lastly, we reduce the output of RandomX to just 8 bytes:
RANDOMX_HASH_SIZE = 8
64-bit preimage security is more than sufficient for proof-of-work and it allows the result to be treated as a little-endian encoded unsigned integer for easy effort calculation.
I have implemented this in the tor-pow branch of the RandomX repository:
https://github.com/tevador/RandomX/tree/tor-pow
Namely I have changed the API to return the hash value as an uint64_t and made corresponding changes in the benchmark.
Benchmark example:
./randomx-benchmark --mine \ --avx2 \ --jit \ --largePages \ --nonces 10000 \ --seed 1234 \ --init 1 \ --threads 1 \ --batch RandomX-TOR-v1 benchmark - Argon2 implementation: AVX2 - full memory mode (1040 MiB) - JIT compiled mode - hardware AES mode - large pages mode - batch mode Initializing (1 thread) ... Memory initialized in 5.32855 s Initializing 1 virtual machine(s) ... Running benchmark (10000 nonces) ... Performance: 2535.43 hashes per second Best result: Nonce: 8bc3ded34d2dcdeed9000000f95cd20c Result: d947ceff08750300 Effort: 18956 Valid: 1
At the end, it prints out the nonce that gives the highest effort value and validates it.
For the actual implementation in TOR, the RandomX validator should run in a separate thread that doesn't do anything else apart from validation and moving valid requests into the Intro Queue. This way we can reach the maximum performance of ~2000 processed requests per second.
Finally, here are some disadvantages of RandomX-TOR:
1) Fast verification requires ~1 GiB of memory. If we decide to use two overlapping seed epochs, each service will need to allocate >2 GiB of RAM just to verify the PoW. Alternatively, it is possible to use the slow mode, which requires only 256 MiB per seed, but runs 4x slower. 2) The fast mode needs about 5 seconds to initialize every time the seed is changed (can be reduced to under 1 second using multiple threads). The slow mode needs about 0.1 seconds to initialize. 3) RandomX includes a JIT compiler for maximum performance. The iOS operating system doesn't support JIT compilation, so RandomX runs about 10x slower there. 4) The JIT compiler in RandomX is currently implemented only for x86-64 and ARM64 CPU architectures. Other architectures will run very slowly (especially 32-bit systems). However, the two supported architectures cover the vast majority of devices, so this should not be an issue.