-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support partial vector extension instructions #545
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarks
Benchmark suite | Current: 7e23801 | Previous: edfa83e | Ratio |
---|---|---|---|
Dhrystone |
1306 Average DMIPS over 10 runs |
1316 Average DMIPS over 10 runs |
1.01 |
Coremark |
972.894 Average iterations/sec over 10 runs |
939.158 Average iterations/sec over 10 runs |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
|
||
/* standard uncompressed instruction */ | ||
const uint32_t index = (insn & INSN_6_2) >> 2; | ||
static inline bool op_000000(rv_insn_t *ir, const uint32_t insn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
op_000000
looks misleading. Can you improve its naming scheme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming scheme is based on the function6
field listed in riscv-v-spec/inst-table.adoc. Since each function6
may include OPI
, OPM
, or OPF
functions, often corresponding to unrelated operations. I chose to name them directly based on the function6
for consistency.
This might seem unclear without additional context. To improve clarity, I could add comments explaining the naming convention for each op_function6
. Would this address your concern?
Could we create an CI like using ROSCOF for this? |
I'm not familiar with ROSCOF, but I'll look into it and give it a try. |
Suggest using |
Thank you all for your feedback and suggestions! I will fix the typos, add a newline at the end of files, and remove any unnecessary elements. I also noticed that the current code does not fully meet the contributing guidelines, so I will make sure to address those issues. In addition, I will add more detailed comments in Since some of the code was misplaced from the beginning, and as @eleanorLYJ mentioned, there are non-compliant comments in an early commit, I’m considering I’d appreciate your guidance. Thank you! |
This comment was marked as resolved.
This comment was marked as resolved.
src/rv32_template.c
Outdated
} \ | ||
} | ||
|
||
#define VMV_LOOP(des, op1, op2, op, SHIFT, MASK, i, j, itr, vm) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may I ask where this is one used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The VMV_LOOP
macro is used in the implementation of vmv_v_i
(at src/rv32_template.c
, line 6366), as the riscv-vector-tests frequently utilize vmv_v_i
to clear bits in vector registers during each test. This serves as a quick implementation for the vmv_v_i
instruction.
Additionally, I noticed that the implementations of vmv_v_*
(representing vmv_v_v
, vmv_v_x
, and vmv_v_i
) can be refactored to reuse existing macros such as VV_LOOP
, VX_LOOP
, VI_LOOP
, and their _LEFT
variants (collectively referred to as V*_LOOP
and V*_LOOP_LEFT
). I will remove the VMV_LOOP
and related _LEFT
macros accordingly. Thank you for pointing this out!
#define op_sll(a, b) \ | ||
((a) << ((b) & ((8 << ((rv->csr_vtype >> 3) & 0b111)) - 1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shift amount calculation in op_sll
macro may need bounds checking to prevent undefined behavior when shifting by amounts >= bit width. Consider adding explicit bounds check.
Code suggestion
Check the AI-generated fix before applying
#define op_sll(a, b) \ | |
((a) << ((b) & ((8 << ((rv->csr_vtype >> 3) & 0b111)) - 1))) | |
#define op_sll(a, b) do { \ | |
int _shift = (b) & ((8 << ((rv->csr_vtype >> 3) & 0b111)) - 1); \ | |
_shift = _shift >= sizeof(a) * 8 ? sizeof(a) * 8 - 1 : _shift; \ | |
((a) << _shift); } while(0) |
Code Review Run #005f29
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
uint8_t eew = decode_eew(insn); | ||
ir->eew = 8 << eew; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding error handling for invalid eew
values. Currently, if decode_eew()
returns an invalid value, it could lead to undefined behavior when calculating ir->eew = 8 << eew
.
Code suggestion
Check the AI-generated fix before applying
uint8_t eew = decode_eew(insn); | |
ir->eew = 8 << eew; | |
uint8_t eew = decode_eew(insn); | |
if (eew > 3) { | |
return false; | |
} | |
ir->eew = 8 << eew; |
Code Review Run #005f29
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
In |
Add decode stage for RISC-V "V" Vector extension instructions from version 1.0, excluding VXUNARY0, VRFUNARY0, VWFUNARY0, VFUNARY1, vmv<nr>r, and VFUNARY0. This commit focuses on the decode stage to ensure correct instructions parsing before proceeding to the execution stage. Verification is currently done through hand-written code. Modify Makefile to support VLEN configuration, via make ENABLE_EXT_V=1 VLEN=<value>. The default value for VLEN is set to 128. The current implementation only supports VLEN=128. Enabling ENABLE_EXT_V=1 will also enable ENABLE_EXT_F=1, as vector load/ store instruction shares the same opcode with load_fp and store_fp.
Add support for vset{i}vl{i} instructions following the RISC-V vector extension version 1.0. Simplify avlmax calculation by directly computing avlmax = lmul * vlen / sew instead of converting to floating-point as described in the specification.
Implement vle8_v, vle16_v, vle32_v, vse8_v, vse16_v, vse32_v. Using loop unrolling technique to handle a word at a time. The implementation assumes VLEN = 128. There are two types of illegal instructions: 1. When eew is narrower than csr_vl. Set vill in vtype to 1 and other bits to 0, set csr_vl to 0. 2. When LMUL > 1 and trying to access a vector register that is larger than 31. Use assert to handle this case.
To emulate vector registers of length VLEN using an array of uint32_t, we first handle different SEW values (8, 16, 32) using sew_*b_handler. Inside the handler, the V*_LOOP macro expands to process different VL values and operand types, along with its corresponding V*_LOOP_LEFT. The goal is to maximize code reuse by defining individual operations next to their respective vector instructions, which can be easily applied using the OPT() macro. V*_LOOP execution steps: 1. Copy the operand op1 (op2). 2. Align op1 to the right. 3. Perform the specified operation between op1 and op2. 4. Mask the result according to the corresponding SEW. 5. Shift the result left to align with the corresponding position. 6. Accumulate the result. In vector register groups, registers should follow the pattern v2*n, v2*n+1 when lmul = 2, etc. The current implementation allows using any vector registers except those exceeding v31. For vector masking, if the corresponding mask bit is 0, the value of the destination vector register is preserved. The process is as follows: 1. Copy the destination register. 2. Clear the bits corresponding to VL. 3. Store the computed result in ans. 4. Update the destination register with ans. If ir->vm == 0, vector masking is activated.
Code Review Agent Run #6921bbActionable Suggestions - 6
Additional Suggestions - 6
Review Details
|
Here’s the current status: Done:
Pending:
|
for (int i = 0; i < 4; i++) { | ||
rv->V[rv_reg_zero][i] = 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of vse64_v
appears to only clear the first 4 elements of rv_reg_zero
vector register without implementing the actual vector store operation. Consider implementing the full vector store functionality similar to vse32_v
.
Code suggestion
Check the AI-generated fix before applying
for (int i = 0; i < 4; i++) { | |
rv->V[rv_reg_zero][i] = 0; | |
} | |
uint8_t sew = 8 << ((rv->csr_vtype >> 3) & 0b111); | |
uint32_t addr = rv->X[ir->rs1]; | |
if (ir->eew > sew) { | |
rv->csr_vtype = 0x80000000; | |
rv->csr_vl = 0; | |
return true; | |
} else { | |
uint8_t i = 0; | |
uint8_t j = 0; | |
for (uint32_t cnt = 0; rv->csr_vl > cnt;) { | |
i %= VREG_U32_COUNT; | |
assert(ir->vs3 + j < 32); | |
rv->io.mem_write_w(rv, addr, rv->V[ir->vs3 + j][i]); | |
rv->io.mem_write_w(rv, addr + 4, rv->V[ir->vs3 + j][i + 1]); | |
cnt += 1; | |
i += 2; | |
if (!(cnt % (VREG_U32_COUNT/2))) { | |
j++; | |
i = 0; | |
} | |
addr += 8; | |
} | |
} |
Code Review Run #6921bb
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
#define op_add(a, b) ((a) + (b)) | ||
RVOP(vadd_vv, {OPT(ir->vd, ir->vs2, ir->vs1, add, VV)}, GEN({ | ||
assert; /* FIXME: Implement */ | ||
})) | ||
RVOP(vadd_vx, {OPT(ir->vd, ir->vs2, ir->rs1, add, VX)}, GEN({ | ||
assert; /* FIXME: Implement */ | ||
})) | ||
RVOP(vadd_vi, {OPT(ir->vd, ir->vs2, ir->imm, add, VI)}, GEN({ | ||
assert; /* FIXME: Implement */ | ||
})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider implementing vector addition operations instead of using placeholder assert
statements. The vector add operations (vadd_vv
, vadd_vx
, vadd_vi
) are currently unimplemented which could impact vector arithmetic functionality.
Code suggestion
Check the AI-generated fix before applying
@@ -5474,9 +5474,15 @@
RVOP(vadd_vv, {OPT(ir->vd, ir->vs2, ir->vs1, add, VV)}, GEN({
if (vl <= 0) return;
for(int i = 0; i < vl; i++) {
vd[i] = vs2[i] + vs1[i];
}
}))
Code Review Run #6921bb
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
if (decode_funct3(insn) != 0b010) { | ||
uint8_t eew = decode_eew(insn); | ||
ir->eew = 8 << eew; | ||
uint8_t nf = decode_nf(insn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding validation for eew
and nf
values before using them in calculations. The current code assumes valid values but may lead to undefined behavior if invalid values are passed.
Code suggestion
Check the AI-generated fix before applying
@@ -1430,4 +1430,10 @@
if (decode_funct3(insn) != 0b010) {
uint8_t eew = decode_eew(insn);
+ if (eew > 3) { // Max valid EEW value is 3 (64-bit)
+ return false;
+ }
uint8_t nf = decode_nf(insn);
+ if (nf > 7) { // Max valid NF value is 7 (8 segments)
+ return false;
+ }
ir->eew = 8 << eew;
Code Review Run #6921bb
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
ir->zimm = decode_vsetvli_zimm(insn); | ||
ir->rs1 = decode_rs1(insn); | ||
ir->rd = decode_rd(insn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider validating the decoded values for zimm
, rs1
and rd
before assigning them to the instruction struct. The current implementation assumes valid values but adding bounds checking could prevent potential issues.
Code suggestion
Check the AI-generated fix before applying
ir->zimm = decode_vsetvli_zimm(insn); | |
ir->rs1 = decode_rs1(insn); | |
ir->rd = decode_rd(insn); | |
uint32_t zimm = decode_vsetvli_zimm(insn); | |
uint32_t rs1 = decode_rs1(insn); | |
uint32_t rd = decode_rd(insn); | |
/* Validate decoded values */ | |
if (rs1 >= N_RV_REGS || rd >= N_RV_REGS) | |
return; | |
ir->zimm = zimm; | |
ir->rs1 = rs1; | |
ir->rd = rd; |
Code Review Run #6921bb
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
static inline void decode_mtype(rv_insn_t *ir, const uint32_t insn) | ||
{ | ||
ir->vs2 = decode_rs2(insn); | ||
ir->rd = decode_rd(insn); | ||
ir->vm = decode_vm(insn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider validating input parameters in decode_mtype()
. The function assumes ir
is non-null but doesn't validate it. Adding a null check could prevent potential segmentation faults.
Code suggestion
Check the AI-generated fix before applying
static inline void decode_mtype(rv_insn_t *ir, const uint32_t insn) | |
{ | |
ir->vs2 = decode_rs2(insn); | |
ir->rd = decode_rd(insn); | |
ir->vm = decode_vm(insn); | |
static inline void decode_mtype(rv_insn_t *ir, const uint32_t insn) | |
{ | |
assert(ir != NULL); | |
ir->vs2 = decode_rs2(insn); | |
ir->rd = decode_rd(insn); | |
ir->vm = decode_vm(insn); |
Code Review Run #6921bb
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
case 5: /* Reserved */ | ||
decode_vxtype(ir, insn); | ||
ir->opcode = rv_insn_vfmin_vf; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code marks case 5 as /* Reserved */
but then proceeds to implement it. Consider either removing the comment or making it a reserved case that returns false
.
Code suggestion
Check the AI-generated fix before applying
case 5: /* Reserved */ | |
decode_vxtype(ir, insn); | |
ir->opcode = rv_insn_vfmin_vf; | |
break; | |
case 5: /* Reserved */ |
Code Review Run #6921bb
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
Add support for the RISC-V "V" Vector Extension. This pull request implements decoding for 585 out of 616 version 1.0 spec vector instructions, with partial interpreter implementation.
The decoding method for vector instructions, including vector configuration and load/store instructions, follows the approach used in
rv32emu
. The newrvv_jumptable
is introduced to handle remaining arithmetic instructions.The interpreter implementation is tested using the riscv-vector-tests repository, with current limitations, as outlined in the repo. Included partial support for vector load/store instructions and single-width arithmetic instructions. The architecture now supports different settings for
sew
,lmul
, and vector masking.Vector instructions passing the tests include:
vle8.v
,vle16.v
,vle32.v
vse8.v
,vse16.v
,vse32.v
vadd.vv
,vadd.vx
,vadd.vi
vsub.vv
,vsub.vx
,vsub.vi
,vand.vv
,vand.vx
,vand.vi
vor.vv
,vor.vx
,vor.vi
vxor.vv
,vxor.vx
,vxor.vi
vsll.vv
,vsll.vx
,vsll.vi
vmul.vv
,vmul.vx
,vmul.vi
Close #504
Summary by Bito
This PR implements extensive support for the RISC-V Vector Extension (V), adding decoding capabilities for 585 out of 616 vector instructions from v1.0 spec. The implementation includes comprehensive instruction decoding, vector register handling, and CSR management. The changes support various vector operations including load/store, arithmetic, comparison, and floating-point operations with different data widths (8-64 bits). The code includes validation checks for vector parameters and register indices.Unit tests added: False
Estimated effort to review (1-5, lower is better): 5