-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast Schema alter. #666
base: master
Are you sure you want to change the base?
Fast Schema alter. #666
Conversation
610becc
to
5c86243
Compare
987274f
to
73ba9fc
Compare
c163299
to
2945e34
Compare
02d6e04
to
b4007d9
Compare
f9d9bd6
to
9b3dedf
Compare
b779677
to
90ff09d
Compare
46ea9c8
to
ad334ab
Compare
for (int t_offset = 1; t_offset < num_threads; t_offset++) { | ||
thread_id = (thread_id + t_offset) % num_threads; | ||
if (thread_state[thread_id]->state == THREAD_STEALING) continue; | ||
if (ready_handle) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里修改的目的是,并行测试的时候,会有其他的线程修改omp 的并行数量,如果单独运行相关的测试是没问题的。omp并行数量修改会导致SEGSEGV(并行部分会拿到线程号访问控制数据结构,这个线程号可能比预期的大),所以这里添加了检查。这里没有使用omp设定线程池大小的方法,因为可能会影响现实业务
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里修改的目的是,并行测试的时候,会有其他的线程修改omp 的并行数量,如果单独运行相关的测试是没问题的。omp并行数量修改会导致SEGSEGV(并行部分会拿到线程号访问控制数据结构,这个线程号可能比预期的大),所以这里添加了检查。这里没有使用omp设定线程池大小的方法,因为可能会影响现实业务
这个哪里会修改线程数量啊?以及这个不会影响快速字段变更吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我运行 ut_no_asan 的时候会出这个问题,我暂时没找到具体哪个测试会修改线程数量,有点奇怪的是这个问题在ut_no_asan 必现的。
我的代码里不会修改线程数量,我注意到的是这段代码有时候启动的时候线程数是16,但是在下面并发处理的时候会变成24,多出来的线程会导致SEGSEGV。
有不少测试是修改线程数的,这个修改操作是全局的,所以我在这里加了一个判断,防止出现非法访问。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
跟我修改的部分没有关系,我初步的怀疑是有些操作时间变短了,改变测试之间的时序,两个会用到并设置线程数的测试出现了并行?
/** @brief is this field deleted? */ | ||
bool deleted; | ||
/** @brief id of this field, starts from 0 */ | ||
uint16_t id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个id是从0开始,还是从1开始?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ID 从0 开始。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ID 从0 开始。
这个ID就是设计里面的每个属性的ID吗?那个也是从0开始的吗?设计稿里是从1开始的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
属性property的count是从1 开始的,表示property里的数量,field 标记是从0 开始的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
属性property的count是从1 开始的,表示property里的数量,field 标记是从0 开始的。
我看了下设计稿,确实FieldID 是从 1 开始的,我代码里的实现是从0 开始的,因为我看到之前的实现里field 是从0 开始的,所以为了保持两边含义的一致性(因为之前的实现里,fieldId 只表示属性在数组中的位置,所以从0 开始的),从0 开始与数组位置对应。这里需要修改为从1 开始吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
属性property的count是从1 开始的,表示property里的数量,field 标记是从0 开始的。
我看了下设计稿,确实FieldID 是从 1 开始的,我代码里的实现是从0 开始的,因为我看到之前的实现里field 是从0 开始的,所以为了保持两边含义的一致性(因为之前的实现里,fieldId 只表示属性在数组中的位置,所以从0 开始的),从0 开始与数组位置对应。这里需要修改为从1 开始吗?
没事儿,按你的来吧,改一下设计稿就行
/** @brief is set init value? */ | ||
bool set_init_value; | ||
/** @brief the default value when inserting data. */ | ||
FieldData default_value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default_value和init_value的使用区别在哪里?给个例子?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default_value 的用法还没加上。两个的区别在于:
- init_value 在用户进行增加列的时候设置的值,这个值表示没有数据时返回给客户端的数据(即这个列初始化时的数据)
- default_value 是在后期可能加上的,用户如果设置了某个属性的默认值,用户在创建某个点边的属性时,默认采用这个值,default_value 可以变化,变化后,用户创建 某个点边属性时保存的值也会发生变化。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default_value 可以去掉。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. default_value
感觉这两个可以合并成一个,default value,表示属性没有设置时的默认值
/** @brief is set init value? */ | ||
bool set_init_value; | ||
/** @brief the default value when inserting data. */ | ||
FieldData default_value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. default_value
感觉这两个可以合并成一个,default value,表示属性没有设置时的默认值
for (int t_offset = 1; t_offset < num_threads; t_offset++) { | ||
thread_id = (thread_id + t_offset) % num_threads; | ||
if (thread_state[thread_id]->state == THREAD_STEALING) continue; | ||
if (ready_handle) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里修改的目的是,并行测试的时候,会有其他的线程修改omp 的并行数量,如果单独运行相关的测试是没问题的。omp并行数量修改会导致SEGSEGV(并行部分会拿到线程号访问控制数据结构,这个线程号可能比预期的大),所以这里添加了检查。这里没有使用omp设定线程池大小的方法,因为可能会影响现实业务
这个哪里会修改线程数量啊?以及这个不会影响快速字段变更吧
src/core/field_extractor_v2.cpp
Outdated
namespace _detail { | ||
|
||
bool FieldExtractorV2::DataInRecord(const Value& record) const { | ||
if (GetFieldId() + 1 > GetRecordCount(record)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetFieldId() >= GetRecordCount(record) 就可以?
src/core/field_extractor_v2.cpp
Outdated
return ::lgraph::_detail::UnalignedGet<DataOffset>(record.Data() + var_offset); | ||
} else { | ||
int id_offset = 1; | ||
while (GetFieldOffset(record, GetFieldId() + id_offset) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里可以写个注释,并改用for循环,作用是跳过已删除的offset
void FieldExtractorBase::GetCopy(const Value& record, std::string& data) const { | ||
FMA_DBG_ASSERT(Type() != FieldType::BLOB); | ||
if (!DataInRecord(record)) { | ||
const Value v = GetInitedValue(); | ||
data.resize(v.Size()); | ||
memcpy(&data[0], v.Data(), v.Size()); | ||
return; | ||
} | ||
data.resize(GetDataSize(record)); | ||
GetCopyRaw(record, &data[0], data.size()); | ||
} | ||
|
||
void FieldExtractorBase::GetCopy(const Value& record, Value& data) const { | ||
if (!DataInRecord(record)) { | ||
data = GetInitedValue(); | ||
return; | ||
} | ||
data.Resize(GetDataSize(record)); | ||
GetCopyRaw(record, data.Data(), data.Size()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个GetCopy函数怎么dst不一样啊,data和record哪个是要拷贝生成的对象呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record 是数据来源,这两个函数都是从record 中获取该field_extractor指定的数据,区别在于一个存储到string中,一个存储到Value 中,上面的GetInitedValue 会返回一个FieldData。
|
||
if (temp > std::numeric_limits<T>::max()) { | ||
*dst = std::numeric_limits<T>::max(); | ||
} else if (temp < - std::numeric_limits<T>::max()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这儿是不是应该用std::numeric_limits<int64_t>::min()?
DataOffset var_offset = | ||
::lgraph::_detail::UnalignedGet<DataOffset>(record.Data() + GetFieldOffset(record)); | ||
// For variable data, return data-raw's pointer. | ||
return (char*)record.Data() + sizeof(uint32_t) + var_offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里改成sizeof(DataOffset)吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GET
while (vit->IsValid()) { | ||
Value prop = vit->GetProperty(); | ||
if (curr_sm->GetRecordLabelId(prop) == curr_lid) { | ||
if (!new_schema->GetFastAlterSchema()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是没处理fast alter schema的情况吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果是fast alter schema 的话,这里可以直接跳出,因为不再做数据扫描了
::lgraph::_detail::UnalignedSet<DataOffset>(offsets + sizeof(DataOffset) * (i - 1), | ||
static_cast<DataOffset>(min_size)); | ||
size_t num_fields = fields_.size(); | ||
// version - [label] - count - null_array - offset_array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里version删掉了吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,我修改一下注释。
is_set[extr->GetFieldId()] = true; | ||
extr->ParseAndSet(v, data); | ||
if (fast_alter_schema) { | ||
ParseAndSet(v, data, extr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是因为fast schema的变更需要schema的信息,才没有把ParseAndSet放到fieldextractor中吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,主要有两种需要:
- 在fast_alter_schema 下,要修改的property 可能很落后,所以需要全量生成一个record,再对其进行修改。
- 在现在的这种编码下,修改变长数据后,会影响变长属性的偏移,需要遍历修改。
@@ -470,7 +470,7 @@ def test_label_field(self): | |||
if field.get("name") == "jeep": | |||
assert field.get("type") == "INT8" | |||
|
|||
ret = ha_client.callCypher("CALL db.alterLabelModFields('vertex', 'animal',['run', 'int8', false], ['jeep', 'int32', true])", "default") | |||
ret = ha_client.callCypher("CALL db.alterLabelModFields('vertex', 'animal',['run', 'string', false], ['jeep', 'int32', true])", "default") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么改类型
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为快速字段变更不支持将string 转换为整型。所以我在这里改了类型,确保所有的测试都能通过。
后面 fast_alter_schema 是关的,这里也会改回去。
这里可能得根据不同的配置设置不同的测试文件和操作。
@@ -1,5 +1,5 @@ | |||
# unit test cmake | |||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -DFMA_IN_UNIT_TEST -DNO_STACKTRACE") | |||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O0 -g -Wall -DFMA_IN_UNIT_TEST -DNO_STACKTRACE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么这里不优化了啊?
@@ -38,4 +38,5 @@ add_executable(fma_unit_test | |||
|
|||
target_link_libraries (fma_unit_test | |||
libstdc++fs.a | |||
lgraph_server_lib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么要加上lgraph_server_lib呢?fma的test感觉用不到server吧
@@ -27,14 +27,15 @@ TEST_F(TestLGraphCLI, LGraphCLI) { | |||
CALL db.createVertexLabel('person', 'int8', | |||
'bool' ,'BOOL', false, | |||
'int8' ,'INT8', false, | |||
'datetime' ,'DATETIME', false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的作用是为了避免顺序不一致吗?
@@ -391,7 +391,7 @@ TEST_P(TestSpatial, Spatial_Schema) { | |||
Schema s2(s1); | |||
s2.AddFields(std::vector<FieldSpec>({FieldSpec("Point2", FieldType::POINT, false)})); | |||
UT_EXPECT_TRUE(s2.GetFieldExtractor("Point2")->GetFieldSpec() == | |||
FieldSpec("Point2", FieldType::POINT, false)); | |||
FieldSpec("Point2", FieldType::POINT, false, 9)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么加上一个9?
@@ -409,7 +409,11 @@ TEST_P(TestSpatial, Spatial_Schema) { | |||
UT_EXPECT_TRUE(!s2.HasBlob()); | |||
auto fmap = s2.GetFieldSpecsAsMap(); | |||
auto old_fields = fields; | |||
for (auto& f : mod) old_fields[f.name] = f; | |||
for (auto& f : mod) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里保留id的目的是什么啊?
@@ -463,37 +467,37 @@ TEST_P(TestSpatial, Spatial_Schema) { | |||
std::vector<FieldData>({FieldData::Point | |||
("0101000020231C0000000000000000F03F0000000000000040"), | |||
FieldData(), FieldData(), FieldData()}), true, &n_changed)); | |||
UT_EXPECT_EQ(n_changed, 3); | |||
UT_EXPECT_EQ(n_changed, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么改成0啊?
auto txn = graph.CreateReadTxn(); | ||
UT_EXPECT_TRUE(txn.GetVertexField(0, std::string("Point2")) == FieldData::Point | ||
("0101000020231C0000000000000000F03F0000000000000040")); | ||
UT_EXPECT_TRUE(txn.GetVertexField(0, std::string("Spatial2")) == FieldData()); | ||
} | ||
|
||
UT_LOG() << "Testing modify"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里现在可以不注释掉了吗?
@@ -1084,11 +1084,11 @@ void test_label_field(lgraph::RpcClient& client) { | |||
|
|||
ret = client.CallCypher(str, | |||
"CALL db.alterLabelModFields('vertex', 'animal'," | |||
"['run', 'int8', false], ['jeep', 'int32', true])"); | |||
"['run', 'string', false], ['jeep', 'int32', true])"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这也是string不能改int导致的是吗?
if (enable_fast_alter) { | ||
fids = {0, 1}; // there is no need to reorder the fields. | ||
} else { | ||
fids = {1, 0}; // 域为string类型会放在label的最后面 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
注释用英文吧
In this PR, I will modify the encoding method of metadata and data in Tugraph to ensure quick changes to schema.
This is a work in progress, and to prevent wasting CI resources, the related tests have been temporarily disabled.
For detailed design information, please refer to: https://alidocs.dingtalk.com/i/nodes/14dA3GK8gj560ryRUgzjPoO5J9ekBD76?utm_scene=person_space.