Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast Schema alter. #666

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

ColinLeeo
Copy link

@ColinLeeo ColinLeeo commented Sep 13, 2024

In this PR, I will modify the encoding method of metadata and data in Tugraph to ensure quick changes to schema.
This is a work in progress, and to prevent wasting CI resources, the related tests have been temporarily disabled.
For detailed design information, please refer to: https://alidocs.dingtalk.com/i/nodes/14dA3GK8gj560ryRUgzjPoO5J9ekBD76?utm_scene=person_space.

@ColinLeeo ColinLeeo force-pushed the colin_fast_schema_alter branch 2 times, most recently from 610becc to 5c86243 Compare September 13, 2024 09:25
@ColinLeeo ColinLeeo force-pushed the colin_fast_schema_alter branch 2 times, most recently from 987274f to 73ba9fc Compare October 14, 2024 15:53
@CLAassistant
Copy link

CLAassistant commented Oct 15, 2024

CLA assistant check
All committers have signed the CLA.

@ColinLeeo ColinLeeo force-pushed the colin_fast_schema_alter branch from c163299 to 2945e34 Compare October 15, 2024 22:49
@ColinLeeo ColinLeeo force-pushed the colin_fast_schema_alter branch from 02d6e04 to b4007d9 Compare October 29, 2024 13:26
@ColinLeeo ColinLeeo force-pushed the colin_fast_schema_alter branch 2 times, most recently from f9d9bd6 to 9b3dedf Compare December 2, 2024 11:01
@ColinLeeo ColinLeeo force-pushed the colin_fast_schema_alter branch 2 times, most recently from b779677 to 90ff09d Compare December 18, 2024 04:32
@ColinLeeo ColinLeeo force-pushed the colin_fast_schema_alter branch from 46ea9c8 to ad334ab Compare December 25, 2024 00:23
for (int t_offset = 1; t_offset < num_threads; t_offset++) {
thread_id = (thread_id + t_offset) % num_threads;
if (thread_state[thread_id]->state == THREAD_STEALING) continue;
if (ready_handle) {
Copy link
Author

@ColinLeeo ColinLeeo Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里修改的目的是,并行测试的时候,会有其他的线程修改omp 的并行数量,如果单独运行相关的测试是没问题的。omp并行数量修改会导致SEGSEGV(并行部分会拿到线程号访问控制数据结构,这个线程号可能比预期的大),所以这里添加了检查。这里没有使用omp设定线程池大小的方法,因为可能会影响现实业务

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里修改的目的是,并行测试的时候,会有其他的线程修改omp 的并行数量,如果单独运行相关的测试是没问题的。omp并行数量修改会导致SEGSEGV(并行部分会拿到线程号访问控制数据结构,这个线程号可能比预期的大),所以这里添加了检查。这里没有使用omp设定线程池大小的方法,因为可能会影响现实业务

这个哪里会修改线程数量啊?以及这个不会影响快速字段变更吧

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我运行 ut_no_asan 的时候会出这个问题,我暂时没找到具体哪个测试会修改线程数量,有点奇怪的是这个问题在ut_no_asan 必现的。
我的代码里不会修改线程数量,我注意到的是这段代码有时候启动的时候线程数是16,但是在下面并发处理的时候会变成24,多出来的线程会导致SEGSEGV。
有不少测试是修改线程数的,这个修改操作是全局的,所以我在这里加了一个判断,防止出现非法访问。

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

跟我修改的部分没有关系,我初步的怀疑是有些操作时间变短了,改变测试之间的时序,两个会用到并设置线程数的测试出现了并行?

/** @brief is this field deleted? */
bool deleted;
/** @brief id of this field, starts from 0 */
uint16_t id;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个id是从0开始,还是从1开始?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ID 从0 开始。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ID 从0 开始。

这个ID就是设计里面的每个属性的ID吗?那个也是从0开始的吗?设计稿里是从1开始的

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

属性property的count是从1 开始的,表示property里的数量,field 标记是从0 开始的。

Copy link
Author

@ColinLeeo ColinLeeo Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

属性property的count是从1 开始的,表示property里的数量,field 标记是从0 开始的。

我看了下设计稿,确实FieldID 是从 1 开始的,我代码里的实现是从0 开始的,因为我看到之前的实现里field 是从0 开始的,所以为了保持两边含义的一致性(因为之前的实现里,fieldId 只表示属性在数组中的位置,所以从0 开始的),从0 开始与数组位置对应。这里需要修改为从1 开始吗?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

属性property的count是从1 开始的,表示property里的数量,field 标记是从0 开始的。

我看了下设计稿,确实FieldID 是从 1 开始的,我代码里的实现是从0 开始的,因为我看到之前的实现里field 是从0 开始的,所以为了保持两边含义的一致性(因为之前的实现里,fieldId 只表示属性在数组中的位置,所以从0 开始的),从0 开始与数组位置对应。这里需要修改为从1 开始吗?

没事儿,按你的来吧,改一下设计稿就行

/** @brief is set init value? */
bool set_init_value;
/** @brief the default value when inserting data. */
FieldData default_value;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_value和init_value的使用区别在哪里?给个例子?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_value 的用法还没加上。两个的区别在于:

  1. init_value 在用户进行增加列的时候设置的值,这个值表示没有数据时返回给客户端的数据(即这个列初始化时的数据)
  2. default_value 是在后期可能加上的,用户如果设置了某个属性的默认值,用户在创建某个点边的属性时,默认采用这个值,default_value 可以变化,变化后,用户创建 某个点边属性时保存的值也会发生变化。

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_value 可以去掉。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2. default_value

感觉这两个可以合并成一个,default value,表示属性没有设置时的默认值

/** @brief is set init value? */
bool set_init_value;
/** @brief the default value when inserting data. */
FieldData default_value;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2. default_value

感觉这两个可以合并成一个,default value,表示属性没有设置时的默认值

for (int t_offset = 1; t_offset < num_threads; t_offset++) {
thread_id = (thread_id + t_offset) % num_threads;
if (thread_state[thread_id]->state == THREAD_STEALING) continue;
if (ready_handle) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里修改的目的是,并行测试的时候,会有其他的线程修改omp 的并行数量,如果单独运行相关的测试是没问题的。omp并行数量修改会导致SEGSEGV(并行部分会拿到线程号访问控制数据结构,这个线程号可能比预期的大),所以这里添加了检查。这里没有使用omp设定线程池大小的方法,因为可能会影响现实业务

这个哪里会修改线程数量啊?以及这个不会影响快速字段变更吧

namespace _detail {

bool FieldExtractorV2::DataInRecord(const Value& record) const {
if (GetFieldId() + 1 > GetRecordCount(record)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetFieldId() >= GetRecordCount(record) 就可以?

return ::lgraph::_detail::UnalignedGet<DataOffset>(record.Data() + var_offset);
} else {
int id_offset = 1;
while (GetFieldOffset(record, GetFieldId() + id_offset) == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以写个注释,并改用for循环,作用是跳过已删除的offset

Comment on lines +22 to +41
void FieldExtractorBase::GetCopy(const Value& record, std::string& data) const {
FMA_DBG_ASSERT(Type() != FieldType::BLOB);
if (!DataInRecord(record)) {
const Value v = GetInitedValue();
data.resize(v.Size());
memcpy(&data[0], v.Data(), v.Size());
return;
}
data.resize(GetDataSize(record));
GetCopyRaw(record, &data[0], data.size());
}

void FieldExtractorBase::GetCopy(const Value& record, Value& data) const {
if (!DataInRecord(record)) {
data = GetInitedValue();
return;
}
data.Resize(GetDataSize(record));
GetCopyRaw(record, data.Data(), data.Size());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个GetCopy函数怎么dst不一样啊,data和record哪个是要拷贝生成的对象呢?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record 是数据来源,这两个函数都是从record 中获取该field_extractor指定的数据,区别在于一个存储到string中,一个存储到Value 中,上面的GetInitedValue 会返回一个FieldData。

src/core/field_extractor_base.h Show resolved Hide resolved

if (temp > std::numeric_limits<T>::max()) {
*dst = std::numeric_limits<T>::max();
} else if (temp < - std::numeric_limits<T>::max()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这儿是不是应该用std::numeric_limits<int64_t>::min()?

DataOffset var_offset =
::lgraph::_detail::UnalignedGet<DataOffset>(record.Data() + GetFieldOffset(record));
// For variable data, return data-raw's pointer.
return (char*)record.Data() + sizeof(uint32_t) + var_offset;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里改成sizeof(DataOffset)吧

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GET

while (vit->IsValid()) {
Value prop = vit->GetProperty();
if (curr_sm->GetRecordLabelId(prop) == curr_lid) {
if (!new_schema->GetFastAlterSchema()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是没处理fast alter schema的情况吗?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果是fast alter schema 的话,这里可以直接跳出,因为不再做数据扫描了

::lgraph::_detail::UnalignedSet<DataOffset>(offsets + sizeof(DataOffset) * (i - 1),
static_cast<DataOffset>(min_size));
size_t num_fields = fields_.size();
// version - [label] - count - null_array - offset_array
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里version删掉了吧

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,我修改一下注释。

is_set[extr->GetFieldId()] = true;
extr->ParseAndSet(v, data);
if (fast_alter_schema) {
ParseAndSet(v, data, extr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是因为fast schema的变更需要schema的信息,才没有把ParseAndSet放到fieldextractor中吗?

Copy link
Author

@ColinLeeo ColinLeeo Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,主要有两种需要:

  1. 在fast_alter_schema 下,要修改的property 可能很落后,所以需要全量生成一个record,再对其进行修改。
  2. 在现在的这种编码下,修改变长数据后,会影响变长属性的偏移,需要遍历修改。

@@ -470,7 +470,7 @@ def test_label_field(self):
if field.get("name") == "jeep":
assert field.get("type") == "INT8"

ret = ha_client.callCypher("CALL db.alterLabelModFields('vertex', 'animal',['run', 'int8', false], ['jeep', 'int32', true])", "default")
ret = ha_client.callCypher("CALL db.alterLabelModFields('vertex', 'animal',['run', 'string', false], ['jeep', 'int32', true])", "default")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么改类型

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为快速字段变更不支持将string 转换为整型。所以我在这里改了类型,确保所有的测试都能通过。
后面 fast_alter_schema 是关的,这里也会改回去。
这里可能得根据不同的配置设置不同的测试文件和操作。

@@ -1,5 +1,5 @@
# unit test cmake
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -DFMA_IN_UNIT_TEST -DNO_STACKTRACE")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O0 -g -Wall -DFMA_IN_UNIT_TEST -DNO_STACKTRACE")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么这里不优化了啊?

@@ -38,4 +38,5 @@ add_executable(fma_unit_test

target_link_libraries (fma_unit_test
libstdc++fs.a
lgraph_server_lib
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么要加上lgraph_server_lib呢?fma的test感觉用不到server吧

@@ -27,14 +27,15 @@ TEST_F(TestLGraphCLI, LGraphCLI) {
CALL db.createVertexLabel('person', 'int8',
'bool' ,'BOOL', false,
'int8' ,'INT8', false,
'datetime' ,'DATETIME', false,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的作用是为了避免顺序不一致吗?

@@ -391,7 +391,7 @@ TEST_P(TestSpatial, Spatial_Schema) {
Schema s2(s1);
s2.AddFields(std::vector<FieldSpec>({FieldSpec("Point2", FieldType::POINT, false)}));
UT_EXPECT_TRUE(s2.GetFieldExtractor("Point2")->GetFieldSpec() ==
FieldSpec("Point2", FieldType::POINT, false));
FieldSpec("Point2", FieldType::POINT, false, 9));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么加上一个9?

@@ -409,7 +409,11 @@ TEST_P(TestSpatial, Spatial_Schema) {
UT_EXPECT_TRUE(!s2.HasBlob());
auto fmap = s2.GetFieldSpecsAsMap();
auto old_fields = fields;
for (auto& f : mod) old_fields[f.name] = f;
for (auto& f : mod) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里保留id的目的是什么啊?

@@ -463,37 +467,37 @@ TEST_P(TestSpatial, Spatial_Schema) {
std::vector<FieldData>({FieldData::Point
("0101000020231C0000000000000000F03F0000000000000040"),
FieldData(), FieldData(), FieldData()}), true, &n_changed));
UT_EXPECT_EQ(n_changed, 3);
UT_EXPECT_EQ(n_changed, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么改成0啊?

auto txn = graph.CreateReadTxn();
UT_EXPECT_TRUE(txn.GetVertexField(0, std::string("Point2")) == FieldData::Point
("0101000020231C0000000000000000F03F0000000000000040"));
UT_EXPECT_TRUE(txn.GetVertexField(0, std::string("Spatial2")) == FieldData());
}

UT_LOG() << "Testing modify";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里现在可以不注释掉了吗?

@@ -1084,11 +1084,11 @@ void test_label_field(lgraph::RpcClient& client) {

ret = client.CallCypher(str,
"CALL db.alterLabelModFields('vertex', 'animal',"
"['run', 'int8', false], ['jeep', 'int32', true])");
"['run', 'string', false], ['jeep', 'int32', true])");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这也是string不能改int导致的是吗?

if (enable_fast_alter) {
fids = {0, 1}; // there is no need to reorder the fields.
} else {
fids = {1, 0}; // 域为string类型会放在label的最后面
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注释用英文吧

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants