Internals
File Structure on Server
/data (in container, but configurable on the command line)
- folders.json - a listing of each google shared folder
- One folder for each drive
- Second folder with the same name with
_transform
on the end to hold markdown version
- Second folder with the same name with
- quota.json - google throttle for limited rate
/data# more folders.json
{
"0APmwe3yIhGabUk9PVA": {
"id": "0APmwe3yIhGabUk9PVA",
"name": "A Test WikiGDrive"
},
}
/data/0APmwe3yIhGabUk9PVA# tree -a
.
|-- .drive.json # delete this
|-- .folder-files.json # Each file - coming from google API
|-- .folder.json # https://github.com/mieweb/wikiGDrive/blob/8609077ee14501c80acbd97a61c9fbdfbb0fc6fc/src/containers/google_folder/TaskFetchFolder.ts#L68
|-- .tree.json # a listin of all the files
|-- 1KZ45LytrvLZ3Np_EC_x5Uv6fy8xHLhvJyDNfC6i4xtc.odt
`-- 1wlRv3bZ5Z84TD9Oba4-lEorfV_R9aKhJyRS2iCInA7w.odt
|-- .user_config.yaml
|-- .private
| |-- id_rsa
| `-- id_rsa.pub
/data/0APmwe3yIhGabUk9PVA_transform# tree -a
|-- .git.json
|-- .gitignore
|-- .tree.json
|-- .wgd-directory.yaml
|-- .wgd-local-links.csv
|-- .wgd-local-log.csv
|-- example-folder
| |-- .wgd-directory.yaml
| `-- 1
| |-- .wgd-directory.yaml
| `-- 2
| |-- .wgd-directory.yaml
| `-- 3
| |-- .wgd-directory.yaml
| `-- 4
| |-- .wgd-directory.yaml
| |-- sub-folder-example-file.assets
| |-- sub-folder-example-file.debug.xml
| `-- sub-folder-example-file.md
|-- index.assets
|-- index.debug.xml
|-- index.md
|-- readme.assets
|-- readme.debug.xml
`-- readme.md
.wgd dir structure
drive.json:
{
"drive": "https://drive.google.com/drive/folders/FOLDER_ID",
"dest": "/home/user/mieweb/wikigdrive-test",
"link_mode": "mdURLs",
"service_account": "wikigdrive.json"
}
google_files.json is indexed with Google's fileId - data got from google (just adding parentId, simplify lastAuthor)
Note this is going away. Will be replacing this single database with a multi-file version for scale.
- id - Google's fileId
- name - Title set inside google docs. It is not unique
- mimeType - Google's mime type or 'conflict' or 'redirect'
- modifiedTime - Server-size mtime
- localPath - real local path, unique with handled conflicts and redirects (in case of title rename)
- lastAuthor - Google's last author if available
{
"123123123": {
"id": "123123123",
"name": "A title of document",
"mimeType": "application/vnd.google-apps.document",
"modifiedTime": "2020-02-27T20:20:20.123Z",
"desiredLocalPath": "a-title-of-document",
"lastAuthor": "John Smith",
}
}
download.json is indexed with Google's fileId - it contains gdoc JSON sources, svg for diagrams and zip with images:
{
"123123": {
"id": "123123",
"name": "System Conversion",
"mimeType": "application/vnd.google-apps.document",
"modifiedTime": "2020-02-27T21:31:21.718Z",
"images": [
{
"docUrl": "i.0",
"pngUrl": "https://lh6.googleusercontent.com/123123123123",
"zipImage": {
"zipPath": "image1.png",
"width": 704,
"height": 276,
"hash": "0000001101010111101111010010101001010110001011101000111100110111"
}
}
]
}
}
local_files.json is indexed with file id
- desiredLocalPath - slugified name. It is not unique, wikigdrive handles redirects so it is NOT real path in local system
- dirty - file needs to be downloaded
- conflicting - array of fileIds when mimeType = 'conflict'
- localPath - path to transformed markdown file
- modifiedTime - fetched from google server
{
"123123123": {
"localPath": "a-title-of-document"
"localPath": "external_path/123123123.png",
"md5Checksum": "123123123"
}
}
Conflict resolution and redirect algorithm
Sync stage: get files from google by listening root directory or watching changes - save into google_files.json
Download stage: download all files that does not exist in download.json - save into download.json
Transform stage:
- Get files to transform (does not exist in local_files.json, have different modifiedTime, are trashed), generate desireLocalPaths based on parents
- If file is removed - remove .md file, remove images
- If file is new (not exists in local_files.json) - add to localFiles, schedule for generation
- If file exists but with different desireLocalPath:
- Remove old .md, remove old images
- Schedule for generation
- Generate redir with old localPath
- Remove dangling redirects
- Check if there are any conflicts (same desireLocalPath)
- Check if any conflicts can be removed