Deploy Symfony project to App Engine - ERROR: Too many files
Increase the deployment verbosity using the --verbosity
option for the gcloud app deploy command and you'll get the list of all the files uploaded. Then use the skip_files
option in your app.yaml
to specify the ones you want ignored:
Optional. The skip_files element specifies which files in the
application directory are not to be uploaded to App Engine. The value
is either a regular expression, or a list of regular expressions. Any
filename that matches any of the regular expressions is omitted from
the list of files to upload when the application is uploaded.
Filenames are relative to the project directory.The skip_files has the following default:
skip_files:
- ^(.*/)?#.*#$
- ^(.*/)?.*~$
- ^(.*/)?.*\.py[co]$
- ^(.*/)?.*/RCS/.*$
- ^(.*/)?\..*$
Note: watch out for overwriting the defaults for this config.
I may be wrong, but your project structure image suggests your app code resides in the src
directory. If so I'd suggest moving the app.yaml
file inside it - the directory containing the app.yaml
file being deployed is considered to be the top dir of the app/service - its entire content will be uploaded to GAE. You may need to adjust some paths after such move - GAE considers all app/service paths relative to this app/service top dir. If you need them, you can selectively symlink some files/directories from the project directory into the src
dir, deployment follows symlinks, replacing them with their actual content.
Some related posts:
- How to properly deploy node apps to GAE with secret keys?
- gcloud app deploy : This deployment has too many files
Determine a file's path(s) relative to a directory, including symlinks
This, like many things, is more complex than it might appear on the surface.
Each entity in the file system points at an inode
, which describes the content of the file. Entities are the things you see - files, directories, sockets, block devices, character devices, etc...
The content of a single "file" can be accessed via one or more paths - each of these paths is called a "hard link". Hard links can only point at files on the same filesystem, they cannot cross the boundary of a filesystem.
It is also possible for a path to address a "symbolic link", which can point at another path - that path doesn't have to exist, it can be another symbolic link, it can be on another filesystem, or it can point back at the original path producing an infinite loop.
It is impossible to locate all links (symbolic or hard) that point at a particular entity without scanning the entire tree.
Before we get into this... some comments:
- See the end for some benchmarks. I'm not convinced that this is a significant issue, though admittedly this filesystem is on a 6-disk ZFS array, on an i7, so using a lower spec system will take longer...
- Given that this is impossible without calling
stat()
on every file at some point, you're going to struggle coming up with a better solution that isn't significantly more complex (such as maintaining an index database, with all the issues that introduces)
As mentioned, we must scan (index) the whole tree. I know it's not what you want to do, but it's impossible without doing this...
To do this, you need to collect inodes, not filenames, and review them after the fact... there may be some optimisation here, but I've tried to keep it simple to prioritise understanding.
The following function will produce this structure for us:
def get_map(scan_root):
# this dict will have device IDs at the first level (major / minor) ...
# ... and inodes IDs at the second level
# each inode will have the following keys:
# - 'type' the entity's type - i.e: dir, file, socket, etc...
# - 'links' a list of all found hard links to the inode
# - 'symlinks' a list of all found symlinks to the inode
# e.g: entities[2049][4756]['links'][0] path to a hard link for inode 4756
# entities[2049][4756]['symlinks'][0] path to a symlink that points at an entity with inode 4756
entity_map = {}
for root, dirs, files in os.walk(scan_root):
root = '.' + root[len(scan_root):]
for path in [ os.path.join(root, _) for _ in files ]:
try:
p_stat = os.stat(path)
except OSError as e:
if e.errno == 2:
print('Broken symlink [%s]... skipping' % ( path ))
continue
if e.errno == 40:
print('Too many levels of symbolic links [%s]... skipping' % ( path ))
continue
raise
p_dev = p_stat.st_dev
p_ino = p_stat.st_ino
if p_dev not in entity_map:
entity_map[p_dev] = {}
e_dev = entity_map[p_dev]
if p_ino not in e_dev:
e_dev[p_ino] = {
'type': get_type(p_stat.st_mode),
'links': [],
'symlinks': [],
}
e_ino = e_dev[p_ino]
if os.lstat(path).st_ino == p_ino:
e_ino['links'].append(path)
else:
e_ino['symlinks'].append(path)
return entity_map
I've produced an example tree that looks like this:
$ tree --inodes
.
├── [ 67687] 4 -> 5
├── [ 67676] 5 -> 4
├── [ 67675] 6 -> dead
├── [ 67676] a
│ └── [ 67679] 1
├── [ 67677] b
│ └── [ 67679] 2 -> ../a/1
├── [ 67678] c
│ └── [ 67679] 3
└── [ 67687] d
└── [ 67688] 4
4 directories, 7 files
The output of this function is:
$ places
Broken symlink [./6]... skipping
Too many levels of symbolic links [./5]... skipping
Too many levels of symbolic links [./4]... skipping
{201: {67679: {'links': ['./a/1', './c/3'],
'symlinks': ['./b/2'],
'type': 'file'},
67688: {'links': ['./d/4'], 'symlinks': [], 'type': 'file'}}}
If we are interested in ./c/3
, then you can see that just looking at symlinks (and ignoring hard links) would cause us to miss ./a/1
...
By subsequently searching for the path we are interested in, we can find all other references within this tree:
def filter_map(entity_map, filename):
for dev, inodes in entity_map.items():
for inode, info in inodes.items():
if filename in info['links'] or filename in info['symlinks']:
return info
$ places ./a/1
Broken symlink [./6]... skipping
Too many levels of symbolic links [./5]... skipping
Too many levels of symbolic links [./4]... skipping
{'links': ['./a/1', './c/3'], 'symlinks': ['./b/2'], 'type': 'file'}
The full source for this demo is below. Note that I've used relative paths to keep things simple, but it would be sensible to update this to use absolute paths. Additionally, any symlink that points outside the tree will not currently have a corresponding link
... that's an exercise for the reader.
It might also be an idea to collect the data while you're filling the tree (if that's something that would work with your process)... you can use inotify
to deal with this nicely - there's even a python module.
#!/usr/bin/env python3
import os, sys, stat
from pprint import pprint
def get_type(mode):
if stat.S_ISDIR(mode):
return 'directory'
if stat.S_ISCHR(mode):
return 'character'
if stat.S_ISBLK(mode):
return 'block'
if stat.S_ISREG(mode):
return 'file'
if stat.S_ISFIFO(mode):
return 'fifo'
if stat.S_ISLNK(mode):
return 'symlink'
if stat.S_ISSOCK(mode):
return 'socket'
return 'unknown'
def get_map(scan_root):
# this dict will have device IDs at the first level (major / minor) ...
# ... and inodes IDs at the second level
# each inode will have the following keys:
# - 'type' the entity's type - i.e: dir, file, socket, etc...
# - 'links' a list of all found hard links to the inode
# - 'symlinks' a list of all found symlinks to the inode
# e.g: entities[2049][4756]['links'][0] path to a hard link for inode 4756
# entities[2049][4756]['symlinks'][0] path to a symlink that points at an entity with inode 4756
entity_map = {}
for root, dirs, files in os.walk(scan_root):
root = '.' + root[len(scan_root):]
for path in [ os.path.join(root, _) for _ in files ]:
try:
p_stat = os.stat(path)
except OSError as e:
if e.errno == 2:
print('Broken symlink [%s]... skipping' % ( path ))
continue
if e.errno == 40:
print('Too many levels of symbolic links [%s]... skipping' % ( path ))
continue
raise
p_dev = p_stat.st_dev
p_ino = p_stat.st_ino
if p_dev not in entity_map:
entity_map[p_dev] = {}
e_dev = entity_map[p_dev]
if p_ino not in e_dev:
e_dev[p_ino] = {
'type': get_type(p_stat.st_mode),
'links': [],
'symlinks': [],
}
e_ino = e_dev[p_ino]
if os.lstat(path).st_ino == p_ino:
e_ino['links'].append(path)
else:
e_ino['symlinks'].append(path)
return entity_map
def filter_map(entity_map, filename):
for dev, inodes in entity_map.items():
for inode, info in inodes.items():
if filename in info['links'] or filename in info['symlinks']:
return info
entity_map = get_map(os.getcwd())
if len(sys.argv) == 2:
entity_info = filter_map(entity_map, sys.argv[1])
pprint(entity_info)
else:
pprint(entity_map)
I've run this on my system out of curiosity. It's a 6x disk ZFS RAID-Z2 pool on an i7-7700K with plenty of data to play with. Admittedly this will run somewhat slower on lower-spec systems...
Some benchmarks to consider:
- A dataset of ~3.1k files and links in ~850 directories.
This runs in less than 3.5 seconds, ~80ms on subsequent runs - A dataset of ~30k files and links in ~2.2k directories.
This runs in less than 30 seconds, ~300ms on subsequent runs - A dataset of ~73.5k files and links in ~8k directories.
This runs in approx 60 seconds, ~800ms on subsequent runs
Using simple maths, that's about 1140 stat()
calls per second with an empty cache, or ~90k stat()
calls per second once the cache has been filled - I don't think that stat()
is as slow as you think it is!
gcloud app deploy : This deployment has too many files
If you really have more than the 10000 files quota in the service you're trying to deploy then you might have to reduce the number accordingly.
Other things to try:
- you might be able to get a quota increase, see Getting error on GAE: Max number of files and blobs is 10000
- delete whatever files are not actually needed, or just skip them during deployment see skip_files or, for the more recent cloud SDK versions, the
.gcloudignore
file. - if you have a lot of static files consider moving (some of) them to GCS instead, see Approaches for overcoming 10000 file limit on Google App Engine?
- split the service into multiple smaller services - each with its own 10000 files limit.
Assuming you do not actually hit the files quota then the error usually indicates you have looping/circular referencing symlinks in your app directory. Which could also explain a path like the one you mentioned in a comment to this post: https://stackoverflow.com/a/42425048/4495081. You just have to fix the offending symlink(s). Again, a simple/consistent directory structure could help prevent such issues.
Listing non symbolic link on Windows
There are some problems in your code:
- you need delayed expansion because you are setting (writing) and expanding (reading) the variable
count
within the same parenthesised block of code (namely thefor /F %%a
loop); - in your
for /F %%a
loop you need to state options"eol=| delims="
in order not to run into trouble with files whose names begin with;
(such would be ignored due to the defaulteol=;
option) and those which have white-spaces in their names (you would receive only the postion before the first white-space because of the defaultdelims
SPACE and TAB and the default optiontokens=1
(seefor /?
for details about that); dir /B
returns file names only, so%%a
actually points to files in the current directory rather than toC:\TEMP\
; to fix that, simply change to that directory first bycd
;- to capture the output of a command (line) and assign it to a variable, use another
for /F
loop andset
; this loop is going to iterate once only, becausefind /C
returns only a single line; note the escaped pipe^|
below, which is required to not execute it immediately; - there is no comparison operator
-EQU
, you need to remove the-
to check for equality; - it is a good idea to use the quoted
set
syntax as it is most robust against poisonous characters; - file and directory paths should generally be quoted since they might contain token delimiters or other poisonous characters;
Here is the fixed script:
@echo off
setlocal EnableDelayedExpansion
pushd "C:\TEMP\" || exit /B 1
for /F "eol=| delims=" %%a in ('dir /B "."') do (
for /F %%b in ('
fsutil hardlink list "%%a" ^| find /C /V ""
') do (
set "count=%%b"
)
if !count! EQU 1 del "%%a"
)
popd
endlocal
This can even be simplified:
@echo off
pushd "C:\TEMP\" || exit /B 1
for /F "eol=| delims=" %%a in ('dir /B "."') do (
for /F %%b in ('
fsutil hardlink list "%%a" ^| find /C /V ""
') do (
if %%b EQU 1 del "%%a"
)
)
popd
Since the inner for /F
loop iterates always once only, we can move the if
query inside, thus avoiding the definition of an auxiliary variable which is the only one we needed delayed expansion for.
why are some cygwin symlinks not visible from a cmd.exe session
The reason why the links are not visible is due to their file Attribute
S = System
are not visible in CMD by DOS/Windows design,
from CMD, sorry in German, we have:
$ cmd
Microsoft Windows [Version 10.0.19041.450]
(c) 2020 Microsoft Corporation. Alle Rechte vorbehalten.
D:\cygwin64\bin>attrib zipinfo
S D:\cygwin64\bin\zipinfo
D:\cygwin64\bin>dir zipinfo
Datenträger in Laufwerk D: ist DATA
Volumeseriennummer: D603-FB6E
Verzeichnis von D:\cygwin64\bin
Datei nicht gefunden
D:\cygwin64\bin>dir /A:S zipinfo
Datenträger in Laufwerk D: ist DATA
Volumeseriennummer: D603-FB6E
Verzeichnis von D:\cygwin64\bin
19.06.2018 22:17 16 zipinfo
1 Datei(en), 16 Bytes
0 Verzeichnis(se), 542.542.495.744 Bytes frei
Related Topics
Kill Bash Script Foreground Children When a Signal Comes
Docker Rootless Unable to Pull Images
How to Make Fork Changes Reflect with The Master When Updated
Using Pthread Mutex Shared Between Processes Correctly
Linux Directory Starting with Dot
Authenticating Gtk App to Run with Root Permissions
Sed Command Works on Linux, But Not on Os X
User Time Larger Than Real Time
How to Find List of Valid Locales in My Linux Using Perl
Copying Files with Wildcard (*) to a Folder in a Bash Script - Why Isn't It Working
Gnome Shell Extension Key Binding
How to Find The Version of a Compiled Kernel Module
Raspberry-Pi Docker Error: Standard_Init_Linux.Go:178: Exec User Process Caused "Exec Format Error"
Running Docker Without Sudo on Ubuntu 14.04
Where Is G_Multi Configured in Beaglebone Black
Shared Library Symbol Conflicts and Static Linking (On Linux)